To R-SIG-Mac, with a copy to Jeff Newmiller:

On R-help there's a thread about reading a remote file that is coded in UTF-16LE with a byte-order mark. Jeff Newmiller pointed out (https://stat.ethz.ch/pipermail/r-help/2024-September/479933.html) that it would be better to declare the encoding as "UTF-16", because the BOM will indicate little endian.

I tried this on my Mac running R 4.4.1, and it didn't work. I get the same incorrect result from all of these commands:

 # Automatically recognizing a URL and using fileEncoding:
 read.delim(

'https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt',
     fileEncoding = "UTF-16"
 )

 # Using explicit url() with encoding:
 read.delim(

url('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt',
        encoding = "UTF-16")
 )

 # Specifying the endianness incorrectly:
 read.delim(

url('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt',
        encoding = "UTF-16BE")
 )

The only way I get the correct result is if I specify "UTF-16LE" explicitly, whereas Jeff got correct results on several different systems using "UTF-16".

Is this a MacOS bug or an R for MacOS bug?

Duncan Murdoch

_______________________________________________
R-SIG-Mac mailing list
R-SIG-Mac@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-mac

Reply via email to