There seems to be something odd with "∞" on Windows (and not only with read.table) In native encoding (cp-1252 in my case), "∞" gets converted to "8"
x <- "∞" Encoding(x) #> [1] "unknown" print(x) #> [1] "8" charToRaw(x) #> [1] 38 "∞" is indeed "8" identical(x, "8") #> [1] TRUE Everything seems fine if "∞" is UTF-8 encoded. y <- "\u221E" Encoding(y) #> [1] "UTF-8" print(y) #> [1] "∞" charToRaw(y) #> [1] e2 88 9e Unless the string is converted back to native encoding. format(y) #> [1] "8" This ought to be "<U+221E>", equivalently to format("∝") #> [1] "<U+221D>" Session Info: si <- sessionInfo() si$running #> [1] "Windows 10 x64 (build 17134)" si$R.version$version.string #> [1] "R version 3.5.2 (2018-12-20)" si$locale #> [1] "LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252" Am Do., 7. Feb. 2019 um 14:33 Uhr schrieb David Byrne < david.byrne...@gmail.com>: > I can confirm that it doesn't happen on Ubuntu 18.04.1 so Peter is > most likely correct; it looks like its Windows specific. > > On Thu, 7 Feb 2019 at 12:55, peter dalgaard <pda...@gmail.com> wrote: > > > > This doesn't seem to be happening on MacOS, neither in Terminal nor > RStudio, (R 3.5.1, R-devel, R-patched). So probably Windows specific. > > > > -pd > > > > > On 7 Feb 2019, at 11:17 , David Byrne <david.byrne...@gmail.com> > wrote: > > > > > > Bug > > > Using read.table(file, encoding="UTF-8") to import a UTF-8 encoded > > > file containing the infinity symbol (' ∞ ') results in the infinity > > > symbol imported as the number 8. Other Unicode characters seem > > > unaffected, example, Zhe: ж > > > > > > Expected Behavior: > > > The imported data.frame should represent the infinity symbol as the > > > expected 'Inf' so that normal mathematical operations can be processed > > > > > > Stack Overflow Post: > > > I created a question on Stack Overflow where one other member was able > > > to reproduce the same issues I was having. This question can be found > > > at: > > > > https://stackoverflow.com/questions/54522196/r-read-table-with-utf-8-encoded-file-reads-infinity-symbol-as-8-int > > > > > > Method to Reproduce - 1: > > > A simple method to reproduce this issues is to use R-Studio: In the > > > console, type the following: > > >> read.table(text=" ∞", encoding="UTF-8") > > > > > > The result should be a data.frame with a single value of '8' > > > > > > Repeating the same with ж Results in correct expected behavior > > > > > > Method to Reproduce - 2: > > > Create a .csv file containing the infinity and Zhe characters (I have > > > attached the file for convenience, hopefully it is no rejected by your > > > email service). Launch an interactive session using > > > > > >> r --vanilla > > > > > > Enter the following statement taking care to replace the > > > <path-to-file> with the appropriate one: > > > > > >> read.table("<path-to-file>/unicode_chars.csv", sep=",", > encoding="UTF-8") > > > > > > > > > This should result in a two element data.frame; the first being the > > > incorrect value of 8 with an additional <U+FEFF> and the second the > > > correct value of Zhe. > > > > > > Note the additional <U+FEFF> prefixed to the front of the '8'. This > > > appears to be a hidden character for the purposes of letting editors > > > know the encoding. The following link has some explanation however, it > > > states this is caused by excel. The file I created was done so using > > > notepad and not Excel. > > > > > > > https://medium.freecodecamp.org/a-quick-tale-about-feff-the-invisible-character-cd25cd4630e7 > > > > > > System Details: > > > OS: > > >> Windows 10.0.17134 Build 17134 > > > > > > > > > R Version: > > >> platform x86_64-w64-mingw32 > > >> arch x86_64 > > >> os mingw32 > > >> system x86_64, mingw32 > > >> status > > >> major 3 > > >> minor 4.1 > > >> year 2017 > > >> month 06 > > >> day 30 > > >> svn rev 72865 > > >> language R > > >> version.string R version 3.4.1 (2017-06-30) > > >> nickname Single Candle > > > ______________________________________________ > > > R-devel@r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > -- > > Peter Dalgaard, Professor, > > Center for Statistics, Copenhagen Business School > > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > > Phone: (+45)38153501 > > Office: A 4.23 > > Email: pd....@cbs.dk Priv: pda...@gmail.com > > > > > > > > > > > > > > > > > > > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel