Look at Encoding() on your two strings. The results are different, and this seems to be the root of the problem. Adding encoding="latin1" to the read.csv call is a workaround.

It looks like there is a problem in the use of the CHARSXP cache: if I save the session then x0 == x becomes true when I reload it, even though the encodings remain different.

I've found the immediate cause and will change this in R-patched shortly.

On Thu, 6 Nov 2008, Heinz Tuechler wrote:

Dear All!

Reading character strings containing an "umlaut" from a csv-file I find a (to me) surprising behaviour in R 2.8.0, that I did not notice in R 2.7.2.
A comparison by "==" results in FALSE, while grep does find the aggreement.
See the example below.
The crucial line is x=="div 1-2 Veränderungen", with the result [1] FALSE in R 2.8.0 but
[1] TRUE in R 2.7.2.

Thank you in advance for your help

Heinz Tüchler

##### in R 2.8.0 patched

x0 <- "div 1-2 Veränderungen" # define a character string

write.csv(x0, 'chr.csv', row.names=FALSE) # write a csv-file with one line
rm(x0)

x <- read.csv('chr.csv', skip=0, header=TRUE, as.is=TRUE)$x # read in csv-file
x
x=="div 1-2 Veränderungen"
[1] FALSE
grep("div 1-2 Veränderungen", x)
[1] 1
grep("div 1-2 Veränderungen", x, value=TRUE)
[1] "div 1-2 Veränderungen"

unlink('chr.csv') # delete file

Version:
platform = i386-pc-mingw32
arch = i386
os = mingw32
system = i386, mingw32
status = Patched
major = 2
minor = 8.0
year = 2008
month = 11
day = 04
svn rev = 46830
language = R
version.string = R version 2.8.0 Patched (2008-11-04 r46830)

Windows XP (build 2600) Service Pack 2

Locale:
LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252

Search Path:
.GlobalEnv, package:stats, package:graphics, package:grDevices, package:utils, package:datasets, package:methods, Autoloads, package:base


##### in R 2.7.2 patched


x0 <- "div 1-2 Veränderungen" # define a character string

write.csv(x0, 'chr.csv', row.names=FALSE) # write a csv-file with one line
rm(x0)

x <- read.csv('chr.csv', skip=0, header=TRUE, as.is=TRUE)$x # read in csv-file
x
x=="div 1-2 Veränderungen"
[1] TRUE
grep("div 1-2 Veränderungen", x)
[1] 1
grep("div 1-2 Veränderungen", x, value=TRUE)
[1] "div 1-2 Veränderungen"

unlink('chr.csv') # delete file

Version:
platform = i386-pc-mingw32
arch = i386
os = mingw32
system = i386, mingw32
status = Patched
major = 2
minor = 7.2
year = 2008
month = 09
day = 02
svn rev = 46486
language = R
version.string = R version 2.7.2 Patched (2008-09-02 r46486)

Windows XP (build 2600) Service Pack 2

Locale:
LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252

Search Path:
.GlobalEnv, package:stats, package:graphics, package:grDevices, package:utils, package:datasets, package:methods, Autoloads, package:base

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Brian D. Ripley,                  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to