Dear colleagues,
With the help of a colleague of mine here in Helsinki (Seppo Nyrkkö)
who looked at the innards of the R source code for Mac it turned out
that this was in the end indeed an issue concerning the Mac locale and
its settings and not R.
Though we had tried this earlier by changing the LANG variable to
'fi_FI', we hadn't looked hard enough in the available encodings (with
locale -a) to select the exactly correct value, being:
LANG=fi_FI.IS08859-1;
export LANG;
With this configuration R was able to happily read in my original
table with the Scandinavian characters in the header, without no fuss.
Thanks for your advice, and wishing all a good Midsummer,
-Antti Arppe
On Mon, 12 Jun 2006 [EMAIL PROTECTED] wrote:
1. Reading in a table originally with ISO-latin1 encoding
(Linux) (J ? rg Beyer)
----------------------------------------------------------------------
Message: 1
Date: Sun, 11 Jun 2006 13:30:35 +0200
From: J ? rg Beyer <[EMAIL PROTECTED]>
Subject: [R-SIG-Mac] Reading in a table originally with ISO-latin1
encoding (Linux)
To: <[email protected]>
Message-ID: <[EMAIL PROTECTED]>
Content-Type: text/plain; charset="US-ASCII"
Antti,
I think I can offer some help. I can add the following for
R 2.1.1 w/ R.app 1.14
Mac OS X 10.4.6, PPC G4/400 (Oct. 1999)
If you are only interested in the solution, you can skip the following
report and jump to the last paragraph.
A tabulated data file with German umlauts in some column headers shows the
same behavior as yours, if I use your command
data <- read.table(file("<filename>", encoding="<encoding>"),
header=TRUE)
or these variations
data <- read.table(file("<filename>"), header=TRUE)
data <- read.table(file("<filename>"), header=FALSE)
In all these case, the same strange behavior results
-- respectless whether the file is encoded as "latin1", "utf-8" or the
generic "Mac Roman"
-- respectless whether you choose UTF-8 with or without BOM
-- respectless whether you choose Mac, DOS, or UNIX line feeds
-- respectless whether you choose Apple's TextEdit, TextWrangler or BBEdit
for setting/changing the encoding (I prefer the latter for its fine tuning,
automation, and scripting features)
-- respectless whether you try to read the file with R on the terminal, or
with R.app (the Mac GUI)
-- strange enough, R *croaks about "incomplete lines"* even if there are no
accented characters (or multibyte characters) in your data file at all,
*just plain ASCII*... indicating that the problem may be located deeper in
the parsing process, not in the character set.
At this point I read (again) the "read.table" help page and found it a bit
misleading -- the sep=""-option reads as if by default the file is read line
by line (1st step), and then every line is split into columns wherever a
stream of white space is found (2nd step).
I think this is not the case. If you modify your command and explicitly add
the separator option (tab, in this case)
data <- read.table(file("<filename>", encoding="<encoding>"), sep="\t",
header=TRUE)
my file reads in without any problems, be it Latin-1 or UTF-8 (not sure
how to handle Mac Roman files, at the moment).
But keep in mind that multibyte characters are possible, but not recommended
in variable names (or column headers).
Hope this helps.
Cheers
Joerg
------------------------------
_______________________________________________
R-SIG-Mac mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/r-sig-mac