You seem never to have told R or us what charset these data are in. I
think it is likely that they are being transferred in latin2 (like your
email), and you are running R in UTF-8 according to Sys.getlocale. So
what you need to do is to either
1) Run R in latin2
or
2) use iconv() to convert the results from latin2 to UTF-8.
UTF-8 is relatively new in the DBMS world. For ODBC, look at the bug
reports on the MySQL site. Using RODBC with UTF-8 locales is on my TODO
list, but of no urgency at all.
On Sun, 18 Feb 2007, janek0 wrote:
Dear List
In my short life as a beginning R-user i've encountered a following
problem that i'm unable to solve myself:
I have a database in MySQL containing table and field names as well as
some data containing Polish accentuated characters (like ????),
utf8-encoded. It works just fine with just any external query browser i
can find, jdbc, odbc, native, whatever. Also mysql is happy about my
charset. It also seems to be configured correctly:
mysql> status
--------------
mysql Ver 14.12 Distrib 5.0.24a, for pc-linux-gnu (i486) using readline
5.1
Connection id: 16
Current database:
Current user: [EMAIL PROTECTED]
SSL: Not in use
Current pager: stdout
Using outfile: ''
Using delimiter: ;
Server version: 5.0.24a-Debian_9-log
Protocol version: 10
Connection: Localhost via UNIX socket
Server characterset: utf8
Db characterset: utf8
Client characterset: utf8
Conn. characterset: utf8
UNIX socket: /var/run/mysqld/mysqld.sock
Uptime: 1 hour 37 min 17 sec
Threads: 2 Questions: 240 Slow queries: 0 Opens: 175 Flush tables: 1
Open tables: 64 Queries per second avg: 0.041
Yet if i use R's RODBC or RMySQL to connect to my database i can't see
these accentuated characters:
library(RODBC)
con <-odbcConnect("trybunal", uid="root", pwd="mypassword")
sqlTable(con)
the output (abridged) is like that:
TABLE_CAT TABLE_SCHEM TABLE_NAME TABLE_TYPE
REMARKS
14 trybunal Wyk?adnia innych przepis<f3>w TABLE MySQL
instead of <F3> i should see "รณ" and instead of ? a "?".
It is just the same if i use RMySQL instead of RODBC:
library(RMySQL)
con <-dbConnect(dbDriver("MySQL"), dbname="trybunal", username="root",
password="mypassword")
dbListTables(con)
the output (abridged) is like that:
[13] "Ustawa" "Wyk?adnia innych przepis<f3>w"
and if i use
dbReadTable(con, "Metryczka")
("Metryczka" being one table in the database) i get:
Error in make.names(as.character(names), allow_) :
invalid multibyte string 11
It works without error if i set LC_ALL to "C", but obviously without
Polish charset.
Strange thing is that Sys.getlocale() gives me
[1]"LC_CTYPE=pl_PL.UTF-8;LC_NUMERIC=C;LC_TIME=pl_PL.UTF-8;LC_COLLATE=pl_PL.UTF-8;
LC_MONETARY=pl_PL.UTF-8;LC_MESSAGES=pl_PL.UTF-8;
LC_PAPER=C;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=C;LC_IDENTIFICATION=C"
So it is utf8 all over the place. Also R works just fine with
read.table() if the table contains utf8-encoded chars. Thus the problem
is just with R-mysql connection. It seems therefore that dbi package
does not support non-ascii charsets.
Questions:
1. Is above conclusion correct or am i doing something wrong ?
2. If it is correct, is there any way to use table and field names as
they are now (with non-ascii chars) in my SQL queries (e.g. SELECT
`Wyk?adnia przedmiotu kontroli`.*) ?
I can live with Polish characters missing in the output if i have to but
i must address the database fields/tables. I can't change their names of
as this would mean rebuilding database frontend. I can't import data to
R by exporting the database and then importing it via read.table because
i want a "live" application and must keep things simple.
Any help will be greatly appreciated.
--
Brian D. Ripley, [EMAIL PROTECTED]
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.