Re: [R] How to read.table with “Hebrew” c olumn names (in R)?
Hi sessionInfo() R version 2.11.0 Under development (unstable) (2010-03-09 r51229) i386-pc-mingw32 locale: [1] LC_COLLATE=Hebrew_Israel.1255 LC_CTYPE=Hebrew_Israel.1255 [3] LC_MONETARY=Hebrew_Israel.1255 LC_NUMERIC=C [5] LC_TIME=Hebrew_Israel.1255 attached base packages: [1] stats grDevices datasets grid utils graphics methods [8] base other attached packages: [1] reshape_0.8.3 plyr_0.1.9 proto_0.3-8lattice_0.18-3 fun_1.0 loaded via a namespace (and not attached): [1] ggplot2_0.8.3 tools_2.11.0 Regards Petr r-help-boun...@r-project.org napsal dne 19.03.2010 08:35:59: Hello William, Ista and other R-help members, The code you suggested: read.table(http://www.talgalili.com/files/aa.txt,encoding=UTF-8; ,check.names=FALSE, header = T, sep = \t) Works for me the same way it does for you: I can read the data in (finally!), but some of the ways for using it fails (such as the printing, and the attempt at including column names in lm) So first thanks for the help! Second, could you please supply your sessionInfo() ? I wonder how your locale is compared to that of Ista, since it looks as if for Ista there is no problem with the Hebrew. Thanks for helping! Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Fri, Mar 19, 2010 at 12:42 AM, William Dunlap wdun...@tibco.com wrote: I tried this on R 2.11.0 unstable (2010-03-07 r51225) using encoding=UTF-8 and check.names=FALSE in read.table(). It seemed to basically work, except that the data.frame/matrix printing routine wants to print the Unicode codes for the characters in the names: data1 - read.table(http://www.talgalili.com/files/aa.txt;, header = TRUE, sep = \t, encoding=UTF-8, check.names=FALSE) data1 # I see Unicode codes, presumably the correct ones U+05D0U+05D7U+05EA U+05E9U+05EAU+05D9U+05D9U+05DD 1 12 97 2 123 354 361 U+05E9U+05DCU+05D5U+05E9 16 2 44 33 colnames(data1) # I see Hebrew strings (in R the first starts with aleph) [1] ×חת שתיים שלוש colnames(data)[1] [1] ×חת strsplit(colnames(data)[1], )[[1]][1] [1] × data1[,שתיים] [1] 97 354 1 I'm writing this in Outlook in the English (American) locale and the copy-n-paste from the R gui window to the Outlook window of the Hebrew letters reversed the whole line of them (reversing the characters in each name and the names in the line), which I why I showed a subset of the names and a substring of the first name. However, when I try to use lm() with this data.frame then I run into trouble, which is probably the same problem as I see in the data.frame printing: lm(`שתיים` ~ `שלוש`) Error: \u sequences not supported inside backticks (line 1) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Tal Galili Sent: Thursday, March 18, 2010 2:41 PM To: r-help@r-project.org Subject: [R] How to read.table with “Hebrew” column names (in R)? (I am reposting this question after a few months without a solution...) Hi all, I am trying to read a .txt file, with Hebrew column names, but without success. I uploaded an example file to: http://www.talgalili.com/files/aa.txt And tried the command: read.table(http://www.talgalili.com/files/aa.txt;, header = T, sep = \t) This returns me with: X.Ă‚ÂŞ X...Ă‚ÂŞ.. X...Å“ 1 12 97 6 2 123 35444 3 6 1 3 Instead of: Ă— חת שתיי×שלוש 12 97 6 123 354 44 6 1 3 Trying to use something like: read.table(http://www.talgalili.com/files/aa.txt,fileEncodin g =iso8859-8) Has resulted in: V1 1 ? Warning messages: 1: In read.table(http://www.talgalili.com/files/aa.txt;, fileEncoding = iso8859-8) : invalid input found on input connection 'http://www.talgalili.com/files/aa.txt' 2: In read.table(http://www.talgalili.com/files/aa.txt;, fileEncoding = iso8859-8) : incomplete final line found by readTableHeader on 'http://www.talgalili.com/files/aa.txt'
Re: [R] How to read.table with “Hebrew” c olumn names (in R)?
Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com From: Tal Galili [mailto:tal.gal...@gmail.com] Sent: Friday, March 19, 2010 12:36 AM To: William Dunlap; istaz...@gmail.com Cc: r-help@r-project.org Subject: Re: [R] How to read.table with “Hebrew” column names (in R)? Hello William, Ista and other R-help members, The code you suggested: read.table(http://www.talgalili.com/files/aa.txt,encoding=UTF-8; ,check.names=FALSE, header = T, sep = \t) Works for me the same way it does for you: I can read the data in (finally!), but some of the ways for using it fails (such as the printing, and the attempt at including column names in lm) So first thanks for the help! Second, could you please supply your sessionInfo() ? I wonder how your locale is compared to that of Ista, since it looks as if for Ista there is no problem with the Hebrew. I was on Windows XP (American/English edition, if that makes any difference) using a precompiled copy of R 2.11.0 downloaded from CRAN (the Simon Fraser mirror) and sessionInfo() and i10n_info() say: sessionInfo() R version 2.11.0 Under development (unstable) (2010-03-07 r51225) i386-pc-mingw32 locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tcltk_2.11.0 l10n_info() $MBCS [1] FALSE $`UTF-8` [1] FALSE $`Latin-1` [1] TRUE $codepage [1] 1252 I cannot set the locale to Hebrew (nor to en_US or en_US.utf8). Sys.setlocale(LC_ALL, Hebrew) [1] Warning message: In Sys.setlocale(LC_ALL, Hebrew) : OS reports request to set locale to Hebrew cannot be honored I'd like to learn more about the issue since we've had problems reading UTF-8 encoded XML files and using the results in R on Windows. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com Thanks for helping! Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Fri, Mar 19, 2010 at 12:42 AM, William Dunlap wdun...@tibco.com wrote: I tried this on R 2.11.0 unstable (2010-03-07 r51225) using encoding=UTF-8 and check.names=FALSE in read.table(). It seemed to basically work, except that the data.frame/matrix printing routine wants to print the Unicode codes for the characters in the names: data1 - read.table(http://www.talgalili.com/files/aa.txt;, header = TRUE, sep = \t, encoding=UTF-8, check.names=FALSE) data1 # I see Unicode codes, presumably the correct ones U+05D0U+05D7U+05EA U+05E9U+05EAU+05D9U+05D9U+05DD 1 12 97 2 123 354 36 1 U+05E9U+05DCU+05D5U+05E9 16 2 44 33 colnames(data1) # I see Hebrew strings (in R the first starts with aleph) [1] אחת שתיים שלוש colnames(data)[1] [1] אחת strsplit(colnames(data)[1], )[[1]][1] [1] א data1[,שתיים] [1] 97 354 1 I'm writing this in Outlook in the English (American) locale and the copy-n-paste from the R gui window to the Outlook window of the Hebrew letters reversed the whole line of them (reversing the characters in each name and the names in the line), which I why I showed a subset of the names and a substring of the first name. However, when I try to use lm() with this data.frame then I run into trouble, which is probably the same problem as I see in the data.frame printing: lm(`שתיים` ~ `שלוש`)
Re: [R] How to read.table with “Hebrew” c olumn names (in R)?
I tried this on R 2.11.0 unstable (2010-03-07 r51225) using encoding=UTF-8 and check.names=FALSE in read.table(). It seemed to basically work, except that the data.frame/matrix printing routine wants to print the Unicode codes for the characters in the names: data1 - read.table(http://www.talgalili.com/files/aa.txt;, header = TRUE, sep = \t, encoding=UTF-8, check.names=FALSE) data1 # I see Unicode codes, presumably the correct ones U+05D0U+05D7U+05EA U+05E9U+05EAU+05D9U+05D9U+05DD 1 12 97 2 123 354 361 U+05E9U+05DCU+05D5U+05E9 16 2 44 33 colnames(data1) # I see Hebrew strings (in R the first starts with aleph) [1] אחת שתיים שלוש colnames(data)[1] [1] אחת strsplit(colnames(data)[1], )[[1]][1] [1] א data1[,שתיים] [1] 97 354 1 I'm writing this in Outlook in the English (American) locale and the copy-n-paste from the R gui window to the Outlook window of the Hebrew letters reversed the whole line of them (reversing the characters in each name and the names in the line), which I why I showed a subset of the names and a substring of the first name. However, when I try to use lm() with this data.frame then I run into trouble, which is probably the same problem as I see in the data.frame printing: lm(`שתיים` ~ `שלוש`) Error: \u sequences not supported inside backticks (line 1) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Tal Galili Sent: Thursday, March 18, 2010 2:41 PM To: r-help@r-project.org Subject: [R] How to read.table with “Hebrew” column names (in R)? (I am reposting this question after a few months without a solution...) Hi all, I am trying to read a .txt file, with Hebrew column names, but without success. I uploaded an example file to: http://www.talgalili.com/files/aa.txt And tried the command: read.table(http://www.talgalili.com/files/aa.txt;, header = T, sep = \t) This returns me with: X.ª X...ª.. X...Å“ 1 12 97 6 2 123 35444 3 6 1 3 Instead of: ×חת ×©×ª×™×™× ×©×œ×•×© 12 97 6 123 354 44 6 1 3 Trying to use something like: read.table(http://www.talgalili.com/files/aa.txt,fileEncodin g =iso8859-8) Has resulted in: V1 1 ? Warning messages: 1: In read.table(http://www.talgalili.com/files/aa.txt;, fileEncoding = iso8859-8) : invalid input found on input connection 'http://www.talgalili.com/files/aa.txt' 2: In read.table(http://www.talgalili.com/files/aa.txt;, fileEncoding = iso8859-8) : incomplete final line found by readTableHeader on 'http://www.talgalili.com/files/aa.txt' While also trying this: Sys.setlocale(LC_ALL, en_US.UTF-8) Or this: Sys.setlocale(LC_ALL, en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8) Get's me this: [1] Warning message: In Sys.setlocale(LC_ALL, en_US.UTF-8) : OS reports request to set locale to en_US.UTF-8 cannot be honored My output for: l10n_info() Is: $MBCS [1] FALSE $`UTF-8` [1] FALSE $`Latin-1` [1] TRUE $codepage [1] 1252 And for: Sys.getlocale() Is: [1] LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 Finally, here is the sessionInfo() R version 2.10.1 (2009-12-14) i386-pc-mingw32 locale: [1] LC_COLLATE=English_United States.1255 LC_CTYPE=English_United States.1252LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_2.10.1 Any suggestion or clarification will be appreciated. Best, Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to read.table with “Hebrew” c olumn names (in R)?
My test was on Windows XP. On an old Linux distro I have access to (Ubuntu 8.04.3 hardy) it does work better, although the putty terminal emulator (on the Windows side) reverses all the lines containing any Hebrew text (pushing them against the right edge of the terminal window). When I look at your output in Outlook I also see reversed strings and lines, but that is probably a Windows problem. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: Ista Zahn [mailto:istaz...@gmail.com] Sent: Thursday, March 18, 2010 4:01 PM To: William Dunlap Cc: Tal Galili; r-help@r-project.org Subject: Re: [R] How to read.table with “Hebrew” column names (in R)? Seems to work fine on my machine: data1 - read.table(http://www.talgalili.com/files/aa.txt;, + header = TRUE, sep = \t, encoding=UTF-8, check.names=FALSE) data1 אחת שתיים שלוש 1 12976 2 123 354 44 3 6 13 colnames(data1) [1] אחת שתיים שלוש colnames(data1)[1] [1] אחת strsplit(colnames(data1)[1], )[[1]][1] [1] א data1[,שתיים] [1] 97 354 1 lm(`שתיים` ~ `שלוש`, data=data1) Call: lm(formula = שתיים ~ שלוש, data = data1) Coefficients: (Intercept) שלוש 12.4067.826 sessionInfo() R version 2.10.1 (2009-12-14) i686-pc-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base Sys.info() sysnamerelease Linux2.6.31.12-0.1-default version nodename #1 SMP 2010-01-27 08:20:11 +0100 linux-46fj machine login i686 unknown user izahn -Ista On Thu, Mar 18, 2010 at 6:42 PM, William Dunlap wdun...@tibco.com wrote: I tried this on R 2.11.0 unstable (2010-03-07 r51225) using encoding=UTF-8 and check.names=FALSE in read.table(). It seemed to basically work, except that the data.frame/matrix printing routine wants to print the Unicode codes for the characters in the names: data1 - read.table(http://www.talgalili.com/files/aa.txt;, header = TRUE, sep = \t, encoding=UTF-8, check.names=FALSE) data1 # I see Unicode codes, presumably the correct ones U+05D0U+05D7U+05EA U+05E9U+05EAU+05D9U+05D9U+05DD 1 12 97 2 123 354 3 6 1 U+05E9U+05DCU+05D5U+05E9 1 6 2 44 3 3 colnames(data1) # I see Hebrew strings (in R the first starts with aleph) [1] אחת שתיים שלוש colnames(data)[1] [1] אחת strsplit(colnames(data)[1], )[[1]][1] [1] א data1[,שתיים] [1] 97 354 1 I'm writing this in Outlook in the English (American) locale and the copy-n-paste from the R gui window to the Outlook window of the Hebrew letters reversed the whole line of them (reversing the characters in each name and the names in the line), which I why I showed a subset of the names and a substring of the first name. However, when I try to use lm() with this data.frame then I run into trouble, which is probably the same problem as I see in the data.frame printing: lm(`שתיים` ~ `שלוש`) Error: \u sequences not supported inside backticks (line 1) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Tal Galili Sent: Thursday, March 18, 2010 2:41 PM To: r-help@r-project.org Subject: [R] How to read.table with “Hebrew” column names (in R)? (I am reposting this question after a few months without a solution...) Hi all, I am trying to read a .txt file, with Hebrew column names, but without success. I uploaded an example file to: http://www.talgalili.com/files/aa.txt And tried the command: read.table(http://www.talgalili.com/files/aa.txt;, header = T, sep = \t) This returns me with: X.ª X...ª.. X...Å“ 1 12 97 6 2 123 354 44 3 6 1 3 Instead of: × ×—×ª ×©×ª×™×™× ×©×œ×•×© 12 97 6 123 354 44 6 1 3