Re: [R] How to read.table with “Hebrew” c olumn names (in R)?

2010-03-19 Thread Petr PIKAL
Hi

 sessionInfo()
R version 2.11.0 Under development (unstable) (2010-03-09 r51229) 
i386-pc-mingw32 

locale:
[1] LC_COLLATE=Hebrew_Israel.1255  LC_CTYPE=Hebrew_Israel.1255 
[3] LC_MONETARY=Hebrew_Israel.1255 LC_NUMERIC=C 
[5] LC_TIME=Hebrew_Israel.1255 

attached base packages:
[1] stats grDevices datasets  grid  utils graphics  methods 
[8] base 

other attached packages:
[1] reshape_0.8.3  plyr_0.1.9 proto_0.3-8lattice_0.18-3 fun_1.0  

loaded via a namespace (and not attached):
[1] ggplot2_0.8.3 tools_2.11.0

Regards
Petr


r-help-boun...@r-project.org napsal dne 19.03.2010 08:35:59:

 Hello William, Ista and other R-help members,
 
 The code you suggested:
 read.table(http://www.talgalili.com/files/aa.txt,encoding=UTF-8;
 ,check.names=FALSE, header = T, sep = \t)
 Works for me the same way it does for you: I can read the data in
 (finally!), but some of the ways for using it fails (such as the 
printing,
 and the attempt at including column names in lm)
 
 So first thanks for the help!
 
 Second, could you please supply your  sessionInfo() ?
 I wonder how your locale is compared to that of Ista, since it looks as 
if
 for Ista there is no problem with the Hebrew.
 
 Thanks for helping!
 Tal
 
 
 
 
 Contact
 Details:---
 Contact me: tal.gal...@gmail.com |  972-52-7275845
 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
 www.r-statistics.com (English)
 
--
 
 
 
 
 On Fri, Mar 19, 2010 at 12:42 AM, William Dunlap wdun...@tibco.com 
wrote:
 
  I tried this on R 2.11.0 unstable (2010-03-07 r51225) using
  encoding=UTF-8 and check.names=FALSE in read.table().
  It seemed to basically work, except that the data.frame/matrix 
printing
  routine wants to print the Unicode codes for the characters
  in the names:
 
 data1 - read.table(http://www.talgalili.com/files/aa.txt;,
header = TRUE, sep = \t, encoding=UTF-8, check.names=FALSE)
 data1 # I see Unicode codes, presumably the correct ones
  U+05D0U+05D7U+05EA U+05E9U+05EAU+05D9U+05D9U+05DD
1   12   97
2  123  354
361
  U+05E9U+05DCU+05D5U+05E9
16
2   44
33
 colnames(data1) # I see Hebrew strings (in R the first starts with
  aleph)
[1] אחת   שתיים שלוש
 colnames(data)[1]
[1] אחת
 strsplit(colnames(data)[1], )[[1]][1]
[1] א
 data1[,שתיים]
[1]  97 354   1
 
  I'm writing this in Outlook in the English (American) locale
  and the copy-n-paste from the R gui window to the Outlook window
  of the Hebrew letters reversed the whole line of them (reversing
  the characters in each name and the names in the line), which I
  why I showed a subset of the names and a substring of the first name.
 
  However, when I try to use lm() with this data.frame then I run into
  trouble, which is probably the same problem as I see in the
  data.frame printing:
 
 lm(`שתיים` ~ `שלוש`)
Error: \u sequences not supported inside backticks (line 1)
 
  Bill Dunlap
  Spotfire, TIBCO Software
  wdunlap tibco.com
 
   -Original Message-
   From: r-help-boun...@r-project.org
   [mailto:r-help-boun...@r-project.org] On Behalf Of Tal Galili
   Sent: Thursday, March 18, 2010 2:41 PM
   To: r-help@r-project.org
   Subject: [R] How to read.table with “Hebrew” column names (in 
R)?
  
   (I am reposting this question after a few months without a
   solution...)
  
  
   Hi all,
  
   I am trying to read a .txt file, with Hebrew column names, but 
without
   success.
  
   I uploaded an example file to: http://www.talgalili.com/files/aa.txt
  
   And tried the command:
  
   read.table(http://www.talgalili.com/files/aa.txt;, header =
   T, sep = \t)
  
   This returns me with:
  
 X.ª X...ª.. X...œ
   1  12  97 6
   2 123 35444
   3   6   1 3
  
   Instead of:
  
   × ×—×ª שתיי×שלוש
   12  97  6
   123 354 44
   6   1   3
  
  
Trying to use something like:
  
   read.table(http://www.talgalili.com/files/aa.txt,fileEncodin
   g =iso8859-8)
  
   Has resulted in:
  
V1
   1  ?
   Warning messages:
   1: In read.table(http://www.talgalili.com/files/aa.txt;, 
fileEncoding
   = iso8859-8) :
  
 invalid input found on input connection
   'http://www.talgalili.com/files/aa.txt'
   2: In read.table(http://www.talgalili.com/files/aa.txt;, 
fileEncoding
   = iso8859-8) :
  
 incomplete final line found by readTableHeader on
   'http://www.talgalili.com/files/aa.txt'
  
 

Re: [R] How to read.table with “Hebrew” c olumn names (in R)?

2010-03-19 Thread William Dunlap
 
 

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

 




From: Tal Galili [mailto:tal.gal...@gmail.com] 
Sent: Friday, March 19, 2010 12:36 AM
To: William Dunlap; istaz...@gmail.com
Cc: r-help@r-project.org
Subject: Re: [R] How to read.table with “Hebrew” column names (in R)?


Hello William, Ista and other R-help members,

The code you suggested:
read.table(http://www.talgalili.com/files/aa.txt,encoding=UTF-8; 
,check.names=FALSE, header = T, sep = \t)
Works for me the same way it does for you: I can read the data in 
(finally!), but some of the ways for using it fails (such as the printing, and 
the attempt at including column names in lm)

So first thanks for the help!

Second, could you please supply your  sessionInfo() ?
I wonder how your locale is compared to that of Ista, since it looks as 
if for Ista there is no problem with the Hebrew.

I was on Windows XP (American/English edition, if that makes
any difference) using a precompiled copy of R 2.11.0 downloaded
from CRAN (the Simon Fraser mirror) and sessionInfo() and
i10n_info() say:

   sessionInfo()
  R version 2.11.0 Under development (unstable) (2010-03-07 r51225) 
  i386-pc-mingw32 

  locale:
  [1] LC_COLLATE=English_United States.1252 
  [2] LC_CTYPE=English_United States.1252   
  [3] LC_MONETARY=English_United States.1252
  [4] LC_NUMERIC=C  
  [5] LC_TIME=English_United States.1252

  attached base packages:
  [1] stats graphics  grDevices utils datasets  methods   base 

  loaded via a namespace (and not attached):
  [1] tcltk_2.11.0
   l10n_info()
  $MBCS
  [1] FALSE

  $`UTF-8`
  [1] FALSE
  
  $`Latin-1`
  [1] TRUE

  $codepage
  [1] 1252

I cannot set the locale to Hebrew (nor to en_US or
en_US.utf8).
   Sys.setlocale(LC_ALL, Hebrew)
  [1] 
  Warning message:
  In Sys.setlocale(LC_ALL, Hebrew) :
OS reports request to set locale to Hebrew cannot be honored

I'd like to learn more about the issue since we've had problems
reading UTF-8 encoded XML files and using the results in R on
Windows.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 


Thanks for helping!
Tal




Contact 
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) 
| www.r-statistics.com (English)

--





On Fri, Mar 19, 2010 at 12:42 AM, William Dunlap wdun...@tibco.com 
wrote:


I tried this on R 2.11.0 unstable (2010-03-07 r51225) using
encoding=UTF-8 and check.names=FALSE in read.table().
It seemed to basically work, except that the data.frame/matrix 
printing
routine wants to print the Unicode codes for the characters
in the names:

   data1 - read.table(http://www.talgalili.com/files/aa.txt;,
  header = TRUE, sep = \t, encoding=UTF-8, 
check.names=FALSE)
   data1 # I see Unicode codes, presumably the correct ones
U+05D0U+05D7U+05EA 
U+05E9U+05EAU+05D9U+05D9U+05DD
  1   12
   97
  2  123
  354
  36
1
U+05E9U+05DCU+05D5U+05E9
  16
  2   44
  33
   colnames(data1) # I see Hebrew strings (in R the first 
starts with aleph)
  [1] אחת   שתיים שלוש
   colnames(data)[1]
  [1] אחת
   strsplit(colnames(data)[1], )[[1]][1]
  [1] א
   data1[,שתיים]
  [1]  97 354   1

I'm writing this in Outlook in the English (American) locale
and the copy-n-paste from the R gui window to the Outlook window
of the Hebrew letters reversed the whole line of them (reversing
the characters in each name and the names in the line), which I
why I showed a subset of the names and a substring of the first 
name.

However, when I try to use lm() with this data.frame then I run 
into
trouble, which is probably the same problem as I see in the
data.frame printing:

   lm(`שתיים` ~ `שלוש`)
  

Re: [R] How to read.table with “Hebrew” c olumn names (in R)?

2010-03-18 Thread William Dunlap
I tried this on R 2.11.0 unstable (2010-03-07 r51225) using
encoding=UTF-8 and check.names=FALSE in read.table().
It seemed to basically work, except that the data.frame/matrix printing
routine wants to print the Unicode codes for the characters
in the names:

data1 - read.table(http://www.talgalili.com/files/aa.txt;,
   header = TRUE, sep = \t, encoding=UTF-8, check.names=FALSE)
data1 # I see Unicode codes, presumably the correct ones
 U+05D0U+05D7U+05EA U+05E9U+05EAU+05D9U+05D9U+05DD
   1   12   97
   2  123  354
   361
 U+05E9U+05DCU+05D5U+05E9
   16
   2   44
   33 
colnames(data1) # I see Hebrew strings (in R the first starts with aleph)
   [1] אחת   שתיים שלוש
colnames(data)[1]
   [1] אחת 
strsplit(colnames(data)[1], )[[1]][1]
   [1] א
data1[,שתיים]
   [1]  97 354   1

I'm writing this in Outlook in the English (American) locale
and the copy-n-paste from the R gui window to the Outlook window
of the Hebrew letters reversed the whole line of them (reversing
the characters in each name and the names in the line), which I
why I showed a subset of the names and a substring of the first name.

However, when I try to use lm() with this data.frame then I run into
trouble, which is probably the same problem as I see in the
data.frame printing:

lm(`שתיים` ~ `שלוש`)
   Error: \u sequences not supported inside backticks (line 1)

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Tal Galili
 Sent: Thursday, March 18, 2010 2:41 PM
 To: r-help@r-project.org
 Subject: [R] How to read.table with “Hebrew” column names (in R)?
 
 (I am reposting this question after a few months without a 
 solution...)
 
 
 Hi all,
 
 I am trying to read a .txt file, with Hebrew column names, but without
 success.
 
 I uploaded an example file to: http://www.talgalili.com/files/aa.txt
 
 And tried the command:
 
 read.table(http://www.talgalili.com/files/aa.txt;, header = 
 T, sep = \t)
 
 This returns me with:
 
   X.ª X...ª.. X...œ
 1  12  97 6
 2 123 35444
 3   6   1 3
 
 Instead of:
 
 אחת שתיים   שלוש
 12  97  6
 123 354 44
 6   1   3
 
 
  Trying to use something like:
 
 read.table(http://www.talgalili.com/files/aa.txt,fileEncodin
 g =iso8859-8)
 
 Has resulted in:
 
  V1
 1  ?
 Warning messages:
 1: In read.table(http://www.talgalili.com/files/aa.txt;, fileEncoding
 = iso8859-8) :
 
   invalid input found on input connection
 'http://www.talgalili.com/files/aa.txt'
 2: In read.table(http://www.talgalili.com/files/aa.txt;, fileEncoding
 = iso8859-8) :
 
   incomplete final line found by readTableHeader on
 'http://www.talgalili.com/files/aa.txt'
 
 While also trying this:
 
 Sys.setlocale(LC_ALL, en_US.UTF-8)
 
 Or this:
 
 Sys.setlocale(LC_ALL, 
 en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8)
 
 Get's me this:
 
 [1] 
 Warning message:
 In Sys.setlocale(LC_ALL, en_US.UTF-8) :
 
   OS reports request to set locale to en_US.UTF-8 cannot be honored
 
 
 
 My output for:
 
 l10n_info()
 
 Is:
 
 $MBCS
 [1] FALSE
 
 $`UTF-8`
 [1] FALSE
 
 $`Latin-1`
 [1] TRUE
 
 $codepage
 [1] 1252
 
 And for:
 
 Sys.getlocale()
 
 Is:
 
 [1] LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
 States.1252;LC_MONETARY=English_United
 States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
 
 Finally, here is the  sessionInfo()
 
 R version 2.10.1 (2009-12-14)
 
 i386-pc-mingw32
 
 locale:
 [1] LC_COLLATE=English_United States.1255  LC_CTYPE=English_United
 States.1252LC_MONETARY=English_United States.1252 LC_NUMERIC=C
 [5] LC_TIME=English_United States.1252
 
 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base
 
 loaded via a namespace (and not attached):
 [1] tools_2.10.1
 
 
 Any suggestion or clarification will be appreciated.
 
 
 
 Best,
 
 Tal
 
 Contact
 Details:---
 Contact me: tal.gal...@gmail.com |  972-52-7275845
 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il 
 (Hebrew) |
 www.r-statistics.com (English)
 --
 
 
   [[alternative HTML version deleted]]
 
 
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to read.table with “Hebrew” c olumn names (in R)?

2010-03-18 Thread William Dunlap
My test was on Windows XP.  On an old Linux distro
I have access to (Ubuntu 8.04.3 hardy) it does work
better, although the putty terminal emulator (on
the Windows side) reverses all the lines
containing any Hebrew text (pushing them against
the right edge of the terminal window).

When I look at your output in Outlook I also
see reversed strings and lines, but that is probably
a Windows problem.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

 -Original Message-
 From: Ista Zahn [mailto:istaz...@gmail.com] 
 Sent: Thursday, March 18, 2010 4:01 PM
 To: William Dunlap
 Cc: Tal Galili; r-help@r-project.org
 Subject: Re: [R] How to read.table with “Hebrew” column names (in R)?
 
 Seems to work fine on my machine:
 
  data1 - read.table(http://www.talgalili.com/files/aa.txt;,
 +   header = TRUE, sep = \t, encoding=UTF-8, 
 check.names=FALSE)
  data1
   אחת שתיים שלוש
 1  12976
 2 123   354   44
 3   6 13
  colnames(data1)
 [1] אחת   שתיים שלוש
  colnames(data1)[1]
 [1] אחת
  strsplit(colnames(data1)[1], )[[1]][1]
 [1] א
  data1[,שתיים]
 [1]  97 354   1
  lm(`שתיים` ~ `שלוש`, data=data1)
 
 Call:
 lm(formula = שתיים ~ שלוש, data = data1)
 
 Coefficients:
 (Intercept) שלוש
  12.4067.826
 
  sessionInfo()
 R version 2.10.1 (2009-12-14)
 i686-pc-linux-gnu
 
 locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
 
 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base
  Sys.info()
sysnamerelease
Linux2.6.31.12-0.1-default
version   nodename
 #1 SMP 2010-01-27 08:20:11 +0100   linux-46fj
machine  login
 i686  unknown
   user
izahn
 
 
 -Ista
 
 On Thu, Mar 18, 2010 at 6:42 PM, William Dunlap 
 wdun...@tibco.com wrote:
  I tried this on R 2.11.0 unstable (2010-03-07 r51225) using
  encoding=UTF-8 and check.names=FALSE in read.table().
  It seemed to basically work, except that the 
 data.frame/matrix printing
  routine wants to print the Unicode codes for the characters
  in the names:
 
     data1 - read.table(http://www.talgalili.com/files/aa.txt;,
        header = TRUE, sep = \t, encoding=UTF-8, 
 check.names=FALSE)
     data1 # I see Unicode codes, presumably the correct ones
      U+05D0U+05D7U+05EA 
 U+05E9U+05EAU+05D9U+05D9U+05DD
    1                       12                                
        97
    2                      123                                
       354
    3                        6                                
         1
      U+05E9U+05DCU+05D5U+05E9
    1                                6
    2                               44
    3                                3
     colnames(data1) # I see Hebrew strings (in R the first 
 starts with aleph)
    [1] אחת   שתיים שלוש
     colnames(data)[1]
    [1] אחת
     strsplit(colnames(data)[1], )[[1]][1]
    [1] א
     data1[,שתיים]
    [1]  97 354   1
 
  I'm writing this in Outlook in the English (American) locale
  and the copy-n-paste from the R gui window to the Outlook window
  of the Hebrew letters reversed the whole line of them (reversing
  the characters in each name and the names in the line), which I
  why I showed a subset of the names and a substring of the 
 first name.
 
  However, when I try to use lm() with this data.frame then I run into
  trouble, which is probably the same problem as I see in the
  data.frame printing:
 
     lm(`שתיים` ~ `שלוש`)
    Error: \u sequences not supported inside backticks (line 1)
 
  Bill Dunlap
  Spotfire, TIBCO Software
  wdunlap tibco.com
 
  -Original Message-
  From: r-help-boun...@r-project.org
  [mailto:r-help-boun...@r-project.org] On Behalf Of Tal Galili
  Sent: Thursday, March 18, 2010 2:41 PM
  To: r-help@r-project.org
  Subject: [R] How to read.table with “Hebrew” column names (in R)?
 
  (I am reposting this question after a few months without a
  solution...)
 
 
  Hi all,
 
  I am trying to read a .txt file, with Hebrew column names, 
 but without
  success.
 
  I uploaded an example file to: 
 http://www.talgalili.com/files/aa.txt
 
  And tried the command:
 
  read.table(http://www.talgalili.com/files/aa.txt;, header =
  T, sep = \t)
 
  This returns me with:
 
    X.ª X...ª.. X...Å“
  1      12          97         6
  2     123         354        44
  3       6           1         3
 
  Instead of:
 
  × ×—×ª ×©×ª×™×™×    ×©×œ×•×©
  12  97  6
  123 354 44
  6   1   3