Sorry, forgot to cc the list...
Begin forwarded message:
> From: Ivan Alves <[EMAIL PROTECTED]>
> Date: 28 September 2006 18:45:59 GMT+02:00
> To: Simon Urbanek <[EMAIL PROTECTED]>
> Subject: Re: [R-SIG-Mac] A bug in the Mac GUI and a request
>
> Many thanks for the prompt response, dear Simon
>
> On 28 Sep 2006, at 17:24, Simon Urbanek wrote:
>
>> Ok, we'll have a look. There are issues with Cocoa since 10.4 that
>> are still plaguing us ...
>>
> Again, thanks
>>
>>> Now the request:
>>> UTF-8 encoding support. Since I have started working with csv
>>> files encoded in UTF-8 (which appears to be the native.enc in OS
>>> X) I have had quite a headache subsetting dataframes by values of
>>> a given factor (with text encoded in UTF-8 I pressume). For
>>> instance "Compañia" appears as "Compa\361ia".
>>
>> The problem here is that your are *not* using UTF-8. UTF-8
>> encoding of "ñ" is (if printed as octal codes) "\303\261". So what
>> you see is something that is in a different encoding - AFAIC it is
>> latin1 as you can see easily:
>>
>> In command-line R (without locale):
>> > iconv("Compa\361ia","latin1","UTF-8")
>> [1] "Compa\303\261ia"
>>
>> In the GUI:
>> > iconv("Compa\361ia","latin1","UTF-8")
>> [1] "Compañia"
>>
> Here is the problem (I thought), since I get
>
> > iconv("Compa\361ia","latin1","UTF-8")
> [1] "Compa\303\261ia"
>
> in both the command-line R (in Terminal) and in the GUI.
>>
>> So when you are loading your table, you should be using something
>> like
>> read.table(file("my.table",encoding="latin1"))
>> to get the correct result.
>>
> This was among the many things that I tried and I got
>
> test <- read.table(file("~/Projects/Isis/data/large_general.csv",
> encoding="latin1"))
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines,
> na.strings, :
> line 1 did not have 93 elements
> In addition: Warning messages:
> 1: invalid input found on input connection '~/Projects/Isis/data/
> large_general.csv'
> 2: incomplete final line found by readTableHeader on '~/Projects/
> Isis/data/large_general.csv'
>
> and the reading stops, my guess consequence of the first "return"
> that it hits. With read.csv at least one error dissapears, but the
> output is the same
>
> > test <- read.csv(file("~/Projects/Isis/data/large_general.csv",
> encoding="latin1"))
> Warning messages:
> 1: invalid input found on input connection '~/Projects/Isis/data/
> large_general.csv'
> 2: incomplete final line found by readTableHeader on '~/Projects/
> Isis/data/large_general.csv'
>
> I am confused.
>>
>> You may want to read Brian Ripley's article about encodings and
>> localization in the R News Vol 5/1.
>>
> Will do, many thanks.
>>
>> Cheers,
>> Simon
>
[[alternative HTML version deleted]]
_______________________________________________
R-SIG-Mac mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/r-sig-mac