On Mon, Jan 23, 2012 at 6:22 PM, Peter Vandenabeele
<[email protected]>wrote:

> On Mon, Jan 23, 2012 at 6:10 PM, Henrique Testa <[email protected]>wrote:
>
>> Hi all,
>>
>> This problem is making me nuts. I am using Iconv.conv to convert from
>> UTF-8 to ISO-8859-1:
>>
>> Iconv.conv('iso-8859-1//IGNORE', 'utf-8', @data).html_safe
>>
>> Both locally and on production the Ruby version is 1.9.3p0 (Rails
>> 3.0.3), but it raises the following exception only on production:
>>
>> A Iconv::IllegalSequence occurred in newsletters#show:
>>
>>  "e acompanham, na"...
>>  app/controllers/newsletters_controller.rb:19:in `conv'
>>
>> If I delete that part of the text, it raises again in other location.
>> This is really strange because the contents locally and on production
>> are exactly the same. Here is the text I am trying to convert (user
>> created data): https://gist.github.com/1664294.  Any ideas?
>>
>> Thanks!
>>
>> Henrique
>>
>
> FWIW, I was able to reproduce the exception
>
>   Iconv::IllegalSequence
>
> with a simple ruby program (rvm ruby 1.9.3).
>
> $ wget
> https://raw.github.com/gist/1664294/17c4e28a1bf87b331c0425e9ddbb48284d096b00/gistfile1.txt
> --2012-01-23 18:16:02--
> https://raw.github.com/gist/1664294/17c4e28a1bf87b331c0425e9ddbb48284d096b00/gistfile1.txt
> Resolving raw.github.com... 207.97.227.243
> Connecting to raw.github.com|207.97.227.243|:443... connected.
> HTTP request sent, awaiting response... 200 OK
> Length: 50089 (49K) [text/plain]
> Saving to: `gistfile1.txt'
>
> 100%[======================================>] 50,089      --.-K/s   in
> 0.08s
>
> 2012-01-23 18:16:03 (584 KB/s) - `gistfile1.txt' saved [50089/50089]
>
> $ cat convert.rb
> @data
> File.open('gistfile1.txt') do |f|
>   @data = f.read
> end
>
> require 'iconv'
>
> Iconv.conv('iso-8859-1//IGNORE', 'utf-8', @data).html_safe
>
> $ ruby convert.rb
> /home/peterv/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:36:in
> `require': iconv will be deprecated in the future, use String#encode
> instead.
> convert.rb:7:in `conv': " style=\"padding-"... (Iconv::IllegalSequence)
>     from convert.rb:7:in `<main>'
>
>
>

Some relevant links:

http://yehudakatz.com/2010/05/05/ruby-1-9-encodings-a-primer-and-the-solution-for-rails/
http://blog.grayproductions.net/articles/ruby_19s_string
http://www.ruby-doc.org/core-1.9.3/Encoding/Converter.html#method-i-convert

The code that seems to function fairly well is:

$ cat convert.rb
File.open('gistfile1.txt') do |f|
  f.readlines.each do |line|
    puts "###############################################"
    puts line.valid_encoding? # always true

    ec = Encoding::Converter.new("utf-8", "ISO-8859-1", :undef => :replace)
    ec.replacement = "UNDEFINED"
    puts ec.convert(line)
  end
end

$ ruby convert.rb  > result

This code converts your entire document (line by line)
without throwing exceptions.

The source text seems to be always valid UTF-8.

But ... some UTF-8 constructs seem to be incompatible to translate
to ISO-8859-1, e.g. the long dash in this piece of text:

  "... institucional do Grupo Zaffari – aliás ..."

It is found back in the output with the code "UNDEFINED" that I defined.

Without the :undef, that produced:

  convert.rb:9:in `convert': U+2013 from UTF-8 to ISO-8859-1
(Encoding::UndefinedConversionError)

That seems quite plausible since UTF-8 has many different code points,
but ISO-8859-1 is limited to 1 byte if I understand correctly.

I hope this can put you on the right track,

Peter

-- 
You received this message because you are subscribed to the Google Groups "Ruby 
on Rails: Talk" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/rubyonrails-talk?hl=en.

Reply via email to