On Mon, Jan 23, 2012 at 6:22 PM, Peter Vandenabeele <[email protected]>wrote:
> On Mon, Jan 23, 2012 at 6:10 PM, Henrique Testa <[email protected]>wrote: > >> Hi all, >> >> This problem is making me nuts. I am using Iconv.conv to convert from >> UTF-8 to ISO-8859-1: >> >> Iconv.conv('iso-8859-1//IGNORE', 'utf-8', @data).html_safe >> >> Both locally and on production the Ruby version is 1.9.3p0 (Rails >> 3.0.3), but it raises the following exception only on production: >> >> A Iconv::IllegalSequence occurred in newsletters#show: >> >> "e acompanham, na"... >> app/controllers/newsletters_controller.rb:19:in `conv' >> >> If I delete that part of the text, it raises again in other location. >> This is really strange because the contents locally and on production >> are exactly the same. Here is the text I am trying to convert (user >> created data): https://gist.github.com/1664294. Any ideas? >> >> Thanks! >> >> Henrique >> > > FWIW, I was able to reproduce the exception > > Iconv::IllegalSequence > > with a simple ruby program (rvm ruby 1.9.3). > > $ wget > https://raw.github.com/gist/1664294/17c4e28a1bf87b331c0425e9ddbb48284d096b00/gistfile1.txt > --2012-01-23 18:16:02-- > https://raw.github.com/gist/1664294/17c4e28a1bf87b331c0425e9ddbb48284d096b00/gistfile1.txt > Resolving raw.github.com... 207.97.227.243 > Connecting to raw.github.com|207.97.227.243|:443... connected. > HTTP request sent, awaiting response... 200 OK > Length: 50089 (49K) [text/plain] > Saving to: `gistfile1.txt' > > 100%[======================================>] 50,089 --.-K/s in > 0.08s > > 2012-01-23 18:16:03 (584 KB/s) - `gistfile1.txt' saved [50089/50089] > > $ cat convert.rb > @data > File.open('gistfile1.txt') do |f| > @data = f.read > end > > require 'iconv' > > Iconv.conv('iso-8859-1//IGNORE', 'utf-8', @data).html_safe > > $ ruby convert.rb > /home/peterv/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:36:in > `require': iconv will be deprecated in the future, use String#encode > instead. > convert.rb:7:in `conv': " style=\"padding-"... (Iconv::IllegalSequence) > from convert.rb:7:in `<main>' > > > Some relevant links: http://yehudakatz.com/2010/05/05/ruby-1-9-encodings-a-primer-and-the-solution-for-rails/ http://blog.grayproductions.net/articles/ruby_19s_string http://www.ruby-doc.org/core-1.9.3/Encoding/Converter.html#method-i-convert The code that seems to function fairly well is: $ cat convert.rb File.open('gistfile1.txt') do |f| f.readlines.each do |line| puts "###############################################" puts line.valid_encoding? # always true ec = Encoding::Converter.new("utf-8", "ISO-8859-1", :undef => :replace) ec.replacement = "UNDEFINED" puts ec.convert(line) end end $ ruby convert.rb > result This code converts your entire document (line by line) without throwing exceptions. The source text seems to be always valid UTF-8. But ... some UTF-8 constructs seem to be incompatible to translate to ISO-8859-1, e.g. the long dash in this piece of text: "... institucional do Grupo Zaffari – aliás ..." It is found back in the output with the code "UNDEFINED" that I defined. Without the :undef, that produced: convert.rb:9:in `convert': U+2013 from UTF-8 to ISO-8859-1 (Encoding::UndefinedConversionError) That seems quite plausible since UTF-8 has many different code points, but ISO-8859-1 is limited to 1 byte if I understand correctly. I hope this can put you on the right track, Peter -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.

