You probably need to figure out the actual encoding and explicitly convert from that to UTF-8. This is a snippet of code that I have in a real project:

      open(DATAFEED_URI) do |file|
        local_filename = local_path
        local_filename.open('w') do |outf|
          file.each do |line|
            begin
outf.write Iconv.conv('UTF-8//TRANSLIT//IGNORE', 'WINDOWS-1252', line)
            rescue Iconv::IllegalSequence => e
shlogger.error { "#{DATAFEED_URI} line #{file.lineno} could not be translated:\n#{line}" }
            end
          end
        end
        local_filename.open('r') {|opened| yield opened }
      end

The part that you're going to be interested in is the line that calls Iconv and, in particular, the second argument of 'WINDOWS-1252' which is likely the encoding of your data. There are also a couple aliases for that code page:

$ iconv -l | grep -e 1252
CP1252 MS-ANSI WINDOWS-1252

(`iconv -l` prints a list of all the encodings known by iconv.)

I hope that helps.

-Rob

On Jun 20, 2011, at 7:33 PM, Erica wrote:

Thanks for your response.  I tried this on a string that was causing
the error and it didn't work.  The problem is with microsoft word
special characters.  I can't find a way to replace these characters.
Here is one website I found that describes the special characters:
http://www.toao.net/48-replacing-smart-quotes-and-em-dashes-in-mysql,
although it's not about rails.

Can anyone help me out?

Thanks,

Erica

On Jun 17, 7:38 pm, Jeff Lewis <[email protected]> wrote:
HiErica,

I ran into similar situation a while ago for a webservice app I was
working on where I had to handle a lot of bad / untrusted non-utf8
data, and found a fix that met the needs of the app using Iconv
(http://www.ruby-doc.org/stdlib/libdoc/iconv/rdoc/index.html)
following a strategy outlined by Paul Battley (http://po-ru.com/ diary/
fixing-invalid-utf-8-in-ruby-revisited/):

...
  def AppUtil.force_utf8(str)
    ic = Iconv.new('UTF-8//IGNORE', 'UTF-8')
    return ic.iconv("#{str} ")[0..-2]
  end
...

Jeff

On Jun 16, 5:27 pm,Erica<[email protected]> wrote:

What's a good solution for fixing character encoding problems for
compatibility between ascii and utf-8?  The database is postgres and
is encoded in utf-8.

Once in awhile there will be a compatibility error from strings from a
webform.

Is there a command to fix this besides using
a_string.force_encoding('utf-8')?  Even this doesn't seem to always
work either.

Thanks,

Erica

--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails- [email protected]. To unsubscribe from this group, send email to [email protected] . For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en .


Rob Biedenharn          
[email protected]     http://AgileConsultingLLC.com/
[email protected]               http://GaslightSoftware.com/

--
You received this message because you are subscribed to the Google Groups "Ruby on 
Rails: Talk" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/rubyonrails-talk?hl=en.

Reply via email to