You probably need to figure out the actual encoding and explicitly
convert from that to UTF-8. This is a snippet of code that I have in a
real project:
open(DATAFEED_URI) do |file|
local_filename = local_path
local_filename.open('w') do |outf|
file.each do |line|
begin
outf.write Iconv.conv('UTF-8//TRANSLIT//IGNORE',
'WINDOWS-1252', line)
rescue Iconv::IllegalSequence => e
shlogger.error { "#{DATAFEED_URI} line #{file.lineno}
could not be translated:\n#{line}" }
end
end
end
local_filename.open('r') {|opened| yield opened }
end
The part that you're going to be interested in is the line that calls
Iconv and, in particular, the second argument of 'WINDOWS-1252' which
is likely the encoding of your data. There are also a couple aliases
for that code page:
$ iconv -l | grep -e 1252
CP1252 MS-ANSI WINDOWS-1252
(`iconv -l` prints a list of all the encodings known by iconv.)
I hope that helps.
-Rob
On Jun 20, 2011, at 7:33 PM, Erica wrote:
Thanks for your response. I tried this on a string that was causing
the error and it didn't work. The problem is with microsoft word
special characters. I can't find a way to replace these characters.
Here is one website I found that describes the special characters:
http://www.toao.net/48-replacing-smart-quotes-and-em-dashes-in-mysql,
although it's not about rails.
Can anyone help me out?
Thanks,
Erica
On Jun 17, 7:38 pm, Jeff Lewis <[email protected]> wrote:
HiErica,
I ran into similar situation a while ago for a webservice app I was
working on where I had to handle a lot of bad / untrusted non-utf8
data, and found a fix that met the needs of the app using Iconv
(http://www.ruby-doc.org/stdlib/libdoc/iconv/rdoc/index.html)
following a strategy outlined by Paul Battley (http://po-ru.com/
diary/
fixing-invalid-utf-8-in-ruby-revisited/):
...
def AppUtil.force_utf8(str)
ic = Iconv.new('UTF-8//IGNORE', 'UTF-8')
return ic.iconv("#{str} ")[0..-2]
end
...
Jeff
On Jun 16, 5:27 pm,Erica<[email protected]> wrote:
What's a good solution for fixing character encoding problems for
compatibility between ascii and utf-8? The database is postgres and
is encoded in utf-8.
Once in awhile there will be a compatibility error from strings
from a
webform.
Is there a command to fix this besides using
a_string.force_encoding('utf-8')? Even this doesn't seem to always
work either.
Thanks,
Erica
--
You received this message because you are subscribed to the Google
Groups "Ruby on Rails: Talk" group.
To post to this group, send email to rubyonrails-
[email protected].
To unsubscribe from this group, send email to [email protected]
.
For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en
.
Rob Biedenharn
[email protected] http://AgileConsultingLLC.com/
[email protected] http://GaslightSoftware.com/
--
You received this message because you are subscribed to the Google Groups "Ruby on
Rails: Talk" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk?hl=en.