$ iconv -f US-ASCII -t UTF-8  < test.sql > out.sql
iconv: illegal input sequence at position 114500

Any ideas how the job can be accomplised reliably.

Also my database may contain data in multiple encodings
like WINDOWS-1251 and WINDOWS-1256 in various places
as data has been inserted by different peoples using
different sources and client software.

You could use a simple program like that (in Python):

output = open( "unidump", "w" )
for line in open( "your dump" ):
        for encoding in "utf-8", "iso-8859-15", "whatever":
                try:
                        output.write( unicode( line, encoding ).encode( "utf-8" 
))
                        break
                except UnicodeError:
                        pass
        else:
                print "No suitable encoding for line..."

I'd say this might work, if UTF-8 cannot absorb an apostrophe inside a multibit character. Can it ?

Or you could do that to all your table using SELECTs but it's going to be painful...

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Reply via email to