On Thu, Feb 2, 2012 at 6:58 PM, Jeremy Evans <[email protected]> wrote:
> On Feb 2, 7:29 am, Christian MICHON <[email protected]> > wrote: > > > So I believe there might be a mismatch between what the logger > > > reported and what truly was inserted in H2. > > > > I did some more experiments just now, testing: > > - jdbc + H2: FAIL > > - jdbc + sqlite (sqlitejdbc-v056-pure.jar): FAIL > > - using MRI, sequel (3.31.0) and sqlite3 (1.3.4 x86-mingw32): PASS > > > > So I believe it would be either a jruby issue or a jdbc issue (I mean > > the common jdbc subpart in sequel). > > > > I can revert to older version of jruby to see if this is a regression > > or not, but for the jdbc subpart, I'm not yet familiar enough with the > > code). > > Sequel can only report what ruby string is sent to the SQL statement. > Apparently, JDBC (or something below it) monkeys with the string. I > set TRACE_LEVEL_FILE=2 when connecting to H2 and when inserting "f > \224tter", H2 is showing the inserted value as 'f\ufffdtter'. When > retrieving the value, it is returned as "f\357\277\275tter". That's > UTF-8, as I expected: > > "\u{fffd}".encode('UTF-8').force_encoding('BINARY').split('').map{| > c| "%o" % c.ord} > => ["357", "277", "275"] > > Basically, H2 is storing unicode. Now, unicode codepoint fffd is the > "replacement character". My guess is that H2 doesn't understand the > \224 character, and instead of raising an exception, silently replaces > it with garbage (the replacement character). H2 does have a file > encoding setting (http://www.h2database.com/javadoc/org/h2/constant/ > SysProperties.html#file.encoding), but supposedly it defaults to > cp1252, which should map \224 to unicode codepoint 201d (in ISO-8859-1 > and ISO-8859-15 it maps to unicode codepoint 0094). > > This may be a general issue in JRuby, see > http://jira.codehaus.org/browse/JRUBY-2688. > It was closed, but apparently not for a good reason. > > To work around you issue, I recommend the following: > > require 'iconv' > class String > def to_utf8 > Iconv.iconv("UTF8", "ISO-8859-15", self).first > end > def from_utf8 > Iconv.iconv("ISO-8859-15", "UTF8", self).first > end > end > > Then you can do: > > o = Word.new > o.text = "f\224tter".to_utf8 > o.save > o.first.text.from_utf8 > As I noted in this thread http://www.ruby-forum.com/topic/3496096 it may be that using Iconv in ruby gets deprecated and the functionality is better implemented with Encoding::Converter http://www.ruby-doc.org/core-1.9.3/Encoding/Converter.html There one can also control the "replacement" character for characters that cannot be translated from one encoding to another. E.g. this functionality: <quote from the doc> :invalid => nil Raise error on invalid byte sequence. This is a default behavior. :invalid => :replace Replace invalid byte sequence by replacement string. :undef => nil Raise an error if a character in source_encoding is not defined in destination_encoding. This is a default behavior. :undef => :replace Replace undefined character in destination_encoding with replacement string. :replace => string Specify the replacement string. If not specified, “uFFFD” is used for Unicode encodings and “?” for others. </quote> HTH, Peter -- You received this message because you are subscribed to the Google Groups "sequel-talk" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/sequel-talk?hl=en.
