On Thu, Feb 2, 2012 at 6:58 PM, Jeremy Evans <[email protected]> wrote:

> On Feb 2, 7:29 am, Christian MICHON <[email protected]>
> wrote:
> > > So I believe there might be a mismatch between what the logger
> > > reported and what truly was inserted in H2.
> >
> > I did some more experiments just now, testing:
> > - jdbc + H2: FAIL
> > - jdbc + sqlite (sqlitejdbc-v056-pure.jar): FAIL
> > - using MRI, sequel (3.31.0) and sqlite3 (1.3.4 x86-mingw32): PASS
> >
> > So I believe it would be either a jruby issue or a jdbc issue (I mean
> > the common jdbc subpart in sequel).
> >
> > I can revert to older version of jruby to see if this is a regression
> > or not, but for the jdbc subpart, I'm not yet familiar enough with the
> > code).
>
> Sequel can only report what ruby string is sent to the SQL statement.
> Apparently, JDBC (or something below it) monkeys with the string.  I
> set TRACE_LEVEL_FILE=2 when connecting to H2 and when inserting "f
> \224tter", H2 is showing the inserted value as 'f\ufffdtter'.  When
> retrieving the value, it is returned as "f\357\277\275tter".  That's
> UTF-8, as I expected:
>
>  "\u{fffd}".encode('UTF-8').force_encoding('BINARY').split('').map{|
> c| "%o" % c.ord}
>  => ["357", "277", "275"]
>
> Basically, H2 is storing unicode.  Now, unicode codepoint fffd is the
> "replacement character".  My guess is that H2 doesn't understand the
> \224 character, and instead of raising an exception, silently replaces
> it with garbage (the replacement character).  H2 does have a file
> encoding setting (http://www.h2database.com/javadoc/org/h2/constant/
> SysProperties.html#file.encoding), but supposedly it defaults to
> cp1252, which should map \224 to unicode codepoint 201d (in ISO-8859-1
> and ISO-8859-15 it maps to unicode codepoint 0094).
>
> This may be a general issue in JRuby, see
> http://jira.codehaus.org/browse/JRUBY-2688.
> It was closed, but apparently not for a good reason.
>
> To work around you issue, I recommend the following:
>
>  require 'iconv'
>  class String
>    def to_utf8
>      Iconv.iconv("UTF8", "ISO-8859-15", self).first
>    end
>    def from_utf8
>      Iconv.iconv("ISO-8859-15", "UTF8", self).first
>    end
>  end
>
> Then you can do:
>
>  o = Word.new
>  o.text = "f\224tter".to_utf8
>  o.save
>  o.first.text.from_utf8
>

As I noted in this thread

  http://www.ruby-forum.com/topic/3496096

it may be that using Iconv in ruby gets deprecated
and the  functionality is better implemented with

Encoding::Converter

http://www.ruby-doc.org/core-1.9.3/Encoding/Converter.html

There one can also control the "replacement"
character for characters that cannot be translated from
one encoding to another. E.g. this functionality:

<quote from the doc>
:invalid => nil
Raise error on invalid byte sequence. This is a default behavior.

:invalid => :replace
Replace invalid byte sequence by replacement string.

:undef => nil
Raise an error if a character in source_encoding is not defined in
destination_encoding. This is a default behavior.

:undef => :replace
Replace undefined character in destination_encoding with replacement string.

:replace => string
Specify the replacement string. If not specified, “uFFFD” is used for
Unicode encodings and “?” for others.

</quote>

HTH,

Peter

-- 
You received this message because you are subscribed to the Google Groups 
"sequel-talk" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sequel-talk?hl=en.

Reply via email to