Hi,

I'm using Thrift to communicate with HBase. The code is running with Ruby
1.9.2, the Thrift Gem version is 0.5.0.
I came up very quickly with an encoding issue coming from the Thrift
library, and especially the BufferedTransport class.

So I've decided to write down few tests to give you a concrete example :

# encoding: utf-8
require 'spec_helper'

describe "encoding" do

  before do
    transport =
Thrift::BufferedTransport.new(Thrift::Socket.new(MR_CONFIG['host'], 9090))
    protocol  = Thrift::BinaryProtocol.new(transport)
    @client   = Apache::Hadoop::Hbase::Thrift::Hbase::Client.new(protocol)

    transport.open()

    @table_name = "encoding_test"
    @column_family = "info:"
  end

  it "should create a new table" do
    column = Apache::Hadoop::Hbase::Thrift::ColumnDescriptor.new{|c|
c.name= @column_family}
    @client.createTable(@table_name, [column]).should be_nil
  end

  it "should save standard caracteres" do
    m        = Apache::Hadoop::Hbase::Thrift::Mutation.new
    m.column = "info:first_name"
    m.value  = "Vincent"

    m.value.encoding.should == Encoding::UTF_8
    @client.mutateRow(@table_name, "ID1", [m]).should be_nil
  end

  it "should save UTF8 caracteres" do
    m        = Apache::Hadoop::Hbase::Thrift::Mutation.new
    m.column = "info:first_name"
    m.value  = "Thorbjørn"

    m.value.encoding.should == Encoding::UTF_8
    @client.mutateRow(@table_name, "ID1", [m]).should be_nil
  end

  it "should destroy the table" do
    @client.disableTable(@table_name).should be_nil
    @client.deleteTable(@table_name).should be_nil
  end
end

It fails when it tries to save the UTF8 string including the caractere 'ø'.

Here is the output :

  1) encoding should save UTF8 caracteres
     Failure/Error: @client.mutateRow(@table_name, "ID1", [m]).should be_nil
     incompatible character encodings: ASCII-8BIT and UTF-8
     #
/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/transport/buffered_transport.rb:59:in
`write'
     #
/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/protocol/binary_protocol.rb:107:in
`write_string'
     #
/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/client.rb:35:in
`write'
     #
/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/client.rb:35:in
`send_message'
     # ./lib/thrift/hbase.rb:289:in `send_mutateRow'
     # ./lib/thrift/hbase.rb:284:in `mutateRow'
     # ./spec/thrift/cases/encoding_spec.rb:37:in `block (2 levels) in <top
(required)>'


I was wondering if Thrift has been tested with Ruby 1.9.2 or only 1.8.7?
Any idea on how to fix that?

Thank you!
Vincent

Reply via email to