So I tried the latest heliotrope with the leveldb-ruby 0.6 gem, whistlepig 0.7 and MeCab hooks for Japanese text support and it works better than before. Unfortunately got two issues:
First any attempt to search using japanese text fails with the dreaded incompatible character encodings error: ##################################################### [2011-07-05 10:22:17] INFO WEBrick 1.3.1 [2011-07-05 10:22:17] INFO ruby 1.9.2 (2010-08-18) [x86_64-linux] [2011-07-05 10:22:17] INFO WEBrick::HTTPServer#start: pid=13523 port=8042 search(body:"手紙", 0, 20) took 2.1ms Encoding::CompatibilityError - incompatible character encodings: ASCII-8BIT and UTF-8: bin/heliotrope-server:223:in `block in <class:HeliotropeServer>' /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:1152:in `call' /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:1152:in `block in compile!' /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:724:in `instance_eval' /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:724:in `route_eval' /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:708:in `block (2 levels) in route!' /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:758:in `block in process_route' /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:755:in `catch' /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:755:in `process_route' /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:707:in `block in route!' /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:706:in `each' /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:706:in `route!' /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:843:in `dispatch!' /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:644:in `block in call!' /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:808:in `instance_eval' /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:808:in `block in invoke' /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:808:in `catch' /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:808:in `invoke' /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:644:in `call!' /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:629:in `call' /var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/head.rb:9:in `call' /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/showexceptions.rb:21:in `call' /var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/lint.rb:48:in `_call' /var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/lint.rb:36:in `call' /var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/showexceptions.rb:24:in `call' /var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/commonlogger.rb:18:in `call' /var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/content_length.rb:13:in `call' /var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/handler/webrick.rb:52:in `service' /usr/lib/ruby/1.9.1/webrick/httpserver.rb:111:in `service' /usr/lib/ruby/1.9.1/webrick/httpserver.rb:70:in `run' /usr/lib/ruby/1.9.1/webrick/server.rb:183:in `block in start_thread' 127.0.0.1 - - [05/Jul/2011 10:22:20] "GET /search?q=%E6%89%8B%E7%B4%99 HTTP/1.1" 500 89118 0.0331 localhost - - [05/Jul/2011:10:22:20 JST] "GET /search?q=%E6%89%8B%E7%B4%99 HTTP/1.0" 500 89118 - -> /search?q=%E6%89%8B%E7%B4%99 [2011-07-05 10:22:20] ERROR Errno::ECONNRESET: Connection reset by peer /usr/lib/ruby/1.9.1/webrick/httpserver.rb:56:in `eof?' /usr/lib/ruby/1.9.1/webrick/httpserver.rb:56:in `run' ####################################################### The problem seems to be the header method in the heliotrope-server that uses multiline strings (e.g. <<- EOS). By forcing the resulting text to UTF-8 encoding the search works as expected with japanese and non japanese text (see attached patch). The second problem is actually not heliotrope problem. Is the artificial limitations imposed by Gmail. After running heliotrope-add for some time it would fail when the IMAP fetch returns nil. Just after it failed I tried to use my current email reader (kmail) and got an interesting error saying: "exceeded IMAP bandwidth limits". These indicates the nil is due to Gmail limiting the maximum bandwidth I can consume downloading emails. The latest heliotrope now catches this error and ignores it but after a while ignoring it I started getting sys-write errors on the socket. I believe this is also GMail abruptly breaking the socket connection to enforce it's bandwidth limits. Maybe limiting the rate of gmail-dumper so it reads mails at a lower pace would eliminate these problems or simply stop reading emails for some time when we get the first nil response. Overall heliotrope is now usable for Japanese language users (at least for me ). Now I will start playing with turnsole to see if it can handle japanese. -- regards, Horacio Sanson
From a056837d1ebe5054106e65ac7155b4e8e422a382 Mon Sep 17 00:00:00 2001 From: Horacio Sanson <hsan...@gmail.com> Date: Tue, 5 Jul 2011 10:31:33 +0900 Subject: [PATCH] Fix encoding exception. --- bin/heliotrope-server | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/bin/heliotrope-server b/bin/heliotrope-server index 15dc897..def9909 100644 --- a/bin/heliotrope-server +++ b/bin/heliotrope-server @@ -557,6 +557,9 @@ td { <input type="submit" value="go"/> </form></div> EOS + + title.force_encoding(Encoding::UTF_8) if title.respond_to?(:force_encoding) # sigh... + title end def footer -- 1.7.4.1
_______________________________________________ Sup-devel mailing list Sup-devel@rubyforge.org http://rubyforge.org/mailman/listinfo/sup-devel