I managed to stop the crash when searching for Japanese text by forcing UTF-8 encoding in que query parameter (see patch).
But seems that Whistelpig cannot speak Japanese. I tried the following small test and as you can see I get no results: > require 'rubygems' => true > require 'whistlepig' => true > include Whistlepig => Object > index = Index.new "index" => #<Whistlepig::Index:0x00000002093f60> > entry1 = Entry.new => #<Whistlepig::Entry:0x0000000207d328> > entry1.add_string "body", "研究会" => #<Whistlepig::Entry:0x0000000207d328> > docid1 = index.add_entry entry1 => 1 > q1 = Query.new "body", "研究" => body:"研究" > results1 = index.search q1 => [] I will now dig in Whistelpig source code to see if I can fix this but any pointer/directions or tips were to start looking would be greatly appreciated. On Mon, May 2, 2011 at 12:46 AM, Horacio Sanson <hsan...@gmail.com> wrote: > I also tried with ruby 1.8 and heliotrope does not crash but searching > any Japanese word returns no matches even for search terms I now have > matches. > > And by the way the installation instructions should mention that for > ruby 1.8 we also need to install the json gem or heliotrope won't > start. > > regards, > Horacio > > On Mon, May 2, 2011 at 12:35 AM, Horacio Sanson <hsan...@gmail.com> wrote: >> Installed whistelpig 0.6 but now I get a different error that looks >> similar to the turnsole problem. Below the backtrace: >> >> http://localhost:8042/search?q=primo -> /search?q=%7Einbox&start=0&num=20 >> 127.0.0.1 - - [02/May/2011 00:31:58] "GET /favicon.ico HTTP/1.1" 404 447 >> 0.0008 >> localhost - - [02/May/2011:00:31:58 JST] "GET /favicon.ico HTTP/1.1" 404 447 >> - -> /favicon.ico >> search(body:"会", 0, 20) took 0.0ms >> Encoding::CompatibilityError - incompatible character encodings: UTF-8 >> and ASCII-8BIT: >> bin/heliotrope-server:154:in `block in <class:HeliotropeServer>' >> /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:1152:in `call' >> /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:1152:in >> `block in compile!' >> /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:724:in >> `instance_eval' >> /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:724:in >> `route_eval' >> /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:708:in >> `block (2 levels) in route!' >> /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:758:in >> `block in process_route' >> /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:755:in `catch' >> /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:755:in >> `process_route' >> /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:707:in >> `block in route!' >> /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:706:in `each' >> /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:706:in `route!' >> /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:843:in >> `dispatch!' >> /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:644:in >> `block in call!' >> /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:808:in >> `instance_eval' >> /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:808:in >> `block in invoke' >> /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:808:in `catch' >> /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:808:in `invoke' >> /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:644:in `call!' >> /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:629:in `call' >> /var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/head.rb:9:in `call' >> /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/showexceptions.rb:21:in >> `call' >> /var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/lint.rb:48:in `_call' >> /var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/lint.rb:36:in `call' >> /var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/showexceptions.rb:24:in `call' >> /var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/commonlogger.rb:18:in `call' >> /var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/content_length.rb:13:in `call' >> /var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/handler/webrick.rb:52:in >> `service' >> /usr/lib/ruby/1.9.1/webrick/httpserver.rb:111:in `service' >> /usr/lib/ruby/1.9.1/webrick/httpserver.rb:70:in `run' >> /usr/lib/ruby/1.9.1/webrick/server.rb:183:in `block in start_thread' >> 127.0.0.1 - - [02/May/2011 00:32:09] "GET /search?q=%E4%BC%9A >> HTTP/1.1" 500 89861 0.0228 >> localhost - - [02/May/2011:00:32:09 JST] "GET /search?q=%E4%BC%9A >> HTTP/1.1" 500 89861 >> http://localhost:8042/search?q=%7Einbox&start=0&num=20 -> /search?q=%E4%BC%9A >> 127.0.0.1 - - [02/May/2011 00:32:09] "GET /favicon.ico HTTP/1.1" 404 447 >> 0.0009 >> localhost - - [02/May/2011:00:32:09 JST] "GET /favicon.ico HTTP/1.1" 404 447 >> - -> /favicon.ico >> >> regards, >> Horacio >> >> On Fri, Apr 29, 2011 at 1:52 PM, William Morgan >> <wmorgan-...@masanjin.net> wrote: >>> Reformatted excerpts from William Morgan's message of 2011-04-26: >>>> Thanks for the bug report on this one too. It's great to have someone >>>> testing this stuff with non-ASCII code. This is a known bug in >>>> Whistlepig and I should be releasing a fix soon. >>> >>> This is fixed in Whistlepig 0.6. Heliotrope should now be fine with >>> utf-8 input. I'm still working on this issue in turnsole. >>> >>> Let me know if you have any more issues! >>> -- >>> William <wmorgan-...@masanjin.net> >>> _______________________________________________ >>> Sup-devel mailing list >>> Sup-devel@rubyforge.org >>> http://rubyforge.org/mailman/listinfo/sup-devel >>> >> >
From 0881630c8b410b6f78df578bf686afacbb78ec64 Mon Sep 17 00:00:00 2001 From: Horacio Sanson <hsan...@gmail.com> Date: Tue, 3 May 2011 23:18:22 +0900 Subject: [PATCH] Fix crash for non ASCII chars. --- bin/heliotrope-server | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/bin/heliotrope-server b/bin/heliotrope-server index 4793ac2..ed9c3be 100644 --- a/bin/heliotrope-server +++ b/bin/heliotrope-server @@ -151,7 +151,7 @@ class HeliotropeServer < Sinatra::Base nav += "</div>" header("Search: #{query.original_query_s}", query.original_query_s) + - "<div>Parsed query: #{escape_html query.parsed_query_s}</div>" + + "<div>Parsed query: #{escape_html query.parsed_query_s.force_encoding('UTF-8')}</div>" + "<div>Search took #{sprintf '%.2f', info[:elapsed]}s and #{info[:continued] ? 'was' : 'was NOT'} continued</div>" + "#{nav}<table>" + results.map { |r| threadinfo_to_html r }.join + -- 1.7.4.1
_______________________________________________ Sup-devel mailing list Sup-devel@rubyforge.org http://rubyforge.org/mailman/listinfo/sup-devel