Wow, that is a fair slab of text! Good move in disabling it again in production.
So, one thing that’s relevant to this is that Sphinx’s SphinxQL protocol only has one match mode: extended. So, specifying :match_mode => :phrase has no impact. I’d recommend wrapping your search queries in double-quotes to get the same behaviour. Perhaps this will help with the excerpt results as well? > On 19 Jan 2015, at 2:50 am, Walter Lee Davis <[email protected]> wrote: > > Here's what the results look like (no found text in view): > > http://oll.libertyfund.org/search/results?page=21&q=broken+windows > > I searched through that snippet, and found that the phrase broken windows did > not occur within it, although there were multiple instances of broken and one > of Window. > > I just did another search, this time on the phrase broken window (not > windows) and am seeing much better results: > > http://oll.libertyfund.org/search/results?q=broken+window > > By the way, this is the title that I would have expected to come up almost at > the top: > > http://oll.libertyfund.org/search/title/2393?q=broken+window > > since it advances the (now-discredited) theory. > > So the issue seems to be what behavior Sphinx has when it cannot find the > exact phrase. It seems as though it falls back to breaking up the term, and > returning a much wider range of results as a consequence. I also wonder what > the stemmer makes of exact phrase -- clearly it did not shorten broken > windows to broken window, which would have been wonderful if it could have > done. > > Thanks for your help so far, and I hope you have some further insights into > how I could make this work better. (I would particularly like it if there was > a way to "fail fast" in the case of an exact phrase not being found at all.) > > Thanks, > > Walter > > On Jan 18, 2015, at 10:36 AM, Walter Lee Davis <[email protected]> wrote: > >> Okay, that worked. Here is one such snippet call: >> >> http://oll.libertyfund.org/snippet.txt >> >> Logging is turned back down at the moment, as this is clearly pretty >> excessive. I am full-text searching in books, which are each between 2 and 8 >> MB of text. >> >> Walter >> >> On Jan 17, 2015, at 10:49 PM, Pat Allan <[email protected]> wrote: >> >>> Sorry, turns out excerpt calls weren’t logged. I’ve just fixed this now on >>> the develop branch, if you want to give that a spin? >>> >>> gem 'thinking-sphinx', '~> 3.1.2', >>> :git => 'git://github.com/pat/thinking-sphinx.git', >>> :branch => ‘develop', >>> :ref => 'de070904a8' >>> >>> — >>> Pat >>> >>>> On 18 Jan 2015, at 1:42 pm, Walter Lee Davis <[email protected]> wrote: >>>> >>>> I turned my production log level up to debug, and ran some searches, but I >>>> did not get any CALL SNIPPETS statements in the production.log. >>>> >>>> Here's an example search from the debug-level log: >>>> >>>> I, [2015-01-18T02:39:30.798473 #33311] INFO -- : Started GET >>>> "/search/results?page=2&q=elbows" for 173.161.197.5 at 2015-01-18 02:39:30 >>>> +0000 >>>> I, [2015-01-18T02:39:30.799777 #33311] INFO -- : Processing by >>>> SearchController#results as HTML >>>> I, [2015-01-18T02:39:30.799836 #33311] INFO -- : Parameters: >>>> {"page"=>"2", "q"=>"elbows"} >>>> D, [2015-01-18T02:39:30.801638 #33311] DEBUG -- : Sphinx Query (0.8ms) >>>> SELECT * FROM `page_core`, `person_core`, `title_core` WHERE >>>> MATCH('elbows') AND `sphinx_deleted` = 0 LIMIT 20, 20 >>>> D, [2015-01-18T02:39:30.801768 #33311] DEBUG -- : Sphinx Found 58 >>>> results >>>> D, [2015-01-18T02:39:30.805019 #33311] DEBUG -- : Title Load (2.5ms) >>>> SELECT `titles`.* FROM `titles` WHERE `titles`.`id` IN (970, 991, 1169, >>>> 1228, 1229, 1338, 1616, 1621, 1637, 1656, 1691, 1696, 1701, 1705, 1747, >>>> 1755, 1784, 1929, 1960, 1976) >>>> D, [2015-01-18T02:39:30.886971 #33541] DEBUG -- : Content Load (27.1ms) >>>> SELECT `contents`.* FROM `contents` WHERE `contents`.`title_id` = 2081 >>>> LIMIT 1 >>>> D, [2015-01-18T02:39:30.905921 #33311] DEBUG -- : Content Load (91.8ms) >>>> SELECT `contents`.* FROM `contents` WHERE `contents`.`title_id` = 970 >>>> LIMIT 1 >>>> I, [2015-01-18T02:39:31.258680 #33541] INFO -- : Rendered >>>> search/results.html.erb within layouts/application (14482.7ms) >>>> I, [2015-01-18T02:39:31.261015 #33541] INFO -- : Rendered >>>> layouts/_footer.html.erb (0.2ms) >>>> I, [2015-01-18T02:39:31.262004 #33541] INFO -- : Rendered >>>> layouts/_top_nav.html.erb (0.8ms) >>>> I, [2015-01-18T02:39:31.262441 #33541] INFO -- : Completed 200 OK in >>>> 14574ms (Views: 14178.9ms | ActiveRecord: 313.6ms) >>>> >>>> >>>> Walter >>>> >>>> On Jan 17, 2015, at 8:28 PM, Pat Allan <[email protected]> wrote: >>>> >>>>> They’re output at a `debug` level, so they’re probably not on production… >>>>> but should be in development.log by default, yes. >>>>> >>>>>> On 18 Jan 2015, at 12:20 pm, Walter Lee Davis <[email protected]> wrote: >>>>>> >>>>>> I was getting results that were missing any highlighted search terms. >>>>>> I'm not seeing those queries at all -- would they be in development log >>>>>> only? I can turn up the log level, but this is a very busy server. >>>>>> >>>>>> Walter >>>>>> >>>>>> On Jan 17, 2015, at 7:14 PM, Pat Allan <[email protected]> wrote: >>>>>> >>>>>>> Hi Walter >>>>>>> >>>>>>> It looks right to me, and I’m not spotting anything in the TS or Riddle >>>>>>> code that suggests behaviour should be different. Are you getting an >>>>>>> error when you include those options? Or just no results? >>>>>>> >>>>>>> In your Rails log file, there should be a SphinxQL ‘CALL SNIPPETS’ >>>>>>> statement for each excerpt generation call - can you share one of those >>>>>>> queries here? >>>>>>> >>>>>>> — >>>>>>> Pat >>>>>>> >>>>>>>> On 18 Jan 2015, at 7:39 am, Walter Lee Davis <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>> I had a working Rails 3.0 site with TS 2.0.10 that used the following >>>>>>>> controller method to gather cross-model results: >>>>>>>> >>>>>>>> def results >>>>>>>> @results = ThinkingSphinx.search Riddle.escape(params[:q].to_s), >>>>>>>> :page => params[:page], >>>>>>>> :match_mode => :phrase, >>>>>>>> :order => 'class_crc ASC, @relevance DESC', >>>>>>>> :excerpt_options => { >>>>>>>> :exact_phrase => true, >>>>>>>> :limit => 2000, >>>>>>>> :around => 20, >>>>>>>> :force_all_words => true, >>>>>>>> :chunk_separator => '</li><li>' >>>>>>>> }.reject{ |r| r.class.to_s == 'NilClass' } >>>>>>>> @hits = @results.total_entries rescue 0 >>>>>>>> end >>>>>>>> >>>>>>>> >>>>>>>> In my current Rails 4.1 iteration of this application, I was not able >>>>>>>> to get the exact_phrase and show_all_words keys to work at all. My >>>>>>>> controller has those commented out, so the search results work, but >>>>>>>> are greedier than I'd like them to be. >>>>>>>> >>>>>>>> Is there anything new in this area that I have missed from your >>>>>>>> otherwise great upgrade guide? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Walter >>>>>>>> >>>>>>>> -- >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "Thinking Sphinx" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>>>>> an email to [email protected]. >>>>>>>> To post to this group, send email to [email protected]. >>>>>>>> Visit this group at http://groups.google.com/group/thinking-sphinx. >>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>> >>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "Thinking Sphinx" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>>>> an email to [email protected]. >>>>>>> To post to this group, send email to [email protected]. >>>>>>> Visit this group at http://groups.google.com/group/thinking-sphinx. >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "Thinking Sphinx" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>>> an email to [email protected]. >>>>>> To post to this group, send email to [email protected]. >>>>>> Visit this group at http://groups.google.com/group/thinking-sphinx. >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google Groups >>>>> "Thinking Sphinx" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send an >>>>> email to [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> Visit this group at http://groups.google.com/group/thinking-sphinx. >>>>> For more options, visit https://groups.google.com/d/optout. >>>> >>>> -- >>>> You received this message because you are subscribed to the Google Groups >>>> "Thinking Sphinx" group. >>>> To unsubscribe from this group and stop receiving emails from it, send an >>>> email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at http://groups.google.com/group/thinking-sphinx. >>>> For more options, visit https://groups.google.com/d/optout. >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "Thinking Sphinx" group. >>> To unsubscribe from this group and stop receiving emails from it, send an >>> email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at http://groups.google.com/group/thinking-sphinx. >>> For more options, visit https://groups.google.com/d/optout. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Thinking Sphinx" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at http://groups.google.com/group/thinking-sphinx. >> For more options, visit https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to the Google Groups > "Thinking Sphinx" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/thinking-sphinx. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/thinking-sphinx. For more options, visit https://groups.google.com/d/optout.
