Hi Walter, Sorry for the slow response… and to be honest, I don’t have a good answer for this behaviour. I’m really not sure what’s going on.
I did look over the available settings for excerpts: http://sphinxsearch.com/docs/current.html#api-func-buildexcerpts <http://sphinxsearch.com/docs/current.html#api-func-buildexcerpts> … and anything that I feel would influence what you’re seeing (e.g. exact_phrase) defaults to what would be ideal in your site anyway. I’m not sure if upgrading Sphinx would have any impact, but it may be worthwhile - at least to 2.2.11. That said, there’s nothing in the release notes for 2.2.10/11 that I can spot that suggests any change in behaviour. If you really wanted to dig into it, I’d suggest building a test app that can reproduce the problem with a smaller dataset, and potentially share that here so I can have a look as well. Of course, it very much sounds like a Sphinx issue rather than anything to do with Thinking Sphinx, so whether I can actually fix things is not super likely. Wish I could be more helpful! — Pat > On 9 Jul 2019, at 8:46 am, Walter Lee Davis <[email protected]> wrote: > > I wonder if anyone knows how Sphinx goes about constructing the snippets that > are returned along with the matches to a search term. This page illustrates a > wild variety of examples of how one search term can be interpreted: > > https://oll.libertyfund.org/search/results?q=power+corrupts > > Note the first hit, from Alvis on Shakespeare. The exact phrase exists in the > third line of the snippet (on a desktop screen, YMMV). It is not highlighted. > In the third example, the result is from deep in the weeds of the footnotes, > and hits on the word power, and actually highlights it. The fifth hit gets > both power (twice) and corrupts, but misses the stem of corrupts in corrupt. > The second-to-the-last hit on that page, in Liberty, Order, and Justice, goes > on for several screens (208,135 words), with a single snippet that has grown > to encompass 725 individual keyword hits in one "paragraph" of source text. > > I'm using Thinking Sphinx 3.1.2, and Sphinx is version 2.2.9 > > Here's the controller method that constructed this page: > > @results = ThinkingSphinx.search > "\"#{ThinkingSphinx::Query.escape(params[:q].to_s)}\"", > :page => params[:page], > :star => true, > :excerpts => { > :limit => 1000, > :around => 40, > :force_all_words => true, > :chunk_separator => '</li><li>' > }.reject{ |r| r.class.to_s == 'NilClass' } rescue Kaminari::paginate_array > [] > @results.context[:panes] << ThinkingSphinx::Panes::ExcerptsPane > @hits = @results.total_entries rescue 0 > > And these results are from mostly titles, but some pages. Here's the > definition for both: > > # titles_index.rb > ThinkingSphinx::Index.define :title, :with => :active_record do > set_property :group_concat_max_len => 10.megabytes > > indexes :title, :sortable => true > indexes teaser > indexes content.plain, :as => :plain_text > indexes author_name, :sortable => true > has roles(:person_id), :as => :people_ids > has :id, :as => :title_id > has author_id, created_at, updated_at > has set, :as => :title_set > where sanitize_sql(["publish", true]) > end > > #pages_index.rb > ThinkingSphinx::Index.define :page, :with => :active_record do > > indexes :title, :sortable => true > indexes teaser > indexes body > has created_at, updated_at > end > > In the view, I'm using this tortured bit of ERB: > > <%= content_tag( :ol, > "<li>#{result.excerpts.plain_contents}</li>".gsub(/<li>\s*<\/li>/,'').html_safe > ) if result.respond_to?(:plain_contents) %> > > And there's no way to explain why some results are wrapped in the <span > class="match"> in the output from Sphinx, while others (nearby, in the same > set of results) are not. > > Thanks in advance if anyone can enlighten me or point me toward documentation > of this feature. This is all very old code, maybe 6 or 8 years since I last > touched it. I've moved it to a newer server since I wrote all this, but > nothing much changed when I did that. My client would like to know, and I > don't have any good answers. > > Walter > > -- > You received this message because you are subscribed to the Google Groups > "Thinking Sphinx" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/thinking-sphinx. > To view this discussion on the web visit > https://groups.google.com/d/msgid/thinking-sphinx/AAECECFD-619C-49AC-B4E7-63A6C87C2595%40wdstudio.com. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/thinking-sphinx/B3882117-5A14-4307-B5C6-7388A670C375%40freelancing-gods.com.
