Hi Walter,

Sorry for the slow response… and to be honest, I don’t have a good answer for 
this behaviour. I’m really not sure what’s going on.

I did look over the available settings for excerpts:
http://sphinxsearch.com/docs/current.html#api-func-buildexcerpts 
<http://sphinxsearch.com/docs/current.html#api-func-buildexcerpts>
… and anything that I feel would influence what you’re seeing (e.g. 
exact_phrase) defaults to what would be ideal in your site anyway.

I’m not sure if upgrading Sphinx would have any impact, but it may be 
worthwhile - at least to 2.2.11. That said, there’s nothing in the release 
notes for 2.2.10/11 that I can spot that suggests any change in behaviour.

If you really wanted to dig into it, I’d suggest building a test app that can 
reproduce the problem with a smaller dataset, and potentially share that here 
so I can have a look as well. Of course, it very much sounds like a Sphinx 
issue rather than anything to do with Thinking Sphinx, so whether I can 
actually fix things is not super likely.

Wish I could be more helpful!

— 
Pat

> On 9 Jul 2019, at 8:46 am, Walter Lee Davis <[email protected]> wrote:
> 
> I wonder if anyone knows how Sphinx goes about constructing the snippets that 
> are returned along with the matches to a search term. This page illustrates a 
> wild variety of examples of how one search term can be interpreted:
> 
> https://oll.libertyfund.org/search/results?q=power+corrupts
> 
> Note the first hit, from Alvis on Shakespeare. The exact phrase exists in the 
> third line of the snippet (on a desktop screen, YMMV). It is not highlighted. 
> In the third example, the result is from deep in the weeds of the footnotes, 
> and hits on the word power, and actually highlights it. The fifth hit gets 
> both power (twice) and corrupts, but misses the stem of corrupts in corrupt. 
> The second-to-the-last hit on that page, in Liberty, Order, and Justice, goes 
> on for several screens (208,135 words), with a single snippet that has grown 
> to encompass 725 individual keyword hits in one "paragraph" of source text.
> 
> I'm using Thinking Sphinx 3.1.2, and Sphinx is version 2.2.9
> 
> Here's the controller method that constructed this page:
> 
>    @results = ThinkingSphinx.search 
> "\"#{ThinkingSphinx::Query.escape(params[:q].to_s)}\"",
>    :page => params[:page],
>    :star => true,
>    :excerpts => {
>      :limit    => 1000,
>      :around     => 40,
>      :force_all_words => true,
>      :chunk_separator => '</li><li>'
>    }.reject{ |r| r.class.to_s == 'NilClass' } rescue Kaminari::paginate_array 
> []
>    @results.context[:panes] << ThinkingSphinx::Panes::ExcerptsPane
>    @hits = @results.total_entries rescue 0
> 
> And these results are from mostly titles, but some pages. Here's the 
> definition for both:
> 
> # titles_index.rb
> ThinkingSphinx::Index.define :title, :with => :active_record do
>  set_property :group_concat_max_len => 10.megabytes
> 
>  indexes :title, :sortable => true
>  indexes teaser
>  indexes content.plain, :as => :plain_text
>  indexes author_name, :sortable => true
>  has roles(:person_id), :as => :people_ids 
>  has :id, :as => :title_id
>  has author_id, created_at, updated_at
>  has set, :as => :title_set
>  where sanitize_sql(["publish", true])
> end
> 
> #pages_index.rb
> ThinkingSphinx::Index.define :page, :with => :active_record do
> 
>  indexes :title, :sortable => true
>  indexes teaser
>  indexes body
>  has created_at, updated_at
> end
> 
> In the view, I'm using this tortured bit of ERB:
> 
>        <%= content_tag( :ol, 
> "<li>#{result.excerpts.plain_contents}</li>".gsub(/<li>\s*<\/li>/,'').html_safe
>  ) if result.respond_to?(:plain_contents) %>
> 
> And there's no way to explain why some results are wrapped in the <span 
> class="match"> in the output from Sphinx, while others (nearby, in the same 
> set of results) are not.
> 
> Thanks in advance if anyone can enlighten me or point me toward documentation 
> of this feature. This is all very old code, maybe 6 or 8 years since I last 
> touched it. I've moved it to a newer server since I wrote all this, but 
> nothing much changed when I did that. My client would like to know, and I 
> don't have any good answers.
> 
> Walter
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Thinking Sphinx" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/thinking-sphinx.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/thinking-sphinx/AAECECFD-619C-49AC-B4E7-63A6C87C2595%40wdstudio.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/thinking-sphinx/B3882117-5A14-4307-B5C6-7388A670C375%40freelancing-gods.com.

Reply via email to