Hi Pat,

Thanks for your reply! Sounds like there is no simple solution to this. In 
my particular case, I think I will opt for adding the period to the 
charset_table after all. I don't have big blobs of text with too much 
punctuation and in fact the "punctuation" is often part of the "word", so I 
think I will go with that solution for now.

Also thanks for the tip about the extended match mode being the default now 
- I will remove all the redundant :match_mode options in my queries!

Andrea

On Tuesday, November 3, 2015 at 1:57:45 PM UTC-8, Pat Allan wrote:
>
> Hi Andrea
>
> This is certainly a tricky situation.
>
> I’d consider creating a new field with a modified version of the string, 
> replacing the periods with something else… perhaps even letters. For 
> example: 2and7and1and2 - which remains a single word from Sphinx’s 
> perspective without needing to modify the charset_table, and is distinct. 
> Of course, you’d need to do that translation for both the new field, and 
> any search queries. I can’t think of a neater way to handle this at the 
> moment, but perhaps others on the Sphinx forum have ideas as well?
> http://sphinxsearch.com/forum/
>
> Also, it’s worth noting that TS v3 uses the SphinxQL protocol, and that 
> *only* has the one match mode: extended. Everything else can be done within 
> that match mode, so Sphinx does not support any of the others with this 
> newer protocol.
>
> Hope this helps!
>
> — 
> Pat
>
> On 3 Nov 2015, at 7:02 pm, Andrea S. <[email protected] <javascript:>> 
> wrote:
>
> I need to enable boolean queries that optionally contain keywords with 
> periods / full stops as part of their name. For instance such a keyword may 
> look like this: "2.7.1.2". I noticed that a search for "2.7.1.2" also 
> returns results for "1.2.7.1", which is something I want to avoid. 
>
> What setting would I need to tweak to distinguish between "2.7.1.2" and 
> "1.2.7.1"? I know that the period is treated as a word separator unless 
> explicitly added to the charset_table, but I'm also aware that adding it to 
> the charset table would have other side effects, e.g. the word "foo." will 
> be different from just "foo". So this doesn't seem to be a viable solution.
>
> This is the search I'm attempting:
>
> Feature.search("\"2.7.1.2\"", :match_mode=>"boolean")
>
> I'm explicitly *quoting* the search query, which I hoped would treat it 
> as an *exact phrase*. I've also tried other match_modes, such as 
> "extended" (no match mode / default) and "phrase", but every time results 
> for "1.2.7.1" cropped up. 
>
> Essentially, I'm confused about the source of the problem - if the period 
> is treated as a word separator, does that mean that sphinx searches for the 
> phrase "2 7 1 2"? If yes, then why does the phrase "1 2 7 1" also match? 
> Does that have to do with the min_prefix/infix length settings? By the way, 
> my thinking_sphinx.yml file looks like this:
>
>  mysql41: 5532
>  charset_table: 0..9, A..Z->a..z, a..z 
>  max_matches: 10000 
>  sql_query_range: "SELECT MIN(id),MAX(id) FROM features" 
>  sql_range_step: 1000 
>  mem_limit: 128M 
>  bin_path: '/usr/bin'
>
>
> Any tips would be greatly appreciated.
>
>
> Thanks,
>
>
> Andrea
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Thinking Sphinx" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> To post to this group, send email to [email protected] 
> <javascript:>.
> Visit this group at http://groups.google.com/group/thinking-sphinx.
> For more options, visit https://groups.google.com/d/optout.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/thinking-sphinx.
For more options, visit https://groups.google.com/d/optout.

Reply via email to