Hi Pat, Thanks for your reply! Sounds like there is no simple solution to this. In my particular case, I think I will opt for adding the period to the charset_table after all. I don't have big blobs of text with too much punctuation and in fact the "punctuation" is often part of the "word", so I think I will go with that solution for now.
Also thanks for the tip about the extended match mode being the default now - I will remove all the redundant :match_mode options in my queries! Andrea On Tuesday, November 3, 2015 at 1:57:45 PM UTC-8, Pat Allan wrote: > > Hi Andrea > > This is certainly a tricky situation. > > I’d consider creating a new field with a modified version of the string, > replacing the periods with something else… perhaps even letters. For > example: 2and7and1and2 - which remains a single word from Sphinx’s > perspective without needing to modify the charset_table, and is distinct. > Of course, you’d need to do that translation for both the new field, and > any search queries. I can’t think of a neater way to handle this at the > moment, but perhaps others on the Sphinx forum have ideas as well? > http://sphinxsearch.com/forum/ > > Also, it’s worth noting that TS v3 uses the SphinxQL protocol, and that > *only* has the one match mode: extended. Everything else can be done within > that match mode, so Sphinx does not support any of the others with this > newer protocol. > > Hope this helps! > > — > Pat > > On 3 Nov 2015, at 7:02 pm, Andrea S. <[email protected] <javascript:>> > wrote: > > I need to enable boolean queries that optionally contain keywords with > periods / full stops as part of their name. For instance such a keyword may > look like this: "2.7.1.2". I noticed that a search for "2.7.1.2" also > returns results for "1.2.7.1", which is something I want to avoid. > > What setting would I need to tweak to distinguish between "2.7.1.2" and > "1.2.7.1"? I know that the period is treated as a word separator unless > explicitly added to the charset_table, but I'm also aware that adding it to > the charset table would have other side effects, e.g. the word "foo." will > be different from just "foo". So this doesn't seem to be a viable solution. > > This is the search I'm attempting: > > Feature.search("\"2.7.1.2\"", :match_mode=>"boolean") > > I'm explicitly *quoting* the search query, which I hoped would treat it > as an *exact phrase*. I've also tried other match_modes, such as > "extended" (no match mode / default) and "phrase", but every time results > for "1.2.7.1" cropped up. > > Essentially, I'm confused about the source of the problem - if the period > is treated as a word separator, does that mean that sphinx searches for the > phrase "2 7 1 2"? If yes, then why does the phrase "1 2 7 1" also match? > Does that have to do with the min_prefix/infix length settings? By the way, > my thinking_sphinx.yml file looks like this: > > mysql41: 5532 > charset_table: 0..9, A..Z->a..z, a..z > max_matches: 10000 > sql_query_range: "SELECT MIN(id),MAX(id) FROM features" > sql_range_step: 1000 > mem_limit: 128M > bin_path: '/usr/bin' > > > Any tips would be greatly appreciated. > > > Thanks, > > > Andrea > > -- > You received this message because you are subscribed to the Google Groups > "Thinking Sphinx" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] <javascript:>. > To post to this group, send email to [email protected] > <javascript:>. > Visit this group at http://groups.google.com/group/thinking-sphinx. > For more options, visit https://groups.google.com/d/optout. > > > -- You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/thinking-sphinx. For more options, visit https://groups.google.com/d/optout.
