Thanks for the speedy reply. I tried adding the charset table as recommended, but I am not seeing any difference in my search results. I did differ from the directions slightly, in that I put the character set in the default block at the top of my Yaml file, since it's then included in all of the environments. I figured that should work, but in case it doesn't can you explain why?
default: &default morphology: stem_en html_strip: true batch_size: 300 charset_table: "0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F, U+23" development: <<: *default test: <<: *default production: <<: *default staging: <<: *default mysql41: 9320 I forced a full rebuild/reindex with rake ts:rt:rebuild. When that didn't seem to change things, I also ran rake ts:rebuild. My understanding is that the first of these should be done when you use the Real Time index. If I'm mistaken, please let me know. Thanks again! Walter > On Feb 22, 2021, at 10:51 PM, Pat Allan <[email protected]> wrote: > > Hi Walter, > > I’m pretty sure Sphinx doesn’t index punctuation by default. If you want > octothorps included, you’ll need to define a custom charset_table value (per > environment in `config/thinking_sphinx.yml`) which includes that character. > The Sphinx docs outline the default, so best to take that and then add in the > octothorp (U+23). > http://sphinxsearch.com/docs/current.html#conf-charset-table > https://freelancing-gods.com/thinking-sphinx/v5/advanced_config.html#character-sets-and-tables > > Keep in mind that this will impact all uses of that character in all fields - > there’s no way to have it apply to just some fields (or, in this case, words > that only start with that character). > > Once you’ve added this configuration, a full rebuild will be required. > > Cheers, > > — > Pat > >> On 23 Feb 2021, at 2:41 pm, Walter Lee Davis <[email protected]> wrote: >> >> I'm using GutenTag to apply tags to individual pages in a CMS. The Document >> model uses TS5 with Real-Time Indexing. I've set up my index thusly: >> >> # in the model >> def tags_for_indexing >> tag_names.join ' ' >> end >> >> # in the index >> ThinkingSphinx::Index.define :document, :with => :real_time do >> scope { Document.where(id: Document.publicly.map{ |d| >> [d.id].concat(d.descendants.published.map(&:id)) }.flatten) } >> >> indexes title >> indexes teaser >> indexes body_html >> indexes author_display >> indexes tags_for_indexing >> >> has created_at, type: :timestamp >> has updated_at, type: :timestamp >> end >> >> I've tested the method, and confirm that it outputs a space-delimited string >> of words for the tags. >> >> I run rake ts:rt:rebuild and everything seems to go fine. But trying to >> search on some of these tag names is not returning the results I am >> imagining. The client has insisted on making some of these tags start with >> an octothorp, because she is writing about "hashtags" on Twitter. Most tags >> do not have punctuation in them. I am able to find other terms, even very >> obscure ones, when I don't use punctuation in the tag names. >> >> Does this sound like something that I can fix, or should I advise the client >> to lay off the octothorps? >> >> Walter >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Thinking Sphinx" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/thinking-sphinx/EA71574B-9EBF-484E-A5FA-BF7CD53A10BC%40wdstudio.com. > > > -- > You received this message because you are subscribed to the Google Groups > "Thinking Sphinx" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/thinking-sphinx/05B716CE-D5C7-40F6-BDE3-EC2859738632%40freelancing-gods.com. -- You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/thinking-sphinx/0822E7D4-08AD-48D6-8105-3CC26F937006%40wdstudio.com.
