Having the setting in the default block should be fine - you should be able to 
see the charset_table setting in the generated Sphinx configuration files.

Also: I generally recommend just using ts:rebuild, as that handles both 
real-time indices and SQL-backed indices (i.e. it’s running the same things as 
ts:rt:rebuild) - if you’re finding ts:rebuild is not working well for you, I’m 
keen to hear why!

All that said, doesn’t sound like you’re doing anything wrong. I wonder if 
html_strip is somehow filtering out the octothorps? Though I’m pretty sure it’s 
looking just for HTML tags… still, may be worth turning that off to 
double-check.

And I’ve just run some quick tests locally - without the custom charset_table 
value, I find the string “#test” is found by Sphinx when searching by “#test” 
or “test” (because # is ignored, given it’s not an indexable character - so the 
two searches are actually identical). Adding in the charset_table setting, 
rebuilding - searching for #test returns a result, but test doesn’t (as that 
now doesn’t exist as a standalone word in what’s indexed).

I doubt it matters, but: which version of Sphinx are you using?

— 
Pat

> On 23 Feb 2021, at 3:10 pm, Walter Lee Davis <[email protected]> wrote:
> 
> Thanks for the speedy reply. I tried adding the charset table as recommended, 
> but I am not seeing any difference in my search results. I did differ from 
> the directions slightly, in that I put the character set in the default block 
> at the top of my Yaml file, since it's then included in all of the 
> environments. I figured that should work, but in case it doesn't can you 
> explain why?
> 
> default: &default
>  morphology: stem_en
>  html_strip: true
>  batch_size: 300
>  charset_table: "0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, 
> U+430..U+44F, U+23"
> 
> development:
>  <<: *default
> 
> test:
>  <<: *default
> 
> production:
>  <<: *default
> 
> staging:
>  <<: *default
>  mysql41: 9320
> 
> 
> I forced a full rebuild/reindex with rake ts:rt:rebuild. When that didn't 
> seem to change things, I also ran rake ts:rebuild. My understanding is that 
> the first of these should be done when you use the Real Time index. If I'm 
> mistaken, please let me know.
> 
> Thanks again!
> 
> Walter
> 
>> On Feb 22, 2021, at 10:51 PM, Pat Allan <[email protected]> wrote:
>> 
>> Hi Walter,
>> 
>> I’m pretty sure Sphinx doesn’t index punctuation by default. If you want 
>> octothorps included, you’ll need to define a custom charset_table value (per 
>> environment in `config/thinking_sphinx.yml`) which includes that character. 
>> The Sphinx docs outline the default, so best to take that and then add in 
>> the octothorp (U+23).
>> http://sphinxsearch.com/docs/current.html#conf-charset-table
>> https://freelancing-gods.com/thinking-sphinx/v5/advanced_config.html#character-sets-and-tables
>> 
>> Keep in mind that this will impact all uses of that character in all fields 
>> - there’s no way to have it apply to just some fields (or, in this case, 
>> words that only start with that character).
>> 
>> Once you’ve added this configuration, a full rebuild will be required.
>> 
>> Cheers,
>> 
>> — 
>> Pat
>> 
>>> On 23 Feb 2021, at 2:41 pm, Walter Lee Davis <[email protected]> wrote:
>>> 
>>> I'm using GutenTag to apply tags to individual pages in a CMS. The Document 
>>> model uses TS5 with Real-Time Indexing. I've set up my index thusly:
>>> 
>>> # in the model
>>> def tags_for_indexing
>>>   tag_names.join ' '
>>> end
>>> 
>>> # in the index
>>> ThinkingSphinx::Index.define :document, :with => :real_time do
>>> scope { Document.where(id: Document.publicly.map{ |d| 
>>> [d.id].concat(d.descendants.published.map(&:id)) }.flatten) }
>>> 
>>> indexes title
>>> indexes teaser
>>> indexes body_html
>>> indexes author_display
>>> indexes tags_for_indexing
>>> 
>>> has created_at, type: :timestamp
>>> has updated_at, type: :timestamp
>>> end
>>> 
>>> I've tested the method, and confirm that it outputs a space-delimited 
>>> string of words for the tags.
>>> 
>>> I run rake ts:rt:rebuild and everything seems to go fine. But trying to 
>>> search on some of these tag names is not returning the results I am 
>>> imagining. The client has insisted on making some of these tags start with 
>>> an octothorp, because she is writing about "hashtags" on Twitter. Most tags 
>>> do not have punctuation in them. I am able to find other terms, even very 
>>> obscure ones, when I don't use punctuation in the tag names. 
>>> 
>>> Does this sound like something that I can fix, or should I advise the 
>>> client to lay off the octothorps?
>>> 
>>> Walter
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "Thinking Sphinx" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to [email protected].
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/thinking-sphinx/EA71574B-9EBF-484E-A5FA-BF7CD53A10BC%40wdstudio.com.
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Thinking Sphinx" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/thinking-sphinx/05B716CE-D5C7-40F6-BDE3-EC2859738632%40freelancing-gods.com.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Thinking Sphinx" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/thinking-sphinx/0822E7D4-08AD-48D6-8105-3CC26F937006%40wdstudio.com.

-- 
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/thinking-sphinx/09329FD3-9473-4361-B9DE-C4A1847C882D%40freelancing-gods.com.

Reply via email to