> On Feb 23, 2021, at 12:02 AM, Pat Allan <[email protected]> wrote:
>
> Having the setting in the default block should be fine - you should be able
> to see the charset_table setting in the generated Sphinx configuration files.
>
> Also: I generally recommend just using ts:rebuild, as that handles both
> real-time indices and SQL-backed indices (i.e. it’s running the same things
> as ts:rt:rebuild) - if you’re finding ts:rebuild is not working well for you,
> I’m keen to hear why!
While I was fighting with this, and fiddling with the configuration to use has
instead of indexes, I got myself into a state where ts:rebuild would blow up
with a SQL error (I think it was a Sphinx SQL error) and ts:rt:rebuild would
work fine. But with the current configuration that I shared with you, both work.
>
> All that said, doesn’t sound like you’re doing anything wrong. I wonder if
> html_strip is somehow filtering out the octothorps? Though I’m pretty sure
> it’s looking just for HTML tags… still, may be worth turning that off to
> double-check.
>
> And I’ve just run some quick tests locally - without the custom charset_table
> value, I find the string “#test” is found by Sphinx when searching by “#test”
> or “test” (because # is ignored, given it’s not an indexable character - so
> the two searches are actually identical). Adding in the charset_table
> setting, rebuilding - searching for #test returns a result, but test doesn’t
> (as that now doesn’t exist as a standalone word in what’s indexed).
>
> I doubt it matters, but: which version of Sphinx are you using?
Sphinx 2.2.11-id64-release (95ae9a6), TS 5.0.0.
It's definitely odd. I'm not sure if re-indexing is picking up the tag names
when it runs en masse, and it seems to be something with GutenTag. If I find a
document in console, the object that I get back has tag_names set to nil, but
if I then call tag_names on that object, I get back the array of strings I am
expecting. It's just the value that I see inside the <> brackets initially when
to_s is called on the found object by irb, so I don't know if that's
significant at all, or is getting in the way of Sphinx extracting the values.
Again, when I test in console by calling my tags_for_indexing method on a found
object, I get back the expected string value.
I've told the client that she may need to get rid of her beloved hashtags in
the tagging interface, or use Gutentag in place of Sphinx to get "everything
tagged with this tag". I'm not convinced that's a bad idea, either.
Walter
>
> —
> Pat
>
>> On 23 Feb 2021, at 3:10 pm, Walter Lee Davis <[email protected]> wrote:
>>
>> Thanks for the speedy reply. I tried adding the charset table as
>> recommended, but I am not seeing any difference in my search results. I did
>> differ from the directions slightly, in that I put the character set in the
>> default block at the top of my Yaml file, since it's then included in all of
>> the environments. I figured that should work, but in case it doesn't can you
>> explain why?
>>
>> default: &default
>> morphology: stem_en
>> html_strip: true
>> batch_size: 300
>> charset_table: "0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F,
>> U+430..U+44F, U+23"
>>
>> development:
>> <<: *default
>>
>> test:
>> <<: *default
>>
>> production:
>> <<: *default
>>
>> staging:
>> <<: *default
>> mysql41: 9320
>>
>>
>> I forced a full rebuild/reindex with rake ts:rt:rebuild. When that didn't
>> seem to change things, I also ran rake ts:rebuild. My understanding is that
>> the first of these should be done when you use the Real Time index. If I'm
>> mistaken, please let me know.
>>
>> Thanks again!
>>
>> Walter
>>
>>> On Feb 22, 2021, at 10:51 PM, Pat Allan <[email protected]> wrote:
>>>
>>> Hi Walter,
>>>
>>> I’m pretty sure Sphinx doesn’t index punctuation by default. If you want
>>> octothorps included, you’ll need to define a custom charset_table value
>>> (per environment in `config/thinking_sphinx.yml`) which includes that
>>> character. The Sphinx docs outline the default, so best to take that and
>>> then add in the octothorp (U+23).
>>> http://sphinxsearch.com/docs/current.html#conf-charset-table
>>> https://freelancing-gods.com/thinking-sphinx/v5/advanced_config.html#character-sets-and-tables
>>>
>>> Keep in mind that this will impact all uses of that character in all fields
>>> - there’s no way to have it apply to just some fields (or, in this case,
>>> words that only start with that character).
>>>
>>> Once you’ve added this configuration, a full rebuild will be required.
>>>
>>> Cheers,
>>>
>>> —
>>> Pat
>>>
>>>> On 23 Feb 2021, at 2:41 pm, Walter Lee Davis <[email protected]> wrote:
>>>>
>>>> I'm using GutenTag to apply tags to individual pages in a CMS. The
>>>> Document model uses TS5 with Real-Time Indexing. I've set up my index
>>>> thusly:
>>>>
>>>> # in the model
>>>> def tags_for_indexing
>>>> tag_names.join ' '
>>>> end
>>>>
>>>> # in the index
>>>> ThinkingSphinx::Index.define :document, :with => :real_time do
>>>> scope { Document.where(id: Document.publicly.map{ |d|
>>>> [d.id].concat(d.descendants.published.map(&:id)) }.flatten) }
>>>>
>>>> indexes title
>>>> indexes teaser
>>>> indexes body_html
>>>> indexes author_display
>>>> indexes tags_for_indexing
>>>>
>>>> has created_at, type: :timestamp
>>>> has updated_at, type: :timestamp
>>>> end
>>>>
>>>> I've tested the method, and confirm that it outputs a space-delimited
>>>> string of words for the tags.
>>>>
>>>> I run rake ts:rt:rebuild and everything seems to go fine. But trying to
>>>> search on some of these tag names is not returning the results I am
>>>> imagining. The client has insisted on making some of these tags start with
>>>> an octothorp, because she is writing about "hashtags" on Twitter. Most
>>>> tags do not have punctuation in them. I am able to find other terms, even
>>>> very obscure ones, when I don't use punctuation in the tag names.
>>>>
>>>> Does this sound like something that I can fix, or should I advise the
>>>> client to lay off the octothorps?
>>>>
>>>> Walter
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google Groups
>>>> "Thinking Sphinx" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send an
>>>> email to [email protected].
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/thinking-sphinx/EA71574B-9EBF-484E-A5FA-BF7CD53A10BC%40wdstudio.com.
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "Thinking Sphinx" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an
>>> email to [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/thinking-sphinx/05B716CE-D5C7-40F6-BDE3-EC2859738632%40freelancing-gods.com.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Thinking Sphinx" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/thinking-sphinx/0822E7D4-08AD-48D6-8105-3CC26F937006%40wdstudio.com.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Thinking Sphinx" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/thinking-sphinx/09329FD3-9473-4361-B9DE-C4A1847C882D%40freelancing-gods.com.
--
You received this message because you are subscribed to the Google Groups
"Thinking Sphinx" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/thinking-sphinx/683D4011-C092-4648-A61D-789D2EDF7E39%40wdstudio.com.