Re: Search term automatically split at non-alphanumeric

Scott Derrick Tue, 30 Dec 2025 09:21:35 -0800

Thanks Ciprian!

I'll do this immediately and report back!


Scott


On 12/27/2025 1:41 PM, Ciprian Dimofte - Opensolr.com wrote:

Hi Scott,

This is a classic Solr text analysis issue. The default tokenizer (usually 
StandardTokenizer or ClassicTokenizer) treats # as a delimiter, so soul#person 
gets split into two separate tokens: soul and person.

Where to look:
Your field type definition in schema.xml (or managed-schema) - specifically the 
<analyzer> section for your text field.

Options to fix it:
        1.      Use WhitespaceTokenizer - Only splits on whitespace, so 
soul#person stays as a single token. In your field type, change the tokenizer 
to: solr.WhitespaceTokenizerFactory
        2.      Use PatternTokenizer with a custom regex - Gives you 
fine-grained control over what characters split tokens.
        3.      Add a WordDelimiterGraphFilter with specific settings - You can 
configure exactly which characters cause splits. Set splitOnNumerics=“0”, 
splitOnCaseChange=“0”, generateWordParts=“0”, generateNumberParts=“0”, 
catenateWords=“1”, preserveOriginal=“1”.
        4.      Use a MappingCharFilter - Map # to something that won’t cause a 
split before tokenization.
Documentation links:
        ∙       Tokenizers: 
https://solr.apache.org/guide/solr/latest/indexing-guide/tokenizers.html
        ∙       Filters: 
https://solr.apache.org/guide/solr/latest/indexing-guide/filters.html
Important: Whatever you change at index time, you need to apply the same 
analysis at query time, then reindex your data.

Ciprian

Opensolr SRL
Your Path to Ai Search
https://opensolr.com

On 27 Dec 2025, at 23:36, Scott Derrick <[email protected]> wrote:

Hi,

     I just noticed that when searching for a term that has an embedded 
non-alphanumeric, the default schema for solr splits it into multiple terms.

     The example was soul#person, which caused a search for soul or person.  The behavior 
we want would be the equivalent of "soul#person".  We don't want the user to 
have to enter their search term in quotes .

     Looking for directions to the specific documentation so I can get this 
fixed...

thanks

Scott

Re: Search term automatically split at non-alphanumeric

Reply via email to