Thanks Ciprian! I'll do this immediately and report back!
Scott On 12/27/2025 1:41 PM, Ciprian Dimofte - Opensolr.com wrote:
Hi Scott, This is a classic Solr text analysis issue. The default tokenizer (usually StandardTokenizer or ClassicTokenizer) treats # as a delimiter, so soul#person gets split into two separate tokens: soul and person. Where to look: Your field type definition in schema.xml (or managed-schema) - specifically the <analyzer> section for your text field. Options to fix it: 1. Use WhitespaceTokenizer - Only splits on whitespace, so soul#person stays as a single token. In your field type, change the tokenizer to: solr.WhitespaceTokenizerFactory 2. Use PatternTokenizer with a custom regex - Gives you fine-grained control over what characters split tokens. 3. Add a WordDelimiterGraphFilter with specific settings - You can configure exactly which characters cause splits. Set splitOnNumerics=“0”, splitOnCaseChange=“0”, generateWordParts=“0”, generateNumberParts=“0”, catenateWords=“1”, preserveOriginal=“1”. 4. Use a MappingCharFilter - Map # to something that won’t cause a split before tokenization. Documentation links: ∙ Tokenizers: https://solr.apache.org/guide/solr/latest/indexing-guide/tokenizers.html ∙ Filters: https://solr.apache.org/guide/solr/latest/indexing-guide/filters.html Important: Whatever you change at index time, you need to apply the same analysis at query time, then reindex your data. Ciprian Opensolr SRL Your Path to Ai Search https://opensolr.comOn 27 Dec 2025, at 23:36, Scott Derrick <[email protected]> wrote: Hi, I just noticed that when searching for a term that has an embedded non-alphanumeric, the default schema for solr splits it into multiple terms. The example was soul#person, which caused a search for soul or person. The behavior we want would be the equivalent of "soul#person". We don't want the user to have to enter their search term in quotes . Looking for directions to the specific documentation so I can get this fixed... thanks Scott
