Can you enter the text on the Solr Admin UI Analysis page? Then you could tell which stage the issue occurs.

StandardTokenizer has a default token length limit of 255. You can override with the "maxTokenLength" attribute:

<tokenizer class="solr.StandardTokenizerFactory" maxTokenLength="1024" />

See:
https://lucene.apache.org/core/4_2_0/analyzers-common/org/apache/lucene/analysis/standard/StandardTokenizerFactory.html

But the "#" sounds like a bug.

-- Jack Krupansky

-----Original Message----- From: Danny Watari
Sent: Tuesday, April 02, 2013 5:45 PM
To: solr-user@lucene.apache.org
Subject: Lengthy description is converted to hash symbols

Hi, I have a field that is defined to be of type "text_en".  Occasionally, I
notice that lengthy strings are converted to hash symbols.  Here is a
snippet of my field type:

<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
     <analyzer type="index">
       <tokenizer class="solr.StandardTokenizerFactory"/>
       <filter class="solr.LowerCaseFilterFactory"/>
     </analyzer>
     <analyzer type="query">
       <tokenizer class="solr.StandardTokenizerFactory"/>
       <filter class="solr.LowerCaseFilterFactory"/>
     </analyzer>
</fieldType>

<field name="description" type="text_en" indexed="true" stored="true"
required="false" />

Here is an example of the field's value:
<str
name="description">###############################################################################################################################################################################################################################################################</str>


Any ideas why this might be happening?




--
View this message in context: http://lucene.472066.n3.nabble.com/Lengthy-description-is-converted-to-hash-symbols-tp4053338.html Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to