Yes indeed I currently use a workaround with regex filter.

Example for limiting to 30 characters:
<filter class="solr.PatternReplaceFilterFactory" pattern="(.{1,30})(.{31,})" 
replacement="$1" replace="all"/>

Just thought there might be already a filter.
But as Karsten showed it is pretty easy to implement.

May be Karsten can open an issue and add his code?

Regards
Bernd

Am 08.08.2011 22:56, schrieb Markus Jelsma:
There is none indeed exept using copyField and maxChars. Could you perhaps
come up with some regex that replaces the group of chars beyond the desired
limit and replace it with '' ?

That would fit in a pattern replace char filter.

Hi Bernd,

I also searched for such a filter but did not found it.

Best regards
   Karsten

P.S. I am using now this filter:

public class CutMaxLengthFilter extends TokenFilter {

        public CutMaxLengthFilter(TokenStream in) {
                this(in, DEFAULT_MAXLENGTH);
        }

        public CutMaxLengthFilter(TokenStream in, int maxLength) {
                super(in);
                this.maxLength = maxLength;
        }

        public static final int DEFAULT_MAXLENGTH = 15;
        private final int maxLength;
        private final CharTermAttribute termAtt =
addAttribute(CharTermAttribute.class);

        @Override
        public final boolean incrementToken() throws IOException {
                if (!input.incrementToken()) {
                        return false;
                }
                int length = termAtt.length();
                if (maxLength>  0&&  length>  maxLength) {
                        termAtt.setLength(maxLength);
                }
                return true;
        }
}

with this factory

public class CutMaxLengthFilterFactory extends BaseTokenFilterFactory {

        private int maxLength;

        @Override
        public void init(Map<String, String>  args) {
                super.init(args);
                maxLength = getInt("maxLength",
CutMaxLengthFilter.DEFAULT_MAXLENGTH);
        }

        public TokenStream create(TokenStream input) {
                return new CutMaxLengthFilter(input, maxLength);
        }
}



-------- Original-Nachricht --------

Datum: Mon, 08 Aug 2011 10:15:45 +0200
Von: Bernd Fehling<bernd.fehl...@uni-bielefeld.de>
An: solr-user@lucene.apache.org
Betreff: string cut-off filter?

Hi list,

is there a string cut-off filter to limit the length
of a KeywordTokenized string?

So the string should not be dropped, only limitited to a
certain length.

Regards
Bernd

--
*************************************************************
Bernd Fehling                Universitätsbibliothek Bielefeld
Dipl.-Inform. (FH)                        Universitätsstr. 25
Tel. +49 521 106-4060                   Fax. +49 521 106-4052
bernd.fehl...@uni-bielefeld.de                33615 Bielefeld

BASE - Bielefeld Academic Search Engine - www.base-search.net
*************************************************************

Reply via email to