On Mon, Feb 7, 2011 at 10:51 PM, Steven A Rowe sar...@syr.edu wrote:
I haven't done any benchmarking, but I'm pretty sure that ASCIIFoldingFilter
can achieve a significantly higher throughput rate than MappingCharFilter,
and given that, it probably makes sense to keep both, to allow people to
Chris Hostetter-3 wrote:
CharFilters and TokenFilters have different purposes though...
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#When_To_use_a_CharFilter_vs_a_TokenFilter
(ie: If you use MappingCharFilter, you can't then tokenize on some of the
characters you
On Tue, Feb 8, 2011 at 9:12 AM, David Smiley (@MITRE.org)
dsmi...@mitre.org wrote:
I'm skeptical that whatever the difference is is relevant in the scheme of
things. The cost to keeping it is introducing confusion on users, and more
code to maintain.
its pretty significant. charfilters are
Robert Muir wrote:
On Tue, Feb 8, 2011 at 9:12 AM, David Smiley (@MITRE.org)
dsmi...@mitre.org wrote:
I'm skeptical that whatever the difference is is relevant in the scheme
of
things. The cost to keeping it is introducing confusion on users, and
more
code to maintain.
its pretty
On Tue, Feb 8, 2011 at 10:05 AM, David Smiley (@MITRE.org)
dsmi...@mitre.org wrote:
Well then I see a path forward to speed up MappingCharFilter substantially.
There's your LUCENE-2788, and then you could easily add the same no-op
optimization for the smallest char value in the HashMap.
only
unsubscribe
On 2/8/11 7:05 AM, David Smiley (@MITRE.org) wrote:
Robert Muir wrote:
On Tue, Feb 8, 2011 at 9:12 AM, David Smiley (@MITRE.org)
dsmi...@mitre.org wrote:
I'm skeptical that whatever the difference is is relevant in the scheme
of
things. The cost to keeping it is introducing
AFAIK, ISOLatin1AccentFilter was deprecated because ASCIIFoldingFilter provides
a superset of it mappings.
I haven't done any benchmarking, but I'm pretty sure that ASCIIFoldingFilter
can achieve a significantly higher throughput rate than MappingCharFilter, and
given that, it probably makes
:
: ISOLatin1AccentFilter is deprecated, presumably because you can (and should)
: use MappingCharFilter configured with mapping-ISOLatin1Accent.txt. By that
: same reasoning, shouldn't ASCIIFoldingFilter be deprecated in favor of using
: mapping-FoldToASCII.txt ?
CharFilters and TokenFilters