Thanks Adrien, got it.
2018-07-04 13:46 GMT+08:00 Adrien Grand :
> This was considered a bug as the need to early-terminate is a per-collector
> decision. If you want to do something like that again, you could fork
> MultiCollector and propagate CollectionTerminatedExceptions.
>
> Le mer. 4 juil.
This was considered a bug as the need to early-terminate is a per-collector
decision. If you want to do something like that again, you could fork
MultiCollector and propagate CollectionTerminatedExceptions.
Le mer. 4 juil. 2018 à 05:34, Yonghui Zhao a écrit :
> In lucene 4.10,
> If one collector
In lucene 4.10,
If one collector throw CollectionTerminatedException, all collectors are
terminated.
In lucene 7.2.1, CollectionTerminatedException will only terminate current
collector, the others won't be terminated.
How to keep old behavior?
03 July 2018, Apache Lucene™ 6.6.5 available
The Lucene PMC is pleased to announce the release of Apache Lucene 6.6.5.
Apache Lucene is a high-performance, full-featured text search engine
library written entirely in Java. It is a technology suitable for nearly
any application that requires full-
Ah I see -- there is \p{Emoji} to start with, which is nice, but also this
extended pictographic -- I'll read more, and get back if I have questions.
Might be a little while before I dig in to this though. Thanks again
On Tue, Jul 3, 2018 at 11:25 AM Robert Muir wrote:
> If you customized the ru
If you customized the rules, maybe have a look at
https://issues.apache.org/jira/browse/LUCENE-8366
The rules got simpler and we also updated the customization example
used for the factory's test.
On Tue, Jul 3, 2018 at 10:46 AM, Michael Sokolov wrote:
> Yes that sounds good -- this ConditionalT
Yes that sounds good -- this ConditionalTokenFilter is going to be very
helpful. We have overridden the ICUTokenizer's rbbi rules, but I'll poke
around and see about incorporating the emoji rules from there. Thanks
Robert
On Tue, Jul 3, 2018 at 9:28 AM Robert Muir wrote:
> > Any thoughts?
>
> b
Thanks for the pointer
On Tue, Jul 3, 2018 at 9:04 AM julien Blaize
wrote:
> Hello Michael,
>
> i had previously worked on emoji detection with lucene.
>
> I had to extends the Tokenizer class (and not the TokenFilter like
> WordDelimiterFilter) to preserve the delimiter attribute.
> I also had
> Any thoughts?
best idea I have would be to tokenize with ICUTokenizer, which will
tag emoji sequences as "" token type, then use
ConditionalTokenFilter to send all tokens EXCEPT those with token type
of "" to your WordDelimiterFilter. This way
WordDelimiterFilter never sees the emoji at all and
On Tue, Jul 3, 2018 at 8:00 AM, Michael Sokolov wrote:
> WDGF (and WordDelimiterFilter) treat emoji as "SUBWORD_DELIM" characters
> like punctuation and thus remove them, but we would like to be able to
> search for emoji and use this filter for handling dashes, dots and other
> intra-word punctua
Hello Michael,
i had previously worked on emoji detection with lucene.
I had to extends the Tokenizer class (and not the TokenFilter like
WordDelimiterFilter) to preserve the delimiter attribute.
I also had to keep track of consecutive delimiters in the character stream
because Lucene default imp
WDGF (and WordDelimiterFilter) treat emoji as "SUBWORD_DELIM" characters
like punctuation and thus remove them, but we would like to be able to
search for emoji and use this filter for handling dashes, dots and other
intra-word punctuation.
These filters identify non-word and non-digit characters
12 matches
Mail list logo