Steve
Thank for the contact. I believe UAX29URLEmailTokenizer tokenizes email
addresses as follows: john@mycompany.com.au john.doe
mycompany.com.au john doe mycompany com au com.au.We have an overridden
query parser that swaps out anyaddress: with to, from, cc, bcc, etc.
Inside the overri
Hi Jamie,
What does EmailFilter do?
Why is the expanded form "required for the UAX29URLEmailTokenizer"? Seems like
an exact match would work on the email address alone, without the expanded
components?
Do you have an example of a query that reproducibly matches more documents than
it shoul
Uwe
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: Jamie [mailto:ja...@mailarchiva.com]
> Sent: Friday, March 28, 2014 4:41 PM
> To: java-user@lucene.apache.org
> Subject: Re: Lucene 4.
I beg your pardon. Its our EmailFilter class that emits the tokens. We
do it this way, since users like to search using individual components
of an email address. e.g. joe or mycompany.com.au. I think we may have a
synchronization issue at play. I will perform some further testing and
will get
Jamie,
UAX29URLEmailTokenizer does not emit email components as tokens;
“john@mycompany.com.au” will be tokenized as “john@mycompany.com.au”,
nothing more. That’s why I asked what EmailFilter does.
If the filter really is ignored by Lucene, that would be a bug in Lucene. I
think some