Hi Alexandre,

CombiningFilter sounds close (no option to put spaces between original terms), 
but hasn't yet been committed: 
<https://issues.apache.org/jira/browse/LUCENE-3413>.

Steve

On Jan 8, 2013, at 4:55 PM, Alexandre Rafalovitch <arafa...@gmail.com> wrote:

> Hello,
> 
> I want to take a composite email address  such as "John Doe <
> john...@example.com>" and leave "John Doe" as a facet field.
> 
> So far, I got UAX29.... Tokenizer combined with TypeTokenFilterFactory to
> filter out email type.
> 
> But that leaves with "John" and "Doe" as tokens which I cannot figure out
> how to combine back with extra space to make it back into John Doe.
> 
> I thought about using regexp instead to just string <....>, but that feels
> even less robust.
> 
> Do we have anything ready to use for that or do I need to custom code?
> 
> Regards,
>   Alex.
> 
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)

Reply via email to