On Thu, Jan 17, 2013 at 5:48 AM, Nick Wellnhofer <[email protected]> wrote: > It can be done, but it's not trivial and probably not very performant. > First, you have to write your own Analyzer class in Perl. See the following > threads for some guidance: > > http://mail-archives.apache.org/mod_mbox/lucy-user/201111.mbox/%[email protected]%3E > http://mail-archives.apache.org/mod_mbox/lucy-user/201207.mbox/%[email protected]%3E > > We really need a cookbook entry describing how to write custom analyzers.
Putting up a cookbook entry on wiki.apache.org/lucy would be great. I'd have some misgivings about adding an entry to the offical docs, though, because subclassing Analyzer isn't officially supported. (Background: Attempts to increase the speed of the current array-based Analyzer system using memory pools to allocate Tokens fell short of expectations; we need to see whether a stream-based implementation would be superior, but that would require a different subclassing API.) Marvin Humphrey
