Thanks, Jukka. Attached please my version of HTMLParser.java.
----- Original Message ---- From: Jukka Zitting <[email protected]> To: [email protected] Sent: Sunday, January 4, 2009 1:48:10 AM Subject: Re: search results Hi, On Sun, Jan 4, 2009 at 3:08 AM, Cheng Zhang <[email protected]> wrote: > It turns out that the org.apache.jackrabbit.extractor.HTMLParser eats all > digits. > in method filterAndJoin, all non-letters are removed. > Does anybody has any idea why we do so? imo, index "hf100" makes more > sense than indexing "hf". I don't recall any specific reason why digits should be dropped. I'd be happy to apply the fix if you've already fixed this and would like to attach the patch to Jira. > Or is there anyway I can configure to use my HTMLParser instead of the > default? Look at the textFilterClasses parameter in the <SearchIndex/> configuration of your repository.xml and workspace.xml files. BR, Jukka Zitting
