Hi there, Unfortunately there was no response to my previous posting. I am still looking for sample configuration specifications that would allow me to specify a lucene stop word analyzer.
Anybody has a sample repository config file where they have referenced a stopwords.txt type file? Thanks ** julio -----Original Message----- From: Julio Castillo [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 15, 2008 9:30 AM To: '[email protected]' Subject: RE: Excluding words Thanks Ard, Let me see if I understood you, as the link doesn't exactly show how, but I will guess. Currently my repository.xml has the following entry: <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex"> <param name="path" value="${wsp.home}/index"/> <param name="textFilterClasses" value="org.apache.jackrabbit.extractor.MsWordTextExtractor,...<list truncated>.."/> <param name="extractorPoolSize " value="2"/> <param name="supportHighlighting" value="true"/> </SearchIndex> I saw an example for synonyms, so I imagine it would look like this (I just need the actual correct parameter names)? <param name="stopWordAnalyzerClass" value="org.apache.lucene.analysis.StopAnalyzer"/> <param name="stopWordAnalyzerConfigPath" value="../stopwords.txt"/> Thanks ** julio -----Original Message----- From: Ard Schrijvers [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 15, 2008 4:39 AM To: [email protected] Subject: RE: Excluding words Hello Julio, You can define your own lucene analyzer in Jackrabbit (even per property, see [1] at the bottom). If you just configure a lucene analyzer having a list of stopwords for example, where you create the list yourself, you are done. Regards Ard [1] http://wiki.apache.org/jackrabbit/IndexingConfiguration > > Is there a way to perhaps on a per node insertion basis exclude words > from being indexed by Lucene? > > I have to load a large volume of documents. There are certain words > that I want to exclude as they will be present in 99% of the > documents, but I haven't found a way to access or restrict Lucene to > prevent it from indexing such words. > > Any ideas? > > Julio Castillo > Edgenuity Inc. > >
