Hi, there parameter that allows you to configure a custom analyzer is called 'analyzer'. the default value for this parameter is org.apache.lucene.analysis.standard.StandardAnalyzer. so, you just have to write your own implementation that supports stop words and then configure it properly in your workspace.xml files.
see also: http://wiki.apache.org/jackrabbit/Search regards marcel Julio Castillo wrote: > Hi there, > Unfortunately there was no response to my previous posting. > > I am still looking for sample configuration specifications that would allow > me to specify a lucene stop word analyzer. > > Anybody has a sample repository config file where they have referenced a > stopwords.txt type file? > > Thanks > > ** julio > > -----Original Message----- > From: Julio Castillo [mailto:[EMAIL PROTECTED] > Sent: Wednesday, October 15, 2008 9:30 AM > To: '[email protected]' > Subject: RE: Excluding words > > Thanks Ard, > Let me see if I understood you, as the link doesn't exactly show how, but I > will guess. Currently my repository.xml has the following entry: > > <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex"> > <param name="path" value="${wsp.home}/index"/> > <param name="textFilterClasses" > value="org.apache.jackrabbit.extractor.MsWordTextExtractor,...<list > truncated>.."/> > <param name="extractorPoolSize " value="2"/> > <param name="supportHighlighting" value="true"/> </SearchIndex> > > I saw an example for synonyms, so I imagine it would look like this (I just > need the actual correct parameter names)? > > <param name="stopWordAnalyzerClass" > value="org.apache.lucene.analysis.StopAnalyzer"/> > <param name="stopWordAnalyzerConfigPath" value="../stopwords.txt"/> > > Thanks > > ** julio > > -----Original Message----- > From: Ard Schrijvers [mailto:[EMAIL PROTECTED] > Sent: Wednesday, October 15, 2008 4:39 AM > To: [email protected] > Subject: RE: Excluding words > > Hello Julio, > > You can define your own lucene analyzer in Jackrabbit (even per property, > see [1] at the bottom). If you just configure a lucene analyzer having a > list of stopwords for example, where you create the list yourself, you are > done. > > Regards Ard > > [1] http://wiki.apache.org/jackrabbit/IndexingConfiguration > >> Is there a way to perhaps on a per node insertion basis exclude words >> from being indexed by Lucene? >> >> I have to load a large volume of documents. There are certain words >> that I want to exclude as they will be present in 99% of the >> documents, but I haven't found a way to access or restrict Lucene to >> prevent it from indexing such words. >> >> Any ideas? >> >> Julio Castillo >> Edgenuity Inc. >> >> > >
