Hello Julio, Currently, you cannot get some (dynamic) list of stopwords. Do you have some specific static list? Then just create your own StopAnalyzer, having a final set of stopwords, or just read the stopwords from some file you define the stopwords in. If you have a dynamic list of stopwords, I think you have to build something more smart
-Ard > > Thanks Ard for taking the time to respond. > > I just responded to Marcel, I have an idea how to introduce > into my configuration the Lucene StopAnalyzer (see my > previous message: > <param name="analyzer" > value="org.apache.lucene.analysis.StopAnalyzer"/>). > I just want to know how do I feed to this standard Analyzer > input so that it knows which words to stop/exclude. I don't > know how to get Jackrabbit/Lucene standard set of stop words > (I wish to add some words to it). > > Thanks > > ** julio > > -----Original Message----- > From: Ard Schrijvers [mailto:[EMAIL PROTECTED] > Sent: Wednesday, October 22, 2008 5:54 AM > To: [email protected] > Subject: RE: Excluding words > > Sorry Julio for not responding, I was very occupied. > > As Marcel pointed out you can just configure your own > analyzer. If your stop words are some default set, you can > just use some other standard lucene analyzer, see [1] > > For a lot of available analyzers. Realize that most language > analyzers also by default do stemming. At [1] you also have > the StopAnalyzer which does what you want > > Regards Ard > > [1] > http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc > //org/apac > he/lucene/analysis/Analyzer.html > > > > > Hi, > > > > there parameter that allows you to configure a custom analyzer is > > called 'analyzer'. the default value for this parameter is > > org.apache.lucene.analysis.standard.StandardAnalyzer. so, you just > > have to write your own implementation that supports stop words and > > then configure it properly in your workspace.xml files. > > > > see also: http://wiki.apache.org/jackrabbit/Search > > > > regards > > marcel > > > > Julio Castillo wrote: > > > Hi there, > > > Unfortunately there was no response to my previous posting. > > > > > > I am still looking for sample configuration specifications > > that would > > > allow me to specify a lucene stop word analyzer. > > > > > > Anybody has a sample repository config file where they have > > referenced > > > a stopwords.txt type file? > > > > > > Thanks > > > > > > ** julio > > > > > > -----Original Message----- > > > From: Julio Castillo [mailto:[EMAIL PROTECTED] > > > Sent: Wednesday, October 15, 2008 9:30 AM > > > To: '[email protected]' > > > Subject: RE: Excluding words > > > > > > Thanks Ard, > > > Let me see if I understood you, as the link doesn't exactly > > show how, > > > but I will guess. Currently my repository.xml has the > > following entry: > > > > > > <SearchIndex > > class="org.apache.jackrabbit.core.query.lucene.SearchIndex"> > > > <param name="path" value="${wsp.home}/index"/> > > > <param name="textFilterClasses" > > > > value="org.apache.jackrabbit.extractor.MsWordTextExtractor,...<list > > > truncated>.."/> > > > <param name="extractorPoolSize " value="2"/> > > > <param name="supportHighlighting" value="true"/> </SearchIndex> > > > > > > I saw an example for synonyms, so I imagine it would look > > like this (I > > > just need the actual correct parameter names)? > > > > > > <param name="stopWordAnalyzerClass" > > > value="org.apache.lucene.analysis.StopAnalyzer"/> > > > <param name="stopWordAnalyzerConfigPath" > > value="../stopwords.txt"/> > > > > > > Thanks > > > > > > ** julio > > > > > > -----Original Message----- > > > From: Ard Schrijvers [mailto:[EMAIL PROTECTED] > > > Sent: Wednesday, October 15, 2008 4:39 AM > > > To: [email protected] > > > Subject: RE: Excluding words > > > > > > Hello Julio, > > > > > > You can define your own lucene analyzer in Jackrabbit (even per > > > property, see [1] at the bottom). If you just configure a lucene > > > analyzer having a list of stopwords for example, where you > > create the > > > list yourself, you are done. > > > > > > Regards Ard > > > > > > [1] http://wiki.apache.org/jackrabbit/IndexingConfiguration > > > > > >> Is there a way to perhaps on a per node insertion basis > > exclude words > > >> from being indexed by Lucene? > > >> > > >> I have to load a large volume of documents. There are > > certain words > > >> that I want to exclude as they will be present in 99% of the > > >> documents, but I haven't found a way to access or restrict > > Lucene to > > >> prevent it from indexing such words. > > >> > > >> Any ideas? > > >> > > >> Julio Castillo > > >> Edgenuity Inc. > > >> > > >> > > > > > > > > > > > >
