Thanks Ard,
Let me see if I understood you, as the link doesn't exactly show how, but I
will guess. Currently my repository.xml has the following entry:
<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
<param name="path" value="${wsp.home}/index"/>
<param name="textFilterClasses"
value="org.apache.jackrabbit.extractor.MsWordTextExtractor,...<list
truncated>.."/>
<param name="extractorPoolSize " value="2"/>
<param name="supportHighlighting" value="true"/>
</SearchIndex>
I saw an example for synonyms, so I imagine it would look like this (I just
need the actual correct parameter names)?
<param name="stopWordAnalyzerClass"
value="org.apache.lucene.analysis.StopAnalyzer"/>
<param name="stopWordAnalyzerConfigPath" value="../stopwords.txt"/>
Thanks
** julio
-----Original Message-----
From: Ard Schrijvers [mailto:[EMAIL PROTECTED]
Sent: Wednesday, October 15, 2008 4:39 AM
To: [email protected]
Subject: RE: Excluding words
Hello Julio,
You can define your own lucene analyzer in Jackrabbit (even per property,
see [1] at the bottom). If you just configure a lucene analyzer having a
list of stopwords for example, where you create the list yourself, you are
done.
Regards Ard
[1] http://wiki.apache.org/jackrabbit/IndexingConfiguration
>
> Is there a way to perhaps on a per node insertion basis exclude words
> from being indexed by Lucene?
>
> I have to load a large volume of documents. There are certain words
> that I want to exclude as they will be present in 99% of the
> documents, but I haven't found a way to access or restrict Lucene to
> prevent it from indexing such words.
>
> Any ideas?
>
> Julio Castillo
> Edgenuity Inc.
>
>