RE: Excluding words

Julio Castillo Wed, 15 Oct 2008 09:30:35 -0700

Thanks Ard,
Let me see if I understood you, as the link doesn't exactly show how, but I
will guess. Currently my repository.xml has the following entry:


<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
  <param name="path" value="${wsp.home}/index"/>
  <param name="textFilterClasses"
value="org.apache.jackrabbit.extractor.MsWordTextExtractor,...<list
truncated>.."/>
  <param name="extractorPoolSize " value="2"/>
  <param name="supportHighlighting" value="true"/>
</SearchIndex>

I saw an example for synonyms, so I imagine it would look like this (I just
need the actual correct parameter names)?

  <param name="stopWordAnalyzerClass"
value="org.apache.lucene.analysis.StopAnalyzer"/>
  <param name="stopWordAnalyzerConfigPath" value="../stopwords.txt"/>

Thanks

** julio

-----Original Message-----
From: Ard Schrijvers [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 15, 2008 4:39 AM
To: [email protected]
Subject: RE: Excluding words

Hello Julio,

You can define your own lucene analyzer in Jackrabbit (even per property,
see [1] at the bottom). If you just configure a lucene analyzer having a
list of stopwords for example, where you create the list yourself, you are
done.

Regards Ard

[1] http://wiki.apache.org/jackrabbit/IndexingConfiguration

> 
> Is there a way to perhaps on a per node insertion basis exclude words 
> from being indexed by Lucene?
> 
> I have to load a large volume of documents. There are certain words 
> that I want to exclude as they will be present in 99% of the 
> documents, but I haven't found a way to access or restrict Lucene to 
> prevent it from indexing such words.
> 
> Any ideas?
> 
> Julio Castillo
> Edgenuity Inc.
> 
>

RE: Excluding words

Reply via email to