RE: Excluding words

Julio Castillo Wed, 22 Oct 2008 09:19:39 -0700

Marcel,
I wish to use the standard Lucene Stop word analyzer:
org.apache.lucene.analysis.StopAnalyzer


So based on the wiki page indicating the Search parameters configuration it
would look something like this?

<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
  <param name="path" value="${wsp.home}/index"/>
  <param name="textFilterClasses"
value="org.apache.jackrabbit.extractor.MsWordTextExtractor,...<list
truncated>.."/>
  <param name="analyzer" value="org.apache.lucene.analysis.StopAnalyzer"/>
</SearchIndex>

Where and how do I specify which words should be excluded (stopped?).

Thanks

** julio


-----Original Message-----
From: Marcel Reutegger [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 22, 2008 5:07 AM
To: [email protected]
Subject: Re: Excluding words

Hi,

there parameter that allows you to configure a custom analyzer is called
'analyzer'. the default value for this parameter is
org.apache.lucene.analysis.standard.StandardAnalyzer. so, you just have to
write your own implementation that supports stop words and then configure it
properly in your workspace.xml files.

see also: http://wiki.apache.org/jackrabbit/Search

regards
 marcel

> -----Original Message-----
> From: Julio Castillo [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, October 15, 2008 9:30 AM
> To: '[email protected]'
> Subject: RE: Excluding words
> 
> Thanks Ard,
> Let me see if I understood you, as the link doesn't exactly show
> how, but I will guess. Currently my repository.xml has 
> the following entry:
> 
<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
  <param name="path" value="${wsp.home}/index"/>
  <param name="textFilterClasses"
value="org.apache.jackrabbit.extractor.MsWordTextExtractor,...<list
truncated>.."/>
  <param name="extractorPoolSize " value="2"/>
  <param name="supportHighlighting" value="true"/>
</SearchIndex>

I saw an example for synonyms,so I imagine it would look like this (I 
just need the actual correct parameter names)?

  <param name="stopWordAnalyzerClass"
value="org.apache.lucene.analysis.StopAnalyzer"/>
  <param name="stopWordAnalyzerConfigPath" value="../stopwords.txt"/>

 Thanks
> 
> ** julio
> 
> -----Original Message-----
> From: Ard Schrijvers [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, October 15, 2008 4:39 AM
> To: [email protected]
> Subject: RE: Excluding words
> 
> Hello Julio,
> 
> You can define your own lucene analyzer in Jackrabbit (even per 
> property, see [1] at the bottom). If you just configure a lucene 
> analyzer having a list of stopwords for example, where you create the 
> list yourself, you are done.
> 
> Regards Ard
> 
> [1] http://wiki.apache.org/jackrabbit/IndexingConfiguration
> 
>> Is there a way to perhaps on a per node insertion basis exclude words 
>> from being indexed by Lucene?
>>
>> I have to load a large volume of documents. There are certain words 
>> that I want to exclude as they will be present in 99% of the 
>> documents, but I haven't found a way to access or restrict Lucene to 
>> prevent it from indexing such words.
>>
>> Any ideas?
>>
>> Julio Castillo
>> Edgenuity Inc.
>>
>>
> 
>

RE: Excluding words

Reply via email to