Marcel,
I wish to use the standard Lucene Stop word analyzer:
org.apache.lucene.analysis.StopAnalyzer
So based on the wiki page indicating the Search parameters configuration it
would look something like this?
<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
<param name="path" value="${wsp.home}/index"/>
<param name="textFilterClasses"
value="org.apache.jackrabbit.extractor.MsWordTextExtractor,...<list
truncated>.."/>
<param name="analyzer" value="org.apache.lucene.analysis.StopAnalyzer"/>
</SearchIndex>
Where and how do I specify which words should be excluded (stopped?).
Thanks
** julio
-----Original Message-----
From: Marcel Reutegger [mailto:[EMAIL PROTECTED]
Sent: Wednesday, October 22, 2008 5:07 AM
To: [email protected]
Subject: Re: Excluding words
Hi,
there parameter that allows you to configure a custom analyzer is called
'analyzer'. the default value for this parameter is
org.apache.lucene.analysis.standard.StandardAnalyzer. so, you just have to
write your own implementation that supports stop words and then configure it
properly in your workspace.xml files.
see also: http://wiki.apache.org/jackrabbit/Search
regards
marcel
> -----Original Message-----
> From: Julio Castillo [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, October 15, 2008 9:30 AM
> To: '[email protected]'
> Subject: RE: Excluding words
>
> Thanks Ard,
> Let me see if I understood you, as the link doesn't exactly show
> how, but I will guess. Currently my repository.xml has
> the following entry:
>
<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
<param name="path" value="${wsp.home}/index"/>
<param name="textFilterClasses"
value="org.apache.jackrabbit.extractor.MsWordTextExtractor,...<list
truncated>.."/>
<param name="extractorPoolSize " value="2"/>
<param name="supportHighlighting" value="true"/>
</SearchIndex>
I saw an example for synonyms,so I imagine it would look like this (I
just need the actual correct parameter names)?
<param name="stopWordAnalyzerClass"
value="org.apache.lucene.analysis.StopAnalyzer"/>
<param name="stopWordAnalyzerConfigPath" value="../stopwords.txt"/>
Thanks
>
> ** julio
>
> -----Original Message-----
> From: Ard Schrijvers [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, October 15, 2008 4:39 AM
> To: [email protected]
> Subject: RE: Excluding words
>
> Hello Julio,
>
> You can define your own lucene analyzer in Jackrabbit (even per
> property, see [1] at the bottom). If you just configure a lucene
> analyzer having a list of stopwords for example, where you create the
> list yourself, you are done.
>
> Regards Ard
>
> [1] http://wiki.apache.org/jackrabbit/IndexingConfiguration
>
>> Is there a way to perhaps on a per node insertion basis exclude words
>> from being indexed by Lucene?
>>
>> I have to load a large volume of documents. There are certain words
>> that I want to exclude as they will be present in 99% of the
>> documents, but I haven't found a way to access or restrict Lucene to
>> prevent it from indexing such words.
>>
>> Any ideas?
>>
>> Julio Castillo
>> Edgenuity Inc.
>>
>>
>
>