aslam bari a écrit :
hi aslam here is my patch code to configure Stop words You have to
compile Slide using this LuceneContentIndexer (make a diff with your
current
version before to pick up the changes ) and you can set up your
stopword file in the Domain.xml
I proposed this change and several others to the dev Team but I never
get any serious anwser I think they don't care about this king of problems
I you implement it succesfully you should propose it to the dev team may
be you gonna be more lucky tahan I am
Regards
B DOREL
Hi,
Yes i am interested. But plz let me know how can i set this in Slide and How I
can do this for English words.
----- Original Message ----
From: Dorel bruno <[EMAIL PROTECTED]>
To: Slide Users Mailing List <[email protected]>
Sent: Friday, 23 March, 2007 9:27:42 PM
Subject: Re: How to set StandardAnalyzer Stop Words
Ven Helsing a écrit :
Hello all,
I want to use StandardAnaylyzer for Lucene content indexing and also don't
need to Stop (ignore) common words which is default to StandardAnaylyzer.
Means I want to use StandardAnayzer's constructor with empty set.
How to do so?
Thanks...
We proposed a patch to the dev team ........................ but of no
avail ! I you are interrested we have made a patch to configure stop
words (we use french stopwords)
B DOREL
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
__________________________________________________________
Yahoo! India Answers: Share what you know. Learn something new
http://in.answers.yahoo.com/
---------------------------------------------------------------------------------------
Orange vous informe que cet e-mail a ete controle par l'anti-virus mail.
Aucun virus connu a ce jour par nos services n'a ete detecte.
/*
* $Header:
/home/cvspublic/jakarta-slide/src/stores/org/apache/slide/index/lucene/LuceneContentIndexer.java,v
1.3 2005/04/04 13:55:13 luetzkendorf Exp $
* $Revision: 1.3 $
* $Date: 2005/04/04 13:55:13 $
*
* ====================================================================
*
* Copyright 1999-2004 The Apache Software Foundation
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* V1.0 : NA/EADS, le 29/08/05
* Adaptation : la suppression d'une entrée de l'index ne contrôle plus
* la présence d'un extracteur (basé sur URI) car on ne peut pas faire
* ce contrôle sur 'displayname' (impossible de récupérer la propriété)
* V1.1 : NA/EADS, le 28/02/06
* FFT 2006/EADS/0656 : récupération du nouveau paramètre optionnel
* 'analyzer-stopwords' du fichier de configuration Domain.xml
* et création du StandardAnalizer avec ce paramètre.
* Si pas de fichier de stop-words, alors le StandardAnalyser utilise
* par défaut les ENGLISH_STOPWORDS
* V1.2 : JM/EADS, le 18/10/06
* FFT 2006/EADS/0973 : Passage de l'Uri plutôt que du contenu
*/
package org.apache.slide.index.lucene;
import java.io.File;
import java.io.IOException;
import java.util.Hashtable;
import javax.transaction.xa.XAException;
import javax.transaction.xa.Xid;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.SimpleAnalyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.slide.common.NamespaceAccessToken;
import org.apache.slide.common.ServiceInitializationFailedException;
import org.apache.slide.common.ServiceParameterErrorException;
import org.apache.slide.common.ServiceParameterMissingException;
import org.apache.slide.common.Uri;
import org.apache.slide.content.NodeRevisionContent;
import org.apache.slide.content.NodeRevisionDescriptor;
import org.apache.slide.content.NodeRevisionNumber;
import org.apache.slide.event.DomainEvent;
import org.apache.slide.event.EventDispatcher;
import org.apache.slide.extractor.ExtractorManager;
import org.apache.slide.search.IndexException;
/**
* IndexStore implementation for indexing content based on Jakarta Lucene.
*/
public class LuceneContentIndexer extends AbstractLuceneIndexer
{
private static final String ANALYZER_PARAM = "analyzer";
private String analyzerClassName;
private static final String ANALYZER_STOPWORDS_PARAM = "analyzer-stopwords";
private File analyzerStopWordsFile;
public void initialize(NamespaceAccessToken token)
throws ServiceInitializationFailedException
{
super.initialize(token);
try {
indexConfiguration.initDefaultConfiguration();
indexConfiguration.setContentAnalyzer(
createAnalyzer(this.analyzerClassName));
this.index = new Index(indexConfiguration, getLogger(),
"content " + this.scope);
if (this.index.needsInitialization()) {
DomainEvent.NAMESPACE_INITIALIZED.setEnabled(true);
EventDispatcher.getInstance().addEventListener(
new IndexInitializer(this.scope,
IndexInitializer.CONTENT, getLogger()));
}
}
catch (IndexException e) {
throw new ServiceInitializationFailedException(this, e);
}
}
public void setParameters(Hashtable parameters)
throws ServiceParameterErrorException,
ServiceParameterMissingException
{
super.setParameters(parameters);
// Récupération du fichier de StopWords
analyzerClassName = (String)parameters.get(ANALYZER_PARAM);
analyzerStopWordsFile =
new File((String)parameters.get(ANALYZER_STOPWORDS_PARAM));
// Contrôle de validité du fichier
if (!analyzerStopWordsFile.exists()
|| analyzerStopWordsFile.isDirectory()
|| ! analyzerStopWordsFile.canRead()) {
analyzerStopWordsFile = null;
}
}
/**
* This implementation just calls the super implementation and catches
* all exceptions to ensure that content indexing never makes a commit
failing.
*/
public void commit(Xid xid, boolean onePhase) throws XAException
{
try {
super.commit(xid, onePhase);
} catch (XAException e) {
error("Error while committing to content index ({0})", e);
}
}
/*
* @see
org.apache.slide.search.Indexer#createIndex(org.apache.slide.common.Uri,
org.apache.slide.content.NodeRevisionDescriptor,
org.apache.slide.content.NodeRevisionContent)
*/
public void createIndex(Uri uri, NodeRevisionDescriptor revisionDescriptor,
NodeRevisionContent revisionContent) throws IndexException
{
if (isIncluded(uri.toString())) {
if (ExtractorManager.getInstance().hasContentExtractor(
uri.getNamespace().getName(), uri.toString(),
revisionDescriptor))
{
TransactionalIndexResource indexResource = getCurrentTxn();
indexResource.addIndexJob(uri, revisionDescriptor, true);
}
}
}
/*
* @see
org.apache.slide.search.Indexer#updateIndex(org.apache.slide.common.Uri,
org.apache.slide.content.NodeRevisionDescriptor,
org.apache.slide.content.NodeRevisionContent)
*/
public void updateIndex(Uri uri, NodeRevisionDescriptor revisionDescriptor,
NodeRevisionContent revisionContent) throws IndexException
{
if (isIncluded(uri.toString())) {
if (ExtractorManager.getInstance().hasContentExtractor(
uri.getNamespace().getName(), uri.toString(),
revisionDescriptor))
{
TransactionalIndexResource indexResource = getCurrentTxn();
indexResource.addUpdateJob(uri, revisionDescriptor, true);
}
}
}
/*
* @see
org.apache.slide.search.Indexer#dropIndex(org.apache.slide.common.Uri,
org.apache.slide.content.NodeRevisionNumber)
*/
public void dropIndex(Uri uri, NodeRevisionNumber number)
throws IndexException
{
if (isIncluded(uri.toString())) {
// if (ExtractorManager.getInstance().hasContentExtractor(
// uri.getNamespace().getName(), uri.toString(), null))
// {
TransactionalIndexResource indexResource = getCurrentTxn();
indexResource.addRemoveJob(uri, number);
// }
}
}
protected Analyzer createAnalyzer(String clsName)
throws ServiceInitializationFailedException
{
Analyzer analyzer;
if (clsName == null || clsName.length() == 0) {
analyzer = new SimpleAnalyzer();
} else {
try {
if (clsName.indexOf("StandardAnalyzer") > -1) {
// StandardAnalyzer
if (analyzerStopWordsFile != null) {
// utilisation des Stop-Words spécifiés dans un fichier
analyzer = new StandardAnalyzer(analyzerStopWordsFile);
} else {
// utilisation des Stop-Words par défaut
analyzer = new StandardAnalyzer();
}
} else {
// Tout autre Analyzer
Class analyzerClazz = Class.forName(clsName);
analyzer = (Analyzer)analyzerClazz.newInstance();
}
} catch (ClassNotFoundException e) {
error("Error while instantiating analyzer {1} {2}",
clsName, e.getMessage());
throw new ServiceInitializationFailedException(this, e);
} catch (InstantiationException e) {
error("Error while instantiating analyzer {1} {2}",
clsName, e.getMessage());
throw new ServiceInitializationFailedException(this, e);
} catch (IllegalAccessException e) {
error("Error while instantiating analyzer {1} {2}",
clsName, e.getMessage());
throw new ServiceInitializationFailedException(this, e);
} catch (IOException e) {
error("Error while instantiating analyzer {1} {2}",
clsName, e.getMessage());
throw new ServiceInitializationFailedException(this, e);
}
}
info("using analyzer: {0}", analyzer.getClass().getName());
return analyzer;
}
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]