Hello Ahmad and all nutch-users I would to thank you for your response. Exactly as you say there isn't the same interface with the version 1.0 of NUTCH So actually i don't have error in building the plugin but the problem is that no new field appears on the index when i display the index using Luke .
So the followings are my 3 new classes of the plugin "author" which don't make any error in compiling: 1 )CLASS AuthorIndexer : ---------------------------- package org.apache.nutch.parse.author; // JDK import import java.util.logging.Logger; // Commons imports import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; // Nutch imports import org.apache.nutch.util.LogUtil; import org.apache.nutch.fetcher.FetcherOutput; import org.apache.nutch.indexer.IndexingFilter; import org.apache.nutch.indexer.IndexingException; import org.apache.nutch.parse.Parse; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.io.Text; import org.apache.nutch.crawl.CrawlDatum; import org.apache.nutch.crawl.Inlinks; // Lucene imports //import org.apache.lucene.document.Field; // //import org.apache.lucene.document.Document; import org.apache.nutch.indexer.field.*; import org.apache.nutch.indexer.NutchDocument; import org.apache.nutch.indexer.lucene.LuceneWriter; public class AuthorIndexer implements IndexingFilter { public static final Log LOG = LogFactory.getLog(AuthorIndexer.class.getName()); private Configuration conf; public AuthorIndexer() { } public NutchDocument filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks) throws IndexingException { String recommendation = parse.getData().getMeta("Author"); if (recommendation != null) { doc.add("author",recommendation); LOG.info("Added " + recommendation + " to the author Field"); } return doc; } public void addIndexBackendOptions(Configuration conf){ // stored, indexed and un-tokenized LuceneWriter.addFieldOptions("author", LuceneWriter.STORE.YES,LuceneWriter.INDEX.UNTOKENIZED, conf); } public void setConf(Configuration conf) { this.conf = conf; } public Configuration getConf() { return this.conf; } } 2) Class AuthorParser : ---------------------------- package org.apache.nutch.parse.author; // JDK imports import java.util.Enumeration; import java.util.Properties; import java.util.logging.Logger; // Nutch imports import org.apache.hadoop.conf.Configuration; import org.apache.nutch.parse.HTMLMetaTags; import org.apache.nutch.parse.ParseResult; import org.apache.nutch.parse.HtmlParseFilter; import org.apache.nutch.protocol.Content; // Commons imports import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; // W3C imports import org.w3c.dom.DocumentFragment; public class AuthorParser implements HtmlParseFilter { private static final Log LOG = LogFactory.getLog(AuthorParser.class.getName()); private Configuration conf; /** The Author meta data attribute name */ public static final String META_RECOMMENDED_NAME="Author"; /** * Scan the HTML document looking for a author meta tag. */ public ParseResult filter(Content content, ParseResult parse,HTMLMetaTags metaTags, DocumentFragment doc) { // Trying to find the document's author term String recommendation = null; Properties generalMetaTags = metaTags.getGeneralTags(); for (Enumeration tagNames = generalMetaTags.propertyNames(); tagNames.hasMoreElements(); ) { if (tagNames.nextElement().equals("author")) { recommendation = generalMetaTags.getProperty("author"); LOG.info("Found a Recommendation for " + recommendation); } } if (recommendation == null) { LOG.info("No Recommendation"); } else { LOG.info("Adding Recommendation for " + recommendation); //we will inject information parse.get("author").getData().getContentMeta().set(META_RECOMMENDED_NAME, recommendation); } return parse; } public void setConf(Configuration conf) { this.conf = conf; } public Configuration getConf() { return this.conf; } } 3) Class AuthorQueryFilter : --------------------------------- package org.apache.nutch.parse.author; import org.apache.nutch.searcher.FieldQueryFilter; import java.util.logging.Logger; // Commons imports import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; public class AuthorQueryFilter extends FieldQueryFilter { private static final Log LOG = LogFactory.getLog(AuthorParser.class.getName()); public AuthorQueryFilter() { super("author", 5f); LOG.info("Added a author query"); } } In addition in nutch-site.xml i have added the term "author" to other plugins to be used . In build.xml on the racine of plugins i have added the line <ant dir="author" target="deploy"/> Also in the schema.xml file of nutch i have added the line <field name="author" type="string" stored="true" indexed="true"/> Therefore i have built Nutch with ant and it works correctly . I dont know why the field "author" doesn't appear in the final index . After I 'have removed the plugin "author " and i have activate the plugin "feed" which comes with Nutch in the plugin directory and which contains a field named "author" . (desactivate on default ) With this , the new field appears in the index plus 3 other fields . MY problem is to add a new field which not exactly the "author" field : For example COUNTRY or ACTIVITY which i want to add it manuallly to the results of NUTCH maybe in using the URL DOMAIN NAME . SO I M BLOCKING NOW IN THE ADD OF NEW FIELD "AUTHOR" AND I DONT KNOW FROM WHERE THE PROBLEM COMES. NOTE: I m using NUTCH with SOLR , so i m not sure if the problem depends or not. I NEED YOUR HELP PLEASE. THANKS. 2010/3/22 Ahmad Al-Amri <amri...@yahoo.com> > > Hello; > > The filter method in the 0.9 example is not the same with 1.0 ver. > interface that implemented. > > note that it is return "Document" but 1.0 one returns "NutchDocument" > .... > and there is bit deference in reading meta tags ... > > check this very helpful links: > > http://sujitpal.blogspot.com/2009/07/nutch-custom-plugin-to-parse-and-add.html > http://sujitpal.blogspot.com/2009/07/nutch-getting-my-feet-wet.html > > I assume you added your "author" folder that contains .jar file to your > plugin directory! > > > Regards; > Ahmad Al-Amri > > > > > > > > ________________________________ > From: Arnaud Garcia <arnaud1...@gmail.com> > To: nutch-user@lucene.apache.org > Sent: Wed, March 17, 2010 8:25:26 AM > Subject: Re: Plugin installed , deployed and works correctly but no new > field in the index ???????????? > > 2010/3/17 Arnaud Garcia <arnaud1...@gmail.com> > > > > > > > 2010/3/17 Arnaud Garcia <arnaud1...@gmail.com> > > > > Hello everybody > >> > >> I m trying to add new plugin to Nutch as it s explain in the howto > >> WritingPluginExample on the apache wiki. > >> > >> Because the Example about the plugin on the wiki is for the version 0.9 > , > >> i m switching to Nutch 0.9 after getting a lot of error with Nutch1.0. > >> > >> My new plugin named author will extract the value of the tag author from > >> pages crawled. > >> > >> The method used is exactly the same method on the wiki with the name > >> "author" in place of 'recommended" . > >> > >> So , all things are built successfully , (plugin (separately)+ Nutch ) > , > >> the name of the plugin ("author) was added in nutch-site.xml file , > >> > >> and the balise <ant dir="author" target="deploy"/> was added correctly > >> in the file /nutch/src/plugin/ and > >> > >> the "author" directory was been created on the directory /nutch/build/ . > >> > >> > >> THE PROBLEM IS : > >> > >> No new field named "author" exists in the index . > >> > >> I m using Luke to read and display the index but theresn't any trace > about > >> the new field "author" . > >> > >> I have verified that the tag named author exists in the Web page which i > >> crawl. > >> > >> > >> ANYONE know from where the problem may come > >> > >> CAN ANYONE HELP ME PLEASE. > >> Best regards > >> THANKS > >> > > s > > > > >