Hi everybody,
Sorry if I come again on this issue with this long mail but I really 
cant have my plugin loaded.
I have read and applied the suggestion given  in various previous 
postings on this list
but i still have not get results

Well basically I  have used part of the code written for the "recommended"
plugin example from the nutch wiki, and kept only the Parse extension.
I have ported it a on nutch 0.9 and run the inject/generate/fetch cycle.
The plugin is compiled and correctly installed in 
$NUTCH_HOME/plugins/parse-rec directory.

My problem is the it looks like that my plugin is never executed even if 
it appears to be correctly registered.
Another problem I got is to make the plugin  system to produce some  
logs unless I invoke it directly (see below)

I add here all my code/config etc. hoping someone can point out my 
mistakes or misunderstanding .

-Corrado

I took the code from the latest nightly  "At revision 472436"
put my plugin code in 
trunk/src/plugin/parse-rec/src/java/org/apache/nutch/parse/rec/RecParseFilter.java

here is the code  and  config files:
__________________________ RecParseFilter.java 
______________________________________
package org.apache.nutch.parse.rec;

// JDK imports
import java.util.Enumeration;
import java.util.Properties;
import java.util.logging.Logger;

// Nutch imports
import org.apache.nutch.parse.HTMLMetaTags;
import org.apache.nutch.parse.Parse;
import org.apache.nutch.parse.HtmlParseFilter;
import org.apache.nutch.protocol.Content;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;

import org.apache.nutch.util.NutchConfiguration;
import org.apache.hadoop.conf.Configuration;

import org.w3c.dom.DocumentFragment;

public class RecParseFilter implements HtmlParseFilter {

  /** Configuration  */
  private Configuration conf;

  public static final Log LOG = LogFactory.getLog("RecParseFilter.class");

  /** The Recommended meta data attribute name */
  public static final String META_RECOMMENDED_NAME="Recommended";

  /** Scan the HTML document looking for a recommended meta tag.  */
  public Parse filter(Content content, Parse parse, HTMLMetaTags 
metaTags, DocumentFragment doc) {

        LOG.debug("RecParseFilter::filter() --->");
        /** Trying to find the document's recommended term */
        String recommendation = null;
        Properties generalMetaTags = metaTags.getGeneralTags();
        String title = parse.getData().getTitle();
        LOG.debug("RecParseFilter::filter() - Document Title : " + title);

        for(Enumeration tagNames = generalMetaTags.propertyNames(); 
tagNames.hasMoreElements(); ) {
            if (tagNames.nextElement().equals("recommended")) {
                recommendation = generalMetaTags.getProperty("recommended");
                LOG.debug("RecParseFilter::filter() - Found a 
Recommendation for " + recommendation);
             }
        }

        if(recommendation == null)
           LOG.debug("RecParseFilter::filter() - No Recommendataion");
        else {
           LOG.debug("RecParseFilter::filter() - Adding Recommendation 
for " + recommendation);
           parse.getData().getContentMeta().set(META_RECOMMENDED_NAME, 
recommendation);
        }
        LOG.debug("RecParseFilter::filter() <--");
        return parse;
  }

  public Configuration getConf() {
    LOG.debug("RecParseFilter::getConf() -->");
    LOG.debug("RecParseFilter::getConf() <--");
    return this.conf;
  }

  public void setConf(Configuration conf) {
    LOG.debug("RecParseFilter::setConf() -->");
    LOG.debug("RecParseFilter::setConf() <--");
    this.conf = conf;
  }
}
________________________________________________________________

_________________________plugin.xml_______________________________

<?xml version="1.0" encoding="UTF-8"?>
<plugin
   id="parse-rec"
   name="Recommended Parser/Filter"
   version="0.0.1"
   provider-name="nutch.org">

   <runtime>
      <!-- As defined in build.xml this plugin will end up bundled as 
recommended.jar -->
      <library name="parse-rec.jar">
         <export name="*"/>
      </library>
   </runtime>

   <requires>
    <import plugin="nutch-extensionpoints"/>
   </requires>

   <!-- The RecommendedParser extends the HtmlParseFilter to grab the 
contents of any recommended meta tags -->
   <extension id="org.apache.nutch.parse.rec.RecParseFilter"
              name="Recommended Parser"
              point="org.apache.nutch.parse.HtmlParseFilter">
      <implementation id="RecParseFilter" 
class="org.apache.nutch.parse.rec.RecParseFilter">
         <parameter name="contentType" value="text/html"/>
         <parameter name="pathSuffix"  value=""/>
      </implementation>
   </extension>
</plugin>
________________________________________________________________

I have added this line in nutch-site.xml

___________________________nutch-site.xml__________________________
      <property>
        <name>plugin.includes</name>   
<value>*nutch-extensionpoints*|protocol-http|urlfilter-regex|*parse-(*text|html|js|rec)|index-basic|query-(basic|site|url)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value>

      </property>
________________________________________________________________

I have added this lines in parse-plugins.xml.
Whell I also tried to have only my  plugin with the same results

___________________________parse.plugins.xml__________________________
        <mimeType name="text/html">
                <plugin id="parse-rec" />
                <plugin id="parse-html" />
        </mimeType>
________________________________________________________________

and finally added a line to make plugin system to log in log4j.properties
But despite of the this line I get no plugins logs at all.
___________________________log4j.properties__________________________
log4j.logger.org.apache.nutch.plugin=DEBUG
________________________________________________________________

After having run the fetcher I was expecting to have the "recommended" 
meta tag in my segement

nutch readseg -get test/segments/20061108110142 
"http://testmachine.toto.net/index.html";
SegmentReader: get 'http://testmachine.toto.net/index.html'
Content::
Version: 2
url: http://testmachine.toto.net/index.html
base: http://testmachine.toto.net/index.html
contentType: text/html
metadata: nutch.segment.name=20061108110142 nutch.crawl.score=1.0
Content:

Crawl Generate::
Version: 4
Status: 1 (DB_unfetched)
Fetch time: Wed Nov 08 10:54:39 CET 2006
Modified time: Thu Jan 01 01:00:00 CET 1970
Retries since fetch: 0
Retry interval: 30.0 days
Score: 1.0
Signature: null
Metadata: null

Crawl Fetch::
Version: 4
Status: 6 (fetch_retry)
Fetch time: Wed Nov 08 11:02:46 CET 2006
Modified time: Thu Jan 01 01:00:00 CET 1970
Retries since fetch: 1
Retry interval: 30.0 days
Score: 1.0
Signature: null
Metadata: null

I have then tried to invoke the plugin directly :
nutch  plugin parse-rec  org.apache.nutch.parse.rec.RecParseFilter

In this way I got the plugin logs I wanted in hadoop.log showing that 
the plugin is registered


.....
2006-11-08 11:07:33,520 DEBUG plugin.PluginRepository - parsing: 
/home/opt/nutch-0.9-dev/plugins/parse-rec/plugin.xml
2006-11-08 11:07:33,526 DEBUG plugin.PluginRepository - plugin: 
id=parse-rec name=Recommended Parser/Filter version=0.0.1 
provider=nutch.orgclass=null
2006-11-08 11:07:33,527 DEBUG plugin.PluginRepository - impl: 
point=org.apache.nutch.parse.HtmlParseFilter 
class=org.apache.nutch.parse.rec.RecParseFilter
2006-11-08 11:07:33,528 DEBUG plugin.PluginRepository - parsing: 
/home/opt/nutch-0.9-dev/plugins/parse-text/plugin.xml
.....
Registered Plugins:
....
2006-11-08 11:07:34,014 INFO  plugin.PluginRepository -         
Recommended Parser/Filter (parse-rec)
....
2006-11-08 11:07:51,827 DEBUG plugin.PluginRepository - parsing: 
/home/opt/nutch-0.9-dev/plugins/parse-rec/plugin.xml
2006-11-08 11:07:51,837 DEBUG plugin.PluginRepository - plugin: 
id=parse-rec name=Recommended Parser/Filter version=0.0.1 
provider=nutch.orgclass=null
...




-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to