Re: PluginRuntimeException ClassNotFound for ParseFilter plugin in Nutch 2.2 ?

Tony Mullins Wed, 12 Jun 2013 23:47:09 -0700

Hi Tejas,

I am following this example https://github.com/veggen/nutch-element-selector.
And now I have tried this example without any changes to my  fresh source
of Nutch 2.2.


Attached is my patch ( change set) on fresh Nutch 2.2 source.
Kindly review it and please let me know if I am missing something.

Thanks,
Tonny


On Thu, Jun 13, 2013 at 11:19 AM, Tejas Patil <[email protected]>wrote:

> Weird. I would like to have a quick peek into your changes. Maybe you are
> doing something wrong which is hard to predict and figure out by asking
> bunch of questions to you over email. Can you attach a patch file of your
> changes ? Please remove the fluff from it and only keep the bare essential
> things in the patch. Also, if you are working for some company, make sure
> that you attaching some code here should not be against your organisational
> policy.
>
> Thanks,
> Tejas
>
> On Wed, Jun 12, 2013 at 11:03 PM, Tony Mullins <[email protected]
> >wrote:
>
> > I have done this all. Created my plugin's ivy.xml , plugin.xml ,
> build,xml
> > . Added the entry in nutch-site.xml and src>plugin>build.xml.
> > But I am still getting "PluginRuntimeException:
> > java.lang.ClassNotFoundException"
> >
> >
> > Is there any other configuration that I am missing or its Nutch 2.2
> issues
> > ?
> >
> > Thanks,
> > Tony.
> >
> >
> > On Thu, Jun 13, 2013 at 1:09 AM, Tejas Patil <[email protected]
> > >wrote:
> >
> > > Here is the relevant wiki page:
> > > http://wiki.apache.org/nutch/WritingPluginExample
> > >
> > > Although its old, I think that it will help.
> > >
> > >
> > > On Wed, Jun 12, 2013 at 1:01 PM, Sebastian Nagel <
> > > [email protected]
> > > > wrote:
> > >
> > > > Hi Tony,
> > > >
> > > > you have to "register" your plugin in
> > > >  src/plugin/build.xml
> > > >
> > > > Does your
> > > >  src/plugin/myplugin/plugin.xml
> > > > properly propagate jar file,
> > > > extension point and implementing class?
> > > >
> > > > And, finally, you have to add your plugin
> > > > to the property plugin.includes in nutch-site.xml
> > > >
> > > > Cheers,
> > > > Sebastian
> > > >
> > > > On 06/12/2013 07:48 PM, Tony Mullins wrote:
> > > > > Hi,
> > > > >
> > > > > I am trying simple ParseFilter plugin in Nutch 2.2. And I can build
> > it
> > > > and
> > > > > also the src>plugin>build.xml successfully. But its .jar file is
> not
> > > > being
> > > > > created in my runtime>local>plugins>myplugin directory.
> > > > >
> > > > > And on running
> > > > > "bin/nutch parsechecker http://www.google.nl";
> > > > >  I get this error " java.lang.RuntimeException:
> > > > > org.apache.nutch.plugin.PluginRuntimeException:
> > > > > java.lang.ClassNotFoundException:
> > > > > com.xyz.nutch.selector.HtmlElementSelectorFilter"
> > > > >
> > > > > If I go to MyNutch2.2Source/build/myplugin , I can see plugin's jar
> > > with
> > > > > test & classes directory created there. If I copy .jar  from here
> and
> > > > paste
> > > > > it to my runtime>local>plugins>myplugin directory with plugin.xml
> > file
> > > > then
> > > > > too I get the same exception of class not found.
> > > > >
> > > > > I have not made any changes in src>plugin>build-plugin.xml.
> > > > >
> > > > > Could you please guide me that what is I am doing wrong here ?
> > > > >
> > > > > Thanks,
> > > > > Tony
> > > > >
> > > >
> > > >
> > >
> >
>

Index: conf/gora.properties
===================================================================
--- conf/gora.properties        (revision 1492208)
+++ conf/gora.properties        (working copy)
@@ -20,10 +20,10 @@
 # Default SqlStore properties #
 ###############################
 
-gora.sqlstore.jdbc.driver=org.hsqldb.jdbc.JDBCDriver
-gora.sqlstore.jdbc.url=jdbc:hsqldb:hsql://localhost/nutchtest
-gora.sqlstore.jdbc.user=sa
-gora.sqlstore.jdbc.password=
+# gora.sqlstore.jdbc.driver=org.hsqldb.jdbc.JDBCDriver
+# gora.sqlstore.jdbc.url=jdbc:hsqldb:hsql://localhost/nutchtest
+# gora.sqlstore.jdbc.user=sa
+# gora.sqlstore.jdbc.password=
 
 ################################
 # Default AvroStore properties #
@@ -60,7 +60,8 @@
 # CassandraStore properties #
 #############################
 
-# gora.cassandrastore.servers=localhost:9160
+ gora.cassandrastore.servers=localhost:9160
+ gora.datastore.default=org.apache.gora.cassandra.store.CassandraStore
 
 #######################
 # MemStore properties #
Index: conf/nutch-default.xml
===================================================================
--- conf/nutch-default.xml      (revision 1492208)
+++ conf/nutch-default.xml      (working copy)
@@ -60,7 +60,7 @@
 
 <property>
   <name>http.agent.name</name>
-  <value></value>
+  <value>MyIYCrawler</value>
   <description>HTTP 'User-Agent' request header. MUST NOT be empty - 
   please set this to a single word uniquely related to your organization.
 
@@ -79,7 +79,7 @@
 
 <property>
   <name>http.robots.agents</name>
-  <value>*</value>
+  <value>MyIYCrawler</value>
   <description>The agent strings we'll look for in robots.txt files,
   comma-separated, in decreasing order of precedence. You should
   put the value of http.agent.name as the first agent name, and keep the
@@ -823,7 +823,7 @@
 
 <property>
   <name>plugin.folders</name>
-  <value>plugins</value>
+ <value>/root/workspace_eclipse_new/Nutch2.2/src/plugin</value>
   <description>Directories where nutch plugins are located.  Each
   element may be a relative or absolute path.  If absolute, it is used
   as is.  If relative, it is searched for on the classpath.</description>
Index: conf/nutch-site.xml.template
===================================================================
--- conf/nutch-site.xml.template        (revision 1492208)
+++ conf/nutch-site.xml.template        (working copy)
@@ -4,5 +4,77 @@
 <!-- Put site-specific property overrides in this file. -->
 
 <configuration>
+<property>
+  <name>storage.data.store.class</name>
+  <value>org.apache.gora.cassandra.store.CassandraStore</value>
+  <description>Default class for storing data</description>
+</property>
+
+<property>
+<name>http.agent.name</name>
+<value>MyIYCrawler</value>
+<description>HTTP 'User-Agent' request header. MUST NOT be empty -
+please set this to a single word uniquely related to your organization.
+</description>
+</property>
+
+<property>
+<name>http.robots.agents</name>
+<value>MyIYCrawler</value>
+<description>The agent strings we'll look for in robots.txt files,
+comma-separated, in decreasing order of precedence. You should
+put the value of http.agent.name as the first agent name, and keep the
+default * at the end of the list. E.g.: BlurflDev,Blurfl,*
+</description>
+</property>
+
+<property>
+  <name>plugin.folders</name>
+  <value>/root/workspace_eclipse_new/Nutch2.2/src/plugin</value>
+  <description>Directories where nutch plugins are located.  Each
+  element may be a relative or absolute path.  If absolute, it is used
+  as is.  If relative, it is searched for on the classpath.</description>
+</property>
+
+<property>
+    <name>parser.html.selector.blacklist</name>
+    <value>footer,div#mngb</value>
+    <description>
+        A comma-delimited list of css like tags to identify the elements which 
should
+        NOT be parsed. Use this to tell the HTML parser to ignore the given 
elements, e.g. site navigation.
+        It is allowed to only specify the element type (required), and 
optional its class name ('.')
+        or ID ('#'). More complex expressions will not be parsed.
+        Valid examples: div.header,span,p#test,div#main,ul,div.footercol
+        Invalid expressions: div#head#part1,#footer,.inner#post
+        Note that the elements and their children will be silently ignored by 
the parser,
+        so verify the indexed content with Luke to confirm results.
+        Use either 'parser.html.selector.blacklist' or 
'parser.html.selector.whitelist', but not both of them at once. If so,
+        only the whitelist is used.
+    </description>
+</property>
+<property>
+    <name>parser.html.selector.protected_urls</name>
+    <value>http://www.example.com/home</value>
+    <description>Comma separated list of URLs for pages that should be 
excluded from element filtering</description>
+</property>
+<property>
+    <name>parser.html.selector.storage_field</name>
+    <value>filtered_content</value>
+    <description>The name of the document field where the filtered content 
should be stored</description>
+</property>
+
+<property>
+    <name>plugin.includes</name>
+    
<value>protocol-http|urlfilter-regex|parse-(html|tika)|element-selector|index-(basic|anchor)|urlnormalizer-(pass|regex|basic)|scoring-opic</value>
+    <description>
+        Regular expression naming plugin directory names to
+        include.  Any plugin not matching this expression is excluded.
+        In any case you need at least include the nutch-extensionpoints 
plugin. By
+        default Nutch includes crawling just HTML and plain text via HTTP,
+        and basic indexing and search plugins. In order to use HTTPS please 
enable 
+        protocol-httpclient, but be aware of possible intermittent problems 
with the 
+        underlying commons-httpclient library.
+    </description>
+</property>
 
 </configuration>
Index: conf/regex-urlfilter.txt.template
===================================================================
--- conf/regex-urlfilter.txt.template   (revision 1492208)
+++ conf/regex-urlfilter.txt.template   (working copy)
@@ -36,4 +36,4 @@
 -.*(/[^/]+)/[^/]+\1/[^/]+\1/
 
 # accept anything else
-+.
++^http://([a-z0-9]*\.)*lucene.apache.org/
Index: ivy/ivy.xml
===================================================================
--- ivy/ivy.xml (revision 1492208)
+++ ivy/ivy.xml (working copy)
@@ -119,9 +119,9 @@
     <dependency org="org.apache.gora" name="gora-accumulo" rev="0.3" 
conf="*->default" />
     -->
     <!-- Uncomment this to use Cassandra as Gora backend. -->
-    <!-- 
+     
     <dependency org="org.apache.gora" name="gora-cassandra" rev="0.3" 
conf="*->default" />
-    -->
+    
 
     <!--global exclusion -->
     <exclude module="ant" />
Index: src/plugin/build.xml
===================================================================
--- src/plugin/build.xml        (revision 1492208)
+++ src/plugin/build.xml        (working copy)
@@ -58,6 +58,7 @@
      <ant dir="urlnormalizer-basic" target="deploy"/>
      <ant dir="urlnormalizer-pass" target="deploy"/>
      <ant dir="urlnormalizer-regex" target="deploy"/>
+     <ant dir="element-selector" target="deploy" />
      <!--
      <ant dir="feed" target="deploy"/>
      <ant dir="parse-ext" target="deploy"/>
Index: src/plugin/element-selector/build.xml
===================================================================
--- src/plugin/element-selector/build.xml       (revision 0)
+++ src/plugin/element-selector/build.xml       (revision 0)
@@ -0,0 +1,22 @@
+<?xml version="1.0"?>
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements.  See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License.  You may obtain a copy of the License at
+
+     http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+<project name="element-selector" default="jar-core">
+
+  <import file="../build-plugin.xml"/>
+
+</project>
Index: src/plugin/element-selector/ivy.xml
===================================================================
--- src/plugin/element-selector/ivy.xml (revision 0)
+++ src/plugin/element-selector/ivy.xml (revision 0)
@@ -0,0 +1,41 @@
+<?xml version="1.0" ?>
+
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+
+<ivy-module version="1.0">
+  <info organisation="org.apache.nutch" module="${ant.project.name}">
+    <license name="Apache 2.0"/>
+    <ivyauthor name="Apache Nutch Team" url="http://nutch.apache.org"/>
+    <description>
+        Apache Nutch
+    </description>
+  </info>
+
+  <configurations>
+   <include file="../../../ivy/ivy-configurations.xml"/>
+  </configurations>
+
+  <publications>
+    <!--get the artifact from our module name-->
+    <artifact conf="master"/>
+  </publications>
+
+  <dependencies>
+  </dependencies>
+  
+</ivy-module>
Index: src/plugin/element-selector/plugin.xml
===================================================================
--- src/plugin/element-selector/plugin.xml      (revision 0)
+++ src/plugin/element-selector/plugin.xml      (revision 0)
@@ -0,0 +1,29 @@
+<?xml version="1.0" encoding="UTF-8"?>
+
+<plugin
+   id="element-selector"
+   name="Blacklist and Whitelist Parser and Indexer"
+   version="1.0.0"
+   provider-name="kaqqao">
+
+   <runtime>
+      <library name="element-selector.jar">
+         <export name="*"/>
+      </library>
+   </runtime>
+
+   <extension id="kaqqao.nutch.selector.HtmlElementSelectorIndexer"
+              name="Nutch Blacklist and Whitelist Indexing Filter"
+              point="org.apache.nutch.indexer.IndexingFilter">
+      <implementation id="HtmlElementSelectorIndexer"
+                      
class="kaqqao.nutch.selector.HtmlElementSelectorIndexer"/>
+   </extension>
+
+       <extension id="kaqqao.nutch.selector.HtmlElementSelectorFilter"
+              name="Nutch Blacklist and Whitelist Parsing Filter"
+              point="org.apache.nutch.parse.ParseFilter">
+      <implementation id="HtmlElementSelectorFilter"
+                      class="kaqqao.nutch.selector.HtmlElementSelectorFilter"/>
+   </extension>
+
+</plugin>
Index: 
src/plugin/element-selector/src/java/kaqqao/nutch/plugin/selector/HtmlElementSelectorFilter.java
===================================================================
--- 
src/plugin/element-selector/src/java/kaqqao/nutch/plugin/selector/HtmlElementSelectorFilter.java
    (revision 0)
+++ 
src/plugin/element-selector/src/java/kaqqao/nutch/plugin/selector/HtmlElementSelectorFilter.java
    (revision 0)
@@ -0,0 +1,207 @@
+package kaqqao.nutch.plugin.selector;
+
+import org.apache.avro.util.Utf8;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.nutch.parse.HTMLMetaTags;
+import org.apache.nutch.parse.Parse;
+import org.apache.nutch.parse.ParseFilter;
+import org.apache.nutch.storage.WebPage;
+import org.apache.nutch.util.NodeWalker;
+import org.w3c.dom.DocumentFragment;
+import org.w3c.dom.Node;
+import org.w3c.dom.NodeList;
+
+import java.nio.CharBuffer;
+import java.nio.charset.CharacterCodingException;
+import java.nio.charset.Charset;
+import java.nio.charset.CharsetEncoder;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.HashSet;
+import java.util.Set;
+
+public class HtmlElementSelectorFilter implements ParseFilter {
+
+    private Configuration conf;
+    private String[] blacklist;
+    private String[] whitelist;
+    private String storageField;
+    private Set<String> protectedURLs;
+    private Collection<WebPage.Field> fields = new HashSet<WebPage.Field>();
+
+    @Override
+    public Parse filter(String s, WebPage webPage, Parse parse, HTMLMetaTags 
htmlMetaTags, DocumentFragment documentFragment) {
+        DocumentFragment rootToIndex;
+        StringBuilder strippedContent = new StringBuilder();
+        if ((this.whitelist != null) && (this.whitelist.length > 0) && 
!protectedURLs.contains(webPage.getBaseUrl())) {
+            rootToIndex = (DocumentFragment) documentFragment.cloneNode(false);
+            whitelisting(documentFragment, rootToIndex);
+        } else if ((this.blacklist != null) && (this.blacklist.length > 0) && 
!protectedURLs.contains(webPage.getBaseUrl())) {
+            rootToIndex = (DocumentFragment) documentFragment.cloneNode(true);
+            blacklisting(rootToIndex);
+        } else {
+            return parse;
+        }
+
+        getText(strippedContent, rootToIndex);
+        if (storageField == null) {
+            parse.setText(strippedContent.toString());
+        } else {
+            CharsetEncoder encoder = Charset.forName("UTF-8").newEncoder();
+            try {
+                webPage.putToMetadata(new Utf8(storageField), 
encoder.encode(CharBuffer.wrap(strippedContent.toString())));
+            } catch (CharacterCodingException e) {
+                e.printStackTrace();
+            }
+        }
+        return parse;
+    }
+
+    private void blacklisting(Node root) {
+        boolean wasStripped = false;
+        String type = root.getNodeName().toLowerCase();
+        String id = null;
+        String className = null;
+        if (root.hasAttributes()) {
+            Node node = root.getAttributes().getNamedItem("id");
+            id = node != null ? node.getNodeValue().toLowerCase() : null;
+
+            node = root.getAttributes().getNamedItem("class");
+            className = node != null ? node.getNodeValue().toLowerCase() : 
null;
+        }
+
+        String typeAndId = new 
StringBuilder().append(type).append("#").append(id).toString();
+        String typeAndClass = new 
StringBuilder().append(type).append(".").append(className).toString();
+
+        boolean inList = false;
+        if ((type != null) && (Arrays.binarySearch(this.blacklist, type) >= 0))
+            inList = true;
+        else if ((type != null) && (id != null) && 
(Arrays.binarySearch(this.blacklist, typeAndId) >= 0))
+            inList = true;
+        else if ((type != null) && (className != null) && 
(Arrays.binarySearch(this.blacklist, typeAndClass) >= 0)) {
+            inList = true;
+        }
+        if (inList) {
+            root.setNodeValue("");
+
+            while (root.hasChildNodes())
+                root.removeChild(root.getFirstChild());
+            wasStripped = true;
+        }
+
+        if (!wasStripped) {
+            NodeList children = root.getChildNodes();
+            if (children != null) {
+                int len = children.getLength();
+                for (int i = 0; i < len; i++) {
+                    blacklisting(children.item(i));
+                }
+            }
+        }
+    }
+
+    private void whitelisting(Node pNode, Node newNode) {
+        boolean wasStripped = false;
+        String type = pNode.getNodeName().toLowerCase();
+        String id = null;
+        String className = null;
+        if (pNode.hasAttributes()) {
+            Node node = pNode.getAttributes().getNamedItem("id");
+            id = node != null ? node.getNodeValue().toLowerCase() : null;
+
+            node = pNode.getAttributes().getNamedItem("class");
+            className = node != null ? node.getNodeValue().toLowerCase() : 
null;
+        }
+
+        String typeAndId = new 
StringBuilder().append(type).append("#").append(id).toString();
+        String typeAndClass = new 
StringBuilder().append(type).append(".").append(className).toString();
+
+        boolean inList = false;
+        if ((type != null) && (Arrays.binarySearch(this.whitelist, type) >= 0))
+            inList = true;
+        else if ((type != null) && (id != null) && 
(Arrays.binarySearch(this.whitelist, typeAndId) >= 0))
+            inList = true;
+        else if ((type != null) && (className != null) && 
(Arrays.binarySearch(this.whitelist, typeAndClass) >= 0)) {
+            inList = true;
+        }
+        if (inList) {
+            newNode.appendChild(pNode.cloneNode(true));
+            wasStripped = true;
+        }
+
+        if (!wasStripped) {
+            NodeList children = pNode.getChildNodes();
+            if (children != null) {
+                int len = children.getLength();
+                for (int i = 0; i < len; i++) {
+                    whitelisting(children.item(i), newNode);
+                }
+            }
+        }
+    }
+
+    private void getText(StringBuilder sb, Node node) {
+        NodeWalker walker = new NodeWalker(node);
+
+        while (walker.hasNext()) {
+            Node currentNode = walker.nextNode();
+            String nodeName = currentNode.getNodeName();
+            short nodeType = currentNode.getNodeType();
+
+            if ("script".equalsIgnoreCase(nodeName)) {
+                walker.skipChildren();
+            }
+            if ("style".equalsIgnoreCase(nodeName)) {
+                walker.skipChildren();
+            }
+            if (nodeType == 8) {
+                walker.skipChildren();
+            }
+            if (nodeType == 3) {
+                String text = currentNode.getNodeValue();
+                text = text.replaceAll("\\s+", " ");
+                text = text.trim();
+                if (text.length() > 0) {
+                    if (sb.length() > 0) sb.append(' ');
+                    sb.append(text);
+                }
+            }
+        }
+    }
+
+    public void setConf(Configuration conf) {
+        this.conf = conf;
+
+        this.blacklist = null;
+        String elementsToExclude = 
getConf().get("parser.html.selector.blacklist", null);
+        if ((elementsToExclude != null) && (elementsToExclude.trim().length() 
> 0)) {
+            elementsToExclude = elementsToExclude.toLowerCase();
+
+            this.blacklist = elementsToExclude.split(",");
+            Arrays.sort(this.blacklist);
+        }
+
+        this.whitelist = null;
+        String elementsToInclude = 
getConf().get("parser.html.selector.whitelist", null);
+        if ((elementsToInclude != null) && (elementsToInclude.trim().length() 
> 0)) {
+            elementsToInclude = elementsToInclude.toLowerCase();
+
+            this.whitelist = elementsToInclude.split(",");
+            Arrays.sort(this.whitelist);
+        }
+
+        this.storageField = 
getConf().get("parser.html.selector.storage_field", null);
+
+        this.protectedURLs = new 
HashSet<String>(Arrays.asList(getConf().get("parser.html.selector.protected_urls",
 "").split(",")));
+    }
+
+    @Override
+    public Configuration getConf() {
+        return this.conf;
+    }
+
+    @Override
+    public Collection<WebPage.Field> getFields() {
+        return fields;
+    }
+}
Index: 
src/plugin/element-selector/src/java/kaqqao/nutch/plugin/selector/HtmlElementSelectorIndexer.java
===================================================================
--- 
src/plugin/element-selector/src/java/kaqqao/nutch/plugin/selector/HtmlElementSelectorIndexer.java
   (revision 0)
+++ 
src/plugin/element-selector/src/java/kaqqao/nutch/plugin/selector/HtmlElementSelectorIndexer.java
   (revision 0)
@@ -0,0 +1,54 @@
+package kaqqao.nutch.plugin.selector;
+
+import org.apache.avro.util.Utf8;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.nutch.indexer.IndexingException;
+import org.apache.nutch.indexer.IndexingFilter;
+import org.apache.nutch.indexer.NutchDocument;
+import org.apache.nutch.storage.WebPage;
+
+import java.nio.charset.CharacterCodingException;
+import java.nio.charset.Charset;
+import java.nio.charset.CharsetDecoder;
+import java.util.Collection;
+import java.util.HashSet;
+
+public class HtmlElementSelectorIndexer implements IndexingFilter {
+
+    private Configuration conf;
+    private String storageField;
+
+    @Override
+    public NutchDocument filter(NutchDocument document, String s, WebPage 
webPage) throws IndexingException {
+        if (storageField != null) {
+            CharsetDecoder decoder = Charset.forName("UTF-8").newDecoder();
+            try {
+                String strippedContent = 
decoder.decode(webPage.getFromMetadata(new Utf8(storageField))).toString();
+                if (strippedContent != null) {
+                    document.add(storageField, strippedContent);
+                }
+            } catch (CharacterCodingException e) {
+                e.printStackTrace();
+            }
+        }
+
+        return document;
+    }
+
+    @Override
+    public void setConf(Configuration entries) {
+        this.conf = entries;
+
+        this.storageField = 
getConf().get("parser.html.selector.storage_field", null);
+    }
+
+    @Override
+    public Configuration getConf() {
+        return this.conf;
+    }
+
+    @Override
+    public Collection<WebPage.Field> getFields() {
+        return new HashSet<WebPage.Field>();
+    }
+}
Index: urls/seed.txt
===================================================================
--- urls/seed.txt       (revision 0)
+++ urls/seed.txt       (revision 0)
@@ -0,0 +1 @@
+http://lucene.apache.org
\ No newline at end of file

Re: PluginRuntimeException ClassNotFound for ParseFilter plugin in Nutch 2.2 ?

Reply via email to