Is there a way in nutch by which we can use different parser for different
websites?
I am trying to do this by writing a custom parser which will call different
parsers for different websites?

On 14 Mar 2018 14:19, "Semyon Semyonov" <semyon.semyo...@mail.com> wrote:

> As a side note,
>
> I had to implement my own parser with extra functionality, simple
> copy/past of the code of HTMLparser did the job.
>
> If you want to inherit instead of copy paste it can be a bad idea at all.
> HTML parser is a concrete non abstract class, therefore the inheritance
> will not be so smooth as in case of contract implementations(the plugins
> are contracts, ie interfaces) and can easily break some OOP rules.
>
>
> Sent: Wednesday, March 14, 2018 at 9:18 AM
> From: "Yossi Tamari" <yossi.tam...@pipl.com>
> To: user@nutch.apache.org
> Subject: RE: Dependency between plugins
> One suggestion I can make is to ensure that the html-parse plugin is built
> before your plugin (since you are including the jars that are generated in
> its build).
>
> > -----Original Message-----
> > From: Yash Thenuan Thenuan <rit2014...@iiita.ac.in>
> > Sent: 14 March 2018 09:55
> > To: user@nutch.apache.org
> > Subject: Re: Dependency between plugins
> >
> > Hi,
> > It didn't worked in ant runtime.
> > I included "import org.apache.nutch.parse.html;" in my custom parser
> code.
> > but it is throwing errror while i am doing ant runtime.
> >
> > [javac]
> > /Users/yasht/Downloads/apache-nutch-1.14/src/plugin/parse-
> > custom/src/java/org/apache/nutch/parse/custom/CustomParser.java:41:
> > error: cannot find symbol
> >
> > [javac] import org.apache.nutch.parse.html;
> >
> > [javac] ^
> >
> > [javac] symbol: class html
> >
> > [javac] location: package org.apache.nutch.parse
> >
> >
> > below are the xml files of my parser
> >
> >
> > My ivy.xml
> >
> >
> > <ivy-module version="1.0">
> >
> > <info organisation="org.apache.nutch" module="${ant.project.name}">
> >
> > <license name="Apache 2.0"/>
> >
> > <ivyauthor name="Apache Nutch Team" url="http://nutch.apache.org"/>
> >
> > <description>
> >
> > Apache Nutch
> >
> > </description>
> >
> > </info>
> >
> >
> > <configurations>
> >
> > <include file="../../../ivy/ivy-configurations.xml"/>
> >
> > </configurations>
> >
> >
> > <publications>
> >
> > <!--get the artifact from our module name-->
> >
> > <artifact conf="master"/>
> >
> > </publications>
> >
> > </ivy-module>
> >
> > build.xml
> >
> > <project name="parse-custom" default="jar-core">
> >
> > <import file="../build-plugin.xml"/>
> >
> > <!-- Build compilation dependencies -->
> > <target name="deps-jar">
> > <ant target="compile-test" inheritall="false" dir="../parse-html"/>
> > </target>
> >
> >
> > <path id="plugin.deps">
> > <fileset dir="${nutch.root}/build">
> > <include name="**/parse-html/*.jar" />
> > </fileset>
> > </path>
> >
> > <!-- Deploy Unit test dependencies -->
> > <target name="deps-test">
> > <ant target="deploy" inheritall="false" dir="../parse-html"/>
> > <ant target="deploy" inheritall="false" dir="../nutch-extensionpoints"/>
> > </target>
> >
> > </project>
> >
> > plugin.xml
> >
> > <plugin
> > id="parse-custom"
> > name="Custom Parse Plug-in"
> > version="1.0.0"
> > provider-name="nutch.org">
> >
> > <runtime>
> > <library name="parse-custom.jar">
> > <export name="*"/>
> > </library>
> > </runtime>
> >
> > <requires>
> > <import plugin="parse-html"/>
> > <import plugin="nutch-extensionpoints"/>
> > </requires>
> > <extension id="org.apache.nutch.parse.custom"
> > name="CustomParse"
> > point="org.apache.nutch.parse.Parser">
> >
> > <implementation id="org.apache.nutch.parse.custom.CustomParser"
> > class="org.apache.nutch.parse.custom.CustomParser">
> > <parameter name="contentType"
> > value="text/html|application/xhtml+xml"/>
> > <parameter name="pathSuffix" value=""/>
> > </implementation>
> >
> > </extension>
> >
> > </plugin>
> >
> >
> >
> >
> > On Wed, Mar 14, 2018 at 1:02 PM, Yossi Tamari <yossi.tam...@pipl.com>
> > wrote:
> >
> > > Hi Yash,
> > >
> > > I don't know how to do it, I never tried, but if I had to it would be
> > > a trial and error thing....
> > >
> > > If you want to increase the chances that someone will answer your
> > > question, I suggest you provide as much information as possible:
> > > Where did it not work? In "ant runtime", or when running in Hadoop?
> > > What was the error message?
> > > What is the content of your build.xml, plugin.xml, and ivy.xml?
> > > Is parse-html configured in your plugin-includes?
> > >
> > > If it's a problem during execution, I would suggest looking at or
> > > debugging the code of PluginClassLoader.
> > >
> > >
> > > > -----Original Message-----
> > > > From: Yash Thenuan Thenuan <rit2014...@iiita.ac.in>
> > > > Sent: 14 March 2018 08:34
> > > > To: user@nutch.apache.org
> > > > Subject: Re: Dependency between plugins
> > > >
> > > > Anybody please help me out regarding this.
> > > >
> > > > On Tue, Mar 13, 2018 at 6:51 PM, Yash Thenuan Thenuan <
> > > > rit2014...@iiita.ac.in> wrote:
> > > >
> > > > > I am trying to import Htmlparser in my custom parser.
> > > > > I did it in the same way by which Htmlparser imports lib-nekohtml
> > > > > but it didn't worked.
> > > > > Can anybody please tell me how to do it?
> > > > >
> > >
> > >
>
>

Reply via email to