thanks @Sebastian but that didnt help either.  I think that is not right
way to push on different core.

On Fri, Dec 27, 2019 at 5:10 PM Sebastian Nagel
<wastl.na...@googlemail.com.invalid> wrote:

> Hi,
>
> the test compares names of the "host" and the registered domain:
>   doc.getFieldValue('host')=='urgenthomework.com'
>
> The host name is "www.urgenthomework.com". You can test it via:
>
>   $> bin/nutch indexchecker https://www.urgenthomework.com/
>   fetching: https://www.urgenthomework.com/
>   ...
>   host :  www.urgenthomework.com
>   ...
>   title : Homework Help for College, University and School Students
>   ...
>
> Best,
> Sebastian
>
>
> On 12/26/19 11:29 AM, Zara Parst wrote:
> > Hi, Is it possible to crawl three different website like
> >
> > 1. https://www.urgenthomework.com/
> > 2. https://www.myassignmenthelp.net/
> > 3. https://www.assignmenthelp.net/
> >
> > in single nutch configuration and then send the respective index pages to
> > corrosponding cores [ uah, mah , yah]  in solr.  I tried to acheieve it
> by
> > exchange and writer id.  Please look below for my confirgurations
> >
> > -------------exchange.xml---------------------------------
> >
> >
> >
> >
> >
> >
> >
> > *<exchange id="uahIndexernew" class="default">    <writers>      <writer
> > id="indexer_solr_1" />    </writers>    <params>      <param name="expr"
> > value="doc.getFieldValue('host')=='urgenthomework.com
> > <http://urgenthomework.com>'" />    </params>  </exchange>*
> >
> >
> >
> >
> >
> >
> >
> >
> > *<exchange id="mahIndexernew" class="default">    <writers>      <writer
> > id="indexer_solr_2" />    </writers>    <params>      <param name="expr"
> > value="doc.getFieldValue('host')=='myassignmenthelp.net
> > <http://myassignmenthelp.net>'" />    </params>  </exchange>*
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > * <exchange id="yahIndexernew" class="default">    <writers>      <writer
> > id="indexer_solr_3" />    </writers>    <params>      <param name="expr"
> > value="doc.getFieldValue('host')=='assignmenthelp.net
> > <http://assignmenthelp.net>'" />    </params>  </exchange>*
> >
> >
> >
> >
> ---------------------------------index.writers.xml----------------------------------------
> >
> >  <writer id="indexer_solr_1"
> > class="org.apache.nutch.indexwriter.solr.SolrIndexWriter">
> >     <parameters>
> >       <param name="type" value="http" />
> >       <param name="url" value="http://localhost:8983/solr/uah"; />
> >       <param name="collection" value="" />
> >       <param name="weight.field" value="" />
> >       <param name="commitSize" value="1000" />
> >       <param name="auth" value="false" />
> >       <param name="username" value="username" />
> >       <param name="password" value="password" />
> >     </parameters>
> >     <mapping>
> >       <copy>
> >         <!-- <field source="title" dest="content" />
> >         <field source="metatag.description" dest="content" />
> >         <field source="metatag.keywords" dest="content" /> -->
> >       </copy>
> >       <rename></rename>
> >       <remove>
> >         <field source="segment" />
> >         <field source="host" />
> >         <field source="url" />
> >         <!-- <field source="metatag.description" />
> >         <field source="metatag.keywords" />
> >         <field source="date" />
> >         <field source="url" />
> >          -->
> >       </remove>
> >     </mapping>
> >   </writer>
> >
> >
> >   <writer id="indexer_solr_2"
> > class="org.apache.nutch.indexwriter.solr.SolrIndexWriter">
> >     <parameters>
> >       <param name="type" value="http" />
> >       <param name="url" value="http://localhost:8983/solr/mah"; />
> >       <param name="collection" value="" />
> >       <param name="weight.field" value="" />
> >       <param name="commitSize" value="1000" />
> >       <param name="auth" value="false" />
> >       <param name="username" value="username" />
> >       <param name="password" value="password" />
> >     </parameters>
> >     <mapping>
> >       <copy>
> >       </copy>
> >       <rename></rename>
> >       <remove>
> >         <field source="segment" />
> >         <field source="host" />
> >         <field source="url" />
> >       </remove>
> >     </mapping>
> >   </writer>
> >
> >
> >
> >   <writer id="indexer_solr_3"
> > class="org.apache.nutch.indexwriter.solr.SolrIndexWriter">
> >     <parameters>
> >       <param name="type" value="http" />
> >       <param name="url" value="http://localhost:8983/solr/yah"; />
> >       <param name="collection" value="" />
> >       <param name="weight.field" value="" />
> >       <param name="commitSize" value="1000" />
> >       <param name="auth" value="false" />
> >       <param name="username" value="username" />
> >       <param name="password" value="password" />
> >     </parameters>
> >     <mapping>
> >       <copy>
> >       </copy>
> >       <rename></rename>
> >       <remove>
> >         <field source="segment" />
> >         <field source="host" />
> >         <field source="url" />
> >       </remove>
> >     </mapping>
> >   </writer>
> >
> >
> ---------------------------------------------------------------------------------------------------------------
> >
> > But it is not pushing data into corrosinding cores rather it is sending
> > data in one core from different domain, Please do let me know. I am sure
> > there has to be way to achieve it. I didnt try wth sobcollecion.xml. Do
> you
> > think I can achieve it using subcollection?
> >
>
>

Reply via email to