Because you need proper text/data extraction tools such as the open source 
Boilerpipe (NUTCH-961) or other open source or commercial tools. It is, if you 
ask me, impossible to build a good search engine upon crawled data without 
proper text extraction and removal of boiler plate elements/block/widgets/bars.
 
-----Original message-----
> From:A Laxmi <[email protected]>
> Sent: Thursday 19th September 2013 21:43
> To: [email protected]
> Subject: Re: Nutch with HBase examples?
> 
> Thank you! I was wondering how you got the summary text below the title
> crawled so well?
> http://www.zwudi.com/immobilier/vente-immobiliere-appartement-maison
> 
> When I crawled, I have the text summary below the title with lot of junk
> (navigation, footers, etc)
> 
> 
> On Thu, Sep 19, 2013 at 3:27 PM, lsroudi abdel <[email protected]> wrote:
> 
> > Yes www.zwudi.com is a beta version for training
> > Le 19 sept. 2013 21:25, "A Laxmi" <[email protected]> a écrit :
> >
> > > Do you know of any search websites developed using (Nutch +HBase + Solr)?
> > >
> > >
> > > On Thu, Sep 19, 2013 at 3:21 PM, lsroudi abdel <[email protected]>
> > wrote:
> > >
> > > > Actually i use nutch with hbase. Ans soit for search ans indexation
> > > > Le 19 sept. 2013 21:13, "A Laxmi" <[email protected]> a écrit :
> > > >
> > > > > Can anyone give me some example - search websites that utilized Nutch
> > > > 2.2.1
> > > > > with HBase as a backend?
> > > > >
> > > >
> > >
> >
> 

Reply via email to