Interesting indeed, in more than one way...  This is just a plug-in right?
so it can be compiled with nutch 1.11?

On Thu, Dec 17, 2015 at 10:25 AM, Markus Jelsma <[email protected]>
wrote:

> Interesting! That triple extractor and wdc parser could be useful indeed!
> It already uses any23. I wonder how easy we could integrate it into Apache
> Tika, and then use it in Nutch! But since it does use any23, i wonder if it
> relies on SAX events, ot the HTML body as a whole, which is bad.
>
> I am also curious whether the scoring filter supports incremental crawls
> as opposed to OPIC. If not, it is might not be that interesting.
>
> Anyone knows? :)
>
> M.
>
>
>
> -----Original message-----
> > From:Christian Kunz <[email protected]>
> > Sent: Thursday 17th December 2015 7:30
> > To: [email protected]
> > Subject: AW: Anthelion from Yahoo
> >
> > Hi Otis,
> >
> > haven't tried it yet. I wrote a little article that explains roughly how
> it works:
> http://www.seo-suedwest.de/1398-yahoo-crawler-strukturierte-daten-open-source.html
> >
> > If anyone has practical experience with it please let me know.
> >
> > Regards,
> > Christian
> >
> >
> >
> > -----Ursprüngliche Nachricht-----
> > Von: Otis Gospodnetić [mailto:[email protected]]
> > Gesendet: Donnerstag, 17. Dezember 2015 03:55
> > An: [email protected]
> > Betreff: Anthelion from Yahoo
> >
> > Hi,
> >
> > FYI: https://github.com/yahoo/anthelion
> >
> > Anyone tried using it yet?
> >
> > Otis
> > --
> > Monitoring - Log Management - Alerting - Anomaly Detection Solr &
> Elasticsearch Consulting Support Training - http://sematext.com/
> >
>

Reply via email to