Interesting indeed, in more than one way... This is just a plug-in right? so it can be compiled with nutch 1.11?
On Thu, Dec 17, 2015 at 10:25 AM, Markus Jelsma <[email protected]> wrote: > Interesting! That triple extractor and wdc parser could be useful indeed! > It already uses any23. I wonder how easy we could integrate it into Apache > Tika, and then use it in Nutch! But since it does use any23, i wonder if it > relies on SAX events, ot the HTML body as a whole, which is bad. > > I am also curious whether the scoring filter supports incremental crawls > as opposed to OPIC. If not, it is might not be that interesting. > > Anyone knows? :) > > M. > > > > -----Original message----- > > From:Christian Kunz <[email protected]> > > Sent: Thursday 17th December 2015 7:30 > > To: [email protected] > > Subject: AW: Anthelion from Yahoo > > > > Hi Otis, > > > > haven't tried it yet. I wrote a little article that explains roughly how > it works: > http://www.seo-suedwest.de/1398-yahoo-crawler-strukturierte-daten-open-source.html > > > > If anyone has practical experience with it please let me know. > > > > Regards, > > Christian > > > > > > > > -----Ursprüngliche Nachricht----- > > Von: Otis Gospodnetić [mailto:[email protected]] > > Gesendet: Donnerstag, 17. Dezember 2015 03:55 > > An: [email protected] > > Betreff: Anthelion from Yahoo > > > > Hi, > > > > FYI: https://github.com/yahoo/anthelion > > > > Anyone tried using it yet? > > > > Otis > > -- > > Monitoring - Log Management - Alerting - Anomaly Detection Solr & > Elasticsearch Consulting Support Training - http://sematext.com/ > > >

