Pages are pages. Mahout does not care where they came from. I guess you want a parser for HTML pages.
On Mon, Jul 2, 2012 at 12:11 PM, Alexander Aristov <[email protected]> wrote: > Forward it to user list and mahout group. > > Like-minded, any suggestions about integration? What shall I start with? > > > Best Regards > Alexander Aristov > > > ---------- Forwarded message ---------- > From: Alexander Aristov <[email protected]> > Date: 1 July 2012 23:02 > Subject: nucth and mahout integration > To: [email protected] > > > People > > can you give me some advises? > > I want to integrate nutch and mahout to classify crawled pages. > > 1st question: Has someone tried this and are there any libraries available? > > next: What is better/easier? Improve nutch and inject mahout classifier > into the project OR improve mahout to add an ability to read and write > nutch files? > > Best Regards > Alexander Aristov -- Lance Norskog [email protected]
