Hi Lance I understand that pages are pages but nutch stores pages in its own format while mahout operates with other data formats.
I would like to merge nutch and mahout with minimun efforts that's why I question what is easier. Alter mahout and implement logic to read/write nutch data or implement nutch plugin to invoke mahout. How difficult is to inject mahout engine into other java programs? Will it be enough to add jar files or it requires some configuration files and environmant variables set? Best Regards Alexander Aristov On 3 July 2012 06:41, Lance Norskog <[email protected]> wrote: > Pages are pages. Mahout does not care where they came from. I guess > you want a parser for HTML pages. > > On Mon, Jul 2, 2012 at 12:11 PM, Alexander Aristov > <[email protected]> wrote: > > Forward it to user list and mahout group. > > > > Like-minded, any suggestions about integration? What shall I start with? > > > > > > Best Regards > > Alexander Aristov > > > > > > ---------- Forwarded message ---------- > > From: Alexander Aristov <[email protected]> > > Date: 1 July 2012 23:02 > > Subject: nucth and mahout integration > > To: [email protected] > > > > > > People > > > > can you give me some advises? > > > > I want to integrate nutch and mahout to classify crawled pages. > > > > 1st question: Has someone tried this and are there any libraries > available? > > > > next: What is better/easier? Improve nutch and inject mahout classifier > > into the project OR improve mahout to add an ability to read and write > > nutch files? > > > > Best Regards > > Alexander Aristov > > > > -- > Lance Norskog > [email protected] >
