Pages are pages. Mahout does not care where they came from. I guess
you want a parser for HTML pages.

On Mon, Jul 2, 2012 at 12:11 PM, Alexander Aristov
<[email protected]> wrote:
> Forward it to user list and mahout group.
>
> Like-minded, any suggestions about integration? What shall I start with?
>
>
> Best Regards
> Alexander Aristov
>
>
> ---------- Forwarded message ----------
> From: Alexander Aristov <[email protected]>
> Date: 1 July 2012 23:02
> Subject: nucth and mahout integration
> To: [email protected]
>
>
> People
>
> can you give me some advises?
>
> I want to integrate nutch and mahout to classify crawled pages.
>
> 1st question: Has someone tried this and are there any libraries available?
>
> next: What is better/easier? Improve nutch and inject mahout classifier
> into the project OR improve mahout to add an ability to read and write
> nutch files?
>
> Best Regards
> Alexander Aristov



-- 
Lance Norskog
[email protected]

Reply via email to