Ah, didn't know that about Nutch files. I've only used the Nutch ->
Solr integration. Does Pig make sequence files? Is there a Nutch->Pig
integration?

On Tue, Jul 3, 2012 at 3:00 AM, Alexander Aristov
<[email protected]> wrote:
> Hi Lance
>
> I understand that pages are pages but nutch stores pages in its own format
> while mahout operates with other data formats.
>
> I would like to merge nutch and mahout with minimun efforts that's why I
> question what is easier. Alter mahout and implement logic to read/write
> nutch data or implement nutch plugin to invoke mahout.
>
> How difficult is to inject mahout engine into other java programs? Will it
> be enough to add jar files or it requires some configuration files and
> environmant variables set?
>
> Best Regards
> Alexander Aristov
>
>
> On 3 July 2012 06:41, Lance Norskog <[email protected]> wrote:
>
>> Pages are pages. Mahout does not care where they came from. I guess
>> you want a parser for HTML pages.
>>
>> On Mon, Jul 2, 2012 at 12:11 PM, Alexander Aristov
>> <[email protected]> wrote:
>> > Forward it to user list and mahout group.
>> >
>> > Like-minded, any suggestions about integration? What shall I start with?
>> >
>> >
>> > Best Regards
>> > Alexander Aristov
>> >
>> >
>> > ---------- Forwarded message ----------
>> > From: Alexander Aristov <[email protected]>
>> > Date: 1 July 2012 23:02
>> > Subject: nucth and mahout integration
>> > To: [email protected]
>> >
>> >
>> > People
>> >
>> > can you give me some advises?
>> >
>> > I want to integrate nutch and mahout to classify crawled pages.
>> >
>> > 1st question: Has someone tried this and are there any libraries
>> available?
>> >
>> > next: What is better/easier? Improve nutch and inject mahout classifier
>> > into the project OR improve mahout to add an ability to read and write
>> > nutch files?
>> >
>> > Best Regards
>> > Alexander Aristov
>>
>>
>>
>> --
>> Lance Norskog
>> [email protected]
>>



-- 
Lance Norskog
[email protected]

Reply via email to