Hi Lance,

Elephant Bird includes support for SequenceFile i/o from Pig:

https://github.com/kevinweil/elephant-bird/blob/master/pig/src/main/java/com/twitter/elephantbird/pig/store/SequenceFileStorage.java

It's available in Maven Central:

http://search.maven.org/#artifactdetails%7Ccom.twitter.elephantbird%7Celephant-bird-pig%7C3.0.1%7Cjar

<dependency>
    <groupId>com.twitter.elephantbird</groupId>
    <artifactId>elephant-bird-pig</artifactId>
    <version>3.0.1</version>
</dependency>

Andy
@sagemintblue


On Wed, Jul 4, 2012 at 5:33 PM, Lance Norskog <[email protected]> wrote:

> Ah, didn't know that about Nutch files. I've only used the Nutch ->
> Solr integration. Does Pig make sequence files? Is there a Nutch->Pig
> integration?
>
> On Tue, Jul 3, 2012 at 3:00 AM, Alexander Aristov
> <[email protected]> wrote:
> > Hi Lance
> >
> > I understand that pages are pages but nutch stores pages in its own
> format
> > while mahout operates with other data formats.
> >
> > I would like to merge nutch and mahout with minimun efforts that's why I
> > question what is easier. Alter mahout and implement logic to read/write
> > nutch data or implement nutch plugin to invoke mahout.
> >
> > How difficult is to inject mahout engine into other java programs? Will
> it
> > be enough to add jar files or it requires some configuration files and
> > environmant variables set?
> >
> > Best Regards
> > Alexander Aristov
> >
> >
> > On 3 July 2012 06:41, Lance Norskog <[email protected]> wrote:
> >
> >> Pages are pages. Mahout does not care where they came from. I guess
> >> you want a parser for HTML pages.
> >>
> >> On Mon, Jul 2, 2012 at 12:11 PM, Alexander Aristov
> >> <[email protected]> wrote:
> >> > Forward it to user list and mahout group.
> >> >
> >> > Like-minded, any suggestions about integration? What shall I start
> with?
> >> >
> >> >
> >> > Best Regards
> >> > Alexander Aristov
> >> >
> >> >
> >> > ---------- Forwarded message ----------
> >> > From: Alexander Aristov <[email protected]>
> >> > Date: 1 July 2012 23:02
> >> > Subject: nucth and mahout integration
> >> > To: [email protected]
> >> >
> >> >
> >> > People
> >> >
> >> > can you give me some advises?
> >> >
> >> > I want to integrate nutch and mahout to classify crawled pages.
> >> >
> >> > 1st question: Has someone tried this and are there any libraries
> >> available?
> >> >
> >> > next: What is better/easier? Improve nutch and inject mahout
> classifier
> >> > into the project OR improve mahout to add an ability to read and write
> >> > nutch files?
> >> >
> >> > Best Regards
> >> > Alexander Aristov
> >>
> >>
> >>
> >> --
> >> Lance Norskog
> >> [email protected]
> >>
>
>
>
> --
> Lance Norskog
> [email protected]
>

Reply via email to