thank you

it's very helpful

Best Regards
Alexander Aristov


On 5 July 2012 20:12, Andy Schlaikjer <[email protected]> wrote:

> Hi Lance,
>
> Elephant Bird includes support for SequenceFile i/o from Pig:
>
>
> https://github.com/kevinweil/elephant-bird/blob/master/pig/src/main/java/com/twitter/elephantbird/pig/store/SequenceFileStorage.java
>
> It's available in Maven Central:
>
>
> http://search.maven.org/#artifactdetails%7Ccom.twitter.elephantbird%7Celephant-bird-pig%7C3.0.1%7Cjar
>
> <dependency>
>     <groupId>com.twitter.elephantbird</groupId>
>     <artifactId>elephant-bird-pig</artifactId>
>     <version>3.0.1</version>
> </dependency>
>
> Andy
> @sagemintblue
>
>
> On Wed, Jul 4, 2012 at 5:33 PM, Lance Norskog <[email protected]> wrote:
>
> > Ah, didn't know that about Nutch files. I've only used the Nutch ->
> > Solr integration. Does Pig make sequence files? Is there a Nutch->Pig
> > integration?
> >
> > On Tue, Jul 3, 2012 at 3:00 AM, Alexander Aristov
> > <[email protected]> wrote:
> > > Hi Lance
> > >
> > > I understand that pages are pages but nutch stores pages in its own
> > format
> > > while mahout operates with other data formats.
> > >
> > > I would like to merge nutch and mahout with minimun efforts that's why
> I
> > > question what is easier. Alter mahout and implement logic to read/write
> > > nutch data or implement nutch plugin to invoke mahout.
> > >
> > > How difficult is to inject mahout engine into other java programs? Will
> > it
> > > be enough to add jar files or it requires some configuration files and
> > > environmant variables set?
> > >
> > > Best Regards
> > > Alexander Aristov
> > >
> > >
> > > On 3 July 2012 06:41, Lance Norskog <[email protected]> wrote:
> > >
> > >> Pages are pages. Mahout does not care where they came from. I guess
> > >> you want a parser for HTML pages.
> > >>
> > >> On Mon, Jul 2, 2012 at 12:11 PM, Alexander Aristov
> > >> <[email protected]> wrote:
> > >> > Forward it to user list and mahout group.
> > >> >
> > >> > Like-minded, any suggestions about integration? What shall I start
> > with?
> > >> >
> > >> >
> > >> > Best Regards
> > >> > Alexander Aristov
> > >> >
> > >> >
> > >> > ---------- Forwarded message ----------
> > >> > From: Alexander Aristov <[email protected]>
> > >> > Date: 1 July 2012 23:02
> > >> > Subject: nucth and mahout integration
> > >> > To: [email protected]
> > >> >
> > >> >
> > >> > People
> > >> >
> > >> > can you give me some advises?
> > >> >
> > >> > I want to integrate nutch and mahout to classify crawled pages.
> > >> >
> > >> > 1st question: Has someone tried this and are there any libraries
> > >> available?
> > >> >
> > >> > next: What is better/easier? Improve nutch and inject mahout
> > classifier
> > >> > into the project OR improve mahout to add an ability to read and
> write
> > >> > nutch files?
> > >> >
> > >> > Best Regards
> > >> > Alexander Aristov
> > >>
> > >>
> > >>
> > >> --
> > >> Lance Norskog
> > >> [email protected]
> > >>
> >
> >
> >
> > --
> > Lance Norskog
> > [email protected]
> >
>

Reply via email to