thank you it's very helpful
Best Regards Alexander Aristov On 5 July 2012 20:12, Andy Schlaikjer <[email protected]> wrote: > Hi Lance, > > Elephant Bird includes support for SequenceFile i/o from Pig: > > > https://github.com/kevinweil/elephant-bird/blob/master/pig/src/main/java/com/twitter/elephantbird/pig/store/SequenceFileStorage.java > > It's available in Maven Central: > > > http://search.maven.org/#artifactdetails%7Ccom.twitter.elephantbird%7Celephant-bird-pig%7C3.0.1%7Cjar > > <dependency> > <groupId>com.twitter.elephantbird</groupId> > <artifactId>elephant-bird-pig</artifactId> > <version>3.0.1</version> > </dependency> > > Andy > @sagemintblue > > > On Wed, Jul 4, 2012 at 5:33 PM, Lance Norskog <[email protected]> wrote: > > > Ah, didn't know that about Nutch files. I've only used the Nutch -> > > Solr integration. Does Pig make sequence files? Is there a Nutch->Pig > > integration? > > > > On Tue, Jul 3, 2012 at 3:00 AM, Alexander Aristov > > <[email protected]> wrote: > > > Hi Lance > > > > > > I understand that pages are pages but nutch stores pages in its own > > format > > > while mahout operates with other data formats. > > > > > > I would like to merge nutch and mahout with minimun efforts that's why > I > > > question what is easier. Alter mahout and implement logic to read/write > > > nutch data or implement nutch plugin to invoke mahout. > > > > > > How difficult is to inject mahout engine into other java programs? Will > > it > > > be enough to add jar files or it requires some configuration files and > > > environmant variables set? > > > > > > Best Regards > > > Alexander Aristov > > > > > > > > > On 3 July 2012 06:41, Lance Norskog <[email protected]> wrote: > > > > > >> Pages are pages. Mahout does not care where they came from. I guess > > >> you want a parser for HTML pages. > > >> > > >> On Mon, Jul 2, 2012 at 12:11 PM, Alexander Aristov > > >> <[email protected]> wrote: > > >> > Forward it to user list and mahout group. > > >> > > > >> > Like-minded, any suggestions about integration? What shall I start > > with? > > >> > > > >> > > > >> > Best Regards > > >> > Alexander Aristov > > >> > > > >> > > > >> > ---------- Forwarded message ---------- > > >> > From: Alexander Aristov <[email protected]> > > >> > Date: 1 July 2012 23:02 > > >> > Subject: nucth and mahout integration > > >> > To: [email protected] > > >> > > > >> > > > >> > People > > >> > > > >> > can you give me some advises? > > >> > > > >> > I want to integrate nutch and mahout to classify crawled pages. > > >> > > > >> > 1st question: Has someone tried this and are there any libraries > > >> available? > > >> > > > >> > next: What is better/easier? Improve nutch and inject mahout > > classifier > > >> > into the project OR improve mahout to add an ability to read and > write > > >> > nutch files? > > >> > > > >> > Best Regards > > >> > Alexander Aristov > > >> > > >> > > >> > > >> -- > > >> Lance Norskog > > >> [email protected] > > >> > > > > > > > > -- > > Lance Norskog > > [email protected] > > >
