Chris, How did you get on with this..any progress?
I wanted to have a look today but got caught up identifying problems in gora-cassandra v0.2.1. Did you find a method to reuse generated webpage classes? Lewis On Wed, Sep 19, 2012 at 10:06 PM, Chris Gerken <[email protected]> wrote: > No. This has nothing to do with ant. The nutch job has been built and it > has been run. As part of that ant build some Avro classes were built (e.g. > WebPage) specifically for the storage of crawled data into Cassandra via > gora. It seems to me that as I build a completely different job - one that's > going to run in hadoop and access the crawled data from Cassandra - that I > can reuse the the classes that the nutch build created (e.g. WebPage) instead > of rebuilding them from scratch. So I know those Avro classes are there > somewhere. What I don't know is which ones they are and what auxiliary files > they prereq. > > So my question is: Do those files that I need to access the crawled data in > Cassandra exist in a reusable jar somewhere as a result of the nutch build? > I'm not interested in the source, just the actual class files. > > Chris Gerken > > > > On Sep 19, 2012, at 3:56 PM, Lewis John Mcgibbney wrote: > >> can you not just do 'ant job' from cmdline? >> >> Is this what you mean? >> >> From Nutch TLD you can do 'ant -projecthelp' to see a fully annotated >> description of all of the possible ant tasks. >> >> hth >> >> On Wed, Sep 19, 2012 at 9:51 PM, Chris Gerken >> <[email protected]> wrote: >>> Hello, >>> >>> We've set up nutch and gora to gather some crawling data which is now >>> stored in a Cassandra column family. Is there some easy way to get the >>> Avro classes used for the crawl, along with any necessary supporting files, >>> into a hadoop job? I'm building the hadoop job with maven, but am willing >>> to consume a simple jar if there is a jar that just hold the classes and >>> files I want. >>> >>> thanks >>> >>> - Chris >> >> >> >> -- >> Lewis > -- Lewis

