No. This has nothing to do with ant. The nutch job has been built and it has been run. As part of that ant build some Avro classes were built (e.g. WebPage) specifically for the storage of crawled data into Cassandra via gora. It seems to me that as I build a completely different job - one that's going to run in hadoop and access the crawled data from Cassandra - that I can reuse the the classes that the nutch build created (e.g. WebPage) instead of rebuilding them from scratch. So I know those Avro classes are there somewhere. What I don't know is which ones they are and what auxiliary files they prereq.
So my question is: Do those files that I need to access the crawled data in Cassandra exist in a reusable jar somewhere as a result of the nutch build? I'm not interested in the source, just the actual class files. Chris Gerken On Sep 19, 2012, at 3:56 PM, Lewis John Mcgibbney wrote: > can you not just do 'ant job' from cmdline? > > Is this what you mean? > > From Nutch TLD you can do 'ant -projecthelp' to see a fully annotated > description of all of the possible ant tasks. > > hth > > On Wed, Sep 19, 2012 at 9:51 PM, Chris Gerken > <[email protected]> wrote: >> Hello, >> >> We've set up nutch and gora to gather some crawling data which is now stored >> in a Cassandra column family. Is there some easy way to get the Avro >> classes used for the crawl, along with any necessary supporting files, into >> a hadoop job? I'm building the hadoop job with maven, but am willing to >> consume a simple jar if there is a jar that just hold the classes and files >> I want. >> >> thanks >> >> - Chris > > > > -- > Lewis

