Lewis, The nutch/gora build produces a nutch job file and a nutch jar file. The jar file seems to have everything (and a bit more) that I need. The problem is getting that jar into maven, but that seems to require only a few manual steps. I'll know for sure when I get this thing running under hadoop.
thanks - Chris On Sep 20, 2012, at 6:43 PM, Lewis John Mcgibbney wrote: > Chris, > > How did you get on with this..any progress? > > I wanted to have a look today but got caught up identifying problems > in gora-cassandra v0.2.1. > > Did you find a method to reuse generated webpage classes? > > Lewis > > On Wed, Sep 19, 2012 at 10:06 PM, Chris Gerken > <[email protected]> wrote: >> No. This has nothing to do with ant. The nutch job has been built and it >> has been run. As part of that ant build some Avro classes were built (e.g. >> WebPage) specifically for the storage of crawled data into Cassandra via >> gora. It seems to me that as I build a completely different job - one that's >> going to run in hadoop and access the crawled data from Cassandra - that I >> can reuse the the classes that the nutch build created (e.g. WebPage) >> instead of rebuilding them from scratch. So I know those Avro classes are >> there somewhere. What I don't know is which ones they are and what >> auxiliary files they prereq. >> >> So my question is: Do those files that I need to access the crawled data in >> Cassandra exist in a reusable jar somewhere as a result of the nutch build? >> I'm not interested in the source, just the actual class files. >> >> Chris Gerken >> >> >> >> On Sep 19, 2012, at 3:56 PM, Lewis John Mcgibbney wrote: >> >>> can you not just do 'ant job' from cmdline? >>> >>> Is this what you mean? >>> >>> From Nutch TLD you can do 'ant -projecthelp' to see a fully annotated >>> description of all of the possible ant tasks. >>> >>> hth >>> >>> On Wed, Sep 19, 2012 at 9:51 PM, Chris Gerken >>> <[email protected]> wrote: >>>> Hello, >>>> >>>> We've set up nutch and gora to gather some crawling data which is now >>>> stored in a Cassandra column family. Is there some easy way to get the >>>> Avro classes used for the crawl, along with any necessary supporting >>>> files, into a hadoop job? I'm building the hadoop job with maven, but am >>>> willing to consume a simple jar if there is a jar that just hold the >>>> classes and files I want. >>>> >>>> thanks >>>> >>>> - Chris >>> >>> >>> >>> -- >>> Lewis >> > > > > -- > Lewis

