Chris,

How did you get on with this..any progress?

I wanted to have a look today but got caught up identifying problems
in gora-cassandra v0.2.1.

Did you find a method to reuse generated webpage classes?

Lewis

On Wed, Sep 19, 2012 at 10:06 PM, Chris Gerken
<[email protected]> wrote:
> No.  This has nothing to do with ant.  The nutch job has been built and it 
> has been run. As part of that ant build some Avro classes were built (e.g. 
> WebPage) specifically for the storage of crawled data into Cassandra via 
> gora. It seems to me that as I build a completely different job - one that's 
> going to run in hadoop and access the crawled data from Cassandra - that I 
> can reuse the the classes that the nutch build created (e.g. WebPage) instead 
> of rebuilding them from scratch.  So I know those Avro classes are there 
> somewhere.  What I don't know is which ones they are and what auxiliary files 
> they prereq.
>
> So my question is: Do those files that I need to access the crawled data in 
> Cassandra exist in a reusable jar somewhere as a result of the nutch build?  
> I'm not interested in the source, just the actual class files.
>
> Chris Gerken
>
>
>
> On Sep 19, 2012, at 3:56 PM, Lewis John Mcgibbney wrote:
>
>> can you not just do 'ant job' from cmdline?
>>
>> Is this what you mean?
>>
>> From Nutch TLD you can do 'ant -projecthelp' to see a fully annotated
>> description of all of the possible ant tasks.
>>
>> hth
>>
>> On Wed, Sep 19, 2012 at 9:51 PM, Chris Gerken
>> <[email protected]> wrote:
>>> Hello,
>>>
>>> We've set up nutch and gora to gather some crawling data which is now 
>>> stored in a Cassandra column family.  Is there some easy way to get the 
>>> Avro classes used for the crawl, along with any necessary supporting files, 
>>> into a hadoop job?  I'm building the hadoop job with maven, but am willing 
>>> to consume a simple jar if there is a jar that just hold the classes and 
>>> files I want.
>>>
>>> thanks
>>>
>>> - Chris
>>
>>
>>
>> --
>> Lewis
>



-- 
Lewis

Reply via email to