Hi Adriana

you can add metadata to each seed url like this

http://www.example.com  id=123
http://www.example.com  id=456

each CrawlDatum include many metadatas, you can use that to store any
information about url.





On Fri, May 10, 2013 at 5:26 PM, Adriana Farina
<[email protected]>wrote:

> Hello,
>
> I'm using Nutch 2.1 on top of Hadoop 1.0.4, with HBase 0.90.4 as storage
> system. I run Nutch in distributed mode.
>
> I need to associate an id to each url inside the seed list of nutch and to
> store this information in HBase. I think that I have to create a new column
> family in HBase and modify the gora and hbase configuration files in the
> nutch conf folder.
>
> However, I think I need to modify the code of Nutch, but I don't know which
> classes I have to modify. I googled a bit, but I didn't find any
> documentation; I've searched inside the code but I wasn't able to solve my
> problem.
>
> Can anybody help me?
>
> Thank you!
>
>
> --
> Adriana Farina
>



-- 
Don't Grow Old, Grow Up... :-)

Reply via email to