yes, the id will be automatically stored in HBase and the outlinks that extract from seed url will not have any of this information. the information is store in the metadata of current url, as part of the metadata of current url.
On Fri, May 10, 2013 at 10:59 PM, Renato Marroquín Mogrovejo < [email protected]> wrote: > Hi Feng, > > So this means I could put any type of information for the seed urls but > what about the ones fetched in the next cycles? They won't have any of this > information right? > And where is this information stored? As part of the fetched or the parsed > information? > Thanks! > > Renato M. > On May 10, 2013 9:46 AM, "Adriana Farina" <[email protected]> > wrote: > > > And the ids and will be automatically stored in HBase? > > > > > > 2013/5/10 feng lu <[email protected]> > > > > > Hi Adriana > > > > > > you can add metadata to each seed url like this > > > > > > http://www.example.com id=123 > > > http://www.example.com id=456 > > > > > > each CrawlDatum include many metadatas, you can use that to store any > > > information about url. > > > > > > > > > > > > > > > > > > On Fri, May 10, 2013 at 5:26 PM, Adriana Farina > > > <[email protected]>wrote: > > > > > > > Hello, > > > > > > > > I'm using Nutch 2.1 on top of Hadoop 1.0.4, with HBase 0.90.4 as > > storage > > > > system. I run Nutch in distributed mode. > > > > > > > > I need to associate an id to each url inside the seed list of nutch > and > > > to > > > > store this information in HBase. I think that I have to create a new > > > column > > > > family in HBase and modify the gora and hbase configuration files in > > the > > > > nutch conf folder. > > > > > > > > However, I think I need to modify the code of Nutch, but I don't know > > > which > > > > classes I have to modify. I googled a bit, but I didn't find any > > > > documentation; I've searched inside the code but I wasn't able to > solve > > > my > > > > problem. > > > > > > > > Can anybody help me? > > > > > > > > Thank you! > > > > > > > > > > > > -- > > > > Adriana Farina > > > > > > > > > > > > > > > > -- > > > Don't Grow Old, Grow Up... :-) > > > > > > > > > > > -- > > Adriana Farina > > > -- Don't Grow Old, Grow Up... :-)

