Hi kaveh, Size of crawl database is not an issue with regards to migration between Nutch versions, it is a compatibility issue which you need to be concerned about. There are no tools currently available in Nutch (as far as I know) to read URLs from hdfs and import/inject your crawl data into your hbase cluster. This is mostly due to the nature of direction in which Nutch is moving, which is to do just crawling, at scale, quickly. We don't have immediate necessity or passion to maintain legacy tools within the codebase and have been trying to reduce this aspect of the codebase. This however doesn't help as there was never a tool for this specific purpose anyway (as far as I know). It is however becoming something which I am getting interested about (the notion of obtaining lots of data from various data stores and bootstrapping Nutch with it). I would really like to read the data with Gora and map it somewhere. I am interested in the Nutch injecting code and would be interested to extend it/write new code to solve this issue.
On Tue, Feb 26, 2013 at 5:03 PM, kaveh minooie <[email protected]> wrote: > me again, > > is there anyway that I can import my existing crawldb from a nutch 1.4 > which has about 2.5 B (with a B) links in it and currently resides in a > hdfs file system into webpages table in hbase? > > > and what happened to linkdb in nutch 2.x? > > thanks, > -- *Lewis*

