Assume that we have no restriction for max.inlinks, and we have two 
crawl namely crawl_depth1 than continue same crawl  with crawl_depth2. 
There are two cases for obtainning final linkdb.
First one is run

./nutch invertlinks linkdb_depth1 segment_depth1
./nutch invertlinks linkdb_depth2 segment_depth2
./nutch mergelinkdb final_linkdb_1 linkdb_depth1 linkdb_depth2

and second one is run.

/nutch invertlinks final_linkdb2 segment_depth1 segment_depth2

is there any differenece between final_linkdb1 and final_linkdb2 ? I 
mean Is merge operation is loosless in this case?


Andrzej Bialecki wrote:

> Murat Ali Bayir wrote:
>
>> Hi everbody, I want to know how mergelinkdb function works. Assume 
>> that we have two linkdb in the first one
>> the URLx is referred by URLa, URLb and URLc in the second one same 
>> URLx is refferred by URLa, URLk. I want to
>> know structure of the output linkdb.
>> does it contains one entry for URLx referred by URLa, URLb, URLc and 
>> URLk or
>> just append second linkdb to first one and contains two entry for 
>> URLx given below
>> URLx <- URLa  URLb, URLc and
>> ..
>> ..
>> ..
>> URLx <- URLa  URLk
>>
>>
>
> No, these two entries are merged into one (that's why the name :) ). 
> At any given time, in a valid linkdb there is exactly zero or one 
> entries for any given target URL.
>
> You should note that there is a limit set on how many inlinks we are 
> going to store for any given URL (db.max.inlinks) - which may lead to 
> some surprises. If e.g. the linkdbA already hit that limit, and the 
> other linkdbB didn't, then two scenarios are possible - either you get 
> the list just containing all links from linkdbA and none from linkdbB, 
> or you get the list containing all links from linkdbB plus some links 
> from linkdbA ...
>


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to