The crawl for 1M pages completed successfully.  There was an issue with 
doing a copyToLocal but that has already been filed as a HADOOP bug and 
the patch will be included in 0.12.x

Statistics for CrawlDb: crawldb
TOTAL urls:         10839170
retry 0:            10816148
retry 1:             23022

min score:      0.0090
avg score:      0.173
max score:      2119.167

status 1 (db_unfetched):        9899275
status 2 (db_fetched):          667354
status 3 (db_gone):             11195
status 4 (db_redir_temp):       219507
status 5 (db_redir_perm):       41839

Dennis Kubes

Andrzej Bialecki wrote:
> Dennis Kubes wrote:
>>
>>
>> Andrzej Bialecki wrote:
>>> Dennis Kubes wrote:
>>>> I agree there may be subtle bugs.
>>>>
>>>> I can do say a full dmoz crawl (~5M pages) with nutch trunk and hadoop
>>>> 12.1 on a small cluster of 5 machines if this would help?  We have 
>>>> already
>>>>   
>>>
>>> Certainly, that would be most welcome.
>>
>> I will start that up today.
> 
> Thanks!
> 
>>>
>>> 0.12.1 is not out the door yet. I can create a patch that uses the 
>>> latest Hadoop trunk binaries, so that we could test it.
>>
>> I can just pull it down from source.  Let me know if that isn't what 
>> we want'.
> 
> Great, please do.
> 

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to