Re: IndexOptimizer bug?

2005-08-04 Thread [EMAIL PROTECTED]
Dear Michael, I writed a tool OptimizeIndex.java, this is faster and there aren't questions: what it is do? After you optimize index with IndexOptimizer, the number of searching for 'http' is the same? Regards, Ferenc Michael Nebel wrotte: Hi, I fixed the problem with the following

Re: Documentation

2005-08-04 Thread Stefan Groschupf
try: http://wiki.media-style.com/display/nutchDocu/Home Stefan Am 04.08.2005 um 19:54 schrieb Nishant Chandra: Hi, I am new to nutch. Is there any articles/tutorials which explains the internal working of the crawler (crawl stratergy) etc. Nishant

Re: near-term plan

2005-08-04 Thread Stefan Groschupf
Hi Doug, The slides from my talk yesterday at OSCON give some hints on how to get started. We need a MapReduce tutorial. http://wiki.apache.org/nutch/Presentations Can you explan what this means: Page 20: - cheduling is bottleneck, not disk, network or CPU? Thanks. Stefan

Re: near-term plan

2005-08-04 Thread Doug Cutting
Stefan Groschupf wrote: http://wiki.apache.org/nutch/Presentations Can you explan what this means: Page 20: - cheduling is bottleneck, not disk, network or CPU? I mean that neither the CPUs, disks or network are at 100% of capacity. Disks are running around 50% busy, CPUs a bit higher, and

Re: near-term plan

2005-08-04 Thread Piotr Kosiorowski
Hello, I think it is good idea to release ASAP. I wanted to contribute my code for fault-tolerant searching - it takes more time than I expected because as some of you know in meantime I become a father. But I hope I will be able to send something for comments early next week. I will look at

Re: near-term plan

2005-08-04 Thread Jay Pound
Doug I also ran into this when I was testing ndfs the system would have to wait for the namenode to tell the datanodes what data to recieve and which data to replicate, I'm currently setting up lustre to see how it works, its at the kernel level that it operates, do you think if the namenode was

Re: near-term plan

2005-08-04 Thread Doug Cutting
Jay Pound wrote: Doug I also ran into this when I was testing ndfs the system would have to wait for the namenode to tell the datanodes what data to recieve and which data to replicate When did you test this? Which version of Nutch? How many nodes? My benchmark results from just a few days

Detecting unmodified content patches (Re: near-term plan)

2005-08-04 Thread Andrzej Bialecki
Doug Cutting wrote: Andrzej Bialecki wrote: So, I would propose a deadline of Aug 8 for the last commits, and then perhaps Aug 15 for the release? Sounds good to me. Thanks for helping with this! Unfortunately, the patches related to detecting the unmodified content will have to wait

[jira] Closed: (NUTCH-65) index-more plugin can't parse large set of modification-date

2005-08-04 Thread Andrzej Bialecki (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-65?page=all ] Andrzej Bialecki closed NUTCH-65: -- Resolution: Fixed Patches applied. Thanks! index-more plugin can't parse large set of modification-date