Re: near-term plan

2005-08-05 Thread webmaster
I was using a nightly build that Pitor had given me the nutch-nightly.jar (actually it was nutch-dev0.7.jar or something of that nature) I tested it on the windows platform, I had 5 machines running it, 2 at 100 mbit both quad p3 xeon, 1 pentium 4 3ghz hyperthreading, 1 amd athlon xp 2600+ and

Re: near-term plan

2005-08-04 Thread Stefan Groschupf
Hi Doug, The slides from my talk yesterday at OSCON give some hints on how to get started. We need a MapReduce tutorial. http://wiki.apache.org/nutch/Presentations Can you explan what this means: Page 20: - cheduling is bottleneck, not disk, network or CPU? Thanks. Stefan

Re: near-term plan

2005-08-04 Thread Doug Cutting
Stefan Groschupf wrote: http://wiki.apache.org/nutch/Presentations Can you explan what this means: Page 20: - cheduling is bottleneck, not disk, network or CPU? I mean that neither the CPUs, disks or network are at 100% of capacity. Disks are running around 50% busy, CPUs a bit higher, and

Re: near-term plan

2005-08-04 Thread Piotr Kosiorowski
at the Jira to check if some more bugs can be fixed before deadline proposed by Andrzej. Regards Piotr Andrzej Bialecki wrote: Doug Cutting wrote: Here's a near-term plan for Nutch. 1. Release Nutch 0.7, based on current trunk. We should do this ASAP. Are there bugs in trunk that we need

Re: near-term plan

2005-08-04 Thread Jay Pound
replication throughput running level 1) - Original Message - From: Doug Cutting [EMAIL PROTECTED] To: nutch-dev@lucene.apache.org Sent: Thursday, August 04, 2005 3:54 PM Subject: Re: near-term plan Stefan Groschupf wrote: http://wiki.apache.org/nutch/Presentations Can you explan

Re: near-term plan

2005-08-04 Thread Doug Cutting
Jay Pound wrote: Doug I also ran into this when I was testing ndfs the system would have to wait for the namenode to tell the datanodes what data to recieve and which data to replicate When did you test this? Which version of Nutch? How many nodes? My benchmark results from just a few days

Detecting unmodified content patches (Re: near-term plan)

2005-08-04 Thread Andrzej Bialecki
Doug Cutting wrote: Andrzej Bialecki wrote: So, I would propose a deadline of Aug 8 for the last commits, and then perhaps Aug 15 for the release? Sounds good to me. Thanks for helping with this! Unfortunately, the patches related to detecting the unmodified content will have to wait