>On 6/28/06, Ken Krugler <[EMAIL PROTECTED]> wrote:
>>Hi Doug,
>>
>>>Did you ever resolve your 0.8 vs 0.7 crawling performance question? I'm
>>>running into a similar problem.
>>
>>We wound up dramatically increasing the number of threads, which
>>seemed to help solve the bandwidth utilization problem. With Nutch
>>0.7 we were running about 200 threads per crawler, and with Nutch 0.8
>>it's more like 2000+ threads...though you have to reduce the thread
>>stack size in this type of configuration.
>
>Hi Ken
>
>Could you please give me some clue regarding the stack size you are
>seeing the best bandwidth utilization...

Note that stack size twiddling is only done to allow for increasing 
the number of fetcher threads without running of out JVM or OS memory.

>  I have the following
>
>core file size          (blocks, -c) 0
>data seg size           (kbytes, -d) unlimited
>max nice                        (-e) 20
>file size               (blocks, -f) unlimited
>pending signals                 (-i) unlimited
>max locked memory       (kbytes, -l) unlimited
>max memory size         (kbytes, -m) unlimited
>open files                      (-n) 1024
>pipe size            (512 bytes, -p) 8
>POSIX message queues     (bytes, -q) unlimited
>max rt priority                 (-r) unlimited
>stack size              (kbytes, -s) 8192
>cpu time               (seconds, -t) unlimited
>max user processes              (-u) unlimited
>virtual memory          (kbytes, -v) unlimited
>file locks                      (-x) unlimited
>
>What stack size should I play with the default seems to be 8192kb ?

We use something like ulimit -s 512 to set a 512K stack size at the OS level.

>also any onther parameters I should tweak?

We specify -Xss512K when running the fetch map-reduce task to set the 
stack size in the JVM. But I don't remember off the top of my head 
which of the many different config files this gets set in. Stefan?
>
>I often get too many open
>files problem

That's a separate issue.

>and I never could use my full bandwidth.. I am using
>about 10% of my bandwidth. I have played around with ulimit -n "very
>high number" which solves the "too many open files" but its not
>utilizing all my bandwidth, any help will be very much appreciated.

Try increasing the number of fetcher threads and reducing the stack 
size. With 10 high-end servers in a cluster, we were able to max out 
a 100mbs connection for brief periods, though as our crawl converged 
(because it's a vertical crawl) the max rate drops eventually to 
about 50mps.

-- Ken
-- 
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"Find Code, Find Answers"


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to