>On 6/28/06, Ken Krugler <[EMAIL PROTECTED]> wrote: >>Hi Doug, >> >>>Did you ever resolve your 0.8 vs 0.7 crawling performance question? I'm >>>running into a similar problem. >> >>We wound up dramatically increasing the number of threads, which >>seemed to help solve the bandwidth utilization problem. With Nutch >>0.7 we were running about 200 threads per crawler, and with Nutch 0.8 >>it's more like 2000+ threads...though you have to reduce the thread >>stack size in this type of configuration. > >Hi Ken > >Could you please give me some clue regarding the stack size you are >seeing the best bandwidth utilization...
Note that stack size twiddling is only done to allow for increasing the number of fetcher threads without running of out JVM or OS memory. > I have the following > >core file size (blocks, -c) 0 >data seg size (kbytes, -d) unlimited >max nice (-e) 20 >file size (blocks, -f) unlimited >pending signals (-i) unlimited >max locked memory (kbytes, -l) unlimited >max memory size (kbytes, -m) unlimited >open files (-n) 1024 >pipe size (512 bytes, -p) 8 >POSIX message queues (bytes, -q) unlimited >max rt priority (-r) unlimited >stack size (kbytes, -s) 8192 >cpu time (seconds, -t) unlimited >max user processes (-u) unlimited >virtual memory (kbytes, -v) unlimited >file locks (-x) unlimited > >What stack size should I play with the default seems to be 8192kb ? We use something like ulimit -s 512 to set a 512K stack size at the OS level. >also any onther parameters I should tweak? We specify -Xss512K when running the fetch map-reduce task to set the stack size in the JVM. But I don't remember off the top of my head which of the many different config files this gets set in. Stefan? > >I often get too many open >files problem That's a separate issue. >and I never could use my full bandwidth.. I am using >about 10% of my bandwidth. I have played around with ulimit -n "very >high number" which solves the "too many open files" but its not >utilizing all my bandwidth, any help will be very much appreciated. Try increasing the number of fetcher threads and reducing the stack size. With 10 high-end servers in a cluster, we were able to max out a 100mbs connection for brief periods, though as our crawl converged (because it's a vertical crawl) the max rate drops eventually to about 50mps. -- Ken -- Ken Krugler Krugle, Inc. +1 530-210-6378 "Find Code, Find Answers" ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
