Ken: Thank you very much for the info, I applied it my testing enviornment and I could see big changes in my bandwidth utilization. I have tried it on a simple server and i could get a rather constant 25-29 pages/sec in a vertical crawl. Previously I was getting about 5-7 pages/sec.
Cheers Zaheed On 7/11/06, Ken Krugler <[EMAIL PROTECTED]> wrote: > >On 6/28/06, Ken Krugler <[EMAIL PROTECTED]> wrote: > >>Hi Doug, > >> > >>>Did you ever resolve your 0.8 vs 0.7 crawling performance question? I'm > >>>running into a similar problem. > >> > >>We wound up dramatically increasing the number of threads, which > >>seemed to help solve the bandwidth utilization problem. With Nutch > >>0.7 we were running about 200 threads per crawler, and with Nutch 0.8 > >>it's more like 2000+ threads...though you have to reduce the thread > >>stack size in this type of configuration. > > > >Hi Ken > > > >Could you please give me some clue regarding the stack size you are > >seeing the best bandwidth utilization... > > Note that stack size twiddling is only done to allow for increasing > the number of fetcher threads without running of out JVM or OS memory. > > > I have the following > > > >core file size (blocks, -c) 0 > >data seg size (kbytes, -d) unlimited > >max nice (-e) 20 > >file size (blocks, -f) unlimited > >pending signals (-i) unlimited > >max locked memory (kbytes, -l) unlimited > >max memory size (kbytes, -m) unlimited > >open files (-n) 1024 > >pipe size (512 bytes, -p) 8 > >POSIX message queues (bytes, -q) unlimited > >max rt priority (-r) unlimited > >stack size (kbytes, -s) 8192 > >cpu time (seconds, -t) unlimited > >max user processes (-u) unlimited > >virtual memory (kbytes, -v) unlimited > >file locks (-x) unlimited > > > >What stack size should I play with the default seems to be 8192kb ? > > We use something like ulimit -s 512 to set a 512K stack size at the OS level. > > >also any onther parameters I should tweak? > > We specify -Xss512K when running the fetch map-reduce task to set the > stack size in the JVM. But I don't remember off the top of my head > which of the many different config files this gets set in. Stefan? > > > >I often get too many open > >files problem > > That's a separate issue. > > >and I never could use my full bandwidth.. I am using > >about 10% of my bandwidth. I have played around with ulimit -n "very > >high number" which solves the "too many open files" but its not > >utilizing all my bandwidth, any help will be very much appreciated. > > Try increasing the number of fetcher threads and reducing the stack > size. With 10 high-end servers in a cluster, we were able to max out > a 100mbs connection for brief periods, though as our crawl converged > (because it's a vertical crawl) the max rate drops eventually to > about 50mps. > > -- Ken > -- > Ken Krugler > Krugle, Inc. > +1 530-210-6378 > "Find Code, Find Answers" > ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
