Ken:

Thank you very much for the info, I applied it my testing enviornment
and I could see big changes in my bandwidth utilization. I have tried
it on a simple server and i could get a rather constant 25-29
pages/sec in a vertical crawl. Previously I was getting about 5-7
pages/sec.

Cheers
Zaheed


On 7/11/06, Ken Krugler <[EMAIL PROTECTED]> wrote:
> >On 6/28/06, Ken Krugler <[EMAIL PROTECTED]> wrote:
> >>Hi Doug,
> >>
> >>>Did you ever resolve your 0.8 vs 0.7 crawling performance question? I'm
> >>>running into a similar problem.
> >>
> >>We wound up dramatically increasing the number of threads, which
> >>seemed to help solve the bandwidth utilization problem. With Nutch
> >>0.7 we were running about 200 threads per crawler, and with Nutch 0.8
> >>it's more like 2000+ threads...though you have to reduce the thread
> >>stack size in this type of configuration.
> >
> >Hi Ken
> >
> >Could you please give me some clue regarding the stack size you are
> >seeing the best bandwidth utilization...
>
> Note that stack size twiddling is only done to allow for increasing
> the number of fetcher threads without running of out JVM or OS memory.
>
> >  I have the following
> >
> >core file size          (blocks, -c) 0
> >data seg size           (kbytes, -d) unlimited
> >max nice                        (-e) 20
> >file size               (blocks, -f) unlimited
> >pending signals                 (-i) unlimited
> >max locked memory       (kbytes, -l) unlimited
> >max memory size         (kbytes, -m) unlimited
> >open files                      (-n) 1024
> >pipe size            (512 bytes, -p) 8
> >POSIX message queues     (bytes, -q) unlimited
> >max rt priority                 (-r) unlimited
> >stack size              (kbytes, -s) 8192
> >cpu time               (seconds, -t) unlimited
> >max user processes              (-u) unlimited
> >virtual memory          (kbytes, -v) unlimited
> >file locks                      (-x) unlimited
> >
> >What stack size should I play with the default seems to be 8192kb ?
>
> We use something like ulimit -s 512 to set a 512K stack size at the OS level.
>
> >also any onther parameters I should tweak?
>
> We specify -Xss512K when running the fetch map-reduce task to set the
> stack size in the JVM. But I don't remember off the top of my head
> which of the many different config files this gets set in. Stefan?
> >
> >I often get too many open
> >files problem
>
> That's a separate issue.
>
> >and I never could use my full bandwidth.. I am using
> >about 10% of my bandwidth. I have played around with ulimit -n "very
> >high number" which solves the "too many open files" but its not
> >utilizing all my bandwidth, any help will be very much appreciated.
>
> Try increasing the number of fetcher threads and reducing the stack
> size. With 10 high-end servers in a cluster, we were able to max out
> a 100mbs connection for brief periods, though as our crawl converged
> (because it's a vertical crawl) the max rate drops eventually to
> about 50mps.
>
> -- Ken
> --
> Ken Krugler
> Krugle, Inc.
> +1 530-210-6378
> "Find Code, Find Answers"
>


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to