Hi,
I'm trying to gage whether one crawl server is performing well, and
I'm having a tough time trying to determine if I could increase settings
to gain faster crawls, or if I'm approaching the max the server can
handle. The server is a dual AMD Althon 2200 with 2GB of ram hanging off
of a dedicated 10Mb connection. When processing 1 million url segment, I
see these speeds in the log:
281147 pages, 142413 errors, 11.4 pages/s, 1918 kb/s,
281158 pages, 142422 errors, 11.4 pages/s, 1918 kb/s,
281170 pages, 142428 errors, 11.4 pages/s, 1918 kb/s,
281188 pages, 142430 errors, 11.4 pages/s, 1918 kb/s,
281206 pages, 142444 errors, 11.4 pages/s, 1918 kb/s,
281218 pages, 142452 errors, 11.4 pages/s, 1918 kb/s,
It takes about 29 hours to process this segment, It begins on 04/21 at
10:23pm. It starts running mapreduce on 04/22 at 6:25pm. It finishes at
04/23 at 0156am.
I understand that the time and speed of fetching is totally dependant on
the type of content that's being fetched, but I'm sure there's an
average speed for a particular type of configuration, if anyone can help
me out or needs anything explained out better, please let me know. thank
you!
Jason
-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general