Hi there Jay, Here are some numbers that a colleague and I presented in my graduate computer science seminar class on search engines in the Spring 05' semester at USC. The numbers measure the efficiency and scalability of several of the plugin content extractors for Nutch (PDF, WORD, RSS, etc.). The tests were performed on a RedHat Linux 7.3 Box, with 1.3 GB RAM, and a 10 GB HD, and a Pentium III 500 Mhz processor.
The presentation is geared towards the parse-rss plugin that I wrote, although they should give you an idea of the other content extractors too. Hope they help, here's the link to the presentation: http://baron.pagemewhen.com:8080/~chris/RSS-Nutch-Eval.ppt Cheers, Chris ______________________________________________ Chris A. Mattmann [EMAIL PROTECTED] Staff Member Modeling and Data Management Systems Section (387) Data Management Systems and Technologies Group _________________________________________________ Jet Propulsion Laboratory Pasadena, CA Office: 171-266B Mailstop: 171-246 _______________________________________________________ Disclaimer: The opinions presented within are my own and do not reflect those of either NASA, JPL, or the California Institute of Technology. > -----Original Message----- > From: webmaster [mailto:[EMAIL PROTECTED] > Sent: Wednesday, July 20, 2005 8:02 PM > To: [email protected] > Subject: benchmarking > > hey could some of you post your speeds (sorting,indexing, pages a > sec/documents a sec) and system specs I'm trying to compile a database of > which of nutches functions are better suited to run on what hardware. also > if > any of you have a sun box could you post its specs and some of the info > for > database sorting speeds and indexing speed, anything that uses full cpu. > whats everyones pages a sec top score??? e-mail me @ > [EMAIL PROTECTED] > I'll post a webpage with the results > Thanks, > -Jay Pound
