Dear Otis and Lewis, According to the few tests I made. I feel MySQL has the best performance, compared to HSQL and HBase. HSQL is slower and takes up so much disk space. HBase uses more resources. Under HBase, I couldn't get the Fetch job to complete when holding 5000 pages buffered in memory, without having my laptop getting extremely slow. It finally worked with a flushing frequency to the store of 2500 pages. Under MySQL, it worked out smoothly with a 10000 value.
NoSQL technology scales better, but for a "reasonable" volume MySQL will do the job fine and faster. It would be nice to test Cassandra as Gora backend. Write operations are allegedly faster that Hbase. Haven't tried yet. Alexis On Sun, Jan 16, 2011 at 12:57 PM, McGibbney, Lewis John <[email protected]> wrote: > Hi Otis, > > Thank you for this. From reaading various posts on this list and the roadmap > for Nutch 2.0 I had gathered that using HBase was probably the most supported > option within the community. > > Lewis > > ________________________________________ > From: Otis Gospodnetic [[email protected]] > Sent: 16 January 2011 10:45 > To: [email protected] > Subject: Re: Database data storage question > > There are lots of factors to consider, so one can't give a good general > answer, > but: > > Nutch already uses HBase (trunk), so that's +1 for HBase. HBase makes it easy > to scale and has built-in replication thanks to being built on top of HDFS. > > Otis > ---- > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > > > ----- Original Message ---- >> From: "McGibbney, Lewis John" <[email protected]> >> To: "[email protected]" <[email protected]> >> Sent: Fri, January 14, 2011 8:00:50 AM >> Subject: Database data storage question >> >> Hello List, >> >> I am gathering information on the above topic as I intend to integrate a >>database to store fetched data. I would like community input of any >>experiences >>using different database implementations before doing so. E.g. comparison >>between HBase & MySQL etc. >> >> Thank you >> >> Lewis >> >> >> Glasgow Caledonian University is a registered Scottish charity, number >>SC021474 >> >> Winner: Times Higher Education's Widening Participation Initiative of the >> Year >>2009 and Herald Society's Education Initiative of the Year 2009 >>http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html >>l >> > > Email has been scanned for viruses by Altman Technologies' email management > service - www.altman.co.uk/emailsystems > > Glasgow Caledonian University is a registered Scottish charity, number > SC021474 > > Winner: Times Higher Education’s Widening Participation Initiative of the > Year 2009 and Herald Society’s Education Initiative of the Year 2009 > http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html >

