Dear Otis and Lewis,

According to the few tests I made. I feel MySQL has the best
performance, compared to HSQL and HBase. HSQL is slower and takes up
so much disk space. HBase uses more resources. Under HBase, I couldn't
get the Fetch job to complete when holding 5000 pages buffered in
memory, without having my laptop getting extremely slow. It finally
worked with a flushing frequency to the store of 2500 pages. Under
MySQL, it worked out smoothly with a 10000 value.

NoSQL technology scales better, but for a "reasonable" volume MySQL
will do the job fine and faster.

It would be nice to test Cassandra as Gora backend. Write operations
are allegedly faster that Hbase. Haven't tried yet.

Alexis


On Sun, Jan 16, 2011 at 12:57 PM, McGibbney, Lewis John
<[email protected]> wrote:
> Hi Otis,
>
> Thank you for this. From reaading various posts on this list and the roadmap 
> for Nutch 2.0 I had gathered that using HBase was probably the most supported 
> option within the community.
>
> Lewis
>
> ________________________________________
> From: Otis Gospodnetic [[email protected]]
> Sent: 16 January 2011 10:45
> To: [email protected]
> Subject: Re: Database data storage question
>
> There are lots of factors to consider, so one can't give a good general 
> answer,
> but:
>
> Nutch already uses HBase (trunk), so that's +1 for HBase.  HBase makes it easy
> to scale and has built-in replication thanks to being built on top of HDFS.
>
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
> ----- Original Message ----
>> From: "McGibbney, Lewis John" <[email protected]>
>> To: "[email protected]" <[email protected]>
>> Sent: Fri, January 14, 2011 8:00:50 AM
>> Subject: Database data storage question
>>
>> Hello List,
>>
>> I am gathering information on the above topic as I intend to  integrate a
>>database to store fetched data. I would like community input of any  
>>experiences
>>using different database implementations before doing so. E.g.  comparison
>>between HBase & MySQL etc.
>>
>> Thank  you
>>
>> Lewis
>>
>>
>> Glasgow Caledonian University is a registered  Scottish charity, number
>>SC021474
>>
>> Winner: Times Higher Education's  Widening Participation Initiative of the 
>> Year
>>2009 and Herald Society's  Education Initiative of the Year 2009
>>http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html
>>l
>>
>
> Email has been scanned for viruses by Altman Technologies' email management 
> service - www.altman.co.uk/emailsystems
>
> Glasgow Caledonian University is a registered Scottish charity, number 
> SC021474
>
> Winner: Times Higher Education’s Widening Participation Initiative of the 
> Year 2009 and Herald Society’s Education Initiative of the Year 2009
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html
>

Reply via email to