Re: PerformanceEvaluation results

Tim Robertson Thu, 02 Feb 2012 08:00:43 -0800

Thanks all for the comments.  Ganglia set up is in progress.  We'll
keep plugging away.


I should mention that this is our first real dev cluster for
evaluation, and production would likely be more like a 6-7+ node
cluster of better machines, but for sure we are the small fry
leprechauns Ted Dunning refers to in his presentations - we're trying
to understand the potential and do some cost calculations before
buying hardware.

I do feel the HBase project would benefit from some example metrics
for various operations and hardware or else it will remain a difficult
technology for some people to get into with confidence.  We'll blog
our findings, and hopefully it might be of benefit to other
leprechauns.  If we can prove the concept, we're more likely to be
able to get $ to grow.




On Thu, Feb 2, 2012 at 5:24 AM, Michel Segel <[email protected]> wrote:
> Tim,
>
> Here's the problem in a nutshell,
> With respect to hardware, you have  5.4k rpms ? 6 drive and 8 cores?
> Small slow drives, and still  a ratio less than one when you compare drives 
> to spindles.
>
> I appreciate that you want to maximize performance, but when it comes to 
> tuning, you have to start before you get your hardware.
>
>  You are asking a question about tuning, but how can we answer if the numbers 
> are ok?
> Have you looked at your GCs and implemented mslabs? We don't know. Network 
> configuration?
>
> I mean that there's a lot missing and fine tuning a cluster is something you 
> have to do on your own. I guess I could say your numbers look fine to me for 
> that config... But honestly, it would be a swag.
>
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Feb 1, 2012, at 7:09 AM, Tim Robertson <[email protected]> wrote:
>
>> Thanks Michael,
>>
>> It's a small cluster, but is the hardware so bad?  We are particularly
>> interested in relatively low load for random read write (2000
>> transactions per second on <1k rows) but a decent full table scan
>> speed, as we aim to mount Hive tables on HBase backed tables.
>>
>> Regarding tuning... not exactly sure which you would be interested in
>> seeing.  The config is all here:
>> http://code.google.com/p/gbif-common-resources/source/browse/#svn%2Fcluster-puppet%2Fmodules%2Fhadoop%2Ftemplates
>>
>> Cheers,
>> Tim
>>
>>
>>
>> On Wed, Feb 1, 2012 at 1:56 PM, Michael Segel <[email protected]> 
>> wrote:
>>> No.
>>> What tuning did you do?
>>> Why such a small cluster?
>>>
>>> Sorry, but when you start off with a bad hardware configuration, you can 
>>> get Hadoop/HBase to work, but performance will always be sub-optimal.
>>>
>>>
>>>
>>> Sent from my iPhone
>>>
>>> On Feb 1, 2012, at 6:52 AM, "Tim Robertson" <[email protected]> 
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> We have a 3 node cluster (CD3u2) with the following hardware:
>>>>
>>>> RegionServers (+DN + TT)
>>>>  CPU: 2x Intel(R) Xeon(R) CPU E5630 @ 2.53GHz (quad)
>>>>  Disks: 6x250G SATA 5.4K
>>>>  Memory: 24GB
>>>>
>>>> Master (+ZK, JT, NN)
>>>>  CPU: Intel(R) Xeon(R) CPU X3363 @ 2.83GHz, 2x6MB (quad)
>>>>  Disks: 2x500G SATA 7.2K
>>>>  Memory: 8GB
>>>>
>>>> Memory wise, we have:
>>>> Master:
>>>>  NN: 1GB
>>>>  JT: 1GB
>>>>  HBase master: 6GB
>>>>  ZK: 1GB
>>>> RegionServers:
>>>>  RegionServer: 6GB
>>>>  TaskTracker: 1GB
>>>>  11 Mappers @ 1GB each
>>>>  7 Reducers @ 1GB each
>>>>
>>>> HDFS was empty, and I ran randomWrite and scan both with number
>>>> clients of 50 (seemed to spawn 500 Mappers though...)
>>>>
>>>> randomWrite:
>>>> 12/02/01 13:27:47 INFO mapred.JobClient:     ROWS=52428500
>>>> 12/02/01 13:27:47 INFO mapred.JobClient:     ELAPSED_TIME=84504886
>>>>
>>>> scan:
>>>> 12/02/01 13:42:52 INFO mapred.JobClient:     ROWS=52428500
>>>> 12/02/01 13:42:52 INFO mapred.JobClient:     ELAPSED_TIME=8158664
>>>>
>>>> Would I be correct in thinking that this is way below what is to be
>>>> expected of this hardware?
>>>> We're setting up ganglia now to start debugging, but any suggestions
>>>> on how to diagnose this would be greatly appreciated.
>>>>
>>>> Thanks!
>>>> Tim
>>

Re: PerformanceEvaluation results

Reply via email to