Himanish, it hard to say without trend graphs. Setup ganglia and get fsreadlatancy, as well as thread count graphs to see what the issue might be.
-Jack On Thu, May 19, 2011 at 11:46 AM, Himanish Kushary <[email protected]> wrote: > Hi, > > Could anybody suggest what may be the issue. I ran YCSB on both the > development and production servers. > > The loading of data performs better on the production cluster but the 50% > read-50% write workloada performs better on the development.The average > latency for read shoots up to 30-40 ms on production, for development it is > between 10-20 ms.This was while running with 10 threads maintaining 1000 tps > using this command - [*java -cp build/ycsb.jar:db/hbase/conf:db/hbase/lib/* > com.yahoo.ycsb.Client -t -db com.yahoo.ycsb.db.HBaseClient -P > workloads/workloada -p columnfamily=data -p operationcount=1000000 -s > -threads 10 -target 1000*] > > The clusters seems to perform similiarly using YCSB when the tps and > operationcount is lowered to 500 and 100000 respectively. > > We ran our Map-Reduces on the two clusters (assuming that we will not reach > 1000 tps or that much of operationcount from the map-reduce), but strangely > the development cluster performed better. > > Any suggestions will be really helpful? > > Thanks > Himanish > > > > On Mon, May 16, 2011 at 4:43 PM, Himanish Kushary <[email protected]>wrote: > >> *PRODUCTION SERVER CPU INFO* >> processor : 0 >> vendor_id : AuthenticAMD >> cpu family : 16 >> model : 9 >> model name : AMD Opteron(tm) Processor 6174 >> stepping : 1 >> cpu MHz : 2200.022 >> cache size : 512 KB >> physical id : 1 >> siblings : 12 >> core id : 0 >> cpu cores : 12 >> apicid : 16 >> fpu : yes >> fpu_exception : yes >> cpuid level : 5 >> wp : yes >> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat >> pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp >> lm 3dnowext 3dnow constant_tsc nonstop_tsc pni cx16 popcnt lahf_lm >> cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse >> 3dnowprefetch osvw >> bogomips : 4400.03 >> TLB size : 1024 4K pages >> clflush size : 64 >> cache_alignment : 64 >> address sizes : 48 bits physical, 48 bits virtual >> power management: ts ttp tm stc 100mhzsteps hwpstate [8] >> >> >> *DEVELOPMENT SERVER CPU INFO* >> >> processor : 0 >> vendor_id : GenuineIntel >> cpu family : 6 >> model : 30 >> model name : Intel(R) Core(TM) i7 CPU Q 740 @ 1.73GHz >> stepping : 5 >> cpu MHz : 933.000 >> cache size : 6144 KB >> physical id : 0 >> siblings : 8 >> core id : 0 >> cpu cores : 4 >> apicid : 0 >> initial apicid : 0 >> fpu : yes >> fpu_exception : yes >> cpuid level : 11 >> wp : yes >> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat >> pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm >> constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf >> pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 >> popcnt lahf_lm ida tpr_shadow vnmi flexpriority ept vpid >> bogomips : 3457.61 >> clflush size : 64 >> cache_alignment : 64 >> address sizes : 36 bits physical, 48 bits virtual >> power management: >> >> >> >> On Mon, May 16, 2011 at 4:26 PM, Jack Levin <[email protected]> wrote: >> >>> What is the clock rate of your CPUs (desktop vs blade)? >>> >>> -Jack >>> >>> On Mon, May 16, 2011 at 1:24 PM, Himanish Kushary <[email protected]> >>> wrote: >>> > Yes, it is only the HW that was changed . All the configurations are >>> kept at >>> > default from the cloudera installer. >>> > >>> > The regionserver logs semms ok. >>> > >>> > On Mon, May 16, 2011 at 3:20 PM, Jean-Daniel Cryans < >>> [email protected]>wrote: >>> > >>> >> Ok I see... so the only thing that changed is the HW right? No >>> >> upgrades to a new version? Also could it be possible that you changed >>> >> some configs (or missed them)? BTW counting has a parameter for >>> >> scanner caching, like you would write: count "myTable", CACHE = 1000 >>> >> >>> >> and it should stream through your data. >>> >> >>> >> Anything weird in the region server logs? >>> >> >>> >> J-D >>> >> >>> >> On Mon, May 16, 2011 at 12:13 PM, Himanish Kushary <[email protected] >>> > >>> >> wrote: >>> >> > Thanks for the reply. We ran the TestDFSIO benchmark on both the >>> >> development >>> >> > and production and found the production to be better.The statistics >>> are >>> >> > shown below. >>> >> > >>> >> > But once we bring HBase into the picture things gets reversed :-( >>> >> > >>> >> > The count operation,map-reduces etc becomes less performing on the >>> >> > production box.We are using Pseudo Distribution mode in both the >>> >> development >>> >> > and production servers for both hadoop and hbase. >>> >> > >>> >> > *DEVELOPMENT SERVER* >>> >> > >>> >> > 11/05/15 21:26:26 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write >>> >> > 11/05/15 21:26:26 INFO fs.TestDFSIO: Date & time: Sun May >>> 15 >>> >> > 21:26:26 EDT 2011 >>> >> > 11/05/15 21:26:26 INFO fs.TestDFSIO: Number of files: 10 >>> >> > 11/05/15 21:26:26 INFO fs.TestDFSIO: Total MBytes processed: 10000 >>> >> > 11/05/15 21:26:26 INFO fs.TestDFSIO: Throughput mb/sec: >>> >> > 58.09495038691237 >>> >> > 11/05/15 21:26:26 INFO fs.TestDFSIO: Average IO rate mb/sec: >>> >> > 59.699485778808594 >>> >> > 11/05/15 21:26:26 INFO fs.TestDFSIO: IO rate std deviation: >>> >> > 10.54547265175703 >>> >> > 11/05/15 21:26:26 INFO fs.TestDFSIO: Test exec time sec: 163.354 >>> >> > 11/05/15 21:26:26 INFO fs.TestDFSIO: >>> >> > >>> >> > 11/05/15 21:28:44 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read >>> >> > 11/05/15 21:28:44 INFO fs.TestDFSIO: Date & time: Sun May >>> 15 >>> >> > 21:28:44 EDT 2011 >>> >> > 11/05/15 21:28:44 INFO fs.TestDFSIO: Number of files: 10 >>> >> > 11/05/15 21:28:44 INFO fs.TestDFSIO: Total MBytes processed: 10000 >>> >> > 11/05/15 21:28:44 INFO fs.TestDFSIO: Throughput mb/sec: >>> >> > 682.4075337791729 >>> >> > 11/05/15 21:28:44 INFO fs.TestDFSIO: Average IO rate mb/sec: >>> >> > 755.5845947265625 >>> >> > 11/05/15 21:28:44 INFO fs.TestDFSIO: IO rate std deviation: >>> >> > 229.60029445080488 >>> >> > 11/05/15 21:28:44 INFO fs.TestDFSIO: Test exec time sec: 63.896 >>> >> > 11/05/15 21:28:44 INFO fs.TestDFSIO: >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > *PRODUCTION SERVER* >>> >> > >>> >> > 5/16 01:00:43 INFO fs.TestDFSIO: ----- TestDFSIO ----- : *WRITE >>> >> PERFORMANCE* >>> >> > >>> >> > 11/05/16 01:00:43 INFO fs.TestDFSIO: Date & time: Mon May 16 01:00:43 >>> >> > GMT+00:00 2011 >>> >> > >>> >> > 11/05/16 01:00:43 INFO fs.TestDFSIO: Number of files: 10 >>> >> > >>> >> > 11/05/16 01:00:43 INFO fs.TestDFSIO: Total MBytes processed: 10000 >>> >> > >>> >> > 11/05/16 01:00:43 INFO fs.TestDFSIO: Throughput mb/sec: >>> 69.25447557048375 >>> >> > >>> >> > 11/05/16 01:00:43 INFO fs.TestDFSIO: Average IO rate mb/sec: >>> >> > 70.06581115722656 >>> >> > >>> >> > 11/05/16 01:00:43 INFO fs.TestDFSIO: IO rate std deviation: >>> >> > 7.243961483443693 >>> >> > >>> >> > 11/05/16 01:00:43 INFO fs.TestDFSIO: Test exec time sec: 126.896 >>> >> > >>> >> > >>> >> > 5/16 01:25:01 INFO fs.TestDFSIO: ----- TestDFSIO ----- : *READ >>> >> PERFORMANCE* >>> >> > >>> >> > 11/05/16 01:25:01 INFO fs.TestDFSIO: Date & time: Mon May 16 01:25:01 >>> >> > GMT+00:00 2011 >>> >> > >>> >> > 11/05/16 01:25:01 INFO fs.TestDFSIO: Number of files: 10 >>> >> > >>> >> > 11/05/16 01:25:01 INFO fs.TestDFSIO: Total MBytes processed: 10000 >>> >> > >>> >> > 11/05/16 01:25:01 INFO fs.TestDFSIO: Throughput mb/sec: >>> 1487.20999405116 >>> >> > >>> >> > 11/05/16 01:25:01 INFO fs.TestDFSIO: Average IO rate mb/sec: >>> >> > 1525.230712890625 >>> >> > >>> >> > 11/05/16 01:25:01 INFO fs.TestDFSIO: IO rate std deviation: >>> >> > 239.54492784268226 >>> >> > >>> >> >>> > >>> > >>> > >>> > -- >>> > Thanks & Regards >>> > Himanish >>> > >>> >> >> >> >> -- >> Thanks & Regards >> Himanish >> > > > > -- > Thanks & Regards > Himanish >
