Re: Is ARM64 officially supported ?
memtier is trash. Check the README for mc-crusher, I just updated it a bit a day or two ago. Those numbers are incredibly low, I'd have to dig a laptop out of the 90's to get something to perform that badly. mc-crusher runs blindly and you use the other utilities that come with it to find command rates and sample the latency while the benchmark runs. Almost all 3rd party memcached benchmarks end up benchmarking the benchmark tool, not the server. I know mc-crusher doesn't make it very obvious how to use though, sorry. A really quick untuned test against my raspberry pi 3 nets 92,000 gets/sec. (mc-crusher running on a different machine). On a xeon machine I can get tens of millions of ops/sec depending on the read/write ratio. On Thu, 19 Mar 2020, Martin Grigorov wrote: > Hi > > I've made some local performance testing > > First I tried with https://github.com/memcached/mc-crusher but it seems it > doesn't calculate any statistics after the load runs. > > The results below are from https://github.com/RedisLabs/memtier_benchmark > > 1) Text > ./memtier_benchmark --server XYZ --port 12345 -P memcache_text > > ARM64 text > = > Type Ops/sec Hits/sec Misses/sec Latency KB/sec > - > Sets 985.28 --- --- 20.02700 67.22 > Gets 9842.00 0.00 9842.00 20.01900 248.83 > Waits 0.00 --- --- 0.0 --- > Totals 10827.28 0.00 9842.00 20.02000 316.05 > > > X86 text > = > Type Ops/sec Hits/sec Misses/sec Latency KB/sec > - > Sets 931.04 --- --- 20.06800 63.52 > Gets 9300.21 0.00 9300.21 20.32600 235.13 > Waits 0.00 --- --- 0.0 --- > Totals 10231.26 0.00 9300.21 20.30200 298.66 > > > > 2) Binary > ./memtier_benchmark --server XYZ --port 12345 -P memcache_binary > > ARM64 binary > = > Type Ops/sec Hits/sec Misses/sec Latency KB/sec > - > Sets 829.68 --- --- 23.46500 63.90 > Gets 8287.69 0.00 8287.69 23.56100 314.75 > Waits 0.00 --- --- 0.0 --- > Totals 9117.37 0.00 8287.69 23.55200 378.65 > > X86 binary > = > Type Ops/sec Hits/sec Misses/sec Latency KB/sec > - > Sets 829.32 --- --- 23.63600 63.87 > Gets 8284.10 0.00 8284.10 23.58600 314.61 > Waits 0.00 --- --- 0.0 --- > Totals 9113.42 0.00 8284.10 23.59100 378.48 > > > > Text is faster on the ARM64. Binary is similar for both. > > The benchmarking tool runs on different machine than the ones running > Memcached: > > The ARM64 server has this spec: > > $ lscpu > Architecture: aarch64 > Byte Order: Little Endian > CPU(s): 4 > On-line CPU(s) list: 0-3 > Thread(s) per core: 1 > Core(s) per socket: 4 > Socket(s): 1 > NUMA node(s): 1 > Vendor ID: 0x48 > Model: 0 > Stepping: 0x1 > BogoMIPS: 200.00 > L1d cache: 64K > L1i cache: 64K > L2 cache: 512K > L3 cache: 32768K > NUMA node0 CPU(s): 0-3 > Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp > asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm > > > The x64 one: > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Byte Order: Little Endian > CPU(s): 4 > On-line CPU(s) list: 0-3 > Thread(s) per core: 2 > Core(s) per socket: 2 > Socket(s): 1 > NUMA node(s): 1 > Vendor ID: GenuineIntel > CPU family: 6 > Model: 85 > Model name: Intel(R) Xeon(R) Gold 6266C CPU @ 3.00GHz > Stepping: 7 > CPU MHz: 3000.000 > BogoMIPS: 6000.00 > Hypervisor vendor: KVM > Virtualization type: full > L1d cache: 32K > L1i cache: 32K > L2 cache: 1024K > L3 cache: 30976K > NUMA node0 CPU(s): 0-3 > Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca > cmov pat pse36
Re: Is ARM64 officially supported ?
Hi I've made some local performance testing First I tried with https://github.com/memcached/mc-crusher but it seems it doesn't calculate any statistics after the load runs. The results below are from https://github.com/RedisLabs/memtier_benchmark 1) Text ./memtier_benchmark --server XYZ --port 12345 -P memcache_text ARM64 text = Type Ops/sec Hits/sec Misses/sec Latency KB/sec - Sets 985.28 --- --- 20.0270067.22 Gets 9842.00 0.00 9842.00 20.01900 248.83 Waits 0.00 --- --- 0.0 --- Totals 10827.28 0.00 9842.00 20.02000 316.05 X86 text = Type Ops/sec Hits/sec Misses/sec Latency KB/sec - Sets 931.04 --- --- 20.0680063.52 Gets 9300.21 0.00 9300.21 20.32600 235.13 Waits 0.00 --- --- 0.0 --- Totals 10231.26 0.00 9300.21 20.30200 298.66 2) Binary ./memtier_benchmark --server XYZ --port 12345 -P memcache_binary ARM64 binary = Type Ops/sec Hits/sec Misses/sec Latency KB/sec - Sets 829.68 --- --- 23.4650063.90 Gets 8287.69 0.00 8287.69 23.56100 314.75 Waits 0.00 --- --- 0.0 --- Totals 9117.37 0.00 8287.69 23.55200 378.65 X86 binary = Type Ops/sec Hits/sec Misses/sec Latency KB/sec - Sets 829.32 --- --- 23.6360063.87 Gets 8284.10 0.00 8284.10 23.58600 314.61 Waits 0.00 --- --- 0.0 --- Totals 9113.42 0.00 8284.10 23.59100 378.48 Text is faster on the ARM64. Binary is similar for both. The benchmarking tool runs on different machine than the ones running Memcached: The ARM64 server has this spec: $ lscpu Architecture:aarch64 Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 1 NUMA node(s):1 Vendor ID: 0x48 Model: 0 Stepping:0x1 BogoMIPS:200.00 L1d cache: 64K L1i cache: 64K L2 cache:512K L3 cache:32768K NUMA node0 CPU(s): 0-3 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm The x64 one: Architecture:x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 2 Core(s) per socket: 2 Socket(s): 1 NUMA node(s):1 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Gold 6266C CPU @ 3.00GHz Stepping:7 CPU MHz: 3000.000 BogoMIPS:6000.00 Hypervisor vendor: KVM Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache:1024K L3 cache:30976K NUMA node0 CPU(s): 0-3 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat avx512_vnni md_clear flush_l1d arch_capabilities Both with 16GB RAM. Regards, Martin On Mon, Mar 9, 2020 at 11:23 AM Martin Grigorov wrote: > Hi Dormando, > > On Mon, Mar 9, 2020 at 9:19 AM Martin Grigorov > wrote: > >> Hi Dormando, >> >> On Fri, Mar 6, 2020 at 10:15 PM dormando wrote: >> >>> Yo, >>> >>> Just to add in: yes we support ARM64. Though my build test platform is a >>> raspberry pi 3 and I haven't done any serious performance work. >>> packet.net >>> had an arm test