Hi, I was going to suggest htop. That would tell you which one is taking so much CPU. Also, can you post the result of 'uptime'?
-- Y On 4 April 2015 at 00:17, Shrinand Javadekar <[email protected]> wrote: > Thanks Clay. > > On Fri, Apr 3, 2015 at 3:03 AM, Clay Gerrard <[email protected]> > wrote: > > On a single node where network transfers are cheaper, and a small object > > size request rate oriented workload - a good load generator should be > able > > to reach cpu limits with enough concurrency. If you're targeting a disk > > saturating throughput oriented workload - larger objects sizes (1-10MB) > is > > the way to go. > > Yes, I am aware of this. But the object sizes may not be in my > control. Therefore, I will have to stick to 256K objects. > > > > > Is the load generator also running on the same box? You should try to > > validate your observations with a well know swift benchmarking tool like > > ssbench. What's your total requests per second? > > Nope, the load generator is running on a separate machine connected to > the Swift instance by a 1G link. > > I want to get as much throughput from Swift as possible. During these > experiments, I have 256 PUTs happening in parallel and a total of > 102400 PUTs. I have seen ~300 Obj/s. But, I'm getting this at the cost > of 100% CPU utilization. > > I am reasonably confident that the benchmarking tool is not at fault > here. We have tested several different object stores with the same > tool and the results there have been consistent with the expectations. > > > > > My profiling in the past has revealed that the md5 checksumming in the > > object server(s) is the largest (but by far not the only) consumer of > cpu - > > all of the other things you mentioned take cpu cycles - tanstaafl. On a > > single node the problem is exasperated per replica - what's your goals? > > I see. I'm using 2 replicas; they're being written to two different disks. > > > > > Are you sure you're saturating all the cores evenly - what's it look like > > with like `htop` - have you tried tuning your worker counts or any other > > other config settings? > > I have set workers to auto. Reducing the workers, esp. the proxy > server worker has resulted in lower throughput. Also, I have set > threads-per-disk in the object server to 4. I experimented with 8, but > didn't see too much difference. Analysis done using sysdig suggests > that CPU is the bottleneck; not disk. > > I'll take a deeper look at this with htop and see what's happening. > > -Shri > > P.S. "tanstaafl": Knew the phrase; but learnt the acronym just now... > Learn something new everyday!! :-). > > > > > -Clay > > > > On Thu, Apr 2, 2015 at 10:12 PM, Shrinand Javadekar > > <[email protected]> wrote: > >> > >> Top shows the CPUs pegged at ~100%. Writes are done by a tool built > >> in-house which is similar in functionality to other object store > >> benchmarking tools. As I mentioned, there are 256 parallel object > >> writes (PUTS), each of 256K bytes. > >> > >> On Thu, Apr 2, 2015 at 9:21 PM, Yogesh Girikumar <[email protected] > > > >> wrote: > >> > Also how are you doing the object writes to benchmark it? Are you > using > >> > dd? > >> > > >> > On 3 April 2015 at 09:50, Yogesh Girikumar <[email protected]> > >> > wrote: > >> >> > >> >> What does top say? > >> >> > >> >> On 3 April 2015 at 02:34, Shrinand Javadekar < > [email protected]> > >> >> wrote: > >> >>> > >> >>> Hi, > >> >>> > >> >>> I have a single node Swift instance. It has 16 cpus, 8 disks and > 64GB > >> >>> memory. As part of testing, I am doing 256 object writes in parallel > >> >>> for ~10 mins. Each object is also 256K bytes in size. > >> >>> > >> >>> While my experiment is running, I see that the CPU utilization of > the > >> >>> box is always ~100%. I am trying to understand what is causing this > >> >>> high CPU utilization. Some of this could be attributed to: > >> >>> > >> >>> 1. MD5 checksum calculation done to verify every PUT. > >> >>> 2. MD5 checksum calculation by the auditor (if it runs during this > >> >>> interval). > >> >>> 3. Hash calculation of the path to decide which partition the object > >> >>> goes > >> >>> to. > >> >>> > >> >>> Are there any other CPU intensive operations happening on the system > >> >>> that I should be aware of? > >> >>> > >> >>> I see that the proxy-server has a "PUT" queue. Is there some > >> >>> processing of the data in this queue? Would simply putting data in > and > >> >>> out of the queue, streaming the data between the proxy and object > >> >>> server use considerable CPU? > >> >>> > >> >>> Thanks in advance. > >> >>> -Shri > >> >>> > >> >>> _______________________________________________ > >> >>> Mailing list: > >> >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack > >> >>> Post to : [email protected] > >> >>> Unsubscribe : > >> >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack > >> >> > >> >> > >> > > >> > >> _______________________________________________ > >> Mailing list: > >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack > >> Post to : [email protected] > >> Unsubscribe : > >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack > > > > > > _______________________________________________ > Mailing list: > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack > Post to : [email protected] > Unsubscribe : > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack >
_______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : [email protected] Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
