Carl,

I ran a test today on the Dell R710 physical machine using qpidd linked against Google's tcmalloc (exported LD_PRELOAD=/home/clive/libs/libtcmalloc_minimal.so before running the qpidd process).

When qpid-perftest was executed using it's default values, I saw the publish and consume rates raise from 85000/80000 to 108000/105000 transfers/sec. A significant increase.

Producing QPID's own thread optimized malloc or incorporating an existing third party version into the build might have some merit.

Anyway thought you might like to know.

As an aside I hope to try Intel's TBB next, so will keep you informed on how this performs

Clive

On 03/05/2012 22:56, Carl Trieloff wrote:

I was chatting to Kim about this, this week and I believe we should do
something along these lines (custom memory allocator) for quite a few
reasons.

Carl.


On 05/03/2012 05:42 PM, CLIVE wrote:
Steve,

Just one other thought. On other multi-threaded applications I have
usually found a significant speed up by moving to a more thread
efficient memory allocator like that provided by Intel's Thread
Building Blocks (TBB) or Google's tcmalloc (part of google-perftools)

Is this something that you think might be worth a look, or is QPID
doing something clever already?

Clive

On 03/05/2012 22:04, Steve Huston wrote:
Ok, Clive - thanks very much for the follow-up! Glad you have this
situation in hand now.

-Steve

-----Original Message-----
From: CLIVE [mailto:[email protected]]
Sent: Thursday, May 03, 2012 4:53 PM
To: Steve Huston
Cc: [email protected]; 'James Kirkland'
Subject: Re: QPID performance on virtual machines

Steve,

Managed to run some more performance tests today using a RHEL5u4 VM on
a Dell R710 . Ran qpid-perftest with default values on the same VM as
qpidd,
each test ran several times with the calculated average shown in the
table
below.

CPUs    RAM      Publish    Consume
     2         4G        48K           46K
     4         4G        65K           60K
     6         4G        73K           66K
     2         8G        46K           44K
     4         8G        65K           61K
     6         8G        74K           67K

Basically it confirms your assertion about the broker using more
threads
under heavy load. Changing the VM memory had no discernible effect on
performance, but increasing the number of CPU's available to the VM had
a
big effect on throughput.

So when defining a VM for QPID transient usage focus on CPU
allocation!!!
Thanks for the advise and help.

Clive



On 03/05/2012 15:27, Steve Huston wrote:
Hi Clive,

The broker will use threads based on load - if the broker takes longer
to process a message than qpid-perftest takes to send the next
message, the broker would need more threads.

A more pointed test for broker performance would be to run the client
on another host - then you know the non-VM vs. VM differences are just
the broker's actions. It may be a little confusing weeding out the
actual vs.
virtual NIC issues, but there would be no confusion about how much the
client is taking away from resources available to the broker.

-Steve

-----Original Message-----
From: CLIVE [mailto:[email protected]]
Sent: Wednesday, May 02, 2012 5:28 PM
To: [email protected]
Cc: Steve Huston; 'James Kirkland'
Subject: Re: QPID performance on virtual machines

Steve,

I thought about this as well. So re-started the broker on the
physical
Dell
R710 with the threads option set to just 4 and saw the same
throughput values (85000 publish and 80000 subscribe). As reducing
the threads
count
didn't seem to have much effect on the physical machine I thought
that
this
probably wasn't the issue.

As the qpid-perftest application was only creating 1 producer and 1
consumer
I reasoned that perhaps the broker was only using two threads too
service
the read and writes from these clients. This was why reducing the
thread count on the broker had no effect. Would you expect the broker
to use
more
than two threads to service the clients for this scenario?

I will rerun the test tomorrow based on an increased number of CPU's
in
the
VM(s) just to double check whether it is a number of cores issue.

I did run 'strace -c' on qpidd while the test was running to count
the
number
of system calls and I noted the big hitters were futex and write.
Interestingly the reads read in 64K chunks, but the writes were only
2048 bytes at a time. As a result the number writes occurring were an
order
of magnitude bigger than the reads; I left the detailed results at
work
so
apologies for not quoting the actual figures.

Clive

On 02/05/2012 20:23, Steve Huston wrote:
The qpid broker learns how many CPUs are available and will run more
I/O threads when more CPUs are available (#CPUs + 1 threads). It
would be interesting to see the results if your VM gets more CPUs.

-Steve

-----Original Message-----
From: CLIVE [mailto:[email protected]]
Sent: Wednesday, May 02, 2012 1:30 PM
To: James Kirkland
Cc: [email protected]
Subject: Re: QPID performance on virtual machines

James,

qpid-perf-test (as supplied with the qpid-0.14 source tar ball)
runs a
direct
queue test when executed without any parameters; there is a
command
line option that enables this to be be changed if required.  The
message size
is
1024K (again default size when not explicitly set). And
500000 messages are published by the test (again the default when
not explicitly set). All messages are transient so I wouldn't
expect any
file I/O
overhead to interfere with the test and this is confirmed by the
vmstat results I am seeing. The only jump in the vmstat output is
the number of context switches that are occurring which jumps up
into the
thousands.
Clive

On 02/05/2012 18:10, James Kirkland wrote:
What sort of messging scenario is it?  Are the messages persisted?
How big are they?  If they are persisted are you using virtual
disks or physical devices?

CLIVE wrote:
Hi all,

I have been undertaking some performance profiling of QPID
version
0.14 over the last few weeks and I have found a significant
performance drop off when running QPID in a virtual machine.

As an example if I run qpidd on an 8 core DELL R710 with 36G RAM
(RHEL5u5) and then run qpid-perf-test (on the same machine to
discount any network problems) without any command line
parameters
I am seeing about 85,000 publish transfers/sec and 80000 consume
transfers/sec. If I run the same scenario on a VM (tried both KVM
and VMWare ESXi 4.3 running RHEL5u5) with 2 cores and 8G RAM, I
am seeing only 45000 publish transfers/sec and 40000 consume
transfers/sec. A significant drop off in performance. Looking at
the cpu and memory usage these would not seem to be the limiting
factors as the memory consumption of qpidd stays under 200
MBytes
and its CPU is up at about 150%; hence the two core machine.

I have even run the same test on my Mac Book at home using
VMWare
Fusion 4 ( 2 Core 4G RAM) and see the same 45000/40000
transfers/sec results.

I would expect a small drop off in performance when running in a
VM, but not to the extent that I am seeing.

Has anyone else seen this and if so were they able to get to the
bottom of the issue.

Any help would be appreciated.

Clive Lilley

--
James Kirkland
Principal Enterprise Solutions Architect
3340 Peachtree Road, NE,
Suite 1200
Atlanta, GA 30326 USA.
Phone (404) 254-6457<https://www.google.com/voice#phones>
RHCE Certificate: 805009616436562
--------------------------------------------------------------------
- To unsubscribe, e-mail: [email protected] For
additional commands, e-mail: [email protected]

.

.

.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

.



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to