Re: Parallel compression not using all available CPU

Jasse Jansson Mon, 12 Dec 2016 05:42:07 -0800

They recommend to turn hyperthreading off if you run studio software onyour computer.That's if you run Windows, I have no idea if HT affects a Unixderivative anyway.


On 2016-12-12 07:14, PeerCorps Trust Fund wrote:

Thanks for this.
Is there ever a side case where hyperthreading might haveunpredictable results or should it generally always be left on?
On 12/12/2016 01:37 AM, Matthew Dillon wrote:
That doesn't make any sense.  It sounds like it is just compressing more
slowly, so there is less idle time because the HDD/SSD is able tokeep up
due to it compressing more slowly.  You don't want to turn off
hyperthreading in the BIOS and cache coherency stalls will not showup in
the idle% anyway.

-Matt

On Sun, Dec 11, 2016 at 1:22 PM, PeerCorps Trust Fund <
[email protected]> wrote:
Hi,

It turns out that it was a combination of two things - turning off
hyperthreading in BIOS and using a faster disk.

I found a post from the author of lbzip2 which seems to describe what
might be happening in this case, but reference was made to a userusing an
i5 mobile CPU:
########################################################################
"bzip2 author here. I strongly suspect that you see what you seebecauseyour Intel core i5 is probably only dual core PLUS hyper-threaded,not realquad-core. Meaning, you have two instances of the L2 per-core cache,not
four, and each two hyperthreads share an L2 cache.

Since the bzip2 compression/decompression is very cache sensitive (see
"man bzip2"), the scaling factor will be determined mostly by how many
OS-threads can dispose over a dedicated cache each. In your case this
number is probably 2.
Since you run two threads per core, those contend for the shared L2cache,basically each messing with the other (flushing / invalidating thesharedcache for the other). This contention shows up as double CPU time,because"waiting for cache" (or "waiting for main memory") is accounted foras CPU
time.

Hyperthreading is not useful but detrimental for lbzip2; so you should
export LBZIP2="-n 2". You should not run more worker threads percore than:
core-dedicated-cache-size divided by 8MB."
########################################################################
Running the compression again on the same file from an SSD with
hyperthreading turned off, I was able to fully saturate all of thecoresusing lbzip2. None of this seemed obvious at first, but it rectifiedthe
situation. The biggest difference came from turning off hyperthreading
(idle CPU - 20% vs the previous 90%) and then running from an SSD with
hyperthreading turned off (idle CPU = 0%).

Previously, the compression was run from a single HDD, not an SSD.
Concerning the compression test using the same HDD under FreeBSD,well Idon't know why it was able to saturate the CPU. Perhaps it hassomething todo with ZFS's aggressive caching. Turning that off and re-runningthe testwould likely answer the question. Pixz performed similarly when theabove
two modifications were made.




On 12/11/2016 02:00 AM, Jasse Jansson wrote:
Have you tried to disable hypertreads in the BIOS ???
It's a long shot, I know, but it might help.

On 2016-12-10 22:14, PeerCorps Trust Fund wrote:
Hi,

On both systems HAMMER was used. One small correction concerning the
2c/2t machine, both compression programs did effectively utilizethat CPUwhich had an idle % of 0.0. It is the bigger machine, 16c/32twhere the CPUisn't effectively maxed out. I'll continue to try and investigatewhy and
report back if I find anything.


On 12/10/2016 10:26 PM, Justin Sherrill wrote:
On the two DragonFly systems, was it Hammer or UFS?  I would be
surprised if that made a difference, but it might?

On Sat, Dec 10, 2016 at 6:19 AM, PeerCorps Trust Fund
<[email protected]> wrote:
Hi,
I've observed that parallel compression tools such as pixz andlbzip2
do not
make use of all of the available CPU under Dragonfly. On otherOSes, it
does.

When testing on a 50 gb file, using top I've observed that CPU idle
percentages consistently hover around the 90% range for pixz and~70%
for
lbzip2. These values under FreeBSD and Linux are typically ~0.0%idle
until
compression is complete. Correspondingly, compression takes
significantly
longer under Dragonfly, so the CPU is really being underutilized in
this
case as opposed to erroneous reporting by top.

This was tested on two systems, one 16c/32t and a 2c/2t system on a
recent
master DragonFly v4.7.0.973.g8d7da-DEVELOPMENT #2: Wed Dec 711:44:04
EET
2016.

Has anyone else possibly observed this?

--
Mike

Re: Parallel compression not using all available CPU

Reply via email to