On 2014-11-05 00:25, Peter Meerwald wrote:

preliminary benchmarking on Intel i5-2400S, 64-bit, Linux 3.13:

running 'paplay --latency-msec=10 stereo_48KHz.wav', output on internal
soundcard (Intel HDA), measuring the maximum CPU% in top for the pulseaudio
and paplay

code             flags          PA       paplay
master 6d1fd4d1  -O2            < 14.0%  < 3.7%
master 6d1fd4d1  -O2 -DNDEBUG   < 13.3%  < 3.3%
proposed v3      -O2             < 8.3%  < 1.3%
proposed v3      -O2 -DNDEBUG    < 7.6%  < 1.3%

Cool stuff!

Seems we can get the low-latency story somewhat better, and perhaps even more so if we can communicate directly with the I/O threads through srbchannels, but that's a different project...

But to focus on 6.0 release; which of these patches have large enough performance impact vs risk of regression to try to squeeze them into 6.0 rc1, and which ones can just be deferred to the -next branch?

Also, is v4 going to be 90 patches...? ;-)


ARMv7 benchmarking soonish


this patch series aims to save memory allocations and some system calls
related to PA's client/server protocol implementation

v3 adds inlining and saves a snd_pcm_avail(), v2 code is largely unchanged
(minibuffers are increased and better used)


patches 1 to 5 ('tagstruct:') introduce a new tagstruct type _APPENDED
which can hold tagstruct data up to a certain size; tagstructs are now
kept in a specific free-list -- this typically replaces two malloc()/free()s
with one flist push()/pop()

patches 6 to 9 ('packet:') make packets fixed-size (typically); packets are
kept in a specific free-list -- this replaces one malloc()/free() with one
flist push()/pop()

patches 10 to 14 ('pstream:') allows to send tagstructs directly to a pstream
without encapsulation in a packet -- this saves one flist push()/pop()

patches 15 and 16 ('pstream') often save a read() call by reading more than
just the descriptor (up to 40 bytes, e.g. description (20 bytes) + shm
info (16 bytes)); the idea is similar to b4342845d, "Optimize write
of smaller packages", but for read -- this trades some extra memcpy() for
a read(); in v3 the buffer size has been increased to 256 bytes

patch 17 ('iochannel') fixes a strange behaviour in iochannel/mainloop that
deleted the input_event with every read which caused a rebuild of the pollfds
for every read()!

patches 18 to 20 ('queue', 'pstream') aim to combine two (v3: or more) write 
items
into one minibuffer by peeking ahead in the send queue

patch 21 stop calling mainloop's defer_enable() after queuing a SHMRELEASE; this
increases the chance that items can be combined (i.e. by patch 20)

patch 22 inlines pa_run_once() as this function came out high in profiling

patches 23 and 24 ('rtpoll') are cleanup

patch 25 ('mainloop') only clears the wakeup pipe when poll() indicates that
the pipe is readable; if the only ready file descriptor is the wakeup pipe,
searching io_events can be avoided

patch 26 and 27 ('flish') removes the volatile annotation and makes flist_elem 
attributes
non-atomic -- needed?

v3 material:

patches 28 to 31 annotates some branches in and saves two rtclock() calls

patch 32 ('resampler') is cleanup

patch 33 ('build-sys') adds --disable-statistics to configure

patches 34 to 37 make several hot functions inlinable; API function in pulse/
do lot of error checking which is unnecessary in the core; worse, checking does 
NOT
go away with NDEBUG

patch 38 ('resampler') precomputes the maximum block size in frames

patches 39 to 42 ('mix) makes functions inlineable and cleanup

patches 43 and 44 makes volume-related function inlineable

patch 45 and 46 ('iochannel', 'asyncmsgq') drop dead code

patch 47 fixes sink_input_pop_cb() to return the entire memchunk (as per 
specification)

patch 48 saves one call to snd_pcm_avail() by computing left_to_play -- this 
patch
has probably THE BIGGEST impact

patches 49 to 51 are cleanup and refactoring


summary:

with these patches typical playback (i.e. after setup) runs without any 
malloc()/free()
thanks to the use of free-lists; the number of memory management operations is 
reduced

many hot function have been made inlineable, redundant checks can be dropped by
compiling with NDEBUG=1

read() and write() syscalls are saved by combining data into minibuffers

one call to snd_pcm_avail() is saved per mmap_write()


Peter Meerwald (51):
   tagstruct: Distinguish pa_tagstruct_new() use cases
   tagstruct: Replace dynamic flag with type
   tagstruct: Get rid of pa_tagstruct_free_data()
   tagstruct: Add type _APPENDED
   tagstruct: Use flist to potentially save calls to malloc()/free()
   packet: Hide internals of pa_packet, introduce pa_packet_data()
   packet: Make pa_packet_new() create fixed-size packets
   packet: Introduce pa_packet_new_data() to copy data into a newly
     created packet
   packet: Use flist to save calls to malloc()/free()
   pstream: Unionize item_info
   pstream: Add pa_pstream_send_tagstruct()
   pstream: #define PA_PSTREAM_SHM_SIZE
   pstream: Duplicate assignment, write.data is always NULL
   pstream: Only reset memchunk if it has been used
   pstream: Split up do_read()
   pstream: Use small minibuffer to combine several read()s if possible
   iochannel: Fix channel enable
   queue: Add pa_queue_peek() function
   pstream: Add helper functions reset_descriptor(), shm_descriptor()
   pstream: Peek into next item on send queue to see if it can be put
     into minibuffer together with current item
   pstream: Don't call defer_enable() on SHMRELEASE
   once: Inline functions
   rtpoll: Fix condition for DEBUG_TIMING output
   rtpoll: Drop extra wait_op argument to pa_rtpoll_run()
   mainloop: Clear wakeup pipe only when necessary
   flist: Don't use atomic operations to manipulate ptr, next
   flist: Don't make flist volatile
   rtpoll: Annotate branches with LIKELY
   mainloop: Annotate branches with LIKELY
   alsa: Make rtpoll_run() runtime measurement compile-time code, default
     off
   alsa: Annotate branches in ALSA sink/source thread_func() with LIKELY
   resampler: Drop pointless remix variable
   build-sys: Add --disable-statistics
   sample: Make pa_sample_size_table public
   sample: Make pa_channels_valid() inlineable
   sample-util: Add inlineable functions
   core: Make use of use inlineable macros
   resampler: Precompute maximum block size in frames
   mix: Make use of pa_cvolume_is_norm/muted() macros
   mix: Avoid redundant cvolume checks
   mix: pa_mix() is always called with more than one steam
   mix: Length over all chunk has already been computed by the caller
   core: Add volume-util.h
   core: Make use of volume macros
   iochannel: Remove unnecessary zero-initialization
   asyncmsgq: Drop weird assert
   protocol-native: Make sink_input_pop_cb() return entire chunk
   alsa-sink: Assume left_to_play can be computed, save one call to
     snd_pcm_avail()
   alsa: Refactor computation of sleep usec
   alsa: Precompute max_frames
   alsa: Remove redundant sample_spec parameter to reset_watermark()
     function

  configure.ac                                 |  13 +-
  src/modules/alsa/alsa-mixer.c                |   4 +-
  src/modules/alsa/alsa-sink.c                 | 187 +++----
  src/modules/alsa/alsa-source.c               | 135 ++---
  src/modules/alsa/alsa-util.c                 |  32 +-
  src/modules/bluetooth/module-bluez4-device.c |   2 +-
  src/modules/bluetooth/module-bluez5-device.c |   2 +-
  src/modules/echo-cancel/module-echo-cancel.c |  42 +-
  src/modules/echo-cancel/webrtc.cc            |  10 +-
  src/modules/module-card-restore.c            |   4 +-
  src/modules/module-combine-sink.c            |   2 +-
  src/modules/module-device-manager.c          |  12 +-
  src/modules/module-device-restore.c          |  16 +-
  src/modules/module-esound-sink.c             |   2 +-
  src/modules/module-null-sink.c               |   2 +-
  src/modules/module-null-source.c             |   2 +-
  src/modules/module-pipe-sink.c               |   2 +-
  src/modules/module-pipe-source.c             |   2 +-
  src/modules/module-sine-source.c             |   2 +-
  src/modules/module-stream-restore.c          |  12 +-
  src/modules/module-tunnel.c                  |  54 +-
  src/modules/oss/module-oss.c                 |   2 +-
  src/modules/raop/module-raop-sink.c          |   2 +-
  src/pulse/context.c                          |  29 +-
  src/pulse/ext-device-manager.c               |  14 +-
  src/pulse/ext-device-restore.c               |  10 +-
  src/pulse/ext-stream-restore.c               |  10 +-
  src/pulse/introspect.c                       |  82 +--
  src/pulse/mainloop.c                         |  70 +--
  src/pulse/sample.c                           |  18 +-
  src/pulse/sample.h                           |   4 +-
  src/pulse/scache.c                           |  10 +-
  src/pulse/stream.c                           |  43 +-
  src/pulse/subscribe.c                        |   2 +-
  src/pulsecore/asyncmsgq.c                    |   2 -
  src/pulsecore/flist.c                        |  14 +-
  src/pulsecore/flist.h                        |   2 +-
  src/pulsecore/iochannel.c                    |  37 +-
  src/pulsecore/memblock.c                     |  15 +
  src/pulsecore/memblockq.c                    |   5 +-
  src/pulsecore/mix.c                          |  42 +-
  src/pulsecore/mix.h                          |   5 +
  src/pulsecore/once.c                         |  18 +-
  src/pulsecore/once.h                         |  25 +-
  src/pulsecore/packet.c                       |  55 +-
  src/pulsecore/packet.h                       |  20 +-
  src/pulsecore/pdispatch.c                    |   9 +-
  src/pulsecore/protocol-native.c              | 162 +++---
  src/pulsecore/pstream-util.c                 |  33 +-
  src/pulsecore/pstream-util.h                 |   2 -
  src/pulsecore/pstream.c                      | 734 +++++++++++++++++----------
  src/pulsecore/pstream.h                      |   2 +
  src/pulsecore/queue.c                        |  11 +
  src/pulsecore/queue.h                        |   3 +
  src/pulsecore/resampler.c                    |  45 +-
  src/pulsecore/resampler.h                    |   3 +-
  src/pulsecore/rtpoll.c                       |  46 +-
  src/pulsecore/rtpoll.h                       |   5 +-
  src/pulsecore/sample-util.c                  |   8 +-
  src/pulsecore/sample-util.h                  |  53 ++
  src/pulsecore/sink-input.c                   |  13 +-
  src/pulsecore/sink.c                         |  23 +-
  src/pulsecore/source-output.c                |   9 +-
  src/pulsecore/source.c                       |  13 +-
  src/pulsecore/tagstruct.c                    |  67 ++-
  src/pulsecore/tagstruct.h                    |   4 +-
  src/pulsecore/volume-util.h                  |  92 ++++
  src/tests/rtpoll-test.c                      |   4 +-
  src/tests/srbchannel-test.c                  |  21 +-
  69 files changed, 1455 insertions(+), 982 deletions(-)
  create mode 100644 src/pulsecore/volume-util.h


--
David Henningsson, Canonical Ltd.
https://launchpad.net/~diwic
_______________________________________________
pulseaudio-discuss mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/pulseaudio-discuss

Reply via email to