I wonder if it makes sense to introduce a set of pseudo-classes on the video/audio elements, each reflecting a state of the media on the controls (playing/paused/error/etc.)? Then, we could use just CSS to style media controls (whether native or custom), and not have to listen to DOM events just to tweak their appearance.
:DG<