Re: Audio device mmap and kevents
At Tue, 29 Jan 2019 16:06:40 +0900, Tetsuya Isaki wrote: > > looks good to me. Can you test recording? > > I will try it. (I need to setup) I confirmed that GETISPACE at least looks correct with attached sample code (by watching and calulating it manually :) And I also confirmed that mplayer+pulseaudio on netbsd-8 worked even with the patch. (Although, I'm not sure why pulseaudio recording operation affects playback, too.) Anyway, I'll commit GETISPACE part. Thank you for comments. --- Tetsuya Isaki #include #include #include #include #include #include #include int main(int ac, char *av[]) { struct audio_info ai; struct audio_buf_info bufinfo; int r; int afd = open("/dev/audio0", O_RDONLY); if (afd == -1) err(1, "open audio"); r = ioctl(afd, AUDIO_GETBUFINFO, ); if (r == -1) err(1, "AUDIO_GETBUFINFO"); printf("ai.record.buffer_size=%d\n", ai.record.buffer_size); printf("ai.blocksize=%d\n", ai.blocksize); printf("ai.hiwat=%d\n", ai.hiwat); char buf[3000]; // read and discard while (1) { r = read(afd, buf, sizeof(buf)); printf("read(%d bytes)\n", r); if (r != sizeof(buf)) err(1, "read"); r = ioctl(afd, AUDIO_GETBUFINFO, ); if (r == -1) err(1, "GETBUFINFO"); r = ioctl(afd, SNDCTL_DSP_GETISPACE, ); if (r == -1) err(1, "GETISPACE"); printf("record.seek=%d : ", ai.record.seek); printf("fragments=%d fragstotal=%d fragsize=%d bytes=%d\n", bufinfo.fragments, bufinfo.fragstotal, bufinfo.fragsize, bufinfo.bytes); if (bufinfo.bytes < sizeof(buf)) { printf("sleep\n"); sleep(1); continue; } } /* NOTREACHED */ close(afd); return 0; }
Re: Audio device mmap and kevents
At Sun, 27 Jan 2019 13:30:51 - (UTC), Michael van Elst wrote: > >So how about this? > >I confirmed that both mpv and mplayer worked fine on netbsd-8. > ># I didn't test recording, though. > > looks good to me. Can you test recording? I will try it. (I need to setup) I will commit GETOSPACE part separately. Thanks. --- Tetsuya Isaki
Re: Audio device mmap and kevents
is...@pastel-flower.jp (Tetsuya Isaki) writes: >So how about this? >I confirmed that both mpv and mplayer worked fine on netbsd-8. ># I didn't test recording, though. looks good to me. Can you test recording? -- -- Michael van Elst Internet: mlel...@serpens.de "A potential Snark may lurk in every tree."
Re: Audio device mmap and kevents
At Sat, 26 Jan 2019 09:07:02 - (UTC), Michael van Elst wrote: > >I could run "mpv -ao=oss some.mp3" at 0~1% CPU with rev 1.32 > >on netbsd-8. > > >May I commit(revert) it? > > The change to GETOSPACE is wrong, it basically reduces the returned > number of fragments by 1, possibly returning a negative value. The > returned number of bytes is also truncated. > > The change to GETISPACE on the other hand looks correct, but I have > no idea how fixing the record operation would have helped playback. Oh, I didn't see GETISPACE chunk well... For GETOSPACE, rev1.32 is correct. For GETISPACE fragments, rev1.33 is correct. For GETISPACE fragstotal, both rev1.32 and 1.33 are not correct. hiwat is playback-only parameter. For GETISPACE bytes, both 1.32 and 1.33 are not correct. 'bytes' should return the number of bytes (not rounded down) that can be read without blocking. So how about this? I confirmed that both mpv and mplayer worked fine on netbsd-8. # I didn't test recording, though. Index: lib/libossaudio/ossaudio.c === RCS file: /cvsroot/src/lib/libossaudio/ossaudio.c,v retrieving revision 1.33 diff -u -r1.33 ossaudio.c --- lib/libossaudio/ossaudio.c 23 Mar 2017 15:50:48 - 1.33 +++ lib/libossaudio/ossaudio.c 27 Jan 2019 09:21:00 - @@ -411,11 +411,11 @@ return retval; setblocksize(fd, ); bufinfo.fragsize = tmpinfo.blocksize; - bufinfo.fragments = (tmpinfo.hiwat * tmpinfo.blocksize - - (tmpinfo.play.seek + tmpinfo.blocksize -1)) / - tmpinfo.blocksize; + bufinfo.fragments = tmpinfo.hiwat - (tmpinfo.play.seek + + tmpinfo.blocksize - 1) / tmpinfo.blocksize; bufinfo.fragstotal = tmpinfo.hiwat; - bufinfo.bytes = bufinfo.fragments * tmpinfo.blocksize; + bufinfo.bytes = tmpinfo.hiwat * tmpinfo.blocksize + - tmpinfo.play.seek; *(struct audio_buf_info *)argp = bufinfo; break; case SNDCTL_DSP_GETISPACE: @@ -425,8 +425,9 @@ setblocksize(fd, ); bufinfo.fragsize = tmpinfo.blocksize; bufinfo.fragments = tmpinfo.record.seek / tmpinfo.blocksize; - bufinfo.fragstotal = tmpinfo.hiwat; - bufinfo.bytes = bufinfo.fragments * tmpinfo.blocksize; + bufinfo.fragstotal = + tmpinfo.record.buffer_size / tmpinfo.blocksize; + bufinfo.bytes = tmpinfo.record.seek; *(struct audio_buf_info *)argp = bufinfo; break; case SNDCTL_DSP_NONBLOCK: Thanks. --- Tetsuya Isaki
Re: Audio device mmap and kevents
is...@pastel-flower.jp (Tetsuya Isaki) writes: >I could run "mpv -ao=oss some.mp3" at 0~1% CPU with rev 1.32 >on netbsd-8. >May I commit(revert) it? The change to GETOSPACE is wrong, it basically reduces the returned number of fragments by 1, possibly returning a negative value. The returned number of bytes is also truncated. The change to GETISPACE on the other hand looks correct, but I have no idea how fixing the record operation would have helped playback. -- -- Michael van Elst Internet: mlel...@serpens.de "A potential Snark may lurk in every tree."
Re: Audio device mmap and kevents
On Sat, Jan 26, 2019 at 12:38:09PM +0900, Tetsuya Isaki wrote: > At Wed, 23 Jan 2019 16:32:01 +, > co...@sdf.org wrote: > > > the latency issue doesn't matter; > > Using an AMD Ryzen 7 2700X (or: this machine isn't weak): > > > > > mpv --no-video "https://www.youtube.com...; > > PID USERNAME PRI NICE SIZE RES STATE TIME WCPUCPU COMMAND > > 4908 fly 250 299M 41M CPU/0 2:32 98.83% 98.78% mpv > > > > (kern/53028) > > Would you split the PR into "hdaudio default latency too high" > and "mpv spins at 100% CPU playing audio" ? These are completely > different problems. > > For first one (latency problem), I was doubtful about displayed > value at boot time. Anyway, it's too large in either case. I am > rewriting it now. > > For second one (mpv spins at 100% cpu), I could reproduce and > I found src/lib/libossaudio/ossaudio.c rev 1.33 is the cause. > The calculation of GETOSPACE is obviously wrong and rev 1.32 > is correct. > > I think the scenario where mpv was spinning is as follows: > 1) mpv calls ioctl(SNDCTL_DSP_GETOSPACE) to ask how much free >space on write buffer. > 2) Due to this mis-calculation, GETOSPACE may return 0 (means >no writeable space) even writeable. > 3) Then mpv will poll(2) to wait to become writeable. > 4) However poll(2) will return immediately because it's writeable. > 5) goto 1! > > I could run "mpv -ao=oss some.mp3" at 0~1% CPU with rev 1.32 > on netbsd-8. > > May I commit(revert) it? > > According to PR kern/51999, this commit made mplayer work again. > But I doubt it. Unfortunately I don't have environment to play > mplayer now. Can someone confirm this? > > Thanks. > --- > Tetsuya Isaki I can confirm that pulse+mplayer still works with 1.33 reverted, and that the 100% cpu usage is gone. Thanks!
Re: Audio device mmap and kevents
At Wed, 23 Jan 2019 16:32:01 +, co...@sdf.org wrote: > > the latency issue doesn't matter; > Using an AMD Ryzen 7 2700X (or: this machine isn't weak): > > > mpv --no-video "https://www.youtube.com...; > PID USERNAME PRI NICE SIZE RES STATE TIME WCPUCPU COMMAND > 4908 fly 250 299M 41M CPU/0 2:32 98.83% 98.78% mpv > > (kern/53028) Would you split the PR into "hdaudio default latency too high" and "mpv spins at 100% CPU playing audio" ? These are completely different problems. For first one (latency problem), I was doubtful about displayed value at boot time. Anyway, it's too large in either case. I am rewriting it now. For second one (mpv spins at 100% cpu), I could reproduce and I found src/lib/libossaudio/ossaudio.c rev 1.33 is the cause. The calculation of GETOSPACE is obviously wrong and rev 1.32 is correct. I think the scenario where mpv was spinning is as follows: 1) mpv calls ioctl(SNDCTL_DSP_GETOSPACE) to ask how much free space on write buffer. 2) Due to this mis-calculation, GETOSPACE may return 0 (means no writeable space) even writeable. 3) Then mpv will poll(2) to wait to become writeable. 4) However poll(2) will return immediately because it's writeable. 5) goto 1! I could run "mpv -ao=oss some.mp3" at 0~1% CPU with rev 1.32 on netbsd-8. May I commit(revert) it? According to PR kern/51999, this commit made mplayer work again. But I doubt it. Unfortunately I don't have environment to play mplayer now. Can someone confirm this? Thanks. --- Tetsuya Isaki
Re: Audio device mmap and kevents
What I wanted to say is that mmap may be faster from such point of view but that difference may not make observable impact especially for application like audioplay(1). For audioplay(1), yes, I agree. That is an example of an application for which the performance differences are minor enough to be easily outweighed by the convenience. I doubt anyone would argue that the write() interface should go away. I agree too but I needed to implement mmap use (userspace) somewhere to make my point. I think audioplay is still a right choice (even if it does not improve it) because it shows how to implement it as it it always included in netbsd. I personnally never use audioplay to play audio..
Re: Audio device mmap and kevents
> the latency issue doesn't matter; Using an AMD Ryzen 7 2700X (or: this machine isn't weak): > mpv --no-video "https://www.youtube.com...; PID USERNAME PRI NICE SIZE RES STATE TIME WCPUCPU COMMAND 4908 fly 250 299M 41M CPU/0 2:32 98.83% 98.78% mpv (kern/53028)
Re: Audio device mmap and kevents
>> (a) It's the rare port on which copying to mmap()ped memory is no >> faster than a user/kernel crossing plus a copyin. > What I wanted to say is that mmap may be faster from such point of > view but that difference may not make observable impact especially > for application like audioplay(1). For audioplay(1), yes, I agree. That is an example of an application for which the performance differences are minor enough to be easily outweighed by the convenience. I doubt anyone would argue that the write() interface should go away. >> Writing into an mmap()ped ring buffer takes advantage of the speed >> differential and fixes the latency issue without requiring real-time >> behaviour out of userland. > Please let me confirm. Current audio mmap no longer points [to] the > hardware buffer. Is our understanding the same? I don't know; it's been years since I looked at more than the occasional file or two from -current, and I have never looked at the mmap()ped-audio interfaces in detail in any version. But whether the mapped buffer is the one the hardware is reading from or whether it's the source for pseudo-DMA driven off an interrupt or what is close to irrelevant; getting the bits into the kernel machinery via mmap rather than write still provides both speed and latency advantages - just not as much as if it were the hardware's output buffer being mapped into user VM. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: Audio device mmap and kevents
At Mon, 21 Jan 2019 17:05:38 -0500 (EST), Mouse wrote: > > [...], mmap'ing audio device does not lead to improve any performance > > or latency. > > Then something is wrong. > > (a) It's the rare port on which copying to mmap()ped memory is no > faster than a user/kernel crossing plus a copyin. I understand it. What I wanted to say is that mmap may be faster from such point of view but that difference may not make observable impact especially for application like audioplay(1). > Of course, if you're just playing a canned audio data stream and doing > nothing else, the latency issue doesn't matter; audioplay(1) is just such application. I want to keep it simple as possible. > Writing into an mmap()ped ring buffer takes advantage of the speed > differential and fixes the latency issue without requiring real-time > behaviour out of userland. Please let me confirm. Current audio mmap no longer points the hardware buffer. Is our understanding the same? Thanks. --- Tetsuya Isaki
Re: Audio device mmap and kevents
Of course, if you're just playing a canned audio data stream and doing nothing else, the latency issue doesn't matter; you can have the kernel do lots of buffering and you're happy. But things such as games that want to play audio but also want to be able make that audio react very fast (for human values of "very fast") to unpredictable events may be willing to accept the inconvenience of an mmap()ped ring buffer for the sake of getting both lots of buffering and low latency. live coding
Re: Audio device mmap and kevents
> It's better to use write(2) for this purpose. Often, yes, but not, I think, always. > [...], mmap'ing audio device does not lead to improve any performance > or latency. Then something is wrong. (a) It's the rare port on which copying to mmap()ped memory is no faster than a user/kernel crossing plus a copyin. (b) Once bytes are write()ten, userland cannot change them. If the kernel does a lot of buffering, userland has to know the data it wants played a significant time in advance; if the kernel does little buffering, userland has a relatively hard real-time constraint in that it has to write() on a tight schedule or output audio drops out. Writing into an mmap()ped ring buffer takes advantage of the speed differential and fixes the latency issue without requiring real-time behaviour out of userland. (The API would need fixing, though; having to do a syscall to find out where the audio is playing defeats at least some of the performance advantage. Those numbers should appear in the mmap()ped buffer, or perhaps a different mmap()ped buffer.) Of course, if you're just playing a canned audio data stream and doing nothing else, the latency issue doesn't matter; you can have the kernel do lots of buffering and you're happy. But things such as games that want to play audio but also want to be able make that audio react very fast (for human values of "very fast") to unpredictable events may be willing to accept the inconvenience of an mmap()ped ring buffer for the sake of getting both lots of buffering and low latency. Also, if you have a repeating audio track, then at least in principle you can just copy it into the ring buffer and forget it, rather than having to constantly write() - though, of course, this depends on your repeat length dividing the ring buffer size. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: Audio device mmap and kevents
mmap(2) may reduce copies between user space and the kernel compared to write(2). Generally speaking, yes. But your patch seems to increase memcpy(3) instead of decreasing write(2) system call. In audioplay, well, if the audio device is mmapped 1 call to write is replaced with 1 call to memcpy. This is perfectly normal. write(2) probably uses memcpy(3)? As you say, there's less copies between user space and kernel. In user space: no more no less. You are right in the situation where the audio file cannot be mmapped and the audio device can. Here I do read in an intermediate buffer, which is useless. There is clearly work to be done.
Re: Audio device mmap and kevents
First of all, I know I should have work best on my proposal but I had to post that now because I probably wouldn't have time for a while. Remenber that I did not implement mmap in audio.C, it was there! I just think that it is better in this case for the kernel to "pull" the data, Just as it does when the client uses write, it blocks until it needs data. I don't know about that but I find that hard to believe. Also in audio_system(9): With regard to mmapped audio, blocks are played back immediately so the latency presented to applications is one third of the latency sysctl(8) value. Here I trust manuals, I might take the time to understand how.. mmap(2) may reduce copies between user space and the kernel compared to write(2). Generally speaking, yes. But your patch seems to increase memcpy(3) instead of decreasing write(2) system call. In audioplay, well, if the audio device is mmapped 1 call to write is replaced with 1 call to memcpy. This is perfectly normal. write(2) probably uses memcpy(3)? As you say, there's less copies between user space and kernel. In user space: no more no less. You need to consider whether the performance improvement is really necessary or not. And then if yes, you need to measure it before talking. I don't think that I agree on that. I would say: "You need to consider whether the performance improvement is real." And you are right, it need to be measured, but I don't know how to do that. However, even if that is only for the latency, that worth the cost. Of course it need more cooperation from the client but I think this is always the case with mmap, am I wrong? trade performances against ease. This article is not for NetBSD but good to read about mmap. http://manuals.opensound.com/developer/mmap.html Even if that is only for specific applications. I would not have watched this without latency reduction specified in the manuals and if mmap was not already implemented. And it has to store the number of bytes writeable to kn->kn_data, not offset. See kqueue(4). audio should follow it even mmap case. EVFILT_WRITE Takes a descriptor as the identifier, and returns whenever it is possible to write to the descriptor. For sockets, pipes, fifos, and ttys, data will contain the amount of space remaining in the write buffer. I agree, I wanted to try it quick. I think I can return space instead of offset (by adding a 'last_space_returned' and assuming that at every new call to kevent, the client filled all the free space, what do you think?). And one more point: what do you do in such case? time1: Since kevent() wokes up, you were able to know the potision to write next in mmap area. time2: Copy next data to this position got in time1. time1.5: audio HW intterupt occurred. It advances the position in kernel. time3: the next kevent call returns the position (time1.5) Am I wrong? > # Although I don't know whether it's the optimal way or not. > And emulated flag has no meaning after 8.0. From audio(4): Only encodings that are not emulated (i.e. where AUDIO_ENCODINGFLAG_EMULATED is not set) work properly for a mapped device. Ah... The meaning of this flag was changed on 8.0. So this sentence itself about mmap seems not wrong. --- Tetsuya Isaki Thank you.
Re: Audio device mmap and kevents
At Sat, 19 Jan 2019 16:08:37 +0100, yarl-bau...@mailoo.org wrote: > > At first, mmap'ing audio device does not lead to improve any > > performance or latency. > From audio(4): > For audio applications that mmap(2) the audio device for play back > the > resultant latency is a third (1/3) of the value of the > hw.driverN.latency > variable. > > This is for the latency, but maybe the manual is outdated? I don't know about that but I find that hard to believe. > For the performance, while I'm not experienced, I think that mmap is > faster, less copies, no? mmap(2) may reduce copies between user space and the kernel compared to write(2). Generally speaking, yes. But your patch seems to increase memcpy(3) instead of decreasing write(2) system call. You need to consider whether the performance improvement is really necessary or not. And then if yes, you need to measure it before talking. This article is not for NetBSD but good to read about mmap. http://manuals.opensound.com/developer/mmap.html > > And it has to store the number of bytes writeable to kn->kn_data, > > not offset. > That's too bad but why not. See kqueue(4). audio should follow it even mmap case. EVFILT_WRITE Takes a descriptor as the identifier, and returns whenever it is possible to write to the descriptor. For sockets, pipes, fifos, and ttys, data will contain the amount of space remaining in the write buffer. And one more point: what do you do in such case? time1: Since kevent() wokes up, you were able to know the potision to write next in mmap area. time2: Copy next data to this position got in time1. time1.5: audio HW intterupt occurred. It advances the position in kernel. > > # Although I don't know whether it's the optimal way or not. > > And emulated flag has no meaning after 8.0. > From audio(4): > Only encodings that are not emulated (i.e. where > AUDIO_ENCODINGFLAG_EMULATED is not set) work properly for a mapped > device. Ah... The meaning of this flag was changed on 8.0. So this sentence itself about mmap seems not wrong. --- Tetsuya Isaki
re: Audio device mmap and kevents
i will have a look at the audioplay changes in the next week. .mrg.
Re: Audio device mmap and kevents
At first, mmap'ing audio device does not lead to improve any performance or latency. From audio(4): For audio applications that mmap(2) the audio device for play back the resultant latency is a third (1/3) of the value of the hw.driverN.latency variable. This is for the latency, but maybe the manual is outdated? For the performance, while I'm not experienced, I think that mmap is faster, less copies, no? And I think that write(2) already has the optimal synchronization mechanism especially for stream-type data like audio. You don't have to monitor the kernel event. You don't have to calculate the circular buffer's offset. mmap is harder to use than write, this is usually the case but you get better performances. And it has to store the number of bytes writeable to kn->kn_data, not offset. That's too bad but why not. It is already implemented as "audioctl encodings" command. Ok, I didn't know. # Although I don't know whether it's the optimal way or not. And emulated flag has no meaning after 8.0. From audio(4): Only encodings that are not emulated (i.e. where AUDIO_ENCODINGFLAG_EMULATED is not set) work properly for a mapped device. Merging play() and play_fd() makes it harder to read. I do not agree.. It was necessary for me to add mmap support. Less redundancies and better structure, I think.
Re: Audio device mmap and kevents
It's better to use write(2) for this purpose. At first, mmap'ing audio device does not lead to improve any performance or latency. And I think that write(2) already has the optimal synchronization mechanism especially for stream-type data like audio. You don't have to monitor the kernel event. You don't have to calculate the circular buffer's offset. There are comments below. At Fri, 18 Jan 2019 20:32:45 +0100, yarl-bau...@mailoo.org wrote: > Hello, > > I would like to propose a small improvement to the audio system. > I think that it is very interesting to be able to mmap the audio device > for better > performance and smaller latency but It seems neither clean nor optimal > to use > AUDIO_GETOOFFS ioctl and sleep to synchronize. > > Why don't we use kernel events? > > Something like audio.patch for the kernel. > I also implemented that use in audioplay and tried to clean a bit the > code in audioplay.patch. > > There is certainly more work to be done but I have to ask now. What do > you think? > > Thanks. : > --- a/sys/dev/audio.c > +++ b/sys/dev/audio.c > @@ -3463,11 +3466,21 @@ filt_audiowrite(struct knote *kn, long hint) > mutex_enter(sc->sc_intr_lock); > > stream = chan->vc->sc_pustream; > - kn->kn_data = (stream->end - stream->start) > - - audio_stream_get_used(stream); > + if (vc->sc_mpr.mmapped) { > + off_t offset; > + offset = stream->outp - stream->start > + + vc->sc_mpr.blksize; > + if (stream->start + offset >= stream->end) > + offset = 0; > + kn->kn_data = offset; > + } else { > + kn->kn_data = (stream->end - stream->start) > + - audio_stream_get_used(stream); > + rv = kn->kn_data > 0; > + } > mutex_exit(sc->sc_intr_lock); > > - return kn->kn_data > 0; > + return rv; > } filt_audiowrite() has to return zero if it's not writeable. And it has to store the number of bytes writeable to kn->kn_data, not offset. > --- a/usr.bin/audio/play/audioplay.1 > +++ b/usr.bin/audio/play/audioplay.1 > @@ -118,6 +118,12 @@ sample rate. > Print a help message. > .It Fl i > If the audio device cannot be opened, exit now rather than wait for it. > +.It Fl l > +List the audio encodings supported by the device and if it emulated > +or not. Useful to know what format to use when using It is already implemented as "audioctl encodings" command. # Although I don't know whether it's the optimal way or not. And emulated flag has no meaning after 8.0. > --- a/usr.bin/audio/play/play.c > +++ b/usr.bin/audio/play/play.c > -/* > - * play the file on the file descriptor fd > - */ > -static void > -play_fd(const char *file, int fd) > -{ Merging play() and play_fd() makes it harder to read. Thanks. --- Tetsuya Isaki