The issue with siprec (based on rtpproxy) is that you have only 1 stream containing the voice from caller to callee and callee to caller. So that will give a hard time on the ASR :-). I do know that rtpengine has something similar to siprec but I don't know the details.
Bottom line, in my opinion, you need to have 2 separate streams before you can start STT. wkr, On 17/09/2021 11:04, Mark Allen wrote: > I'm just starting to look at Speech-to-Text (STT) processing for calls > - initially recordings but moving on to real-time. I would see this > working along the lines of either: > > - a call is recorded, and when the call ends an event is triggered to > initiate transcription of the recording > - a call starts, the RTP is forked to the STT engine which sends > real-time transcription > > I can see that with OpenSIPS, the SIPREC and Media Exchange modules > allow for forking of the RTP, providing a means of sending the data > for processing, but is anybody actually doing this? If so, what has > been your experience? Is there a toolset that works well with this > (e.g. IBM Voice Gateway, Google, Amazon etc)? > > _______________________________________________ > Users mailing list > [email protected] > http://lists.opensips.org/cgi-bin/mailman/listinfo/users
_______________________________________________ Users mailing list [email protected] http://lists.opensips.org/cgi-bin/mailman/listinfo/users
