On 12/09/2021 03:33, Greg Maxwell wrote:
https://speechbot.github.io/resynthesis/
https://ai.facebook.com/blog/textless-nlp-generating-expressive-speech-from-raw-audio/
The 365 bps figure is not totally fairly comparable to more
traditional codecs because they presume a per-speaker speaker
Dear David??
Any new progress about Codec2, the pitch estimator algorithm or the
voiced/unvoiced classification algorithm ?
Best regards.
Hua
--Original--
From:
Thanks??David.
For the old school low bit rate codecs??the quality of the female speech is
much worse than that of the male speech, just because that the fundamental
period of female voices is shorter and it's variation is larger, which makes
the problemof exact estimation of female pitch
>
> -- Original --
> From: "freetel-codec2" ;
> Date: Tue, Sep 14, 2021 06:25 AM
> To: "freetel-codec2";
> Subject: Re: [Freetel-codec2] facebook speech codec at 365bps
>
> On Mon, 2021-09-13 at 07:24 +, Greg Maxwell w
In my opinion, the pitch estimator is the bottleneck of any narrowband vocoder,
no matter the tranditional or the neural-network based.
LPCnet uses the (last-century) RAPT algorithm for the pitch estimation, and if
the pitch value is wrong, the output would be very terrible.
On Tue, Sep 14, 2021 at 07:25:28AM +0930, david wrote:
>...
>
> Another issue to address is robustness to bit errors. In codec 2 I
> avoid inter-frame coding (ie coding differences) to keep some tolerance
> to the high bit error rates. This costs a few bits/s compared to a
> super efficient
On Mon, 2021-09-13 at 07:24 +, Greg Maxwell wrote:
> On Mon, Sep 13, 2021 at 7:05 AM Random via Freetel-codec2
> wrote:
> > Is it speaker-independent ?
>
> It's speaker independent with the additional per-speaker data
> mentioned in my post.
>
That sounds like speaker dependence to me.
I
On Mon, Sep 13, 2021 at 7:05 AM Random via Freetel-codec2
wrote:
> Is it speaker-independent ?
It's speaker independent with the additional per-speaker data
mentioned in my post.
> As to the amateur radio use, I would worry about the computation complexity
> and the hardware requirement.
Is it speaker-independent ?
As to the amateur radio use, I would worry about the computation complexity and
the hardware requirement.
--Original--
From:
https://speechbot.github.io/resynthesis/
https://ai.facebook.com/blog/textless-nlp-generating-expressive-speech-from-raw-audio/
The 365 bps figure is not totally fairly comparable to more
traditional codecs because they presume a per-speaker speaker
embedding is sent once.
This model need not
10 matches
Mail list logo