I'm currently working on a free software replacement for the non-free mbrola.
The hardest part of building a speech synthesis system is actually the
creation of a voice library. I decided to use human speech recordings instead
of formant synthesis. For me it began in 2011 when I was looking for a
singing synthesizer software. I found many nonfree programs such as Myriad
Virtual Singer, OGI Flinger, Vocaloid and UTAU. As I was unable to find a
free replacement, I decided to write one. In the meanwhile I found out that
some plugins for UTAU are free software, but I still had to replace the
nonfree GUI, which is also trapped by Windows. One existing GPLv3 UTAU plugin
is v.Connect-STAND [1], which is based on WORLD[2]. v.Connect-STAND has a
more natural sound[3] than eCantorix[4], but it is limited to the Japanese
language. I was able to compile it, but I do not know how to use it.
My free program will be based on WORLD, and it will allow speech/singing
synthesis by Collaborative Creation. The algorithms used in WORLD are
described in [5]. I chose a design that makes it possible to be multilingual.
[1] http://hal-the-cat.music.coocan.jp/ritsu_e.html
[2] http://ml.cs.yamanashi.ac.jp/world/english/
[3] https://www.youtube.com/watch?v=to28rvoNYfY
[4] https://github.com/divVerent/ecantorix/wiki/Songs
[5] http://iwk.mdw.ac.at/lit_db_iwk/download.php?id=18114