Re: [sugar] An Update about Speech Synthesis
On Feb 19, 2008 4:45 PM, Samuel Klein [EMAIL PROTECTED] wrote: Hemant and James, Can you write something about this at a [[spoken texts]] page on the wiki ('hear and read'? some other more creative name... )? The Google Literacy Project is highlighting a number of literacy efforts for the upcoming World Book Day, and your work would be fine suggestions for that list. You can use my article, http://www.olpcnews.com/content/ebooks/effective_adult_literacy_program.html. Let me know when you have something, and I'll drop in on the Wiki page and see if I can add anything useful to your account. SJ On Feb 19, 2008 1:13 PM, Hemant Goyal [EMAIL PROTECTED] wrote: Hi, I'd like to see an eSpeak literacy project written up -- Once we have a play button, with text highlighting, we have most of the pieces to make a great read + speak platform that can load in texts and highlight words/sentences as they are being read. Ping had a nice mental model for this a while back. Great idea :). The button will soon be there :D. I had never expected this to turn into something this big :). There are lots of things I want to get done wrt this project and hope to accomplish them one by one. Thanks for the info Hemant! Can you tell me more about your experiences with speech dispatcher and which version you are using? The things I'm interested in are stability, ease of configuration, completeness of implementation, etc. I'll try to tell whatever I am capable of explaining (I am not an expert like you all :) ). Well we had initially started out with a speech-synthesis DBUS API that directly connected to eSpeak. Those results are available on the wiki page [http://wiki.laptop.org/go/Screen_Reader]. From that point onwards we found out about speech-dispatcher and decided to analyze it for our requirements primarily keeping the following things in mind: An API that provided configuration control on a per-client basis. a feature like printf() but for speech for developers to call, and thats precisely how Free(b)soft described their approach to speech-dispatcher. Python Interface for speech-synthesis Callbacks for developers after certain events. At this moment I am in a position to comment about the following: WRT which modules to use -I found it extremely easy to configure speech-dispatcher to use eSpeak as a TTS engine. There are configuration files available to simply select/unselect which TTS module needs to be used. I have described how an older version of speech-dispatcher can be made to run on the XO here http://wiki.laptop.org/go/Screen_Reader#Installing_speech-dispatcher_on_the_xo There were major issues of using eSpeak with the ALSA Sound system some time back [http://dev.laptop.org/ticket/5769, http://dev.laptop.org/ticket/4002]. This issue is resolved by using speech-dispatcher as it supports ALSA, and OSS. So in case OLPC ever shifts to OSS we are safe. I am guessing speech-dispatcher does not directly let a TTS engine write to a sound device but instead accepts the audio buffer and then routes it to the Audio Sub System. Another major issue we had to tackle was providing callbacks while providing the DBUS interface. The present implementation of speech-dispatcher provides callbacks for various events that are important wrt speech-synthesis. I have tested these out in python and they were working quite nicely. In case you have not, you might be interested in checking out their Python API [http://cvs.freebsoft.org/repository/speechd/src/python/speechd/client.py?hideattic=0view=markup]. Voice Configuration and language selection - The API provides us options to control voice parameters such as pitch, volume, voice etc for each client. Message Priorities and Queuing - speech-dispatcher has provided various levels of priority for speech synthesis, so we cand place a Higher Priority to a message played by Sugar as compared to an Activity. Compatibility with orca - I installed orca and used speech-dispatcher as the speech synth engine. It worked fine. We wanted to make sure that the speech synth server would work with orca if it was ported to XO in the future. Documentation - speech-dispatcher has a lot of documentation at the moment, and hence its quite easy to find our way and figure out how to do things we really want to. I had intended to explore gnome-speech as well, however the lack of documentation and examples turned me away. The analysis that I did was mostly from a user point of view or simple developer requirements that we realized had to be fulfilled wrt speech-synthesis, and it was definitely not as detailed as you probably might expect from me. We are presently using speech-dispatcher 0.6.6 A dedicated eSpeak module has been provided in the newer versions of speech-dispatcher and that is a big advantage for us. In the older version
Re: An Update about Speech Synthesis
Hi, I'd like to see an eSpeak literacy project written up -- Once we have a play button, with text highlighting, we have most of the pieces to make a great read + speak platform that can load in texts and highlight words/sentences as they are being read. Ping had a nice mental model for this a while back. Great idea :). The button will soon be there :D. I had never expected this to turn into something this big :). There are lots of things I want to get done wrt this project and hope to accomplish them one by one. Thanks for the info Hemant! Can you tell me more about your experiences with speech dispatcher and which version you are using? The things I'm interested in are stability, ease of configuration, completeness of implementation, etc. I'll try to tell whatever I am capable of explaining (I am not an expert like you all :) ). Well we had initially started out with a speech-synthesis DBUS API that directly connected to eSpeak. Those results are available on the wiki page [http://wiki.laptop.org/go/Screen_Reader]. From that point onwards we found out about speech-dispatcher and decided to analyze it for our requirements primarily keeping the following things in mind: 1. An API that provided configuration control on a per-client basis. 2. a feature like printf() but for speech for developers to call, and thats precisely how Free(b)soft described their approach to speech-dispatcher. 3. Python Interface for speech-synthesis 4. Callbacks for developers after certain events. At this moment I am in a position to comment about the following: 1. WRT which modules to use -I found it extremely easy to configure speech-dispatcher to use eSpeak as a TTS engine. There are configuration files available to simply select/unselect which TTS module needs to be used. I have described how an older version of speech-dispatcher can be made to run on the XO here http://wiki.laptop.org/go/Screen_Reader#Installing_speech-dispatcher_on_the_xo 2. There were major issues of using eSpeak with the ALSA Sound system some time back [http://dev.laptop.org/ticket/5769, http://dev.laptop.org/ticket/4002]. This issue is resolved by using speech-dispatcher as it supports ALSA, and OSS. So in case OLPC ever shifts to OSS we are safe. I am guessing speech-dispatcher does not directly let a TTS engine write to a sound device but instead accepts the audio buffer and then routes it to the Audio Sub System. 3. Another major issue we had to tackle was providing callbacks while providing the DBUS interface. The present implementation of speech-dispatcher provides callbacks for various events that are important wrt speech-synthesis. I have tested these out in python and they were working quite nicely. In case you have not, you might be interested in checking out their Python API [ http://cvs.freebsoft.org/repository/speechd/src/python/speechd/client.py?hideattic=0view=markup ]. 4. Voice Configuration and language selection - The API provides us options to control voice parameters such as pitch, volume, voice etc for each client. 5. Message Priorities and Queuing - speech-dispatcher has provided various levels of priority for speech synthesis, so we cand place a Higher Priority to a message played by Sugar as compared to an Activity. 6. Compatibility with orca - I installed orca and used speech-dispatcher as the speech synth engine. It worked fine. We wanted to make sure that the speech synth server would work with orca if it was ported to XO in the future. 7. Documentation - speech-dispatcher has a lot of documentation at the moment, and hence its quite easy to find our way and figure out how to do things we really want to. I had intended to explore gnome-speech as well, however the lack of documentation and examples turned me away. The analysis that I did was mostly from a user point of view or simple developer requirements that we realized had to be fulfilled wrt speech-synthesis, and it was definitely not as detailed as you probably might expect from me. We are presently using speech-dispatcher 0.6.6 A dedicated eSpeak module has been provided in the newer versions of speech-dispatcher and that is a big advantage for us. In the older version eSpeak was called and various parameters were passed as command line arguments, it surely was not very efficient wrt XO. Stability - I think the main point that I tested here was how well speech-dispatcher responds to long strings. The latest release of speech-dispatcher 0.6.6 has some tests in which an entire story is read out [ http://cvs.freebsoft.org/repository/speechd/src/tests/long_message.c?view=markup]. However I still need to run this test on the XO. I will do so once I have RPM packages to install on the XO. In particular speech-dispatcher is quite customizable, easily controlled through programming languages, provides callback support, and has
Re: An Update about Speech Synthesis
Hemant and James, Can you write something about this at a [[spoken texts]] page on the wiki ('hear and read'? some other more creative name... )? The Google Literacy Project is highlighting a number of literacy efforts for the upcoming World Book Day, and your work would be fine suggestions for that list. SJ On Feb 19, 2008 1:13 PM, Hemant Goyal [EMAIL PROTECTED] wrote: Hi, I'd like to see an eSpeak literacy project written up -- Once we have a play button, with text highlighting, we have most of the pieces to make a great read + speak platform that can load in texts and highlight words/sentences as they are being read. Ping had a nice mental model for this a while back. Great idea :). The button will soon be there :D. I had never expected this to turn into something this big :). There are lots of things I want to get done wrt this project and hope to accomplish them one by one. Thanks for the info Hemant! Can you tell me more about your experiences with speech dispatcher and which version you are using? The things I'm interested in are stability, ease of configuration, completeness of implementation, etc. I'll try to tell whatever I am capable of explaining (I am not an expert like you all :) ). Well we had initially started out with a speech-synthesis DBUS API that directly connected to eSpeak. Those results are available on the wiki page [http://wiki.laptop.org/go/Screen_Reader]. From that point onwards we found out about speech-dispatcher and decided to analyze it for our requirements primarily keeping the following things in mind: An API that provided configuration control on a per-client basis. a feature like printf() but for speech for developers to call, and thats precisely how Free(b)soft described their approach to speech-dispatcher. Python Interface for speech-synthesis Callbacks for developers after certain events. At this moment I am in a position to comment about the following: WRT which modules to use -I found it extremely easy to configure speech-dispatcher to use eSpeak as a TTS engine. There are configuration files available to simply select/unselect which TTS module needs to be used. I have described how an older version of speech-dispatcher can be made to run on the XO here http://wiki.laptop.org/go/Screen_Reader#Installing_speech-dispatcher_on_the_xo There were major issues of using eSpeak with the ALSA Sound system some time back [http://dev.laptop.org/ticket/5769, http://dev.laptop.org/ticket/4002]. This issue is resolved by using speech-dispatcher as it supports ALSA, and OSS. So in case OLPC ever shifts to OSS we are safe. I am guessing speech-dispatcher does not directly let a TTS engine write to a sound device but instead accepts the audio buffer and then routes it to the Audio Sub System. Another major issue we had to tackle was providing callbacks while providing the DBUS interface. The present implementation of speech-dispatcher provides callbacks for various events that are important wrt speech-synthesis. I have tested these out in python and they were working quite nicely. In case you have not, you might be interested in checking out their Python API [http://cvs.freebsoft.org/repository/speechd/src/python/speechd/client.py?hideattic=0view=markup]. Voice Configuration and language selection - The API provides us options to control voice parameters such as pitch, volume, voice etc for each client. Message Priorities and Queuing - speech-dispatcher has provided various levels of priority for speech synthesis, so we cand place a Higher Priority to a message played by Sugar as compared to an Activity. Compatibility with orca - I installed orca and used speech-dispatcher as the speech synth engine. It worked fine. We wanted to make sure that the speech synth server would work with orca if it was ported to XO in the future. Documentation - speech-dispatcher has a lot of documentation at the moment, and hence its quite easy to find our way and figure out how to do things we really want to. I had intended to explore gnome-speech as well, however the lack of documentation and examples turned me away. The analysis that I did was mostly from a user point of view or simple developer requirements that we realized had to be fulfilled wrt speech-synthesis, and it was definitely not as detailed as you probably might expect from me. We are presently using speech-dispatcher 0.6.6 A dedicated eSpeak module has been provided in the newer versions of speech-dispatcher and that is a big advantage for us. In the older version eSpeak was called and various parameters were passed as command line arguments, it surely was not very efficient wrt XO. Stability - I think the main point that I tested here was how well speech-dispatcher responds to long strings. The latest release of speech-dispatcher 0.6.6 has some tests in which an entire story is read out
An Update about Speech Synthesis for Sugar
Hi, It s great to see many other developers sharing the idea we have been trying to implement right within the Sugar Environment. We have been working on integrating speech-synthesis into Sugar for quite some time now. You can check out our ideas here : http://wiki.laptop.org/go/Screen_Reader We are also documenting all our ideas and requirements with respect to Speech Synthesis in this Requirements Analysis Document here : http://www.nsitonline.in/hemant/stuff/Speech%20Synthesis%20on%20XO%20-%20Requirements%20Analysis%20v0.3.5.pdf It outlines some of our immediate as well as long term goals wrt speech-synthesis on the XO. Your ideas, comments and suggestions are welcome. I'd like to update the list about our progress: 1. speech-dispatcher has been selected as a speech synthesis server which will accept all incoming speech synthesis requests from any sugar activity (example: Talk N Type, Speak etc) 2. speech-dispatcher provides a very simple to use API and client specific configuration management. So whats causing the delays? 1. speech-dispatcher is not packaged as an RPM for Fedora, so at present I am mostly making a RPM package so that it can be accepted by the Fedora community and ultimately be dropped into the OLPC Builds. You can track the progress here : https://bugzilla.redhat.com/show_bug.cgi?id=432259 I am not an expert at RPM packaging and hence its taking some time at my end. I'd welcome anyone to assist me and help speed up the process. 2. dotconf packages which speech-dispatcher is being packaged by my team mate Assim. You can check its progress here : https://bugzilla.redhat.com/show_bug.cgi?id=433253 Some immediate tasks that we plan to carry out once speech-dispatcher is packaged and dropped into the OLPC builds are : 1. Provide the much needed play button, with text highlight features as discussed by Edward. 2. Port an AI Chatbot to the XO and hack it enough to make it speak to the child :). 3. Encourage other developers to make use of speech-synthesis to make their activities as lively and child friendly as possible :) 4. Explore orca and other issues to make the XO more friendly for blind/low-vision students @James : We envision that speech-synthesis will surely get integrated with Read in due time. I think it would be great if maybe Gutenberg text could be loaded right from Read only? I was not planning on anything so fancy. Basically, I was frustrated that I had a device that would be wonderfully suited to reading Gutenberg etexts and no suitable program to do it with. I have written such an Activity and am putting the finishing touches on it. As I see it, the selling points of the Activity will be that it can display etexts one page at a time in a readable proportional font and remember what page you were on when you resume the activity. The child can find his book using the Gutenberg site, save the Zip file version to the Journal, rename it, resume it, and start reading. It will also be good sample code for new Activity developers to look at, even children, because it is easy to understand yet it does something that is actually useful. I have written another Activity which lets you browse through a bunch of image files stored in a Zip file, and it also would be good sample code for a new developer, as well as being useful. Warm Regards, Hemant ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: [sugar] An Update about Speech Synthesis for Sugar
On Feb 18, 2008 6:22 AM, Hemant Goyal [EMAIL PROTECTED] wrote: Hi, It s great to see many other developers sharing the idea we have been trying to implement right within the Sugar Environment. Yes, thanks to all. We have been working on integrating speech-synthesis into Sugar for quite some time now. You can check out our ideas here : http://wiki.laptop.org/go/Screen_Reader We are also documenting all our ideas and requirements with respect to Speech Synthesis in this Requirements Analysis Document here : http://www.nsitonline.in/hemant/stuff/Speech%20Synthesis%20on%20XO%20-%20Requirements%20Analysis%20v0.3.5.pdf It outlines some of our immediate as well as long term goals wrt speech-synthesis on the XO. Your ideas, comments and suggestions are welcome. I'd like to update the list about our progress: speech-dispatcher has been selected as a speech synthesis server which will accept all incoming speech synthesis requests from any sugar activity (example: Talk N Type, Speak etc) speech-dispatcher provides a very simple to use API and client specific configuration management.So whats causing the delays? I have a few questions. Let's see what the InterWebs tell us. * How many languages does speech-dispatcher support? http://www.freebsoft.org/doc/speechd/speech-dispatcher_5.html#SEC8 SD works with Festival http://www.cstr.ed.ac.uk/projects/festival/ English, Czech, Italian, Spanish, Russian, Polish... What is the mechanism for adding additional languages? Phoneset recording, dictionary, and what? http://www.freebsoft.org/doc/speechd/speech-dispatcher_23.html#SEC82 Develop new voices and language definitions for Festival: In the world of Free Software, currently Festival is the most promising interface for Text-to-Speech processing and speech synthesis. It's an extensible and highly configurable platform for developing synthetic voices. If there is a lack of synthetic voices or no voices at all for some language, we believe the wisest solution is to try to develop a voice in Festival. It's certainly not advisable to develop your own synthesizer if the goal is producing a quality voice system in a reasonable time. Festival developers provide nice documentation about how to develop a voice and a lot of tools that help doing this. We found that some language definitions can be constructed by canibalizing the already existing definitions and can be tuned later. As for the voice samples, one can temporarily use the MBROLA project voices. But please note that, although they are downloadable for free (as price), they are not Free Software and it would be wonderful if we could replace them by Free Software alternatives as soon as possible. See http://www.cstr.ed.ac.uk/projects/festival/. Which in turn says: Externally configurable language independent modules: * phonesets * lexicons * letter-to-sound rules * tokenizing * part of speech tagging * intonation and duration That answers most of my technical questions, including how (in principle, anyway) we are going to support tonal languages such as Yoruba. Now for organization. Where should we put TTS projects for language support? Can we create http://dev.laptop.org/tts? Who should be in charge? What sort of process should we have for creating projects? Should we just automatically create a TTS project for every translate project? speech-dispatcher is not packaged as an RPM for Fedora, I see Debian packages. Is there a converter? so at present I am mostly making a RPM package so that it can be accepted by the Fedora community and ultimately be dropped into the OLPC Builds. You can track the progress here : https://bugzilla.redhat.com/show_bug.cgi?id=432259 I am not an expert at RPM packaging and hence its taking some time at my end. I'd welcome anyone to assist me and help speed up the process. dotconf packages which speech-dispatcher is being packaged by my team mate Assim. You can check its progress here : https://bugzilla.redhat.com/show_bug.cgi?id=433253 Some immediate tasks that we plan to carry out once speech-dispatcher is packaged and dropped into the OLPC builds are : Provide the much needed play button, with text highlight features as discussed by Edward. Thank you. Port an AI Chatbot to the XO and hack it enough to make it speak to the child :). Encourage other developers to make use of speech-synthesis to make their activities as lively and child friendly as possible :) Explore orca and other issues to make the XO more friendly for blind/low-vision students Have you looked at Oralux, the Linux distro for the blind and visually-impaired? http://oralux.net/ We should invite them to join our efforts. @James : We envision that speech-synthesis will surely get integrated with Read in due time. I think it would be great if maybe Gutenberg text could be loaded right from Read only? I was not planning on anything so fancy. Basically, I was frustrated that I had a device