Eric S. Johansson wrote: > Eric S. Johansson wrote: >> as I was constructing my response, and was almost finished when it >> hit me about what's wrong with the model proposed. it is the >> equivalent of raw natural text. Full function natural text sucks a >> little bit. The broken, unable to correct consistently, natural text >> is horrible and ruins voice models. What you're proposing has even >> less functionality than a broken natural text. > > I'm sorry, that was too harsh. I was interrupted by one too many > things while I was writing that bit and I forgot to go back and edit > it. Again, I apologize for being careless. today might be a day to > stay away from the keyboard unless I'm writing code. :-)
Hi Eric, Looks like the original text got caught in a spam filter somewhere because of the attachment (I found it in the web archives). No worries about the tone. We are having a frank technical discussion and need to speak directly to get our points across. So my turn :) ... I think you are too caught up in the current working model of NS to see how things can be done differently. I have not studied the details of voice recognition and voice models, but I do appreciate the need for custom voice model training over time. There is a need for feedback, but it does _not_ need to be real-time. Personally, I would prefer it not to be real time. NS does in theory tout this as a feature when they claim that you can record speech on a voice recorder and dump it into NS for transcription. I have no idea whether that actually works. I don't really want to interact with the voice engine all the time, I want it to mostly stay out of my way. I don't want to look at the little voice level bar when I'm speaking or read the early guesses of the voice engine. I want to look out the window or look at the spreadsheet that I'm writing an email about :) The fact that NS updates the voice model incrementally is actually a bad feature. I don't want that. If I have a cold one day or there is noise outside or the mic is a bit displaced the profile gets damaged. That's probably why you have to start a fresh one every six months. Instead of saving my voice profile every day, I would like to save up a log of all the mistakes that were made during the week. I would then sit down for a session of training to help NS cope with those words and phrases better. I would first take a backup of my voice profile, then say a few sample sentences to make sure everything was generally working OK. I would then read passages from the log and do the needed correction and re-training. I would save the profile and start using the new one for the next week. I would also save profiles going back four weeks, and once a month I would do a brief test with the stored up profiles to see if it had degraded over time. If it had, I would roll back to an older one and perhaps do some training from recent logs too. There is no reason a voice profile should just automatically go bad over time. The fact that you have to constantly interact with the voice engine is not a feature, it's a bug! It's just that you have adapted your dictation to work around it. It's not at all clear that interactive correction is better that batched correction. It certainly should not be seen as a blocker for a project like this going forward. I wouldn't want to spend years on a project simply to replicate NS on Linux. There is plenty of room for improvement in the current system. OK, now for some replies: > There is a system that art exists that does exactly what you've opposed. > [assuming you meant 'proposed' here] Unlikely. If a system with the level of usability existed it would already be in widespread use. > While it was technically successful, it has failed in that nobody but > the originator uses it in even he admits this model has some serious > shortcomings. > What system, where? What was the model and what were the shortcomings? > The reason I insist on feedback is very simple. A good speech > recognition environment lets you lets you correct recognition errors and > create application-specific and application neutral commands. Yes, we agree that you need correction. The application-specific features can be implemented in this model too it the same way that Orca uses scripting. > modern systems train incrementally. This improves the user experience > because you don't have to put up with continual misrecognition's. > You would still have to correct the mistake at some point. I would prefer to just dictate on and come back and correct all the mistakes at the end. One should read through before sending in any case ;) Correction and re-training do not have to be the same thing, though that's the way NS does it now. > Apparently they also train incrementally on what's not corrected which > means batch correction is not a good thing. And I think that is a serious design-flaw for two (related) reasons: It gradually corrupts you voice files AND it makes the reader constantly worry about whether that is happening. You have to make sure to speak as correctly as properly at all times and always make sure to stop immediately and correct all the mistakes. Otherwise your profile will be hosed. I repeat: that is a bug, not a feature. You end up adapting more to the machine than the machine adapts to you. *That is a bug.* > I have no problem leaving the entire user interface for correction etc. > in Windows. The only trouble is how do you make it visible if you're > running a virtual machine full-screen? Don't run the virtual machine > full-screen? Sure, in a separate correction-session. Personally I would have two physical machines for this task, with the text going across the network. In the correcting session I would just flip the KVM switch to the windows box (or however you choose to organise it). > no, the grammar I gave you was a custom grammar. It didn't need a > preamble of "macro". It demonstrates how you can create a more natural > speech user interface. I think this is an NS bug too. I don't want natural editing, I only want natural dictation. I want two completely separate modes: pure dictation and pure editing. If I say 'cut that' I want the words 'cut that' to be typed. To edit I want to say: 'Hal: cut that bit'. Why? because that would improve overall recognition and would remove the worry that you might delete a paragraph by mistake. NS would only trigger it's special functions on a single word, and otherwise just do its best to transcribe. You would of course select that word to be one that it would never get wrong. (you could argue that natural editing is a feature, but the fact that you cannot easily configure it to use the modes I described is a design-flaw). > but also consider this. Ever wonder why the acceptance rate for speech > recognition is only one user in five? Granted I only have a small > sample but all of the doctors I've talked to about speech recognition > tell me stories of purchasing a very expensive package only to drop it > in a few months and go back to human transcription. Obviously > recognition accuracy is a part of the problem but the other half is > usability. Precisely. It's because they don't want to fiddle with the program, they just want to dictate. Henrik -- Ubuntu-accessibility mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-accessibility
