Hi Eric, I've added comments in-line below. However, this is probably beginning to get a little off topic for the list. Maybe take further discussion off list if you want to respond further. Alternatively, maybe you have some suggestions or ubuntu specific points that could be brought in to get things more on-topic? I'm not up-to-date enough with current VR issues to be able to provide any really constructive advice. However, I also understand how important it can be to have general discussion and possibly find the ideas or energy to carry things forward further. I'm happy to discuss further off list if you wish.
regards, Tim Eric S. Johansson writes: > On 5/15/2010 8:59 PM, Tim Cross wrote: > > > Hi Eric, > > > > the points you raise and your observations are all true, but I don't think > > there is a good answer. What it really boils down to is that OSS is largely > > about solutions that have been developed by users scratching their own > > itch. > > Unfortunately, voice recognition is an extremely complex and difficult to > > scratch itch and the number of developers with the necessary skills that > > want > > to scratch it is very small. > > thanks for a great series responses to a complex question. As for scratching > your own it, there's one big difference. I can't scratch my own itch because > my > hands don't work right. It's roughly the same problem as telling a blind > person > that they can write their own code in an IDE that has lots of wonderful > graphical images that tell you what you need to do... whoops > Yes, I understand the difficulty and frustration. I wasn't meaning to imply that you or any other individual should fix the problem directly, though I suspect there are some who would benefit from VR who are in a position to assist in code writing for OSS projects. The other point I wanted to make is that coding is not the only way to help. To a large extent, the lobbying aspect is also important. A lot of the battle is getting the recognition of the importance of OSS and low cost solutions in the adaptive technology space. This is an area most can asist with and in fact, you have demonstrated in starting this thread. What is needed is to move this sort of discussion more into the mainstream development area and working towards adaptive tech being considered as a first class consideration and not as an afterthought, as is too often the situation. > > It has been a umber of years since I've looked at the status of voice > > recognition in the OSS world. Working on these projects would seem to be a > > good proactive approach. In addition to this, two other approaches that > > might > > be worth pursuing, especially by anyone who is interested in this area and > > doesn't feel they have the technical skill to actually work in the > > development > > area, would be to lobby commercial vendors to either make some of their > > code > > open source or to provide low cost licenses and to lobby for project > > funding to > > support OSS projects that are working on VR. A significant amount of the > > work > > done to improve TTS interfaces has been made possible because of effective > > loggying and gaining of support from commercial and government bodies. > > The vast majority of the speech recognition efforts today are for IVR, > interactive voice response systems such as those you would ask. "Weather in > Boston" and get a text-to-speech response like "the weather in Boston is > hostile > to out-of-towners and not very kind to locals either" > > The difference between speech recognition and text-to-speech today is that > usable text-to-speech is easy to create with a team of grad students. > Speech > recognition takes generations of grad students. Witness how little progress > has > been made on the Sphinx toolkit's since its creation. We have three > different > engines all with different characteristics but all on the same problem > space. We > don't have proper acoustic modeling. We don't have proper language modeling > etc. > etc. I know I'm being a broken record but, these are huge obstacles to > general-purpose use. > I only partially agree on both your points. Yes, much of the VR work to date has been for IVR systems, but as the technology improves, I believe this is changing. For example, the VR support I mentioned on the Nexus phone is for dictation of SMS text messages. The new phone system we recently installed at work has VR capabilities that translates voice messages to text and sends it via SMS an email. I think this type of application of VR will see rapid development over the next few years and represents the next sttage and a higher level of sophistication past the IVR model with its limited recognition abilities. Yes, this is a difficult problem. However, it is interesting to note that your arguments are very similar to the ones that were common in the mid 90's. At that time, software TTS was thought to be too comutationally intensive to be practicle for real-time TTS. Creating voices was considered to be an art that only a very few people could do and many argued it would be many years before we had a decent OSS TTS engine available. I don't think we will see anything in the OSS world that is of production quality and able to meet the needs of adaptive tech users next month or even next year. It is a hard problem and will take considerable resources to address. However, it may not take as long or as many resources as you fear. It is very difficult to predict the rate of development in these areas. For all we know, there may be ground braking hardware or algorithms just around the corner that will completely change the landscape. I feel quite positive about developments in this area because I can see generalised VR becoming more common. The growth in demand/popularity for smart phones and other small form factor devices is being hampered because keyboards, both software and hardware, are still the main interface. However, hardware keyboards are difficult to fit in small devices and software ones are slow and somewhat inconvenient. Generalised VR will be the commercial solution in this area. Initially, much of it will be limited IVR type solutions, but as shown with the Nexus, more general support to dictate messages etc will also increase in popularity. If this technology becomes part of things like the Android OS, then this technology will slowly find its way into the OSS world. > I would love to see us license for little or no money the nuance > NaturallySpeaking toolkit for purposes of developing accessibility > interfaces. I > can't even get them to return a phone call what I'm calling about a > commercial > application. If it's for accessibility, they don't even pick up the phone. > This > tells me it may be time for some guerrilla action. If someone has a spare > $2000, > I have a scanner and I'm sure we can find some good friends in Europe and > Japan. > not that I'm saying or even suggesting we should violate nuanc's copyright > of > course because that would be as wrong as denying disabled people information > they need to make themselves more independent and increasing their prospects > for > working. > There were a number of attempts to get IBM to make their ViaVoice Outloud TTS engine available as open source and to make the runtime free or at a low license cost before someone was actualy successful in finding a model that was acceptable to the vendor and provided a reasonable outcome for users. I suspect it depends on individual tenacity, personality and possibly some degree of luck. I think having a good understanding of business and things that are likely to motivate any business into accepting or supporting any proposal is also essential. Most businesses are not well motivated by altruistic concerns. Many of them still don't undersatnd OSS - some have even believed the FUD put out by companies like Microsoft. Some even fear losing lucrative contracts with anti-OSS vendors if theya re seen to support such initiatives. Trying to convince a large vendor to provide their product at a lower prices for people with a disability is unlikely to gain much traction unless it boils down to good business sense. The difficult part is in identifying a strong convincing buinsess case that the vendor will see as a positive and which has benefits for those with a disability that need such solutions. > > I'm possibly a little more optimistic regarding the future of OSS VR. Voice > > recognition is rapidly moving from living in a very specialised domain to > > being much more general purpose. This is largely due to the growth in > > small form > > factor devices, such as mobile phones. I've been told that the Google > > Nexus 1 > > phone has quite good VR support. This is an indication that decent VR > > applications that run in an OSS environment are becoming more prevalent. > > here's a dirty little secret. They didn't do the speech recognition in the > phone. Not enough horsepower or memory space for vocabularies. They ship the > audio to a server which then does speech recognition not real-time and > shoves > the text back to the cell phone. I wasn't aware of that. So, if I've got this right, you speak the message you want to send, this is recorded and sent to a removte server and then a text version is returned that is sent as the SMS message? It must be fairly close to real-time as the person I was talking to said that as they speak the message it is rendered as text on the screen, which enables them to correct any errors before sending. > > this may unfortunately be our future for disability use. We'll no longer > have > control over speech recognition engines but instead rent recognition time > off > the cloud. I suspect this could well be the model we are moving to generally and not just with respect to adaptive technology. From this perspective, provided the costs are reasonable, we will not be any worse off than other users who are also just as dependent for all their services. Of course, this does not address the issue of anyone being or becoming dependent on technology that we don't have control over or access to. This is largely the underlying concern that RMS had when forming the FSF. While you could argue that those with a disability are possibly at a greater disadvantage because the technology is percieved as being more important or critical to them. However, I think we need to be careful of such arguments. Yes, technology enables me as someone with a disability to do things, many of them independently, that were not possible before we had this technology. However, to argue that my needs are greater or that my pain would be greater if I lost access to this technology than it would be for someone without a disability who has lost control or access to some technology they rely on is dangerous. It runs the risk of creating an 'us and them' paradigm and is based on subjective value statements that are impossible to quantify. It distracts from the real issue - ensuring all have access and the ability to control or own the technology that becomes critical in how we live our lives. > > I really hate the cloud. I understand why pilots hate them as well because > if > you fly to the big fluffy thing, the fluffy soft thing can turn really > really > hard as you run into a mountain hidden within the cloud. boink! > > I'm wait for the equivalent to happen in the software cloud world. > The 'cloud' is just marketing hype. Its like Web 2.0 - it means nothing and everything all at the same time. Technically, there is nothing new here. It is just a swing back to the old 'thin client' and centrally provided service model that I've seen come and go already during my short career. Yes, its more sophisticated in some ways and has some improved architecture - thank god we have learnt something in the last 40 years! Some of the cloud services being provided are good, some are bad and some are dangerous. There have already been major stuff ups - ask a sidekick user wha they think! However, I don't feel anyone should fee any more threatened by the cloud than they do regarding the many proprietary systems they have been putting data into for the last 20 years. As a friend of mine says - "Its all just hem lines, they will go up and they will go down". > > It is > > also likely as demand increases for VR solutions that more University > > research > > will occur as it will be seen as something with good commercial potential > > i.e. > > good funding opportunities. > > Speech recognition research is aimed at IVR. Funding has plateaued or even > dropped because recognition accuracy is not improving. The techniques have > run > out of steam. It will take a radically new approach to put any fire under > speech > recognition again. Sometimes I think the only way nuance is improving > NaturallySpeaking is by fixing bugs. I doubt there's no new technology going > on > inside. > Possibly, I am not up on current research in this area and can only speculate. I once had similar concerns regarding TTS. Nearly all the research was towards the fdevelopment of more natural sounding voices, usually using the concatenative approach. While this style of TTS does appear to generate more human sounding voices, it also suffers from the limitation that pronounciation quality falls dramatically as speech rates increase. Sounds wonderful when the rate is a normal speaking rate, but you cannot understand it once you increase the rate. As a blind user, I'm use to listening at high speech rates. If I had to lisen at a normal speaking rate to all the data I need to process each day, I would never get things done. However, the newer TTS engines are less useful to me than older systems that use mathematically derived approximations of speech, which sound less natural, but at least can be understood at high speech rates. > > Unfortunately, it is also true that the accessibility benefits of > > technology > > such as VR will all too often be a secondary issue to commercial interests. > > There will be a lag time between this technology existing and it being > > accessible to those who would really benefit from it. This is probably the > > downside of the free market economy where developments are driven by > > profits. > > However, it is also the percieved profits that ensure commercial resources > > are > > invested into understanding the problem and developing workable solutions. > > We are still a long way from the sort of society that would put > > the accessibility needs before individual and corporate greed. In fact, we > > are > > still a long way from getting mainstream recognition of accessibility > > issues > > to the level they should be, which is why I think lobbying and raising > > issues > > outside the accessibility community is so important. > > yes. It would be interesting to do the calculation but I think there's a > good > chance that sticking disabled people on disability and low-income housing > may be > cheaper to society than all the efforts put into making software and life > space > disabilities accessible. This is also why I advocate for putting disability > hooks in every machine (i.e. low-cost, at little or no administration) and > every > disabled person carry their own machine with a disability user interface > (i.e. > text-to-speech or speech recognition) so that the cost of enabling a machine > for > accessibility is lower than it is now. > > it's all economics. I think if we can come up with a way that tweaks or > leverages economics in our favor, we can make a big difference. If it's > strictly > "do it because this is right", it cannot fail. Another example of this in a > different field is light pollution. Light pollution is a good idea to > control > because it reduces energy, makes nighttime driving safer, makes it possible > for > elderly to drive out there is insufficient economic incentive to fix > streetlights and high glare security lighting to make any progress. > Therefore > any changes based on moral arguments are hard fought hard one battles and > usually overturned when the people driving the argument vanished from th > political scene because economic/business people push back to the status quo > (i.e. short-term goal driven) > > We will suffer the same fate with our arguments if we can't provide a good > economic argument in addition to our technical and moral/ethical arguments. Yep, the moral agument tends to fail because corporate capitalism is largely amoral. We need to demonstrate strong business cases to justify the outcomes we want. If a decision is percieved as a good business choice, it is far more likely to be adopted. However, sometimes, we really need to be quite creative and use a lot of imagination to formulate such a business case. -- Tim Cross [email protected] There are two types of people in IT - those who do not manage what they understand and those who do not understand what they manage. -- Ubuntu-accessibility mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-accessibility
