On 2009-02-26, Eric Scoles <[email protected]> wrote: > > > > On Wed, Feb 25, 2009 at 7:14 PM, Dave Henn <[email protected]> wrote: > >> >> >> .... >> >> I'm only going to reply to this part because it's quick. There are LOTS of >> other cues before a reader, far more than the individual words and the >> punctuation. There is the context in which each word is employed, which is >> dependent in part on the words surrounding it., partly on the words in >> larger portions of the text being processed. There is the flow of the text >> in a sentence, it's rhythm, or, as you say, cadence that should be parsable >> using phonemes and syllabic databases (I'm sure I'm mangling the >> terminology, but you get the idea). How often have you seen someone or >> yourself read a passage aloud a second time because the first time didn't >> sound right? Something cued, or didn't, your change in how you read the >> passage. All of these things are goals for text-to-speech, and context is >> already being used in many. I don't know about rhythm, but that shouldn't be >> long if it's not already there. As far as gender, if a system has a >> sufficient database of names, it should be able to take a good guess at >> that, and pitch and timbre would at least partially follow from gender. >> Tone, I don't know, but context would certainly help there. > > > > > First: Those are all things that can be inferred, but many of them are > choices. Other choices are possible. Those choices are what make a reading a > performance. > > Second: The value added for a good reading seldom has much to do with those > things that are already in the text. It has to do with, yes, the choices > that the reader makes in how to interpret or render the stuff that's in the > text; but it also has to do with how to render the stuff that's either not > in the text (what it makes him/her feel or think, etc.), or that is in the > text in ways that no reader will be able to deal with for quite a while. > ('Biff's tone oozed. "Oh, but you look wonderful, dear." There was a glint > in his eye that I knew well, but Jane did not.') Or consider Rob Sawyer's > Kennedy impression in the reading many of us heard him give some months > back: Kennedy's not even named in the text, except by implications that only > a human would get. Not naming Kennedy has a positive impact on the > story-reading experience, because you're allowed to discover who it is that > the alien is talking about as you read the words. You could mark that up, > though, without having a negative impact on the experience: You discover it > by hearing the impression. > > Good readings are performances. Performance involves choice. What I'm > suggesting is to take the automated reading to a new level -- one analogous > to that enabled by MIDI on a good keyboard set, which is absolutely not > possible right now or in the truly foreseeable future (i.e., the future that > comes before the readers have AI that allows them to interpret the meaning > of texts). (To say that it's likely to happen because we've done these > things before is not what I mean by 'foreseeable' in this context, because > we don't yet have an idea of how it would be done.) > > I feel we're speaking at cross-purposes. My broader argument is that there > are applications for a speech markup language. The automated reading of blog > posts is simply what I see as an early application. This is not something > that would take huge research funding to work out -- the markup spec could > be mapped out in a weekend by someone familiar with XML specification, > critiqued over a period of time. The particulars could be figured out with > low-tech by grad students. Hell, I'd be surprised if there hadn't already > been Media Lab projects to do exactly this kind of stuff. >
Apparently there's a long history of this kind of development -- Media Lab seems to have been working on it in the late 90s, and it's apparenlty widely used in voice-response systems (well, of course it would be, wouldn't it?). Media Lab: "Tools for Expressive Text-To-Speech Markup" PDF: http://www.media.mit.edu/~erikb/papers/uist01-tools.pdf Google HTML: http://74.125.95.132/search?q=cache:NwC4evLvIi0J:www.media.mit.edu/~erikb/papers/uist01-tools.pdf+text+to+speech+markup&hl=en&ct=clnk&cd=1&gl=us&client=firefox-a W3: Speech Synthesis Markup Language (SSML): http://www.w3.org/TR/speech-synthesis/ Wikipedia: Speech synthesis markup languages http://en.wikipedia.org/wiki/Speech_synthesis#Speech_synthesis_markup_languages Wikipedia: Speech Synthesis Markup Language (SSML): http://en.wikipedia.org/wiki/Speech_Synthesis_Markup_Language Wikipedia: Java Speech Markup Language [JSML] http://en.wikipedia.org/wiki/JSML Spoken Text Markup Language [STML] http://www.bell-labs.com/project/tts/stml.html SABLE, which apparently attempts to merge STML, JSML and SSML http://www.bell-labs.com/project/tts/sable.html All that's needful is for some clever MAKER-type to get obsessed with it. -- eric scoles ([email protected]) --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "R-SPEC: The Rochester Speculative Literature Association" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/r-spec?hl=en -~----------~----~----~----~------~----~------~--~---
