On 2009-02-26, Eric Scoles <[email protected]> wrote:
>
>
>
> On Wed, Feb 25, 2009 at 7:14 PM, Dave Henn <[email protected]> wrote:
>
>>
>>
>> ....
>>
>> I'm only going to reply to this part because it's quick. There are LOTS of
>> other cues before a reader, far more than the individual words and the
>> punctuation. There is the context in which each word is employed, which is
>> dependent in part on the words surrounding it., partly on the words in
>> larger portions of the text being processed. There is the flow of the text
>> in a sentence, it's rhythm, or, as you say, cadence that should be parsable
>> using phonemes and syllabic databases (I'm sure I'm mangling the
>> terminology, but you get the idea). How often have you seen someone or
>> yourself read a passage aloud a second time because the first time didn't
>> sound right? Something cued, or didn't, your change in how you read the
>> passage. All of these things are goals for text-to-speech, and context is
>> already being used in many. I don't know about rhythm, but that shouldn't be
>> long if it's not already there. As far as gender, if a system has a
>> sufficient database of names, it should be able to take a good guess at
>> that, and pitch and timbre would at least partially follow from gender.
>> Tone, I don't know, but context would certainly help there.
>
>
>
>
> First: Those are all things that can be inferred, but many of them are
> choices. Other choices are possible. Those choices are what make a reading a
> performance.
>
> Second: The value added for a good reading seldom has much to do with those
> things that are already in the text. It has to do with, yes, the choices
> that the reader makes in how to interpret or render the stuff that's in the
> text; but it also has to do with how to render the stuff that's either not
> in the text (what it makes him/her feel or think, etc.), or that is in the
> text in ways that no reader will be able to deal with for quite a while.
> ('Biff's tone oozed. "Oh, but you look wonderful, dear." There was a glint
> in his eye that I knew well, but Jane did not.') Or consider Rob Sawyer's
> Kennedy impression in the reading many of us heard him give some months
> back: Kennedy's not even named in the text, except by implications that only
> a human would get. Not naming Kennedy has a positive impact on the
> story-reading experience, because you're allowed to discover who it is that
> the alien is talking about as you read the words. You could mark that up,
> though, without having a negative impact on the experience: You discover it
> by hearing the impression.
>
> Good readings are performances. Performance involves choice. What I'm
> suggesting is to take the automated reading to a new level -- one analogous
> to that enabled by MIDI on a good keyboard set, which is absolutely not
> possible right now or in the truly foreseeable future (i.e., the future that
> comes before the readers have AI that allows them to interpret the meaning
> of texts). (To say that it's likely to happen because we've done these
> things before is not what I mean by 'foreseeable' in this context, because
> we don't yet have an idea of how it would be done.)
>
> I feel we're speaking at cross-purposes. My broader argument is that there
> are applications for a speech markup language. The automated reading of blog
> posts is simply what I see as an early application. This is not something
> that would take huge research funding to work out -- the markup spec could
> be mapped out in a weekend by someone familiar with XML specification,
> critiqued over a period of time. The particulars could be figured out with
> low-tech by grad students. Hell, I'd be surprised if there hadn't already
> been Media Lab projects to do exactly this kind of stuff.
>


Apparently there's a long history of this kind of development -- Media Lab
seems to have been working on it in the late 90s, and it's apparenlty widely
used in voice-response systems (well, of course it would be, wouldn't it?).

Media Lab: "Tools for Expressive Text-To-Speech Markup"
PDF: http://www.media.mit.edu/~erikb/papers/uist01-tools.pdf
Google HTML:
http://74.125.95.132/search?q=cache:NwC4evLvIi0J:www.media.mit.edu/~erikb/papers/uist01-tools.pdf+text+to+speech+markup&hl=en&ct=clnk&cd=1&gl=us&client=firefox-a

W3: Speech Synthesis Markup Language (SSML):
http://www.w3.org/TR/speech-synthesis/

Wikipedia: Speech synthesis markup languages
http://en.wikipedia.org/wiki/Speech_synthesis#Speech_synthesis_markup_languages

Wikipedia: Speech Synthesis Markup Language (SSML):
http://en.wikipedia.org/wiki/Speech_Synthesis_Markup_Language

Wikipedia: Java Speech Markup Language [JSML]
http://en.wikipedia.org/wiki/JSML

Spoken Text Markup Language [STML]
http://www.bell-labs.com/project/tts/stml.html

SABLE, which apparently attempts to merge STML, JSML and SSML
http://www.bell-labs.com/project/tts/sable.html

All that's needful is for some clever MAKER-type to get obsessed with it.



-- 
eric scoles ([email protected])

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"R-SPEC: The Rochester Speculative Literature Association" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/r-spec?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to