Is this in the general case or for specific speech? For example, it should be possible to create an HMM that breaks medical jargon, based on work in splitting Simplified Chinese language text. The average Simplified Chinese "word" is 1.5 ideograms, and you need a well-trained HMM (or similar) to split Simplified Chinese well. The language is very context-specific with both prefixes and suffixes that alter the meaning of "interior" words.
On Mon, Jul 9, 2012 at 4:39 PM, John Stewart <[email protected]> wrote: > That's right, better use a lexical database. CELEX2, available fairly > inexpensively from the Linguistic Data Consortium, has syllable > boundaries in its phonological representations. > > http://www.ldc.upenn.edu/Catalog/readme_files/celex.readme.html#overview > > jds > > On Mon, Jul 9, 2012 at 6:37 PM, James Kosin <[email protected]> wrote: >> Adam, >> >> Sorry, OpenNLP doesn't detect syllables. What you probably need is more >> of a dictionary with pronunciation syllables. >> It could be trained to do it maybe; but, would be very language specific >> and not very useful. The dictionary approach would be best. Though >> OpenNLP could help parse the words/tokens for you to use in the dictionary. >> >> James >> >> On 7/9/2012 5:26 PM, Adam Goodkind wrote: >>> Hi all, >>> >>> Does OpenNLP have the ability to detect syllables? If not, could you point >>> me to a java toolkit that can do this? >>> >>> Thanks, >>> Adam >>> >> >> -- Lance Norskog [email protected]
