Here is a free lexical resource: https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/cmudict/
On Thu, Jul 12, 2012 at 3:30 PM, John Stewart <[email protected]> wrote: > Syllable segmentation in English is notoriously difficult. I'd use a > lexical resource rather than trying to do it algorithmically. > > jds > > On Thu, Jul 12, 2012 at 5:39 PM, Lance Norskog <[email protected]> wrote: >> Phonetic encoding might help. This essentially creates a canonical >> stream of consonants from a word. Check out the Double Metaphone >> implementation in Lucene. Once you have your word encoded in >> consonants, you can try making bigrams of the consonants. >> >> >> >> On Thu, Jul 12, 2012 at 11:49 AM, Adam Goodkind <[email protected]> wrote: >>> It would be for typed English. >>> >>> On Tue, Jul 10, 2012 at 11:25 PM, Lance Norskog <[email protected]> wrote: >>> >>>> Is this in the general case or for specific speech? For example, it >>>> should be possible to create an HMM that breaks medical jargon, based >>>> on work in splitting Simplified Chinese language text. The average >>>> Simplified Chinese "word" is 1.5 ideograms, and you need a >>>> well-trained HMM (or similar) to split Simplified Chinese well. The >>>> language is very context-specific with both prefixes and suffixes that >>>> alter the meaning of "interior" words. >>>> >>>> On Mon, Jul 9, 2012 at 4:39 PM, John Stewart <[email protected]> wrote: >>>> > That's right, better use a lexical database. CELEX2, available fairly >>>> > inexpensively from the Linguistic Data Consortium, has syllable >>>> > boundaries in its phonological representations. >>>> > >>>> > http://www.ldc.upenn.edu/Catalog/readme_files/celex.readme.html#overview >>>> > >>>> > jds >>>> > >>>> > On Mon, Jul 9, 2012 at 6:37 PM, James Kosin <[email protected]> >>>> wrote: >>>> >> Adam, >>>> >> >>>> >> Sorry, OpenNLP doesn't detect syllables. What you probably need is more >>>> >> of a dictionary with pronunciation syllables. >>>> >> It could be trained to do it maybe; but, would be very language specific >>>> >> and not very useful. The dictionary approach would be best. Though >>>> >> OpenNLP could help parse the words/tokens for you to use in the >>>> dictionary. >>>> >> >>>> >> James >>>> >> >>>> >> On 7/9/2012 5:26 PM, Adam Goodkind wrote: >>>> >>> Hi all, >>>> >>> >>>> >>> Does OpenNLP have the ability to detect syllables? If not, could you >>>> point >>>> >>> me to a java toolkit that can do this? >>>> >>> >>>> >>> Thanks, >>>> >>> Adam >>>> >>> >>>> >> >>>> >> >>>> >>>> >>>> >>>> -- >>>> Lance Norskog >>>> [email protected] >>>> >>> >>> >>> >>> -- >>> *Adam Goodkind * >>> *w* adamgoodkind.com <http://www.adamgoodkind.com> >>> *t* @adamgreatkind <https://twitter.com/#%21/adamgreatkind> >> >> >> >> -- >> Lance Norskog >> [email protected] -- Lance Norskog [email protected]
