Syllable segmentation in English is notoriously difficult.  I'd use a
lexical resource rather than trying to do it algorithmically.

jds

On Thu, Jul 12, 2012 at 5:39 PM, Lance Norskog <[email protected]> wrote:
> Phonetic encoding might help. This essentially creates a canonical
> stream of consonants from a word. Check out the Double Metaphone
> implementation in Lucene. Once you have your word encoded in
> consonants, you can try making bigrams of the consonants.
>
>
>
> On Thu, Jul 12, 2012 at 11:49 AM, Adam Goodkind <[email protected]> wrote:
>> It would be for typed English.
>>
>> On Tue, Jul 10, 2012 at 11:25 PM, Lance Norskog <[email protected]> wrote:
>>
>>> Is this in the general case or for specific speech? For example, it
>>> should be possible to create an HMM that breaks medical jargon, based
>>> on work in splitting Simplified Chinese language text. The average
>>> Simplified Chinese "word" is 1.5 ideograms, and you need a
>>> well-trained HMM (or similar) to split Simplified Chinese well. The
>>> language is very context-specific with both prefixes and suffixes that
>>> alter the meaning of "interior" words.
>>>
>>> On Mon, Jul 9, 2012 at 4:39 PM, John Stewart <[email protected]> wrote:
>>> > That's right, better use a lexical database.  CELEX2, available fairly
>>> > inexpensively from the Linguistic Data Consortium, has syllable
>>> > boundaries in its phonological representations.
>>> >
>>> > http://www.ldc.upenn.edu/Catalog/readme_files/celex.readme.html#overview
>>> >
>>> > jds
>>> >
>>> > On Mon, Jul 9, 2012 at 6:37 PM, James Kosin <[email protected]>
>>> wrote:
>>> >> Adam,
>>> >>
>>> >> Sorry, OpenNLP doesn't detect syllables.  What you probably need is more
>>> >> of a dictionary with pronunciation syllables.
>>> >> It could be trained to do it maybe; but, would be very language specific
>>> >> and not very useful.  The dictionary approach would be best.  Though
>>> >> OpenNLP could help parse the words/tokens for you to use in the
>>> dictionary.
>>> >>
>>> >> James
>>> >>
>>> >> On 7/9/2012 5:26 PM, Adam Goodkind wrote:
>>> >>> Hi all,
>>> >>>
>>> >>> Does OpenNLP have the ability to detect syllables? If not, could you
>>> point
>>> >>> me to a java toolkit that can do this?
>>> >>>
>>> >>> Thanks,
>>> >>> Adam
>>> >>>
>>> >>
>>> >>
>>>
>>>
>>>
>>> --
>>> Lance Norskog
>>> [email protected]
>>>
>>
>>
>>
>> --
>> *Adam Goodkind *
>> *w*  adamgoodkind.com <http://www.adamgoodkind.com>
>> *t*   @adamgreatkind <https://twitter.com/#%21/adamgreatkind>
>
>
>
> --
> Lance Norskog
> [email protected]

Reply via email to