Peter Norvig did an excellent presentation where he exposed one solution for this problem. You can look at it (http://videolectures.net/cikm08_norvig_slatuad/) from the slide "Text Data".

Hope this help,

Alexandre

On 11-11-16 01:44 PM, Ryan L. Sun wrote:
Hi all,

I'm facing a problem to split concatenated English text, more
specifically, domain name.
For example:
boysandgirls.com ->  boy(s)|and|girl(s)|.com
haveaniceday.net ->  have|a|nice|day|.net

Can I use opennlp to do this? I checked the opennlp documentation and
looks like "Learnable Tokenizer" is promising, but i couldn't get it
to work.
Any help is appreciated.


--
Alexandre Patry
Ingénieur-Chercheur
http://KeaText.com

 Transformez vos documents en outils de décision
<<  Turn your documents into decison tools

Reply via email to