> If you are willing to give up precision, then you can use heuristics. > > It's ugly but perhaps ok for a simple editor. You can improve the > precision > with better heuristics and more data, so you get to decide how much is > good enough... So using white spaces for general word breaking and ideographs for CJK would be an acceptable approach? What I wonder about is how to handle all those languages I don't speak/understand (in fact almost all :-)). Can I used this simple aproach for, say, cherokee and arabic scripts too? I don't even know which has white spaces and which has not. Ciao, Mike
- RE: extracting words Christopher John Fynn
- Re: extracting words Lukas Pietsch
- Re: extracting words John Cowan
- Re: extracting words Edward Cherlin
- RE: extracting words Makarand Gadre
- Re: extracting words Jungshik Shin
- Re: extracting words Jonathan Lewis
- FW: extracting words Mike Lischke
- Re: FW: extracting words Tex Texin
- Re: extracting words Mark Davis
- Re: FW: extracting words Mike Lischke
- Re: FW: extracting words David Starner
- re: extracting words Mike Lischke
- Re: extracting words Mark Davis
- RE: extracting words jarkko . hietaniemi
- RE: extracting words Mark Leisher
- Re: extracting words Kenneth Whistler
- Re: extracting words Michael \(michka\) Kaplan
- Re: extracting words Jungshik Shin
- RE: extracting words Christopher John Fynn
- RE: extracting words jarkko . hietaniemi