Brian Wilson wrote:
> I posted the following question on the vi/vim stack exchange
> <http://vi.stackexchange.com/questions/5452/set-line-breaks-word-wraps-and-word-searching-for-thai-and-other-non-latin-lang>
> and was told that the vim-dev mailing list would be a more appropriate
> place to ask.
>
> Brian
>
> It is edited here as best I can with the assumption that the entered text
> is utf-8.
>
> My purpose is for a Thai solution, but instead of a hack, a more general
> solution should be available that will help the more than 1 Billion people
> of the various Indic languages.
> ****
>
> I can set the text width and can manually line break imported paragraphs
> with the following as an example.
>
> set textwidth=72
> gqq
>
> I can also navigate English text files with the standard 'w' 'b' 'e' '*'
> commands, etc.
>
> This works well for English, however Thai and other Brahmic scripts of
> South and South-east Asia space at the phrasal level. Libreoffice, Word,
> Indesign, TeX, etc. "know" where line breaks should occur. They also "know"
> where individual words are, even though there are no spaces. I can navigate
> by Thai word in these programs. And I can even type English, Thai and Lao
> in the chrome address bar and then use alternate arrow on my mac to
> navigate at the word level in all three of these languages. It seems that
> these programs are tapping into work that has already been done at some
> lower level. If vim could tap into the same work, then someone could edit a
> multi-language document without having to do anything fancy. 'w' 'dw'
> (etc.) would just work happily from one word to the next regardless of the
> language.
>
> Line breaking poses a different challenge as these languages space at the
> phrasal level so that the trailing space or absence of a trailing space at
> the end of the line has meaning when breaking and joining lines. For
> purpose of example, the spaces are similar to an oxford comma and other
> punctuation and is the difference of whether or not we had Grandma for
> breakfast. (Let's eat Grandma. vs. Let's eat, Grandma.) One, also, doesn't,
> want, random, spaces, coming, when, they, are, not, needed.
>
> *My question is two fold:* 1. How can vim tap into already available
> libraries in order to recognize words from Indic languages (including and
> especially Thai) for the purpose of navigation and other vim word level
> commands. 2. Is it possible to add language awareness for the purpose of
> line breaking so that vim does not strip/add spaces when breaking/joining
> lines at words in Thai or other Indic languages.
Can we see the start and/or end of a word by recognizing characters?
Or do we need to recognize words?
The spell checker does have some knowledge about where words start and
end. It's a bit slow doing it that way, but might still be acceptable.
I suppose we could have a character class that indicates no spaces are
used to separated words. That will assume the character is only used in
that kind of language.
--
hundred-and-one symptoms of being an internet addict:
102. When filling out your driver's license application, you give
your IP address.
/// Bram Moolenaar -- [email protected] -- http://www.Moolenaar.net \\\
/// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\ an exciting new programming language -- http://www.Zimbu.org ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///
--
--
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php
---
You received this message because you are subscribed to the Google Groups
"vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.