Re: [Python-il] Determining if a string is RTL

Dotan Cohen Sun, 04 Jan 2009 01:15:52 -0800

2009/1/4 Amit Aronovitch <[email protected]>:
> From the original regex, I guess you want to find whether the string
> contains any RTL characters, which is different than the question of
> finding the base direction (English text with some Hebrew words in the
> middle still has LTR direction).
>


I was only looking at the first character, but that might not be
obvious to anyone not versed in PHP. In any case, as was pointed out
earlier, I should ignore soft characters such as whitespace and
numbers.

> In that case, the relevant function is pango_unichar_direction, and
> from python it would look something like this:
>
>>>> has_rtl = pango.DIRECTION_RTL in map(pango.unichar_direction, 
>>>> text.decode("utf-8"))
>
> (If you wish to find the base direction, the code would be something like:
>
>>>> has_rtl = pango.DIRECTION_RTL == pango.find_base_dir(text, len(text))
> )
>

This _is_ better, thanks!

> 2)  Plug: reviving my abended fribidi-py code...
>
> One problem with the above code (first case) is that the function is
> called char-by-char (i.e. the "in" operator actually does a python
> loop), which might be slow if you have a long text.
> Now, the fribidi C library does provide a function, "get_types",  for
> calculating bidi props of the whole string. Unfortunately, the current
> python interface - pyfribidi (by Kobi and Nir) - wraps only the main
> functionality (log2vis) and not this low-level function.
> I once started a project called fribidi-py for a complete wrapping of
> FriBidi, but abbended it, mainly because once pyfribidi was done,
> there seemed to be no urgent need for the lower level functionality.
> Since this post made me recall that work, I checked, and it seems that
> it is still functional enough to achieve the goal described here,
> although the resulting code looks ugly (project was at a very
> preliminary stage, and lot of stuff was left unwrapped). If somebody
> wants to hack/revive it - see below. Currently this is an unusable
> solution, but it seems that it should not be too hard to make this
> project usable.
> From lack of time, I will probably not continue this myself unless
> people tell me it would be very useful/important - but if someone
> wishes to take it up, I will gladly help and maybe join.
>
> --------
> Code is available in http://amit.freeshell.org/fribidi-py_0.1.4.tar.gz
> No makefile (sorry, I did say preliminary code...). To use: unpack,
> goto the directory, run the following:
>
> $ . gen_types
> $ ln -s . fribidi
>
>  OK - now lets check for RTL chars:
>
> $ python
>>>> from fribidi import *
>>>> u = u'abc אבג'
>>>> sbuf = (FriBidiChar * len(u))(*map(ord,u))
>>>> rbuf = (FriBidiCharType * len(u))()
>>>> get_types(sbuf, len(u), rbuf)
>>>> [x%2 for x in rbuf]
> [0, 0, 0, 0, 1, 1, 1]
> -------
>
> Of course, several python loops are done here, which make it even less
> efficient than the Pango method described above. However, these loops
> can be avoided using numpy and some more ctypes hacks - just did not
> want to make the example more ugly than it already is...
>
>     Amit
>

I will leave it to the Zim developer to decide if he wants to go that
route. Thanks!

-- 
Dotan Cohen

http://what-is-what.com
http://gibberish.co.il

א-ב-ג-ד-ה-ו-ז-ח-ט-י-ך-כ-ל-ם-מ-ן-נ-ס-ע-ף-פ-ץ-צ-ק-ר-ש-ת
ا-ب-ت-ث-ج-ح-خ-د-ذ-ر-ز-س-ش-ص-ض-ط-ظ-ع-غ-ف-ق-ك-ل-م-ن-ه‍-و-ي
А-Б-В-Г-Д-Е-Ё-Ж-З-И-Й-К-Л-М-Н-О-П-Р-С-Т-У-Ф-Х-Ц-Ч-Ш-Щ-Ъ-Ы-Ь-Э-Ю-Я
а-б-в-г-д-е-ё-ж-з-и-й-к-л-м-н-о-п-р-с-т-у-ф-х-ц-ч-ш-щ-ъ-ы-ь-э-ю-я
ä-ö-ü-ß-Ä-Ö-Ü
_______________________________________________
Python-il mailing list
[email protected]
http://hamakor.org.il/cgi-bin/mailman/listinfo/python-il

Re: [Python-il] Determining if a string is RTL

לענות