On Wed, 2 Nov 2005, Luis Menina wrote: > Ok, I've checked my code and I think you're wrong: > As I pre-increment the pointer, the first byte is never checked (I > assume I'm not in the middle of a character). So I'm waiting in this > case (offset == 1) for the first byte that doesn't match the "10xx xxxx" > pattern... wich is the case of the null byte ! > > Offset is then decremented, and everything goes smoothly...
You are right, your code is correct. Although it requires NULL-termination. BTW, your code performs almost three times slower than original code for Korean, which makes sense. I'm measuring all implementations posted here and on planet. Will post soon. behdad > BTW I've tried to use Federico's pango benchmark tools ( > http://primates.ximian.com/~federico/news-2005-10.html#25 ), but i'm > left with an error... > > After the "import cairo" error (solved by installing pycairo) I have > this error that I can't resolve, as i'm no python guru: > > ================ > > python ./plot-languages.py -o chart.png test1.xml > Traceback (most recent call last): > File "./plot-languages.py", line 373, in ? > main () > File "./plot-languages.py", line 367, in main > rset = ResultSet (file) > File "./plot-languages.py", line 47, in __init__ > self.parse (filename) > File "./plot-languages.py", line 63, in parse > self.parse_language_node (l) > File "./plot-languages.py", line 78, in parse_language_node > time = float_from_node (child) > File "./plot-languages.py", line 32, in float_from_node > return float (c.nodeValue) > ValueError: invalid literal for float(): 11,560000 > > ================= > > Thanks to anyone that can help me... > > > Behdad Esfahbod a écrit : > > On Wed, 2 Nov 2005, Luis Menina wrote: > > > > > >>Can you give me more info about what is wrong with my function ? > >>I don't understand what you mean by "it doesn't pass over the tail of > >>the last characters" > > > > > > Your code fails if the last character to skipped is a multibyte > > one. Suppose this is the input: > > > > str = "\xC2\xA0" > > offset = 1 > > > > which is the U+00A0 NO-BREAK SPACE. The output should be str + > > 2, but your code returns str + 1. > > > > behdad > > > > > > > >>>>gchar * g_utf8_offset_to_pointer1 ( const gchar *str, > >>>> glong offset) > >>>>{ > >>>> while (offset) > >>>> { > >>>> if ((*++str & 0xC0) != 0x80) > >>>> --offset ; > >>>> } > >>>> > >>>> return (gchar *)str; > >>>>} > > > > > > --behdad > > http://behdad.org/ > > > > "Commandment Three says Do Not Kill, Amendment Two says Blood Will Spill" > > -- Dan Bern, "New American Language" > > > > --behdad http://behdad.org/ "Commandment Three says Do Not Kill, Amendment Two says Blood Will Spill" -- Dan Bern, "New American Language" _______________________________________________ Performance-list mailing list [email protected] http://mail.gnome.org/mailman/listinfo/performance-list
