Re: [Vala] how can I get the number of unicode points in a string?

Jim Peters Mon, 04 Apr 2011 06:31:20 -0700

Adam Dingle wrote:
> [...] Fetching the n-th character in a string is less often
> necessary, so it's OK for it to be less efficient.  In the rare case
> where you really do need random access to characters by index, you
> could always iterate over all characters in a string and store them
> in a unichar[] array for that purpose, or you could construct a data
> structure similar to the one you've outlined above.


Yes, converting the whole UTF8 string to a unichar[] is definitely a
better solution than using the offset array -- at least each UTF8
character would be decoded only once that way.  Where the user needs a
lot of random access, it may be worth the memory allocation and copying.

Where they are scanning from the start to the end, it is likely to be
more efficient to work from the UTF8 directly.

Basic operations on UTF8 strings which are quick (where N is length of
string):

- Get unichar at pointer, and advance pointer: O(1)

- Compare a fixed prefix-string at pointer (without decoding UTF8),
  and advance pointer if matches: O(1)

- Search for a fixed string within the string (without decoding UTF8):
  O(N)

You can do a lot with these basic operations.  It is quicker to do
matches in UTF8 than to decode characters.  So testing for a given
character at the pointer location is likely to be quicker with a
prefix-string match than a decode and compare as an integer.

Slow operations on UTF8 strings, to avoid:

- Fetch unichar at index measured in unichars: O(N) for each character
  fetch, O(N*N) in a loop

Jim

-- 
 Jim Peters                  (_)/=\~/_(_)                 j...@uazu.net
                          (_)  /=\  ~/_  (_)
 Uazú                  (_)    /=\    ~/_    (_)                http://
 in Peru            (_) ____ /=\ ____ ~/_ ____ (_)            uazu.net
_______________________________________________
vala-list mailing list
vala-list@gnome.org
http://mail.gnome.org/mailman/listinfo/vala-list

Re: [Vala] how can I get the number of unicode points in a string?

Reply via email to