Adam Dingle wrote: > [...] Fetching the n-th character in a string is less often > necessary, so it's OK for it to be less efficient. In the rare case > where you really do need random access to characters by index, you > could always iterate over all characters in a string and store them > in a unichar[] array for that purpose, or you could construct a data > structure similar to the one you've outlined above.
Yes, converting the whole UTF8 string to a unichar[] is definitely a better solution than using the offset array -- at least each UTF8 character would be decoded only once that way. Where the user needs a lot of random access, it may be worth the memory allocation and copying. Where they are scanning from the start to the end, it is likely to be more efficient to work from the UTF8 directly. Basic operations on UTF8 strings which are quick (where N is length of string): - Get unichar at pointer, and advance pointer: O(1) - Compare a fixed prefix-string at pointer (without decoding UTF8), and advance pointer if matches: O(1) - Search for a fixed string within the string (without decoding UTF8): O(N) You can do a lot with these basic operations. It is quicker to do matches in UTF8 than to decode characters. So testing for a given character at the pointer location is likely to be quicker with a prefix-string match than a decode and compare as an integer. Slow operations on UTF8 strings, to avoid: - Fetch unichar at index measured in unichars: O(N) for each character fetch, O(N*N) in a loop Jim -- Jim Peters (_)/=\~/_(_) j...@uazu.net (_) /=\ ~/_ (_) Uazú (_) /=\ ~/_ (_) http:// in Peru (_) ____ /=\ ____ ~/_ ____ (_) uazu.net _______________________________________________ vala-list mailing list vala-list@gnome.org http://mail.gnome.org/mailman/listinfo/vala-list