On Jan 7, 2008, at 1:04 PM, Jakob Praher wrote:

just out of curiosity, I would like to ask why you decided to implement your own containter structures, like Vector or HashTable/ Map/Set ...

What was your driving force?

We didn't make a blanket decision to implement our own container objects. We decided separately about each one.

For HashMap and HashSet, there was no suitable standard library version available. The details of the hash algorithms are carefully tuned, and the way we can store a RefPtr in a hash table with minimal overhead is as well. The hash-based collections from the standard C++ library are still not present in all the compilers we need to support, and if they were I believe they'd be insufficient.

For Vector, one of the reasons was that WTF::Vector has a feature where it uses the vector object itself to store an initial fixed capacity. We use this in contexts where we have a variable sized object but don't want to do any memory allocation unless it exceeds the fixed size.

The standard C++ library std::vector and std::hash_map also rely on C+ + exceptions, and our entire project works with a limited dialect of C+ + that doesn't use RTTI or exceptions.

Note that we do use the standard C++ library functions such as std::sort in a number of places.

In addition why did you choose to make the string internal representation (UChar) 2 bytes wide? Isn't it that most web-sites are encoded in UTF-8/Latin1?


It's true that most websites are encoded in Latin-1 (although it's the Windows variant with different meanings for 0x80-0x9F). And many modern websites are encoded in UTF-8. Note, though, that those are two different encodings; the internal coding couldn't be Latin-1 because it can't cover all the Unicode characters. So the candidate for internal encoding is UTF-8.

There are multiple reasons we chose UTF-16 over UTF-8.

One "reason" is that the KHTML code base was already using UTF-16 when we started the WebKit project.

Another reason is that the JavaScript language gets at the DOM with JavaScript strings, and all JavaScript string operations are defined in terms of UTF-16 code units. If things were stored as UTF-8, they'd have to be converted back and forth from UTF-16. Or we could change JavaScript to use UTF-8, but then many JavaScript string operations would require scanning from the beginning of the string to count UTF-16 code units.

I'm sure the reasons I list here are not all the reasons for any of these decisions.

The theme seems to be performance.

    -- Darin

_______________________________________________
webkit-dev mailing list
[email protected]
http://lists.webkit.org/mailman/listinfo/webkit-dev

Reply via email to