When I started using SQLite I found it natural to use the sqlite3_open16 and use UTF16 encoding on strings since my applications always use wchar_t when handeling strings. I never questioned this until now when I decided to do some benchmark, and I found it interesting enough to share with you.

In my benchmark I used a database with several tables and indexes and the table I decided to benchmark contains 10 columns and 14000 rows with different types. It's a well normalized database that is used in a real life application.

The benchmark is made on 2 different databases that are identical except for the fact that one is UTF8 encoded and the other is UTF16 encoded. I always get the 2 columns using sqlite3_column_text16 - so when getting the string from the UTF8 database - a conversion is made, but the output strings from both databases are always the same. The benchmark is looped 10 times for better average results.

Benchmark 1:
Selecting 2 columns from the table without any WHERE or ORDER BY
UTF8.db                                        0.38s
UTF16.db                                      0.33s
As expected the UTF16 encoded database is a little bit faster since no conversion is made. The difference is:
15% slower using UTF8 encoding.

Benchmark 2:
Selecting 2 columns from the table without and WHERE, but with ORDER BY on a text-column without any index (slow)
UTF8.db                                       4.34s
UTF16.db                                    11.19s
Well, this is a slow query. Sorting a UTF8 encoded string is obviously a lot faster than sorting a UTF16 encoded string. The conversion done by sqlite3_column_text16 is not noticeable in this benchmark. Difference:
66% faster using UTF8 encoding.

Benchmark 3:
Selecting 2 columns from the table without any WHERE, but with ORDER BY on text-column WITH index.
UTF8.db                                 0.58s
UTF16.db                               0.63s
Interesting. I guess the conversion done by sqlite3_column_text16 is not noticeable compared to the extra disk/mem IO for the extra data using UTF16. Difference:
8% faster using UTF8 encoding.



In the future I am using UTF8 encoded databases since the conversion of strings is a small thing for the system. The advantages of using UTF8 are many:
1. Faster in most cases
2. Smaller databases (30% smaller in benchmark test database)
3. Less memory usage OR more information will fit in memory.

I forgot to tell you that the benchmark is made on windows XP. The conversion done in sqlite3_column_text16 may be a lot slower/faster on any other platform.


Best regards
Daniel

-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to