[sqlite] benchmarking UTF8 vs UTF16 encoded databases

Daniel Önnerby Thu, 22 Nov 2007 13:05:05 -0800

When I started using SQLite I found it natural to use the sqlite3_open16and use UTF16 encoding on strings since my applications always usewchar_t when handeling strings. I never questioned this until now when Idecided to do some benchmark, and I found it interesting enough to sharewith you.

In my benchmark I used a database with several tables and indexes andthe table I decided to benchmark contains 10 columns and 14000 rows withdifferent types. It's a well normalized database that is used in a reallife application.

The benchmark is made on 2 different databases that are identical exceptfor the fact that one is UTF8 encoded and the other is UTF16 encoded. Ialways get the 2 columns using sqlite3_column_text16 - so when gettingthe string from the UTF8 database - a conversion is made, but the outputstrings from both databases are always the same. The benchmark is looped10 times for better average results.


Benchmark 1:
Selecting 2 columns from the table without any WHERE or ORDER BY
UTF8.db                                        0.38s
UTF16.db                                      0.33s

As expected the UTF16 encoded database is a little bit faster since noconversion is made. The difference is:

15% slower using UTF8 encoding.

Benchmark 2:

Selecting 2 columns from the table without and WHERE, but with ORDER BYon a text-column without any index (slow)

UTF8.db                                       4.34s
UTF16.db                                    11.19s

Well, this is a slow query. Sorting a UTF8 encoded string is obviously alot faster than sorting a UTF16 encoded string. The conversion done bysqlite3_column_text16 is not noticeable in this benchmark. Difference:

66% faster using UTF8 encoding.

Benchmark 3:

Selecting 2 columns from the table without any WHERE, but with ORDER BYon text-column WITH index.

UTF8.db                                 0.58s
UTF16.db                               0.63s

Interesting. I guess the conversion done by sqlite3_column_text16 isnot noticeable compared to the extra disk/mem IO for the extra datausing UTF16. Difference:

8% faster using UTF8 encoding.

In the future I am using UTF8 encoded databases since the conversion ofstrings is a small thing for the system. The advantages of using UTF8are many:

1. Faster in most cases
2. Smaller databases (30% smaller in benchmark test database)
3. Less memory usage OR more information will fit in memory.

I forgot to tell you that the benchmark is made on windows XP. Theconversion done in sqlite3_column_text16 may be a lot slower/faster onany other platform.



Best regards
Daniel

-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

[sqlite] benchmarking UTF8 vs UTF16 encoded databases

Reply via email to