Version 3.5.3 made a change to the way DISTINCT is processed.   
Probably that change is making your particular case much slower.  The  
change can be seen at:

    http://www.sqlite.org/cvstrac/chngview?cn=4538

This change was in response to grumbling on the mailing list

    http://www.mail-archive.com/[email protected]/msg28894.html

It would appear that I need to spend some time improving this  
optimization - enabling it only in cases where it seems likely to  
improve performance and disabling it in cases like yours were it makes  
things much worse.  We'll try to have a look at that for version 3.6.9.


On Jan 5, 2009, at 7:13 PM, Jason Freund wrote:

> Dan,
>
> Thanks, I sent a separate email containing the source and data for  
> my test app since it's too big (5M) to post to the group.   But I'm  
> also posting just the code for the test app here, in case anyone  
> wants to take a look.  On my target hardware, the results I got for  
> the most egregious regression (doing prepare and step on a  
> selection) were:
>
>               3.4.0     3.6.5
> 100 records     4ms      20ms
> 500 records    1.4ms     89ms
> 1000 records   1.5ms     173ms
>
> If anyone else is interested in seeing the data to try to reproduce  
> and track the problem, please let me know, and I'll send you the  
> full zip file.
>
> Thanks,
> Jason
>
>
> ===
> / 
> ******************************************************************************
> * This app prepares a statement and steps 50 times in a database of  
> 100,
> * 500 or 1000 records, using a SQL statement that seems to demonstrate
> * a performance regression between 3.4.0 and 3.6.5
> *
> * The raw results we got when running this test on some cellphone  
> reference
> * hardware running an ARM5/~600Mhz chip and NAND storage are:
> *                          3.4.0     3.6.5
> * N=100, Prepare, StepX50  4ms       20ms
> * N=500, Prepare, StepX50  1.4ms     89
> * N=1000 Prepare, StepX50  1.5ms     173ms
> *
> * While this test demonstrates the worst degradation: constant  
> performance
> * for varying data sizes with 3.4.0 and linear performance for  
> 3.6.5, there
> * were several other test cases that were linear on both 340 and  
> 365, but
> * were consistently 30% slower for 365.  These other cases included a
> * different query, and a couple of INSERT statements, but we didn't
> * include those results in this test.
> *
> * Any advice or insight on this case would be appreciated:
> * [email protected]
> *****************************************************************************/
>
> #include <string.h>
> #include <stdio.h>
> #include <sqlite3.h>
> #include <wchar.h>
> #include <glib.h>
> #include <sys/time.h>
>
> /* Return delta between oldTime - currentTime in millisecons */
> float TimeInterval(struct timeval *oldTvP)
> {
>       struct timeval tv;
>       gettimeofday(&tv, NULL);
>
>    if(oldTvP == NULL)
>        return(0);
>
>    tv.tv_sec -= oldTvP->tv_sec;
>       if (tv.tv_usec < oldTvP->tv_usec) {
>        tv.tv_usec += 1000000 - oldTvP->tv_usec;
>               tv.tv_sec--;
>       } else {
>        tv.tv_usec -= oldTvP->tv_usec;
>    }
>
>       return (((float)tv.tv_sec*1000) + tv.tv_usec/1000.0f);
> }
>
> static int my_utf8_collator(void* user_data, int Length1,const void*  
> Start1,
>       int Length2, const void* Start2)
> {
>    int Result = 0;
>
>    gchar* Char1 = (gchar*) Start1;
>    gchar* Stop1 = Char1 + Length1;
>    wchar_t Wide1[2] = {0, 0};
>
>    gchar* Char2 = (gchar*) Start2;
>    gchar* Stop2 = Char2 + Length2;
>    wchar_t Wide2[2] = {0, 0};
>
>    while (0 == Result && Char1 < Stop1 && Char2 < Stop2)
>    {
>        Wide1[0] = (wchar_t) g_utf8_get_char(Char1);
>        Wide2[0] = (wchar_t) g_utf8_get_char(Char2);
>
>        Result = wcscoll(Wide1, Wide2);
>
>        if (0 == Result)
>        {
>            Char1 = g_utf8_next_char(Char1);
>            Char2 = g_utf8_next_char(Char2);
>        }
>    }
>
>    if (Result == 0) {
>      Result = Length1 - Length2;
>    }
>
>    return Result;
> }
>
> void SelectionTest(char *dbName, int category)
> {
>       sqlite3 *dbH = NULL;
>       struct timeval oldTv;
>       float totalTime=0.0;
>       const unsigned char *junkStringP = NULL;
>       char *dummy = NULL, sqlStrP[512];
>       int sqlErr, loopCount=0, junk;
>    sqlite3_stmt *stmtP = NULL;
>    char *selectSql = "SELECT DISTINCT PERSON.PersonLUID, FirstName,  
> LastName, Company, EnglishName, Note NOTNULL, SIMPhoneBook,  
> DisplayNameLast FROM PERSON, _CATEGORY_MEMBERSHIP WHERE  
> _CATEGORY_MEMBERSHIP.EntityLUID = PERSON.PersonLUID AND  
> _CATEGORY_MEMBERSHIP.CategoryLUID = %d ORDER BY LastName, FirstName,  
> Company, SIMPhoneBook, PERSON.PersonLUID COLLATE my_utf8_collator  
> LIMIT 100 OFFSET 0;";
>    char *selectSql2 = "SELECT DISTINCT PERSON.PersonLUID, FirstName,  
> LastName, Company, EnglishName, Note NOTNULL, SIMPhoneBook,  
> DisplayNameLast FROM PERSON ORDER BY LastName, FirstName, Company,  
> SIMPhoneBook, PERSON.PersonLUID COLLATE my_utf8_collator LIMIT 100  
> OFFSET 0;";
>
>       /* Open SQLite database connection */
>       sqlErr = sqlite3_open(dbName, &dbH);
>       if (sqlErr) {
>               printf("sqlite3_open(%s) = 0x%x\n", dbName, sqlErr);
>       }
>
>       /* Install user UTF-8 collate function */
>    sqlErr = sqlite3_create_collation(
>               dbH, /* (sqlite3 *) */
>               "my_utf8_collator", /* (const char *)zName */
>               SQLITE_UTF8, /* int enc */
>               NULL, /* (void *)pCtx */
>               &my_utf8_collator /* int(xCompare) */
>       );
>       if (sqlErr) {
>               printf("sqlite3_create_collation = 0x%x\n", sqlErr);
>       }
>
>       /* Build first SQL statement, with JOIN */
>       sprintf(sqlStrP, selectSql, category);
>
>       /* Start timer, prepare statement and step up to 50 times */
>       gettimeofday(&oldTv, NULL);
>    sqlErr = sqlite3_prepare(dbH, sqlStrP, strlen(sqlStrP),
>               &stmtP, (const char **) &dummy);
>
>       while (((sqlErr = sqlite3_step(stmtP)) == SQLITE_ROW) && loopCount  
> < 50) {
>               /* Include a few column fetches from each row in the timing */
>        junk = sqlite3_column_int(stmtP, 0);
>        junkStringP = sqlite3_column_text(stmtP, 7);
>        junk = sqlite3_column_int(stmtP, 6);
>        junk = sqlite3_column_int(stmtP, 5);
>
>               loopCount++;
>       }
>
>       /* Finalize statement, stop timer, and report results */
>       sqlErr = sqlite3_finalize(stmtP);
>    totalTime = TimeInterval (&oldTv);
>    printf("A) Create select statement with Category and iterate %d "
>               "times = %3.3f ms\n", loopCount, totalTime);
>
>       /* Start timer, prepare statement and step up to 50 times */
>       gettimeofday(&oldTv, NULL);
>    sqlErr = sqlite3_prepare(dbH, selectSql2, strlen(selectSql2),
>               &stmtP, (const char **) &dummy);
>
>       while (((sqlErr = sqlite3_step(stmtP)) == SQLITE_ROW) && loopCount  
> < 50) {
>               /* Include a few column fetches from each row in the timing */
>        junk = sqlite3_column_int(stmtP, 0);
>        junkStringP = sqlite3_column_text(stmtP, 7);
>        junk = sqlite3_column_int(stmtP, 6);
>        junk = sqlite3_column_int(stmtP, 5);
>
>               loopCount++;
>       }
>
>       /* Finalize statement, stop timer, and report results */
>       sqlErr = sqlite3_finalize(stmtP);
>    totalTime = TimeInterval (&oldTv);
>    printf("B) Create select statement (no Category) and iterate %d "
>               "times = %3.3f ms\n\n", loopCount, totalTime);
>
>       /* Close DB */
>       sqlErr = sqlite3_close(dbH);
> }
>
> /* Run 3 queries and report timings */
> int main(int argc, char *argv[])
> {
>       const char *versionP;
>
>       versionP = sqlite3_libversion();
>       printf("SQLite %s\n", versionP);
>
>    SelectionTest("contacts_100.db", -1073743423);
>    SelectionTest("contacts_500.db", -1073743576);
>    SelectionTest("contacts_1000.db", -1073741927);
>
>       return 0;
> }
>
> / 
> *Schema 
> **********************************************************************
> BEGIN TRANSACTION;
>
> CREATE TABLE PERSON(
>    PersonLUID integer not null primary key,
>    FirstName text default null collate nocase,
>    MiddleName text default null,
>    LastName text default null collate nocase,
>    Title text default null,
>    Company text default null collate nocase,
>    Prefix text default null,
>    Suffix text default null,
>    NickName text default null,
>    EnglishName text default null,
>    YomiFirstName text default null,
>    YomiLastName text default null,
>    YomiCompany text default null,
>    Ringtone text default null,
>    VideoRingtone text default null,
>    Note text default null,
>
>       FriendlyName text default null,
>       DisplayNameLast text default null,
>       DisplayNameCompany text default null,
>       DisplayNameFirst text default null,
>       TwoCharLookup text default null,
>
>    Birthday integer default null,
>    BirthdayAlarm integer default null,
>    Anniversary integer default null,
>    AnniversaryAlarm integer default null,
>
>    UseCount integer default 0,
>       DontBeamNote boolean default false,
>       IsBusinessCard boolean default false,
>
>    SIMPhoneBook integer default 0,
>    SIMEntryID integer default 0
> );
>
> CREATE INDEX FirstNameIndex ON PERSON(FirstName);
> CREATE INDEX LastNameIndex ON PERSON(LastName);
> CREATE INDEX CompanyIndex ON PERSON(Company);
> CREATE INDEX UseCountIndex ON PERSON(UseCount);
> CREATE INDEX TwoCharLookupIndex ON PERSON(TwoCharLookup);
> CREATE INDEX LastFirstCompanyIndex ON  
> PERSON(LastName,FirstName,Company,SIMPhoneBook);
> CREATE INDEX FirstLastCompanyIndex ON  
> PERSON(FirstName,LastName,Company,SIMPhoneBook);
> CREATE INDEX CompanyLastFirstIndex ON  
> PERSON(Company,LastName,FirstName,SIMPhoneBook);
>
> CREATE TABLE ADDRESS(
>       AddressLUID integer primary key default 1,
>       PersonLUID integer not null,
>       Identifier text default null,
>       Address text default null,
>       City text default null,
>       State text default null,
>       PostalCode text default null,
>       Country text default null,
>       VCardSrc text default null
> );
>
> CREATE INDEX AddressPersonLUIDIndex ON ADDRESS(PersonLUID);
>
> CREATE TABLE EXTRAFIELDS(
>       ExtraFieldLUID integer primary key default 1,
>       PersonLUID integer not null,
>       Identifier text default null,
>       Value text default null,
>       VCardSrc text default null
> );
>
> CREATE INDEX ExtraFieldPersonLUIDIndex ON EXTRAFIELDS(PersonLUID);
>
> CREATE TABLE CONTACTS(
>       ContactLUID integer primary key default 1,
>       PersonLUID integer not null,
>       Identifier text default null,
>       Value text default null,
>       ContactType text default null,
>       UseCount integer default 0,
>       DefaultContact boolean default false,
>       SpeedDial integer default 0,
>       VCardSrc text default null
> );
>
> CREATE INDEX ContactPersonLUIDIndex ON CONTACTS(PersonLUID);
>
> CREATE TABLE THUMBNAIL(
>       ThumbnailLUID integer primary key default 1,
>       PersonLUID integer not null,
>       SmallThumbnail blob not null,
>       LargeThumbnail blob not null
> );
>
> CREATE INDEX ThumbnailPersonLUIDIndex ON THUMBNAIL(PersonLUID);
>
> CREATE TABLE CUSTOMLABELS(
>       CustomLabelLUID integer primary key default 1,
>       Label text not null,
>       Identifier text not null
> );
>
> CREATE TABLE VERSION(
>       Version integer primary key default 1
> );
>
> INSERT INTO VERSION VALUES(3);
>
> CREATE TABLE LOADEDDEFAULTS(
>       HasDefaults integer primary key default 0
> );
>
> CREATE TABLE CATEGORY_RINGTONES(
>       CategoryLUID integer not null,
>    Ringtone text default null,
>       RingtoneType integer default 0
> );
>
> CREATE TABLE RECORD_COUNT_CACHE(
>       CategoryLUID integer not null primary key,
>    RecordCount integer default null
> );
>
> CREATE TABLE DISTRIBUTION(
>       DistributionLUID integer primary key default 1,
>       Value text not null,
>       DistributionType text not null
> );
>
> CREATE TABLE DISTRIBUTION_MEMBERSHIP(
>       MembershipLUID integer primary key default 1,
>       DistributionLUID integer not null,
>       PersonLUID integer not null,
>       Identifier text not null
> );
>
> CREATE INDEX DistMembershipIndex ON  
> DISTRIBUTION_MEMBERSHIP(DistributionLUID,PersonLUID,Identifier);
>
> COMMIT;
>
> BEGIN TRANSACTION;
>
> CREATE TABLE _CATEGORY(
>       CategoryLUID integer not null primary key default 1,
>       Name text not null,
>       ReadOnly int not null
> );
>
> CREATE TABLE _CATEGORY_MEMBERSHIP(
>       CategoryMemberLUID integer not null primary key default 1,
>       CategoryLUID integer not null,
>       EntityLUID integer not null
> );
>
> COMMIT;";
> *****************************************************************************/
>
>
> On Dec 25, 2008, at 5:01 AM, Jason Freund wrote:
>
>> Hi,
>>
>>
>>
>> We recently upgraded from 3.4.0 to 3.6.5, and noticed that our  
>> little,
>> internal sqlite benchmark test program performed much worse on most
>> operations with the new version.  We expected to see an improvement
>> because 3.6.1 advertises a performance enhancement with the lookaside
>> buffer.
>>
>>
>>
>> Our small performance test app showed regressions averaging 25%
>> slowdown
>> for all read and write operations, but one test case was very
>> striking:
>> going from 2ms in 3.4.0 to 170ms in 3.6.5.  This test prepared a
>> "SELECT
>> DISTINCT from one table with an ORDER BY, COLLATE, and LIMIT" and  
>> then
>> stepped 50 times through that statement.
>>
>>
>>
>> I was wondering if anyone else encountered a regression, and  
>> whether I
>> can interest someone from the dev community in looking into it, or
>> whether I should file a bug?  Also, I may be able to do more builds  
>> to
>> help isolate when the regressions were introduced, but I'm not sure  
>> if
>> there is an easy way to download code for earlier releases?  I  
>> haven't
>> used cvs before - is there an easy way to checkout the code for some
>> kind of "label" that represents an official release, or is there a
>> depot
>> somewhere that I can just download earlier distros?
>
> If you post the performance test program I'll take a look at it.
>
> Dan.
>
> _______________________________________________
> sqlite-users mailing list
> [email protected]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

D. Richard Hipp
[email protected]



_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to