Thank you for your answers. knowing that I have a table T (a,b,c,d,t) where d is a value a,b,c some dimensions and t the time
where I need to make a subset with a "group by" like select a,b,c,sum(d) from T where t>x1 and t<x2 group by a,b,c do you have an idea on how to choose the best index in my case? is it better to choose (a,b,c) ? or (a,b,c,t) ? (the issue is that it is like I duplicate my table right?) I don't think it is useful to index d as it is a value, right? does it make a huge difference if I choose (a,b)? Thanks again, Cheers, Sylvain On Wed, Oct 21, 2009 at 8:53 AM, John Crenshaw <johncrens...@priacta.com>wrote: > Actually, I thought exactly what you said when I saw the question. When > I saw your answer though I realized I'd been wrong, there are ways I > could slow indexing down, and therefore, ways to speed it up. > > Splitting across transactions is about the insertion of data, not the > creation of the index. This is for the case where you can't insert first > and create the index later (maybe you're inserting a lot of data into a > table that already has data for example.) The recommendation in this > case is to wrap the whole batch of inserts in a transaction, but to > commit the transaction at regular intervals, breaking the process into > multiple pieces so that you don't spill over the memory cache. > SUPPOSEDLY this positively impacts indexing performance, but I've not > personally tested that claim. The more significant impact in this case > is actually the individual transactions you avoid, which makes a huge > difference. > > John > > -----Original Message----- > From: sqlite-users-boun...@sqlite.org > [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Pavel Ivanov > Sent: Tuesday, October 20, 2009 2:47 PM > To: General Discussion of SQLite Database > Subject: Re: [sqlite] index for a group by > > I want to notice, John, that my words are in context "I have table > with a lot of data, I want to create a particular index on it, how can > I do it quickly". In this context only your 5 bullet is applicable, I > admit I've forgot about that. And I don't understand how can one split > creating of index across several transactions. > > Pavel > > On Tue, Oct 20, 2009 at 2:15 PM, John Crenshaw > <johncrens...@priacta.com> wrote: > >> Nothing in this process can be sped up. > > > > Actually, that isn't entirely true. While it always requires a full > data > > scan, Some things can make the indexing part of the process faster. > > Since indexing is done basically using a comparative sort, anything > that > > would speed up the sort, will speed up the indexing. > > 1. It is faster to sort 1000 data points, than to insert 1000 > datapoints > > into a constantly sorted list. Creating the index after all inserts is > > faster than creating the index, then inserting. > > 2. If possible, avoid indexes on long data strings, since the compares > > can be time consuming. > > 3. If you have a field that stores one of several strings (as an > "enum") > > consider using integers instead. Integers have lower overhead, and can > > be compared (and sorted) more quickly than strings. > > 4. If you are feeling really gutsy, you could mod the code and > implement > > a radix sort or something similar for integer values. I'm not really > > recommending this, just saying, inserts and lookups in a radix index > are > > faster than a btree. > > 5. Make sure the memory cache is large enough for the sort. Writing > data > > to disk is very costly, compared to sorting in memory. Default is 2000 > > pages (2MB) worth of btree data. If you are about to build an index > that > > will require more btree than that, increase the size, or split across > > several transactions. > > > > John > > > > -----Original Message----- > > From: sqlite-users-boun...@sqlite.org > > [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Pavel Ivanov > > Sent: Tuesday, October 20, 2009 7:35 AM > > To: General Discussion of SQLite Database > > Subject: Re: [sqlite] index for a group by > > > >> please could you let me know which index could be better or faster? > > > > For this particular query index on (t,a,b,c) or even on (t,a,b,c,d) > > would be better and cause the query to execute faster (of course if by > > conditions t>x1 and t<x2 you don't select almost all rows in the > > table). > > > >> also do you know by chance how to speed up the index creation? > > > > There's no way to do that. SQLite have to scan the whole table, read > > data from all rows and put necessary information into the index. > > Nothing in this process can be sped up. > > > > Pavel > > > > On Mon, Oct 19, 2009 at 5:50 PM, Sylvain Pointeau > > <sylvain.point...@gmail.com> wrote: > >> hello, > >> I have a table T (a,b,c,d,t) > >> where c is a value > >> a,b,c some dimensions > >> and t the time > >> > >> I need to make a subset with a "group by" > >> like > >> > >> select a,b,c,sum(d) > >> from T > >> where t>x1 and t<x2 > >> group by a,b,c > >> > >> I created an index on a,b,c > >> but this table is large and the index creation is time consuming (few > > hours) > >> > >> please could you let me know which index could be better or faster? > >> also do you know by chance how to speed up the index creation? > >> > >> Best regards, > >> Sylvain > >> _______________________________________________ > >> sqlite-users mailing list > >> sqlite-users@sqlite.org > >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > >> > > _______________________________________________ > > sqlite-users mailing list > > sqlite-users@sqlite.org > > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > _______________________________________________ > > sqlite-users mailing list > > sqlite-users@sqlite.org > > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > > _______________________________________________ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users