Thank you for your answers.
knowing that I have a table T (a,b,c,d,t)
where d is a value
a,b,c some dimensions
and t the time

where I need to make a subset with a "group by" like

select a,b,c,sum(d)
from T
where t>x1 and t<x2
group by a,b,c


do you have an idea on how to choose the best index in my case?

is it better to choose (a,b,c) ?
or (a,b,c,t) ? (the issue is that it is like I duplicate my table right?)

I don't think it is useful to index d as it is a value, right?

does it make a huge difference if I choose (a,b)?


Thanks again,
Cheers,
Sylvain


On Wed, Oct 21, 2009 at 8:53 AM, John Crenshaw <johncrens...@priacta.com>wrote:

> Actually, I thought exactly what you said when I saw the question. When
> I saw your answer though I realized I'd been wrong, there are ways I
> could slow indexing down, and therefore, ways to speed it up.
>
> Splitting across transactions is about the insertion of data, not the
> creation of the index. This is for the case where you can't insert first
> and create the index later (maybe you're inserting a lot of data into a
> table that already has data for example.) The recommendation in this
> case is to wrap the whole batch of inserts in a transaction, but to
> commit the transaction at regular intervals, breaking the process into
> multiple pieces so that you don't spill over the memory cache.
> SUPPOSEDLY this positively impacts indexing performance, but I've not
> personally tested that claim. The more significant impact in this case
> is actually the individual transactions you avoid, which makes a huge
> difference.
>
> John
>
> -----Original Message-----
> From: sqlite-users-boun...@sqlite.org
> [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Pavel Ivanov
> Sent: Tuesday, October 20, 2009 2:47 PM
> To: General Discussion of SQLite Database
> Subject: Re: [sqlite] index for a group by
>
> I want to notice, John, that my words are in context "I have table
> with a lot of data, I want to create a particular index on it, how can
> I do it quickly". In this context only your 5 bullet is applicable, I
> admit I've forgot about that. And I don't understand how can one split
> creating of index across several transactions.
>
> Pavel
>
> On Tue, Oct 20, 2009 at 2:15 PM, John Crenshaw
> <johncrens...@priacta.com> wrote:
> >> Nothing in this process can be sped up.
> >
> > Actually, that isn't entirely true. While it always requires a full
> data
> > scan, Some things can make the indexing part of the process faster.
> > Since indexing is done basically using a comparative sort, anything
> that
> > would speed up the sort, will speed up the indexing.
> > 1. It is faster to sort 1000 data points, than to insert 1000
> datapoints
> > into a constantly sorted list. Creating the index after all inserts is
> > faster than creating the index, then inserting.
> > 2. If possible, avoid indexes on long data strings, since the compares
> > can be time consuming.
> > 3. If you have a field that stores one of several strings (as an
> "enum")
> > consider using integers instead. Integers have lower overhead, and can
> > be compared (and sorted) more quickly than strings.
> > 4. If you are feeling really gutsy, you could mod the code and
> implement
> > a radix sort or something similar for integer values. I'm not really
> > recommending this, just saying, inserts and lookups in a radix index
> are
> > faster than a btree.
> > 5. Make sure the memory cache is large enough for the sort. Writing
> data
> > to disk is very costly, compared to sorting in memory. Default is 2000
> > pages (2MB) worth of btree data. If you are about to build an index
> that
> > will require more btree than that, increase the size, or split across
> > several transactions.
> >
> > John
> >
> > -----Original Message-----
> > From: sqlite-users-boun...@sqlite.org
> > [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Pavel Ivanov
> > Sent: Tuesday, October 20, 2009 7:35 AM
> > To: General Discussion of SQLite Database
> > Subject: Re: [sqlite] index for a group by
> >
> >> please could you let me know which index could be better or faster?
> >
> > For this particular query index on (t,a,b,c) or even on (t,a,b,c,d)
> > would be better and cause the query to execute faster (of course if by
> > conditions t>x1 and t<x2 you don't select almost all rows in the
> > table).
> >
> >> also do you know by chance how to speed up the index creation?
> >
> > There's no way to do that. SQLite have to scan the whole table, read
> > data from all rows and put necessary information into the index.
> > Nothing in this process can be sped up.
> >
> > Pavel
> >
> > On Mon, Oct 19, 2009 at 5:50 PM, Sylvain Pointeau
> > <sylvain.point...@gmail.com> wrote:
> >> hello,
> >> I have a table T (a,b,c,d,t)
> >> where c is a value
> >> a,b,c some dimensions
> >> and t the time
> >>
> >> I need to make a subset with a "group by"
> >> like
> >>
> >> select a,b,c,sum(d)
> >> from T
> >> where t>x1 and t<x2
> >> group by a,b,c
> >>
> >> I created an index on a,b,c
> >> but this table is large and the index creation is time consuming (few
> > hours)
> >>
> >> please could you let me know which index could be better or faster?
> >> also do you know by chance how to speed up the index creation?
> >>
> >> Best regards,
> >> Sylvain
> >> _______________________________________________
> >> sqlite-users mailing list
> >> sqlite-users@sqlite.org
> >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >>
> > _______________________________________________
> > sqlite-users mailing list
> > sqlite-users@sqlite.org
> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> > _______________________________________________
> > sqlite-users mailing list
> > sqlite-users@sqlite.org
> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to