Re: [sqlite] Index usefulness for GROUP BY

2017-03-04 Thread Simon Slavin

On 4 Mar 2017, at 10:16pm, Jeffrey Mattox  wrote:

> Thank you for your replies.  I've found that my best index is on datetime 
> since it eliminates the most uninteresting rows.  The query plan is 
> 
> SEARCH TABLE History USING INDEX Idx_datetime (datetime>?)
> USE TEMP B-TREE FOR GROUP BY
> USE TEMP B-TREE FOR ORDER BY
> 
>> -
>> Now, I look at a recent set of rows...
>> 
>> SELECT TOTAL(gameCount), weekday  FROM History
>> WHERE datetime >= strftime('%s','now','-28 days')
>> GROUP BY weekday
>> ORDER BY 1 DESC

Create these two additional indexes on History:

(datetime, weekday)
(weekday, datetime)

Then execute the ANALYZE command.

Then find out whether this has increased or decreased the time taken for the 
SELECT.

You can delete the two indexes it turns out not to be using.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Index usefulness for GROUP BY

2017-03-04 Thread Jeffrey Mattox
Thank you for your replies.  I've found that my best index is on datetime since 
it eliminates the most uninteresting rows.  The query plan is 

SEARCH TABLE History USING INDEX Idx_datetime (datetime>?)
USE TEMP B-TREE FOR GROUP BY
USE TEMP B-TREE FOR ORDER BY

In my case, it's also best to have no other indexes.  If I add an index on the 
GROUP BY column and if that column also appears in a WHERE clause, then that 
index is used and (the better) Idx_datetime is ignored.  When I get a larger 
dataset, I'll run ANALYZE to see if the optimizer chooses Idx_datetime (which I 
know would be the best index).

Jeff


> On Mar 3, 2017, at 4:29 AM, Jeffrey Mattox  wrote:
> 
> Given this DB schema (simplified, there are other columns):
> 
>  CREATE TABLE History (
>history_ID  INTEGER PRIMARY KEY,
>gameCount  INTEGER,
>weekday  INTEGER, /* 0=Sunday, 6=Saturday */
>hour  INTEGER, /* (0..23) */
>datetime  INTEGER /* unix datetime */ );
> 
>  CREATE INDEX  Idx_weekday  ON  History( weekday );
> 
> -
> Now, I look at a recent set of rows...
> 
> SELECT TOTAL(gameCount), weekday  FROM History
>  WHERE datetime >= strftime('%s','now','-28 days')
>  GROUP BY weekday
>  ORDER BY 1 DESC
> 
> QUERY PLANS:
> without the index:
>  0  0  0  SCAN TABLE History
>  0  0  0  USE TEMP B-TREE FOR GROUP BY  <-- weekday (7 groups)
>  0  0  0  USE TEMP B-TREE FOR ORDER BY
> 
> with the index:
>  0  0  0  SCAN TABLE History USING INDEX Idx_weekday
>  0  0  0  USE TEMP B-TREE FOR ORDER BY
> 
> Either way, the entire table is scanned (right?).  My index covers the entire 
> table, but the TEMP B-TREE FOR GROUP BY contains only the rows matching the 
> WHERE clause, so the TEMP B-TREE is much smaller (right?).  So, is my index 
> on weekday worthwhile, time-wise and space-wise?  (Query speed is not a big 
> issue for me, and the DB is relatively small -- there are, at most, 60 rows 
> added per day.  Memory is plentiful, OSX).
> 
> ---
> Jeff
> 

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Index usefulness for GROUP BY

2017-03-03 Thread Clemens Ladisch
Jeffrey Mattox wrote:
> is my index on weekday worthwhile, time-wise and space-wise?  (Query
> speed is not a big issue for me, and the DB is relatively small

Indexes are optimizations.  In a small DB, the effect is probably not
noticeable, which implies that you should not bother.

Where exactly the point is at which the index becomes useful in your
system is something which you have to measure yourself.


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Index usefulness for GROUP BY

2017-03-03 Thread Hick Gunter
YES. AFAIK if SQLite detects that the rows are/can be made to be returned in 
GROUP BY order it can use internal variables to accumulate the group results. 
This is expected to be significantly faster than locating and updating a 
temporary BTree row for each record scanned.

-Ursprüngliche Nachricht-
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Jeffrey Mattox
Gesendet: Freitag, 03. März 2017 11:30
An: SQLite mailing list <sqlite-users@mailinglists.sqlite.org>
Betreff: [sqlite] Index usefulness for GROUP BY

Given this DB schema (simplified, there are other columns):

  CREATE TABLE History (
history_ID  INTEGER PRIMARY KEY,
gameCount  INTEGER,
weekday  INTEGER, /* 0=Sunday, 6=Saturday */
hour  INTEGER, /* (0..23) */
datetime  INTEGER /* unix datetime */ );

  CREATE INDEX  Idx_weekday  ON  History( weekday );

-
Now, I look at a recent set of rows...

SELECT TOTAL(gameCount), weekday  FROM History
  WHERE datetime >= strftime('%s','now','-28 days')
  GROUP BY weekday
  ORDER BY 1 DESC

QUERY PLANS:
without the index:
  0  0  0  SCAN TABLE History
  0  0  0  USE TEMP B-TREE FOR GROUP BY  <-- weekday (7 groups)
  0  0  0  USE TEMP B-TREE FOR ORDER BY

with the index:
  0  0  0  SCAN TABLE History USING INDEX Idx_weekday
  0  0  0  USE TEMP B-TREE FOR ORDER BY

Either way, the entire table is scanned (right?).  My index covers the entire 
table, but the TEMP B-TREE FOR GROUP BY contains only the rows matching the 
WHERE clause, so the TEMP B-TREE is much smaller (right?).  So, is my index on 
weekday worthwhile, time-wise and space-wise?  (Query speed is not a big issue 
for me, and the DB is relatively small -- there are, at most, 60 rows added per 
day.  Memory is plentiful, OSX).

---
Jeff

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This communication (including any attachments) is intended for the use of the 
intended recipient(s) only and may contain information that is confidential, 
privileged or legally protected. Any unauthorized use or dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please immediately notify the sender by return e-mail message and 
delete all copies of the original communication. Thank you for your cooperation.


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Index usefulness for GROUP BY

2017-03-03 Thread Jeffrey Mattox
Given this DB schema (simplified, there are other columns):

  CREATE TABLE History (
history_ID  INTEGER PRIMARY KEY,
gameCount  INTEGER,
weekday  INTEGER, /* 0=Sunday, 6=Saturday */
hour  INTEGER, /* (0..23) */
datetime  INTEGER /* unix datetime */ );

  CREATE INDEX  Idx_weekday  ON  History( weekday );

-
Now, I look at a recent set of rows...

SELECT TOTAL(gameCount), weekday  FROM History
  WHERE datetime >= strftime('%s','now','-28 days')
  GROUP BY weekday
  ORDER BY 1 DESC

QUERY PLANS:
without the index:
  0  0  0  SCAN TABLE History
  0  0  0  USE TEMP B-TREE FOR GROUP BY  <-- weekday (7 groups)
  0  0  0  USE TEMP B-TREE FOR ORDER BY

with the index:
  0  0  0  SCAN TABLE History USING INDEX Idx_weekday
  0  0  0  USE TEMP B-TREE FOR ORDER BY

Either way, the entire table is scanned (right?).  My index covers the entire 
table, but the TEMP B-TREE FOR GROUP BY contains only the rows matching the 
WHERE clause, so the TEMP B-TREE is much smaller (right?).  So, is my index on 
weekday worthwhile, time-wise and space-wise?  (Query speed is not a big issue 
for me, and the DB is relatively small -- there are, at most, 60 rows added per 
day.  Memory is plentiful, OSX).

---
Jeff

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] index for a group by

2009-10-21 Thread Simon Slavin

On 21 Oct 2009, at 11:34pm, Sylvain Pointeau wrote:

> if your "book" contains all lines (a,b,c,t,d)and you create an index  
> on
> (a,b,c,t)

I assume you meant to add ',d'in there.

> then your index is as fat as your book, isn't it?

Yes.  And it still isn't as useful for any SELECT that doesn't use the  
fields in the right order.  And it's more useful for a SELECT that  
does use those fields in the right order.  And it's less useful if you  
ever want to read the text in an order that makes any sense: I cannot  
read the index to a text book and learn the information the textbook  
wants to teach me.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] index for a group by

2009-10-21 Thread Nicolas Williams
On Thu, Oct 22, 2009 at 12:34:26AM +0200, Sylvain Pointeau wrote:
> if your "book" contains all lines (a,b,c,t,d)and you create an index on
> (a,b,c,t)
> 
> then your index is as fat as your book, isn't it?

Depends on the size of d.

Also, if you add a constraint declaring t, a, b, and c (you want 't'
first!) to be unique then the DB could make the whole thing smaller than
if you first create the table, then the index.  (I'm not sure of SQLite3
does that, but it could).

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] index for a group by

2009-10-21 Thread Sylvain Pointeau
if your "book" contains all lines (a,b,c,t,d)and you create an index on
(a,b,c,t)

then your index is as fat as your book, isn't it?

cheers,
Sylvain

On Wed, Oct 21, 2009 at 11:52 PM, Simon Slavin  wrote:

>
> On 21 Oct 2009, at 9:19pm, Sylvain Pointeau wrote:
>
> > Thank you for your answers.
> > knowing that I have a table T (a,b,c,d,t)
> > where d is a value
> > a,b,c some dimensions
> > and t the time
> >
> > where I need to make a subset with a "group by" like
> >
> > select a,b,c,sum(d)
> > from T
> > where t>x1 and t > group by a,b,c
> >
> > do you have an idea on how to choose the best index in my case?
>
> The first thing that this SELECT command is doing is rejecting all the
> rows which do not have the right value for t.  So your index should
> start with t.  Your SELECT is then grouping by a,b,c.  So my guess at
> a good index would be
>
> CREATE INDEX searchOnT ON T (t,a,b,c)
>
> You can perhaps speed up your search by replacing your 'and' with
> 'between' like this:
>
> select a,b,c,sum(d)
> from T
> where t between x1 and x2
> group by a,b,c
>
> And for other reasons it might also be better to include d in the index:
>
> CREATE INDEX searchOnT ON T (t,a,b,c)
>
> > is it better to choose (a,b,c) ?
>
> This would not be as useful because the first thing the computer is
> trying to do is reject most of the table first, by checking the value
> of t.  Only after it has done that do the values of a,b,c become
> important.
>
> > or (a,b,c,t) ?
>
> It checks the value of t first, so you want to put the t first.
>
> > (the issue is that it is like I duplicate my table right?)
>
> No.  An index is not like duplicating your table.  Think of the TABLE
> as a book, and the INDEX as the index at the back of the book.  It's
> not an entire copy of the book, it's a fast way of knowing which page
> in the book to look at.  In a normal book people do not want to find
> all the green objects, all the red objects, all the blue objects, so
> you do not make an index for object colours.  You need to know what
> people are most likely to want to find.
>
> I think you might find it helpful to read some basic information about
> databases and indexing before you start to worry about the details of
> your particular program.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] index for a group by

2009-10-21 Thread Simon Slavin

On 21 Oct 2009, at 9:19pm, Sylvain Pointeau wrote:

> Thank you for your answers.
> knowing that I have a table T (a,b,c,d,t)
> where d is a value
> a,b,c some dimensions
> and t the time
>
> where I need to make a subset with a "group by" like
>
> select a,b,c,sum(d)
> from T
> where t>x1 and t group by a,b,c
>
> do you have an idea on how to choose the best index in my case?

The first thing that this SELECT command is doing is rejecting all the  
rows which do not have the right value for t.  So your index should  
start with t.  Your SELECT is then grouping by a,b,c.  So my guess at  
a good index would be

CREATE INDEX searchOnT ON T (t,a,b,c)

You can perhaps speed up your search by replacing your 'and' with  
'between' like this:

select a,b,c,sum(d)
from T
where t between x1 and x2
group by a,b,c

And for other reasons it might also be better to include d in the index:

CREATE INDEX searchOnT ON T (t,a,b,c)

> is it better to choose (a,b,c) ?

This would not be as useful because the first thing the computer is  
trying to do is reject most of the table first, by checking the value  
of t.  Only after it has done that do the values of a,b,c become  
important.

> or (a,b,c,t) ?

It checks the value of t first, so you want to put the t first.

> (the issue is that it is like I duplicate my table right?)

No.  An index is not like duplicating your table.  Think of the TABLE  
as a book, and the INDEX as the index at the back of the book.  It's  
not an entire copy of the book, it's a fast way of knowing which page  
in the book to look at.  In a normal book people do not want to find  
all the green objects, all the red objects, all the blue objects, so  
you do not make an index for object colours.  You need to know what  
people are most likely to want to find.

I think you might find it helpful to read some basic information about  
databases and indexing before you start to worry about the details of  
your particular program.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] index for a group by

2009-10-21 Thread Sylvain Pointeau
Thank you for your answers.
knowing that I have a table T (a,b,c,d,t)
where d is a value
a,b,c some dimensions
and t the time

where I need to make a subset with a "group by" like

select a,b,c,sum(d)
from T
where t>x1 and twrote:

> Actually, I thought exactly what you said when I saw the question. When
> I saw your answer though I realized I'd been wrong, there are ways I
> could slow indexing down, and therefore, ways to speed it up.
>
> Splitting across transactions is about the insertion of data, not the
> creation of the index. This is for the case where you can't insert first
> and create the index later (maybe you're inserting a lot of data into a
> table that already has data for example.) The recommendation in this
> case is to wrap the whole batch of inserts in a transaction, but to
> commit the transaction at regular intervals, breaking the process into
> multiple pieces so that you don't spill over the memory cache.
> SUPPOSEDLY this positively impacts indexing performance, but I've not
> personally tested that claim. The more significant impact in this case
> is actually the individual transactions you avoid, which makes a huge
> difference.
>
> John
>
> -Original Message-
> From: sqlite-users-boun...@sqlite.org
> [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Pavel Ivanov
> Sent: Tuesday, October 20, 2009 2:47 PM
> To: General Discussion of SQLite Database
> Subject: Re: [sqlite] index for a group by
>
> I want to notice, John, that my words are in context "I have table
> with a lot of data, I want to create a particular index on it, how can
> I do it quickly". In this context only your 5 bullet is applicable, I
> admit I've forgot about that. And I don't understand how can one split
> creating of index across several transactions.
>
> Pavel
>
> On Tue, Oct 20, 2009 at 2:15 PM, John Crenshaw
> <johncrens...@priacta.com> wrote:
> >> Nothing in this process can be sped up.
> >
> > Actually, that isn't entirely true. While it always requires a full
> data
> > scan, Some things can make the indexing part of the process faster.
> > Since indexing is done basically using a comparative sort, anything
> that
> > would speed up the sort, will speed up the indexing.
> > 1. It is faster to sort 1000 data points, than to insert 1000
> datapoints
> > into a constantly sorted list. Creating the index after all inserts is
> > faster than creating the index, then inserting.
> > 2. If possible, avoid indexes on long data strings, since the compares
> > can be time consuming.
> > 3. If you have a field that stores one of several strings (as an
> "enum")
> > consider using integers instead. Integers have lower overhead, and can
> > be compared (and sorted) more quickly than strings.
> > 4. If you are feeling really gutsy, you could mod the code and
> implement
> > a radix sort or something similar for integer values. I'm not really
> > recommending this, just saying, inserts and lookups in a radix index
> are
> > faster than a btree.
> > 5. Make sure the memory cache is large enough for the sort. Writing
> data
> > to disk is very costly, compared to sorting in memory. Default is 2000
> > pages (2MB) worth of btree data. If you are about to build an index
> that
> > will require more btree than that, increase the size, or split across
> > several transactions.
> >
> > John
> >
> > -Original Message-
> > From: sqlite-users-boun...@sqlite.org
> > [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Pavel Ivanov
> > Sent: Tuesday, October 20, 2009 7:35 AM
> > To: General Discussion of SQLite Database
> > Subject: Re: [sqlite] index for a group by
> >
> >> please could you let me know which index could be better or faster?
> >
> > For this particular query index on (t,a,b,c) or even on (t,a,b,c,d)
> > would be better and cause the query to execute faster (of course if by
> > conditions t>x1 and t > table).
> >
> >> also do you know by chance how to speed up the index creation?
> >
> > There's no way to do that. SQLite have to scan the whole table, read
> > data from all rows and put necessary information into the index.
> > Nothing in this process can be sped up.
> >
> > Pavel
> >
> > On Mon, Oct 19, 2009 at 5:50 PM, Sylvain Pointeau
> > <sylvain.point...@gmail.com> wrote:
> >> hello,
> >> I have a table T (a,b,c,d,t)
> >> where c is a value
> >> a,b,c some dimensions
> >> and t the time
> >>
> >> I need to make a subset with a "group by"

Re: [sqlite] index for a group by

2009-10-21 Thread John Crenshaw
Actually, I thought exactly what you said when I saw the question. When
I saw your answer though I realized I'd been wrong, there are ways I
could slow indexing down, and therefore, ways to speed it up.

Splitting across transactions is about the insertion of data, not the
creation of the index. This is for the case where you can't insert first
and create the index later (maybe you're inserting a lot of data into a
table that already has data for example.) The recommendation in this
case is to wrap the whole batch of inserts in a transaction, but to
commit the transaction at regular intervals, breaking the process into
multiple pieces so that you don't spill over the memory cache.
SUPPOSEDLY this positively impacts indexing performance, but I've not
personally tested that claim. The more significant impact in this case
is actually the individual transactions you avoid, which makes a huge
difference.

John

-Original Message-
From: sqlite-users-boun...@sqlite.org
[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Pavel Ivanov
Sent: Tuesday, October 20, 2009 2:47 PM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] index for a group by

I want to notice, John, that my words are in context "I have table
with a lot of data, I want to create a particular index on it, how can
I do it quickly". In this context only your 5 bullet is applicable, I
admit I've forgot about that. And I don't understand how can one split
creating of index across several transactions.

Pavel

On Tue, Oct 20, 2009 at 2:15 PM, John Crenshaw
<johncrens...@priacta.com> wrote:
>> Nothing in this process can be sped up.
>
> Actually, that isn't entirely true. While it always requires a full
data
> scan, Some things can make the indexing part of the process faster.
> Since indexing is done basically using a comparative sort, anything
that
> would speed up the sort, will speed up the indexing.
> 1. It is faster to sort 1000 data points, than to insert 1000
datapoints
> into a constantly sorted list. Creating the index after all inserts is
> faster than creating the index, then inserting.
> 2. If possible, avoid indexes on long data strings, since the compares
> can be time consuming.
> 3. If you have a field that stores one of several strings (as an
"enum")
> consider using integers instead. Integers have lower overhead, and can
> be compared (and sorted) more quickly than strings.
> 4. If you are feeling really gutsy, you could mod the code and
implement
> a radix sort or something similar for integer values. I'm not really
> recommending this, just saying, inserts and lookups in a radix index
are
> faster than a btree.
> 5. Make sure the memory cache is large enough for the sort. Writing
data
> to disk is very costly, compared to sorting in memory. Default is 2000
> pages (2MB) worth of btree data. If you are about to build an index
that
> will require more btree than that, increase the size, or split across
> several transactions.
>
> John
>
> -Original Message-
> From: sqlite-users-boun...@sqlite.org
> [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Pavel Ivanov
> Sent: Tuesday, October 20, 2009 7:35 AM
> To: General Discussion of SQLite Database
> Subject: Re: [sqlite] index for a group by
>
>> please could you let me know which index could be better or faster?
>
> For this particular query index on (t,a,b,c) or even on (t,a,b,c,d)
> would be better and cause the query to execute faster (of course if by
> conditions t>x1 and t table).
>
>> also do you know by chance how to speed up the index creation?
>
> There's no way to do that. SQLite have to scan the whole table, read
> data from all rows and put necessary information into the index.
> Nothing in this process can be sped up.
>
> Pavel
>
> On Mon, Oct 19, 2009 at 5:50 PM, Sylvain Pointeau
> <sylvain.point...@gmail.com> wrote:
>> hello,
>> I have a table T (a,b,c,d,t)
>> where c is a value
>> a,b,c some dimensions
>> and t the time
>>
>> I need to make a subset with a "group by"
>> like
>>
>> select a,b,c,sum(d)
>> from T
>> where t>x1 and t> group by a,b,c
>>
>> I created an index on a,b,c
>> but this table is large and the index creation is time consuming (few
> hours)
>>
>> please could you let me know which index could be better or faster?
>> also do you know by chance how to speed up the index creation?
>>
>> Best regards,
>> Sylvain
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
> ___
> sqlite-users m

Re: [sqlite] index for a group by

2009-10-20 Thread Pavel Ivanov
I want to notice, John, that my words are in context "I have table
with a lot of data, I want to create a particular index on it, how can
I do it quickly". In this context only your 5 bullet is applicable, I
admit I've forgot about that. And I don't understand how can one split
creating of index across several transactions.

Pavel

On Tue, Oct 20, 2009 at 2:15 PM, John Crenshaw <johncrens...@priacta.com> wrote:
>> Nothing in this process can be sped up.
>
> Actually, that isn't entirely true. While it always requires a full data
> scan, Some things can make the indexing part of the process faster.
> Since indexing is done basically using a comparative sort, anything that
> would speed up the sort, will speed up the indexing.
> 1. It is faster to sort 1000 data points, than to insert 1000 datapoints
> into a constantly sorted list. Creating the index after all inserts is
> faster than creating the index, then inserting.
> 2. If possible, avoid indexes on long data strings, since the compares
> can be time consuming.
> 3. If you have a field that stores one of several strings (as an "enum")
> consider using integers instead. Integers have lower overhead, and can
> be compared (and sorted) more quickly than strings.
> 4. If you are feeling really gutsy, you could mod the code and implement
> a radix sort or something similar for integer values. I'm not really
> recommending this, just saying, inserts and lookups in a radix index are
> faster than a btree.
> 5. Make sure the memory cache is large enough for the sort. Writing data
> to disk is very costly, compared to sorting in memory. Default is 2000
> pages (2MB) worth of btree data. If you are about to build an index that
> will require more btree than that, increase the size, or split across
> several transactions.
>
> John
>
> -Original Message-
> From: sqlite-users-boun...@sqlite.org
> [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Pavel Ivanov
> Sent: Tuesday, October 20, 2009 7:35 AM
> To: General Discussion of SQLite Database
> Subject: Re: [sqlite] index for a group by
>
>> please could you let me know which index could be better or faster?
>
> For this particular query index on (t,a,b,c) or even on (t,a,b,c,d)
> would be better and cause the query to execute faster (of course if by
> conditions t>x1 and t table).
>
>> also do you know by chance how to speed up the index creation?
>
> There's no way to do that. SQLite have to scan the whole table, read
> data from all rows and put necessary information into the index.
> Nothing in this process can be sped up.
>
> Pavel
>
> On Mon, Oct 19, 2009 at 5:50 PM, Sylvain Pointeau
> <sylvain.point...@gmail.com> wrote:
>> hello,
>> I have a table T (a,b,c,d,t)
>> where c is a value
>> a,b,c some dimensions
>> and t the time
>>
>> I need to make a subset with a "group by"
>> like
>>
>> select a,b,c,sum(d)
>> from T
>> where t>x1 and t> group by a,b,c
>>
>> I created an index on a,b,c
>> but this table is large and the index creation is time consuming (few
> hours)
>>
>> please could you let me know which index could be better or faster?
>> also do you know by chance how to speed up the index creation?
>>
>> Best regards,
>> Sylvain
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] index for a group by

2009-10-20 Thread John Crenshaw
> Nothing in this process can be sped up.

Actually, that isn't entirely true. While it always requires a full data
scan, Some things can make the indexing part of the process faster.
Since indexing is done basically using a comparative sort, anything that
would speed up the sort, will speed up the indexing.
1. It is faster to sort 1000 data points, than to insert 1000 datapoints
into a constantly sorted list. Creating the index after all inserts is
faster than creating the index, then inserting.
2. If possible, avoid indexes on long data strings, since the compares
can be time consuming.
3. If you have a field that stores one of several strings (as an "enum")
consider using integers instead. Integers have lower overhead, and can
be compared (and sorted) more quickly than strings.
4. If you are feeling really gutsy, you could mod the code and implement
a radix sort or something similar for integer values. I'm not really
recommending this, just saying, inserts and lookups in a radix index are
faster than a btree.
5. Make sure the memory cache is large enough for the sort. Writing data
to disk is very costly, compared to sorting in memory. Default is 2000
pages (2MB) worth of btree data. If you are about to build an index that
will require more btree than that, increase the size, or split across
several transactions.

John

-Original Message-
From: sqlite-users-boun...@sqlite.org
[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Pavel Ivanov
Sent: Tuesday, October 20, 2009 7:35 AM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] index for a group by

> please could you let me know which index could be better or faster?

For this particular query index on (t,a,b,c) or even on (t,a,b,c,d)
would be better and cause the query to execute faster (of course if by
conditions t>x1 and t also do you know by chance how to speed up the index creation?

There's no way to do that. SQLite have to scan the whole table, read
data from all rows and put necessary information into the index.
Nothing in this process can be sped up.

Pavel

On Mon, Oct 19, 2009 at 5:50 PM, Sylvain Pointeau
<sylvain.point...@gmail.com> wrote:
> hello,
> I have a table T (a,b,c,d,t)
> where c is a value
> a,b,c some dimensions
> and t the time
>
> I need to make a subset with a "group by"
> like
>
> select a,b,c,sum(d)
> from T
> where t>x1 and t group by a,b,c
>
> I created an index on a,b,c
> but this table is large and the index creation is time consuming (few
hours)
>
> please could you let me know which index could be better or faster?
> also do you know by chance how to speed up the index creation?
>
> Best regards,
> Sylvain
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] index for a group by

2009-10-20 Thread Pavel Ivanov
> please could you let me know which index could be better or faster?

For this particular query index on (t,a,b,c) or even on (t,a,b,c,d)
would be better and cause the query to execute faster (of course if by
conditions t>x1 and t also do you know by chance how to speed up the index creation?

There's no way to do that. SQLite have to scan the whole table, read
data from all rows and put necessary information into the index.
Nothing in this process can be sped up.

Pavel

On Mon, Oct 19, 2009 at 5:50 PM, Sylvain Pointeau
 wrote:
> hello,
> I have a table T (a,b,c,d,t)
> where c is a value
> a,b,c some dimensions
> and t the time
>
> I need to make a subset with a "group by"
> like
>
> select a,b,c,sum(d)
> from T
> where t>x1 and t group by a,b,c
>
> I created an index on a,b,c
> but this table is large and the index creation is time consuming (few hours)
>
> please could you let me know which index could be better or faster?
> also do you know by chance how to speed up the index creation?
>
> Best regards,
> Sylvain
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] index for a group by

2009-10-19 Thread Sylvain Pointeau
hello,
I have a table T (a,b,c,d,t)
where c is a value
a,b,c some dimensions
and t the time

I need to make a subset with a "group by"
like

select a,b,c,sum(d)
from T
where t>x1 and thttp://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users