Re: [sqlite] Implicit INDEX?
You can make the timestamps unique by appending a sequence number. A 32 bit Unix timestamp can map into a 64 bit Sqlite integer. P Kishor wrote: > On 4/14/08, Chris Tracy <[EMAIL PROTECTED]> wrote: >> Err, >> >> Make that example table data: >> >> <...> >> >> 1167611400|89|7|34 >> 1167611400|101|5|4 >> 1167611400|147|9|14 >> 1167611400|173|7|2 >> 1167611400|195|8|1 >> 1167611400|314|8|12 >> 1167611400|347|6|48 >> 1167611400|364|1|49 >> >> >> >> 1167615600|111|7|39 >> 1167615600|155|2|8 >> 1167615600|157|4|7 >> 1167615600|220|4|47 >> 1167615600|247|7|21 >> 1167615600|261|8|30 >> 1167615600|308|9|20 >> >> <...> >> >> As I'd originally said, the timestamps are inserted in ascending >> order. (I just screwed up pasting my example values in the wrong order) >> >> > > > ok, but first, I don't really understand the following -- > >> SELECT * from bridge_table WHERE timestamp = ; >> >> and almost immediately get back all the results I want, since they're >> right next to each other in the db. However, the actual SELECT doesn't >> return for almost 13 additional seconds, as the entire table is scanned >> for other rows where timestamp might be > > Your first sentence above is that you "almost immediately get back all > the results you want" then you go on to say that the "actual SELECT > doesn't return for almost 13 additional seconds." So, what is it > exactly that you "almost immediately get back"? > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Implicit INDEX?
Have tested this now on a table of some 30 rows (no indexes at all) and with 100 rows to find in the middle of the table, sorted asc on time-stamp field. It gave me a speed increase of about 25%. If I looked for rows at the beginning of the table the speed increase was more, some 50% faster. If I looked for rows at the end (highest timestamp) then the simple select was quite a lot faster than the one with limit etc. Maybe not the kind of increase you were interested in, but still something and with no extra overhead at all, just a different SQL. Probably only worth it if looking for early times. RBS -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: 14 April 2008 19:11 To: General Discussion of SQLite Database Subject: Re: [sqlite] Implicit INDEX? How about this: select * from table1 where rowid >= (select rowid from table1 where time_stamp = xxx limit 1) and rowid < (select rowid from table1 where time_stamp > xxx limit 1) RBS > Donald, > >> To test this, I think you'd want to create a select for some of the most >> recent data (i.e. data at the tail of the database), perhaps after >> clearing cache. I suspect this will take the full table scan time to >> return any values. > > I'd actually just thought of that. All my test SELECTs happened > to be using data close to the "front" of the file. As suspected, if I try > it on "later" data, I have to wait for the table scan, then I get the > data. I'd be erroneously assuming that SQLite was somehow searching for > the first value quickly, then scanning the table. > >> Two thoughts: >> >> 1). Easy. >> How bad is the extra 0.8 GByte cost of the index? At today's prices, >> it's only about 20 cents on ordinary 5" drives. (but maybe you're >> programming a cellphone) > > No, it's nothing terribly constrained. I'm just trying to > understand the mechanisms and do what I can to keep the size down where > possible. I was somewhat surprised to find that adding an index on a > single INTEGER column nearly doubled the size of the database and wanted > to figure out if there was a way around it, given that the column will > always be sorted. (And given my, perhaps erroneous understanding that > creating an INDEX just makes sure that the column stays sorted so SQLite > can search through it more intelligently) > >> 2). Fancy. >> You could create your own sparse index table mapping a ROWID to say, >> every thousandth timestamp. Then you could create upper and lower ROWID >> bounds on any query based on timestamps. Maybe you import the records >> in batches already and can create the new table at the same time. > > This is closer to what I'm probably going to do. The data gets > pulled in every 5 minutes, but between runs, very little actually changes. > So the idea is to store only the changes along with a full dump say once > or twice every day. Then I can just query the values from time> to and compute the state of everything from > that data. > > Thanks, > > Chris > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Implicit INDEX?
Does sqlite have support for index based tables as a storage means? This is an excelent case for index based storage. just my .02 Ken Dennis Cote <[EMAIL PROTECTED]> wrote: Jeffrey Rennie wrote: > On Mon, Apr 14, 2008 at 9:48 AM, Chris Tracy wrote: > inserted into the database in ascending order, and where there may be >> as many as 500 hID entries for each timestamp. After a while, this table >>> Have you considered making timestamp a PRIMARY KEY? >>> >>> So, declare it as INTEGER PRIMARY KEY NOT NULL >>Unfortunately, the timestamps aren't unique, so I can't use >> PRIMARY KEY to solve the problem. (Each run generates as many as 500 >> entries, all with the same timestamp) >> > > Are there ever identical rows? If not, just make the whole row a primary > key: > > CREATE TABLE bridge_table ( > timestamp INTEGER NOT NULL, > hID INTEGER NOT NULL, > sID INTEGER NOT NULL, > pID INTEGER NOT NULL, > *PRIMARY KEY (timestamp, hID, sID, pID) > *); This is a good suggestion. Even if there are identical rows, you can still use and index on all columns, without making it an primary key (i.e. unique index). CREATE TABLE bridge_table ( timestamp INTEGER NOT NULL, hID INTEGER NOT NULL, sID INTEGER NOT NULL, pID INTEGER NOT NULL ); CREATE INDEX bt_all ON bridge_table(timestamp, hID, sID, pID); This will double the size of the database file and double the row insert time, but SQLite has an optimization that allows it to use the columns stored in the index to supply the results of a query without looking at the main table. In effect this index becomes your main table because it stores all the data. HTH Dennis Cote ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Implicit INDEX?
On 4/14/08, Dennis Cote <[EMAIL PROTECTED]> wrote: > Jeffrey Rennie wrote: > > On Mon, Apr 14, 2008 at 9:48 AM, Chris Tracy <[EMAIL PROTECTED]> wrote: > > > inserted into the database in ascending order, and where there may be > >> as > many as 500 hID entries for each timestamp. After a while, this table > >>> Have you considered making timestamp a PRIMARY KEY? > >>> > >>> So, declare it as INTEGER PRIMARY KEY NOT NULL > >>Unfortunately, the timestamps aren't unique, so I can't use > >> PRIMARY KEY to solve the problem. (Each run generates as many as 500 > >> entries, all with the same timestamp) > >> > > > > Are there ever identical rows? If not, just make the whole row a primary > > key: > > > > CREATE TABLE bridge_table ( > > timestamp INTEGER NOT NULL, > > hID INTEGER NOT NULL, > > sID INTEGER NOT NULL, > > pID INTEGER NOT NULL, > > *PRIMARY KEY (timestamp, hID, sID, pID) > > *); > > > This is a good suggestion. ummm no. The OP very clearly states that an INDEX is out of question because of space constraints. > Even if there are identical rows, you can > still use and index on all columns, without making it an primary key > (i.e. unique index). > > > CREATE TABLE bridge_table ( > timestamp INTEGER NOT NULL, > hID INTEGER NOT NULL, > sID INTEGER NOT NULL, > pID INTEGER NOT NULL > > ); > CREATE INDEX bt_all ON bridge_table(timestamp, hID, sID, pID); > > This will double the size of the database file and double the row insert > time, but SQLite has an optimization that allows it to use the columns > stored in the index to supply the results of a query without looking at > the main table. In effect this index becomes your main table because it > stores all the data. > > HTH > > Dennis Cote > > > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Puneet Kishor http://punkish.eidesis.org/ Nelson Institute for Environmental Studies http://www.nelson.wisc.edu/ Open Source Geospatial Foundation (OSGeo) http://www.osgeo.org/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Implicit INDEX?
How about this: select * from table1 where rowid >= (select rowid from table1 where time_stamp = xxx limit 1) and rowid < (select rowid from table1 where time_stamp > xxx limit 1) RBS > Donald, > >> To test this, I think you'd want to create a select for some of the most >> recent data (i.e. data at the tail of the database), perhaps after >> clearing cache. I suspect this will take the full table scan time to >> return any values. > > I'd actually just thought of that. All my test SELECTs happened > to be using data close to the "front" of the file. As suspected, if I try > it on "later" data, I have to wait for the table scan, then I get the > data. I'd be erroneously assuming that SQLite was somehow searching for > the first value quickly, then scanning the table. > >> Two thoughts: >> >> 1). Easy. >> How bad is the extra 0.8 GByte cost of the index? At today's prices, >> it's only about 20 cents on ordinary 5" drives. (but maybe you're >> programming a cellphone) > > No, it's nothing terribly constrained. I'm just trying to > understand the mechanisms and do what I can to keep the size down where > possible. I was somewhat surprised to find that adding an index on a > single INTEGER column nearly doubled the size of the database and wanted > to figure out if there was a way around it, given that the column will > always be sorted. (And given my, perhaps erroneous understanding that > creating an INDEX just makes sure that the column stays sorted so SQLite > can search through it more intelligently) > >> 2). Fancy. >> You could create your own sparse index table mapping a ROWID to say, >> every thousandth timestamp. Then you could create upper and lower ROWID >> bounds on any query based on timestamps. Maybe you import the records >> in batches already and can create the new table at the same time. > > This is closer to what I'm probably going to do. The data gets > pulled in every 5 minutes, but between runs, very little actually changes. > So the idea is to store only the changes along with a full dump say once > or twice every day. Then I can just query the values from time> to and compute the state of everything from > that data. > > Thanks, > > Chris > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Implicit INDEX?
Jeffrey Rennie wrote: > On Mon, Apr 14, 2008 at 9:48 AM, Chris Tracy <[EMAIL PROTECTED]> wrote: > inserted into the database in ascending order, and where there may be >> as many as 500 hID entries for each timestamp. After a while, this table >>> Have you considered making timestamp a PRIMARY KEY? >>> >>> So, declare it as INTEGER PRIMARY KEY NOT NULL >>Unfortunately, the timestamps aren't unique, so I can't use >> PRIMARY KEY to solve the problem. (Each run generates as many as 500 >> entries, all with the same timestamp) >> > > Are there ever identical rows? If not, just make the whole row a primary > key: > > CREATE TABLE bridge_table ( > timestamp INTEGER NOT NULL, > hID INTEGER NOT NULL, > sID INTEGER NOT NULL, > pID INTEGER NOT NULL, > *PRIMARY KEY (timestamp, hID, sID, pID) > *); This is a good suggestion. Even if there are identical rows, you can still use and index on all columns, without making it an primary key (i.e. unique index). CREATE TABLE bridge_table ( timestamp INTEGER NOT NULL, hID INTEGER NOT NULL, sID INTEGER NOT NULL, pID INTEGER NOT NULL ); CREATE INDEX bt_all ON bridge_table(timestamp, hID, sID, pID); This will double the size of the database file and double the row insert time, but SQLite has an optimization that allows it to use the columns stored in the index to supply the results of a query without looking at the main table. In effect this index becomes your main table because it stores all the data. HTH Dennis Cote ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Implicit INDEX?
On Mon, Apr 14, 2008 at 9:48 AM, Chris Tracy <[EMAIL PROTECTED]> wrote: > >> inserted into the database in ascending order, and where there may be > as > >> many as 500 hID entries for each timestamp. After a while, this table > > > Have you considered making timestamp a PRIMARY KEY? > > > > So, declare it as INTEGER PRIMARY KEY NOT NULL > >Unfortunately, the timestamps aren't unique, so I can't use > PRIMARY KEY to solve the problem. (Each run generates as many as 500 > entries, all with the same timestamp) > Are there ever identical rows? If not, just make the whole row a primary key: CREATE TABLE bridge_table ( timestamp INTEGER NOT NULL, hID INTEGER NOT NULL, sID INTEGER NOT NULL, pID INTEGER NOT NULL, *PRIMARY KEY (timestamp, hID, sID, pID) *); ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Implicit INDEX?
On 4/14/08, Chris Tracy <[EMAIL PROTECTED]> wrote: > > If you need to do this from the shell, I'm not sure there is anything > > you can do. If you're doing this from the C API just look for timestamps > > that are >= your target and terminate the query as soon as you find one > > larger than the target timestamp (e.g. stop calling step() and call > > reset() on the statement). If the number of datapoints at a given > > timestamp is known (because it is fixed, or because you stashed that > > value in another table), you could also use a LIMIT clause. That has > > the added bonus of doing the right thing (even if it takes longer) if > > the rows somehow get out-of-order. > > > I'm actually using Perl's DBI interface. Not sure that I can > easily do the exit when seeing a timestamp one bigger than the one I'm > after, but it's an intriguing idea I'd not considered before. The LIMIT > I'd also considered, but since the data is variable, I'd abandoned it. > Hadn't considered storing the value in another table. But since the whole > thing was based on an invalid premise (that SQLite was somehow magically > finding the initial value quickly, ignoring that it just happened to be > closer to the start of the db) I think I'll take another approach to > solving the problem. > heck, I didn't realize you were using Perl (I thought you were working in some highly constrained environment with 2 bytes of memory, like the Mars rover or something). Just exit the loop. my $timestamp = 1167615600; while ($ary_ref = $sth->fetch) { last if $ary_ref->[0] > $timestamp; do other stuff... } what's the problem? It should be blindingly fast. > Thanks for the input though, > > > Chris > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Puneet Kishor http://punkish.eidesis.org/ Nelson Institute for Environmental Studies http://www.nelson.wisc.edu/ Open Source Geospatial Foundation (OSGeo) http://www.osgeo.org/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Implicit INDEX?
Can you create a second table holding the minimum rowid and maximum rowid from your main data table. You query this table, get the range or rowid, then you can query the main table WHERE rowid >= MinRowid AND rowid <= MaxRowid. That would than use the built in primary index. David --- On Mon, 4/14/08, Chris Tracy <[EMAIL PROTECTED]> wrote: > From: Chris Tracy <[EMAIL PROTECTED]> > Subject: Re: [sqlite] Implicit INDEX? > To: "General Discussion of SQLite Database" > Date: Monday, April 14, 2008, 1:35 PM > > If you need to do this from the shell, I'm not > sure there is anything > > you can do. If you're doing this from the C API > just look for timestamps > > that are >= your target and terminate the query as > soon as you find one > > larger than the target timestamp (e.g. stop calling > step() and call > > reset() on the statement). If the number of > datapoints at a given > > timestamp is known (because it is fixed, or because > you stashed that > > value in another table), you could also use a LIMIT > clause. That has > > the added bonus of doing the right thing (even if it > takes longer) if > > the rows somehow get out-of-order. > > I'm actually using Perl's DBI interface. Not > sure that I can > easily do the exit when seeing a timestamp one bigger than > the one I'm > after, but it's an intriguing idea I'd not > considered before. The LIMIT > I'd also considered, but since the data is variable, > I'd abandoned it. > Hadn't considered storing the value in another table. > But since the whole > thing was based on an invalid premise (that SQLite was > somehow magically > finding the initial value quickly, ignoring that it just > happened to be > closer to the start of the db) I think I'll take > another approach to > solving the problem. > > Thanks for the input though, > > Chris > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Implicit INDEX?
Donald, > To test this, I think you'd want to create a select for some of the most > recent data (i.e. data at the tail of the database), perhaps after > clearing cache. I suspect this will take the full table scan time to > return any values. I'd actually just thought of that. All my test SELECTs happened to be using data close to the "front" of the file. As suspected, if I try it on "later" data, I have to wait for the table scan, then I get the data. I'd be erroneously assuming that SQLite was somehow searching for the first value quickly, then scanning the table. > Two thoughts: > > 1). Easy. > How bad is the extra 0.8 GByte cost of the index? At today's prices, > it's only about 20 cents on ordinary 5" drives. (but maybe you're > programming a cellphone) No, it's nothing terribly constrained. I'm just trying to understand the mechanisms and do what I can to keep the size down where possible. I was somewhat surprised to find that adding an index on a single INTEGER column nearly doubled the size of the database and wanted to figure out if there was a way around it, given that the column will always be sorted. (And given my, perhaps erroneous understanding that creating an INDEX just makes sure that the column stays sorted so SQLite can search through it more intelligently) > 2). Fancy. > You could create your own sparse index table mapping a ROWID to say, > every thousandth timestamp. Then you could create upper and lower ROWID > bounds on any query based on timestamps. Maybe you import the records > in batches already and can create the new table at the same time. This is closer to what I'm probably going to do. The data gets pulled in every 5 minutes, but between runs, very little actually changes. So the idea is to store only the changes along with a full dump say once or twice every day. Then I can just query the values from to and compute the state of everything from that data. Thanks, Chris ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Implicit INDEX?
> If you need to do this from the shell, I'm not sure there is anything > you can do. If you're doing this from the C API just look for timestamps > that are >= your target and terminate the query as soon as you find one > larger than the target timestamp (e.g. stop calling step() and call > reset() on the statement). If the number of datapoints at a given > timestamp is known (because it is fixed, or because you stashed that > value in another table), you could also use a LIMIT clause. That has > the added bonus of doing the right thing (even if it takes longer) if > the rows somehow get out-of-order. I'm actually using Perl's DBI interface. Not sure that I can easily do the exit when seeing a timestamp one bigger than the one I'm after, but it's an intriguing idea I'd not considered before. The LIMIT I'd also considered, but since the data is variable, I'd abandoned it. Hadn't considered storing the value in another table. But since the whole thing was based on an invalid premise (that SQLite was somehow magically finding the initial value quickly, ignoring that it just happened to be closer to the start of the db) I think I'll take another approach to solving the problem. Thanks for the input though, Chris ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Implicit INDEX?
On Mon, Apr 14, 2008 at 10:06:27AM -0700, Chris Tracy scratched on the wall: > > Your first sentence above is that you "almost immediately get back all > > the results you want" then you go on to say that the "actual SELECT > > doesn't return for almost 13 additional seconds." So, what is it > > exactly that you "almost immediately get back"? > > When I run the select inside the sqlite3 command line, I get back > all the results I'm after within a split second. However, the actual > command itself doesn't return for 13 additional seconds, while the entire > table is scanned for other instances where timestamp equals the value I'm > after. (No additional entries exist, so no additional data is ever > returned, but it always does this scan unless I've created an INDEX) I assume that only works if you are looking for data that was inserted near the "beginning" of the database (e.g. low timestamp values). If you're looking for data that was more recently inserted (e.g at the "end" of the database), it is going to sit there for 13 seconds and then display the values you're looking for... right? So some kind of short-cut solution will help save some time on average, but not for all cases. > So my question is, can I avoid the need to scan the entire table > for additional values where time = without having to CREATE > INDEX (and use almost 100% more disk space) if I guarantee that all rows > are inserted with timestamp in ascending order? Technically no, as SQL doesn't consider rows to be ordered within a table. In theory, the same query run two times can return the rows in a different order. On a more practical side, SQLite will generally return rows in the order of their ROWID, assuming no other ordering is in place. Just be sure you understand how ROWIDs are assigned. If you need to do this from the shell, I'm not sure there is anything you can do. If you're doing this from the C API just look for timestamps that are >= your target and terminate the query as soon as you find one larger than the target timestamp (e.g. stop calling step() and call reset() on the statement). If the number of datapoints at a given timestamp is known (because it is fixed, or because you stashed that value in another table), you could also use a LIMIT clause. That has the added bonus of doing the right thing (even if it takes longer) if the rows somehow get out-of-order. -j -- Jay A. Kreibich < J A Y @ K R E I B I.C H > "'People who live in bamboo houses should not throw pandas.' Jesus said that." - "The Ninja", www.AskANinja.com, "Special Delivery 10: Pop!Tech 2006" ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Implicit INDEX?
Hi, Chris, Regarding: "It seems to me (and I might be wrong) that since the values are already in the database in ascending order by timestamp that SQLite is able to find all the results quite quickly. ..." To test this, I think you'd want to create a select for some of the most recent data (i.e. data at the tail of the database), perhaps after clearing cache. I suspect this will take the full table scan time to return any values. Two thoughts: 1). Easy. How bad is the extra 0.8 GByte cost of the index? At today's prices, it's only about 20 cents on ordinary 5" drives. (but maybe you're programming a cellphone) 2). Fancy. You could create your own sparse index table mapping a ROWID to say, every thousandth timestamp. Then you could create upper and lower ROWID bounds on any query based on timestamps. Maybe you import the records in batches already and can create the new table at the same time. This email and any attachments have been scanned for known viruses using multiple scanners. We believe that this email and any attachments are virus free, however the recipient must take full responsibility for virus checking. This email message is intended for the named recipient only. It may be privileged and/or confidential. If you are not the named recipient of this email please notify us immediately and do not copy it or use it for any purpose, nor disclose its contents to any other person. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Implicit INDEX?
> a shot in the dark. Try > > SELECT * from bridge_table WHERE timestamp < 1167615601; > > although, it would probably not help because SQLite would still have > to scan the table to ascertain there were no more rows that met the > criterion. My hunch is that without giving the program some hint (aka, > INDEX) you can't get what you want. But, others on this list likely > have much better knowledge of the internals. Unfortunately, that doesn't solve the issue of needing to scan all the records in the database (without an explicit INDEX), since SQLite doesn't realize and thus doesn't act on the fact that the data in the timestamp column is already sorted. Chris ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Implicit INDEX?
On 4/14/08, Chris Tracy <[EMAIL PROTECTED]> wrote: > > Your first sentence above is that you "almost immediately get back all > > the results you want" then you go on to say that the "actual SELECT > > doesn't return for almost 13 additional seconds." So, what is it > > exactly that you "almost immediately get back"? > > > When I run the select inside the sqlite3 command line, I get back > all the results I'm after within a split second. However, the actual > command itself doesn't return for 13 additional seconds, while the entire > table is scanned for other instances where timestamp equals the value I'm > after. (No additional entries exist, so no additional data is ever > returned, but it always does this scan unless I've created an INDEX) > > To wit: > > sqlite3 test.db > sqlite> SELECT * from bridge_table WHERE timestamp = 1167615600; > [within a split second I get several hundred values ending with...] > 1167615600|494|8|2 > 1167615600|495|9|7 > 1167615600|496|5|21 > 1167615600|497|8|24 > 1167615600|498|6|46 > 1167615600|499|1|14 > 1167615600|500|9|31 > [and here it sits for 13 more seconds, looking through the rest of the > table for any other values where time = 1167615600, but none exist, so > finally...] > sqlite> [I get the prompt back once it finishes, having received no > additional output, because there isn't any] > > It seems to me (and I might be wrong) that since the values are > already in the database in ascending order by timestamp that SQLite is > able to find all the results quite quickly. However, it must then scan > through the rest of the database on the off chance that some other rows > exist where timestamp = 1167615600, because it doesn't know that they're > all stored in ascending order. > > So my question is, can I avoid the need to scan the entire table > for additional values where time = without having to CREATE > INDEX (and use almost 100% more disk space) if I guarantee that all rows > are inserted with timestamp in ascending order? > a shot in the dark. Try SELECT * from bridge_table WHERE timestamp < 1167615601; although, it would probably not help because SQLite would still have to scan the table to ascertain there were no more rows that met the criterion. My hunch is that without giving the program some hint (aka, INDEX) you can't get what you want. But, others on this list likely have much better knowledge of the internals. Good luck. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Implicit INDEX?
> Your first sentence above is that you "almost immediately get back all > the results you want" then you go on to say that the "actual SELECT > doesn't return for almost 13 additional seconds." So, what is it > exactly that you "almost immediately get back"? When I run the select inside the sqlite3 command line, I get back all the results I'm after within a split second. However, the actual command itself doesn't return for 13 additional seconds, while the entire table is scanned for other instances where timestamp equals the value I'm after. (No additional entries exist, so no additional data is ever returned, but it always does this scan unless I've created an INDEX) To wit: sqlite3 test.db sqlite> SELECT * from bridge_table WHERE timestamp = 1167615600; [within a split second I get several hundred values ending with...] 1167615600|494|8|2 1167615600|495|9|7 1167615600|496|5|21 1167615600|497|8|24 1167615600|498|6|46 1167615600|499|1|14 1167615600|500|9|31 [and here it sits for 13 more seconds, looking through the rest of the table for any other values where time = 1167615600, but none exist, so finally...] sqlite> [I get the prompt back once it finishes, having received no additional output, because there isn't any] It seems to me (and I might be wrong) that since the values are already in the database in ascending order by timestamp that SQLite is able to find all the results quite quickly. However, it must then scan through the rest of the database on the off chance that some other rows exist where timestamp = 1167615600, because it doesn't know that they're all stored in ascending order. So my question is, can I avoid the need to scan the entire table for additional values where time = without having to CREATE INDEX (and use almost 100% more disk space) if I guarantee that all rows are inserted with timestamp in ascending order? Chris ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Implicit INDEX?
On 4/14/08, Chris Tracy <[EMAIL PROTECTED]> wrote: > Err, > > Make that example table data: > > <...> > > 1167611400|89|7|34 > 1167611400|101|5|4 > 1167611400|147|9|14 > 1167611400|173|7|2 > 1167611400|195|8|1 > 1167611400|314|8|12 > 1167611400|347|6|48 > 1167611400|364|1|49 > > > > 1167615600|111|7|39 > 1167615600|155|2|8 > 1167615600|157|4|7 > 1167615600|220|4|47 > 1167615600|247|7|21 > 1167615600|261|8|30 > 1167615600|308|9|20 > > <...> > > As I'd originally said, the timestamps are inserted in ascending > order. (I just screwed up pasting my example values in the wrong order) > > ok, but first, I don't really understand the following -- >SELECT * from bridge_table WHERE timestamp = ; > >and almost immediately get back all the results I want, since they're >right next to each other in the db. However, the actual SELECT doesn't >return for almost 13 additional seconds, as the entire table is scanned >for other rows where timestamp might be Your first sentence above is that you "almost immediately get back all the results you want" then you go on to say that the "actual SELECT doesn't return for almost 13 additional seconds." So, what is it exactly that you "almost immediately get back"? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Implicit INDEX?
Err, Make that example table data: <...> 1167611400|89|7|34 1167611400|101|5|4 1167611400|147|9|14 1167611400|173|7|2 1167611400|195|8|1 1167611400|314|8|12 1167611400|347|6|48 1167611400|364|1|49 1167615600|111|7|39 1167615600|155|2|8 1167615600|157|4|7 1167615600|220|4|47 1167615600|247|7|21 1167615600|261|8|30 1167615600|308|9|20 <...> As I'd originally said, the timestamps are inserted in ascending order. (I just screwed up pasting my example values in the wrong order) Chris ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Implicit INDEX?
>> inserted into the database in ascending order, and where there may be as >> many as 500 hID entries for each timestamp. After a while, this table > Have you considered making timestamp a PRIMARY KEY? > > So, declare it as INTEGER PRIMARY KEY NOT NULL Unfortunately, the timestamps aren't unique, so I can't use PRIMARY KEY to solve the problem. (Each run generates as many as 500 entries, all with the same timestamp) A simple example of the data in the table follows: <...> 1167615600|111|7|39 1167615600|155|2|8 1167615600|157|4|7 1167615600|220|4|47 1167615600|247|7|21 1167615600|261|8|30 1167615600|308|9|20 1167611400|89|7|34 1167611400|101|5|4 1167611400|147|9|14 1167611400|173|7|2 1167611400|195|8|1 1167611400|314|8|12 1167611400|347|6|48 1167611400|364|1|49 <...> Chris ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Implicit INDEX?
On 4/14/08, Chris Tracy <[EMAIL PROTECTED]> wrote: > Hello, > > I'm a relative newcommer to SQL and SQLite and have was is likely > a silly question. However, I've not found an answer to it or even a > reference to anything similar in searching the list archives or the web. > > If I have a table laid out thusly: > > CREATE TABLE bridge_table ( > timestamp INTEGER NOT NULL, > hID INTEGER NOT NULL, > sID INTEGER NOT NULL, > pID INTEGER NOT NULL > ); > > where timestamp values (unix 32-bit integer timestamps) would always be > inserted into the database in ascending order, and where there may be as > many as 500 hID entries for each timestamp. After a while, this table > will grow to contain between 30 and 50 million records. > > In doing some tests, I find that with the fully populated table, I > can run: > > SELECT * from bridge_table WHERE timestamp = ; > > and almost immediately get back all the results I want, since they're > right next to each other in the db. However, the actual SELECT doesn't > return for almost 13 additional seconds, as the entire table is scanned > for other rows where timestamp might be . > > Of course, the simple answer is to add an index on timestamp, > which does what I want. However, doing so increases the size of this > 50 million row table from 1GB to 1.8GB. It seems to me that if I could > convince SQLite that the rows would always be inserted in ascending order > by timestamp, that I could have a sort of implicit index without consuming > the additional 0.8GB. > > Is this even theoretically possible or am I missing something > obvious that would prevent an "implicit index" from working like I'm > suggesting? Have you considered making timestamp a PRIMARY KEY? So, declare it as INTEGER PRIMARY KEY NOT NULL (the NOT NULL may be unnecessary at that point). That should help. > > Thanks, > > Chris > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Puneet Kishor http://punkish.eidesis.org/ Nelson Institute for Environmental Studies http://www.nelson.wisc.edu/ Open Source Geospatial Foundation (OSGeo) http://www.osgeo.org/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users