Re: [sqlite] Implicit INDEX?

2008-04-14 Thread John Stanton
You can make the timestamps unique by appending a sequence number.  A 32 
bit Unix timestamp can map into a 64 bit Sqlite integer.

P Kishor wrote:
> On 4/14/08, Chris Tracy <[EMAIL PROTECTED]> wrote:
>> Err,
>>
>> Make that example table data:
>>
>>  <...>
>>
>> 1167611400|89|7|34
>>  1167611400|101|5|4
>>  1167611400|147|9|14
>>  1167611400|173|7|2
>>  1167611400|195|8|1
>>  1167611400|314|8|12
>>  1167611400|347|6|48
>>  1167611400|364|1|49
>>
>> 
>>
>> 1167615600|111|7|39
>>  1167615600|155|2|8
>>  1167615600|157|4|7
>>  1167615600|220|4|47
>>  1167615600|247|7|21
>>  1167615600|261|8|30
>>  1167615600|308|9|20
>>
>> <...>
>>
>> As I'd originally said, the timestamps are inserted in ascending
>>  order.  (I just screwed up pasting my example values in the wrong order)
>>
>>
> 
> 
> ok, but first, I don't really understand the following --
> 
>> SELECT * from bridge_table WHERE timestamp = ;
>>
>> and almost immediately get back all the results I want, since they're
>> right next to each other in the db.  However, the actual SELECT doesn't
>> return for almost 13 additional seconds, as the entire table is scanned
>> for other rows where timestamp might be 
> 
> Your first sentence above is that you "almost immediately get back all
> the results you want" then you go on to say that the "actual SELECT
> doesn't return for almost 13 additional seconds." So, what is it
> exactly that you "almost immediately get back"?
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Implicit INDEX?

2008-04-14 Thread RB Smissaert
Have tested this now on a table of some 30 rows (no indexes at all)
and with 100 rows to find in the middle of the table, sorted asc on
time-stamp field.
It gave me a speed increase of about 25%.
If I looked for rows at the beginning of the table the speed increase was
more, some 50% faster.
If I looked for rows at the end (highest timestamp) then the simple select
was quite a lot faster than the one with limit etc.

Maybe not the kind of increase you were interested in, but still something
and with no extra overhead at all, just a different SQL. Probably only worth
it if looking for early times.

RBS


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
[EMAIL PROTECTED]
Sent: 14 April 2008 19:11
To: General Discussion of SQLite Database
Subject: Re: [sqlite] Implicit INDEX?

How about this:

select
*
from
table1
where
rowid >=
(select
rowid
from
table1
where
time_stamp = xxx
limit 1)
and
rowid <
(select
rowid
from
table1
where
time_stamp > xxx
limit 1)

RBS


> Donald,
>
>> To test this, I think you'd want to create a select for some of the most
>> recent data (i.e. data at the tail of the database), perhaps after
>> clearing cache.  I suspect this will take the full table scan time to
>> return any values.
>
>  I'd actually just thought of that.  All my test SELECTs happened
> to be using data close to the "front" of the file.  As suspected, if I try
> it on "later" data, I have to wait for the table scan, then I get the
> data.  I'd be erroneously assuming that SQLite was somehow searching for
> the first value quickly, then scanning the table.
>
>> Two thoughts:
>>
>>   1).  Easy.
>> How bad is the extra 0.8 GByte cost of the index?  At today's prices,
>> it's only about 20 cents on ordinary 5" drives.  (but maybe you're
>> programming a cellphone)
>
>  No, it's nothing terribly constrained.  I'm just trying to
> understand the mechanisms and do what I can to keep the size down where
> possible.  I was somewhat surprised to find that adding an index on a
> single INTEGER column nearly doubled the size of the database and wanted
> to figure out if there was a way around it, given that the column will
> always be sorted.  (And given my, perhaps erroneous understanding that
> creating an INDEX just makes sure that the column stays sorted so SQLite
> can search through it more intelligently)
>
>>   2).  Fancy.
>> You could create your own sparse index table mapping a ROWID to say,
>> every thousandth timestamp.  Then you could create upper and lower ROWID
>> bounds on any query based on timestamps.  Maybe you import the records
>> in batches already and can create the new table at the same time.
>
>  This is closer to what I'm probably going to do.  The data gets
> pulled in every 5 minutes, but between runs, very little actually changes.
> So the idea is to store only the changes along with a full dump say once
> or twice every day.  Then I can just query the values from  time> to  and compute the state of everything from
> that data.
>
>  Thanks,
>
>   Chris
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
>



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Implicit INDEX?

2008-04-14 Thread Ken
Does sqlite have support for index based tables as a storage means?

This is an excelent case for  index based storage.

just my .02
Ken


Dennis Cote <[EMAIL PROTECTED]> wrote: Jeffrey Rennie wrote:
> On Mon, Apr 14, 2008 at 9:48 AM, Chris Tracy  wrote:
> 
  inserted into the database in ascending order, and where there may be
>> as
  many as 500 hID entries for each timestamp.  After a while, this table
>>> Have you considered making timestamp a PRIMARY KEY?
>>>
>>> So, declare it as INTEGER PRIMARY KEY NOT NULL
>>Unfortunately, the timestamps aren't unique, so I can't use
>> PRIMARY KEY to solve the problem.  (Each run generates as many as 500
>> entries, all with the same timestamp)
>>
> 
> Are there ever identical rows?  If not, just make the whole row a primary
> key:
> 
> CREATE TABLE bridge_table (
> timestamp INTEGER NOT NULL,
> hID INTEGER NOT NULL,
> sID INTEGER NOT NULL,
> pID INTEGER NOT NULL,
> *PRIMARY KEY (timestamp, hID, sID, pID)
> *);

This is a good suggestion. Even if there are identical rows, you can 
still use and index on all columns, without making it an primary key 
(i.e. unique index).

CREATE TABLE bridge_table (
 timestamp INTEGER NOT NULL,
 hID INTEGER NOT NULL,
 sID INTEGER NOT NULL,
 pID INTEGER NOT NULL
);
CREATE INDEX bt_all ON bridge_table(timestamp, hID, sID, pID);

This will double the size of the database file and double the row insert 
time, but SQLite has an optimization that allows it to use the columns 
stored in the index to supply the results of a query without looking at 
the main table. In effect this index becomes your main table because it 
stores all the data.

HTH
Dennis Cote



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Implicit INDEX?

2008-04-14 Thread P Kishor
On 4/14/08, Dennis Cote <[EMAIL PROTECTED]> wrote:
> Jeffrey Rennie wrote:
>  > On Mon, Apr 14, 2008 at 9:48 AM, Chris Tracy <[EMAIL PROTECTED]> wrote:
>  >
>    inserted into the database in ascending order, and where there may be
>  >> as
>    many as 500 hID entries for each timestamp.  After a while, this table
>  >>> Have you considered making timestamp a PRIMARY KEY?
>  >>>
>  >>> So, declare it as INTEGER PRIMARY KEY NOT NULL
>  >>Unfortunately, the timestamps aren't unique, so I can't use
>  >> PRIMARY KEY to solve the problem.  (Each run generates as many as 500
>  >> entries, all with the same timestamp)
>  >>
>  >
>  > Are there ever identical rows?  If not, just make the whole row a primary
>  > key:
>  >
>  > CREATE TABLE bridge_table (
>  > timestamp INTEGER NOT NULL,
>  > hID INTEGER NOT NULL,
>  > sID INTEGER NOT NULL,
>  > pID INTEGER NOT NULL,
>  > *PRIMARY KEY (timestamp, hID, sID, pID)
>  > *);
>
>
> This is a good suggestion.

ummm no. The OP very clearly states that an INDEX is out of question
because of space constraints.


> Even if there are identical rows, you can
>  still use and index on all columns, without making it an primary key
>  (i.e. unique index).
>
>
>  CREATE TABLE bridge_table (
>  timestamp INTEGER NOT NULL,
>  hID INTEGER NOT NULL,
>  sID INTEGER NOT NULL,
>  pID INTEGER NOT NULL
>
> );
>  CREATE INDEX bt_all ON bridge_table(timestamp, hID, sID, pID);
>
>  This will double the size of the database file and double the row insert
>  time, but SQLite has an optimization that allows it to use the columns
>  stored in the index to supply the results of a query without looking at
>  the main table. In effect this index becomes your main table because it
>  stores all the data.
>
>  HTH
>
> Dennis Cote
>
>
>
>
>  ___
>  sqlite-users mailing list
>  sqlite-users@sqlite.org
>  http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
Puneet Kishor http://punkish.eidesis.org/
Nelson Institute for Environmental Studies http://www.nelson.wisc.edu/
Open Source Geospatial Foundation (OSGeo) http://www.osgeo.org/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Implicit INDEX?

2008-04-14 Thread bartsmissaert
How about this:

select
*
from
table1
where
rowid >=
(select
rowid
from
table1
where
time_stamp = xxx
limit 1)
and
rowid <
(select
rowid
from
table1
where
time_stamp > xxx
limit 1)

RBS


> Donald,
>
>> To test this, I think you'd want to create a select for some of the most
>> recent data (i.e. data at the tail of the database), perhaps after
>> clearing cache.  I suspect this will take the full table scan time to
>> return any values.
>
>  I'd actually just thought of that.  All my test SELECTs happened
> to be using data close to the "front" of the file.  As suspected, if I try
> it on "later" data, I have to wait for the table scan, then I get the
> data.  I'd be erroneously assuming that SQLite was somehow searching for
> the first value quickly, then scanning the table.
>
>> Two thoughts:
>>
>>   1).  Easy.
>> How bad is the extra 0.8 GByte cost of the index?  At today's prices,
>> it's only about 20 cents on ordinary 5" drives.  (but maybe you're
>> programming a cellphone)
>
>  No, it's nothing terribly constrained.  I'm just trying to
> understand the mechanisms and do what I can to keep the size down where
> possible.  I was somewhat surprised to find that adding an index on a
> single INTEGER column nearly doubled the size of the database and wanted
> to figure out if there was a way around it, given that the column will
> always be sorted.  (And given my, perhaps erroneous understanding that
> creating an INDEX just makes sure that the column stays sorted so SQLite
> can search through it more intelligently)
>
>>   2).  Fancy.
>> You could create your own sparse index table mapping a ROWID to say,
>> every thousandth timestamp.  Then you could create upper and lower ROWID
>> bounds on any query based on timestamps.  Maybe you import the records
>> in batches already and can create the new table at the same time.
>
>  This is closer to what I'm probably going to do.  The data gets
> pulled in every 5 minutes, but between runs, very little actually changes.
> So the idea is to store only the changes along with a full dump say once
> or twice every day.  Then I can just query the values from  time> to  and compute the state of everything from
> that data.
>
>  Thanks,
>
>   Chris
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
>



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Implicit INDEX?

2008-04-14 Thread Dennis Cote
Jeffrey Rennie wrote:
> On Mon, Apr 14, 2008 at 9:48 AM, Chris Tracy <[EMAIL PROTECTED]> wrote:
> 
  inserted into the database in ascending order, and where there may be
>> as
  many as 500 hID entries for each timestamp.  After a while, this table
>>> Have you considered making timestamp a PRIMARY KEY?
>>>
>>> So, declare it as INTEGER PRIMARY KEY NOT NULL
>>Unfortunately, the timestamps aren't unique, so I can't use
>> PRIMARY KEY to solve the problem.  (Each run generates as many as 500
>> entries, all with the same timestamp)
>>
> 
> Are there ever identical rows?  If not, just make the whole row a primary
> key:
> 
> CREATE TABLE bridge_table (
> timestamp INTEGER NOT NULL,
> hID INTEGER NOT NULL,
> sID INTEGER NOT NULL,
> pID INTEGER NOT NULL,
> *PRIMARY KEY (timestamp, hID, sID, pID)
> *);

This is a good suggestion. Even if there are identical rows, you can 
still use and index on all columns, without making it an primary key 
(i.e. unique index).

CREATE TABLE bridge_table (
 timestamp INTEGER NOT NULL,
 hID INTEGER NOT NULL,
 sID INTEGER NOT NULL,
 pID INTEGER NOT NULL
);
CREATE INDEX bt_all ON bridge_table(timestamp, hID, sID, pID);

This will double the size of the database file and double the row insert 
time, but SQLite has an optimization that allows it to use the columns 
stored in the index to supply the results of a query without looking at 
the main table. In effect this index becomes your main table because it 
stores all the data.

HTH
Dennis Cote



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Implicit INDEX?

2008-04-14 Thread Jeffrey Rennie
On Mon, Apr 14, 2008 at 9:48 AM, Chris Tracy <[EMAIL PROTECTED]> wrote:

> >>  inserted into the database in ascending order, and where there may be
> as
> >>  many as 500 hID entries for each timestamp.  After a while, this table
>
> > Have you considered making timestamp a PRIMARY KEY?
> >
> > So, declare it as INTEGER PRIMARY KEY NOT NULL
>
>Unfortunately, the timestamps aren't unique, so I can't use
> PRIMARY KEY to solve the problem.  (Each run generates as many as 500
> entries, all with the same timestamp)
>

Are there ever identical rows?  If not, just make the whole row a primary
key:

CREATE TABLE bridge_table (
timestamp INTEGER NOT NULL,
hID INTEGER NOT NULL,
sID INTEGER NOT NULL,
pID INTEGER NOT NULL,
*PRIMARY KEY (timestamp, hID, sID, pID)
*);
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Implicit INDEX?

2008-04-14 Thread P Kishor
On 4/14/08, Chris Tracy <[EMAIL PROTECTED]> wrote:
> >  If you need to do this from the shell, I'm not sure there is anything
>  >  you can do.  If you're doing this from the C API just look for timestamps
>  >  that are >= your target and terminate the query as soon as you find one
>  >  larger than the target timestamp (e.g. stop calling step() and call
>  >  reset() on the statement).  If the number of datapoints at a given
>  >  timestamp is known (because it is fixed, or because you stashed that
>  >  value in another table), you could also use a LIMIT clause.  That has
>  >  the added bonus of doing the right thing (even if it takes longer) if
>  >  the rows somehow get out-of-order.
>
>
> I'm actually using Perl's DBI interface.  Not sure that I can
>  easily do the exit when seeing a timestamp one bigger than the one I'm
>  after, but it's an intriguing idea I'd not considered before.  The LIMIT
>  I'd also considered, but since the data is variable, I'd abandoned it.
>  Hadn't considered storing the value in another table.  But since the whole
>  thing was based on an invalid premise (that SQLite was somehow magically
>  finding the initial value quickly, ignoring that it just happened to be
>  closer to the start of the db) I think I'll take another approach to
>  solving the problem.
>

heck, I didn't realize you were using Perl (I thought you were working
in some highly constrained environment with 2 bytes of memory, like
the Mars rover or something).

Just exit the loop.

my $timestamp = 1167615600;
while ($ary_ref = $sth->fetch) {
  last if $ary_ref->[0] > $timestamp;

  do other stuff...
}

what's the problem? It should be blindingly fast.


> Thanks for the input though,
>
>
> Chris
>
> ___
>  sqlite-users mailing list
>  sqlite-users@sqlite.org
>  http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
Puneet Kishor http://punkish.eidesis.org/
Nelson Institute for Environmental Studies http://www.nelson.wisc.edu/
Open Source Geospatial Foundation (OSGeo) http://www.osgeo.org/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Implicit INDEX?

2008-04-14 Thread David Bicking
Can you create a second table holding the minimum rowid and maximum rowid from 
your main data table. You query this table, get the range or rowid, then you 
can query the main table WHERE rowid >= MinRowid AND rowid <= MaxRowid. That 
would than use the built in primary index.

David


--- On Mon, 4/14/08, Chris Tracy <[EMAIL PROTECTED]> wrote:

> From: Chris Tracy <[EMAIL PROTECTED]>
> Subject: Re: [sqlite] Implicit INDEX?
> To: "General Discussion of SQLite Database" 
> Date: Monday, April 14, 2008, 1:35 PM
> >  If you need to do this from the shell, I'm not
> sure there is anything
> >  you can do.  If you're doing this from the C API
> just look for timestamps
> >  that are >= your target and terminate the query as
> soon as you find one
> >  larger than the target timestamp (e.g. stop calling
> step() and call
> >  reset() on the statement).  If the number of
> datapoints at a given
> >  timestamp is known (because it is fixed, or because
> you stashed that
> >  value in another table), you could also use a LIMIT
> clause.  That has
> >  the added bonus of doing the right thing (even if it
> takes longer) if
> >  the rows somehow get out-of-order.
> 
>   I'm actually using Perl's DBI interface.  Not
> sure that I can 
> easily do the exit when seeing a timestamp one bigger than
> the one I'm 
> after, but it's an intriguing idea I'd not
> considered before.  The LIMIT 
> I'd also considered, but since the data is variable,
> I'd abandoned it. 
> Hadn't considered storing the value in another table. 
> But since the whole 
> thing was based on an invalid premise (that SQLite was
> somehow magically 
> finding the initial value quickly, ignoring that it just
> happened to be 
> closer to the start of the db) I think I'll take
> another approach to 
> solving the problem.
> 
>   Thanks for the input though,
> 
>   Chris
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Implicit INDEX?

2008-04-14 Thread Chris Tracy
Donald,

> To test this, I think you'd want to create a select for some of the most
> recent data (i.e. data at the tail of the database), perhaps after
> clearing cache.  I suspect this will take the full table scan time to
> return any values.

 I'd actually just thought of that.  All my test SELECTs happened 
to be using data close to the "front" of the file.  As suspected, if I try 
it on "later" data, I have to wait for the table scan, then I get the 
data.  I'd be erroneously assuming that SQLite was somehow searching for 
the first value quickly, then scanning the table.

> Two thoughts:
>
>   1).  Easy.
> How bad is the extra 0.8 GByte cost of the index?  At today's prices,
> it's only about 20 cents on ordinary 5" drives.  (but maybe you're
> programming a cellphone)

 No, it's nothing terribly constrained.  I'm just trying to 
understand the mechanisms and do what I can to keep the size down where 
possible.  I was somewhat surprised to find that adding an index on a 
single INTEGER column nearly doubled the size of the database and wanted 
to figure out if there was a way around it, given that the column will 
always be sorted.  (And given my, perhaps erroneous understanding that 
creating an INDEX just makes sure that the column stays sorted so SQLite 
can search through it more intelligently)

>   2).  Fancy.
> You could create your own sparse index table mapping a ROWID to say,
> every thousandth timestamp.  Then you could create upper and lower ROWID
> bounds on any query based on timestamps.  Maybe you import the records
> in batches already and can create the new table at the same time.

 This is closer to what I'm probably going to do.  The data gets 
pulled in every 5 minutes, but between runs, very little actually changes. 
So the idea is to store only the changes along with a full dump say once 
or twice every day.  Then I can just query the values from  to  and compute the state of everything from 
that data.

 Thanks,

Chris
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Implicit INDEX?

2008-04-14 Thread Chris Tracy
>  If you need to do this from the shell, I'm not sure there is anything
>  you can do.  If you're doing this from the C API just look for timestamps
>  that are >= your target and terminate the query as soon as you find one
>  larger than the target timestamp (e.g. stop calling step() and call
>  reset() on the statement).  If the number of datapoints at a given
>  timestamp is known (because it is fixed, or because you stashed that
>  value in another table), you could also use a LIMIT clause.  That has
>  the added bonus of doing the right thing (even if it takes longer) if
>  the rows somehow get out-of-order.

I'm actually using Perl's DBI interface.  Not sure that I can 
easily do the exit when seeing a timestamp one bigger than the one I'm 
after, but it's an intriguing idea I'd not considered before.  The LIMIT 
I'd also considered, but since the data is variable, I'd abandoned it. 
Hadn't considered storing the value in another table.  But since the whole 
thing was based on an invalid premise (that SQLite was somehow magically 
finding the initial value quickly, ignoring that it just happened to be 
closer to the start of the db) I think I'll take another approach to 
solving the problem.

Thanks for the input though,

Chris
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Implicit INDEX?

2008-04-14 Thread Jay A. Kreibich
On Mon, Apr 14, 2008 at 10:06:27AM -0700, Chris Tracy scratched on the wall:
> > Your first sentence above is that you "almost immediately get back all
> > the results you want" then you go on to say that the "actual SELECT
> > doesn't return for almost 13 additional seconds." So, what is it
> > exactly that you "almost immediately get back"?
> 
>   When I run the select inside the sqlite3 command line, I get back 
> all the results I'm after within a split second.  However, the actual 
> command itself doesn't return for 13 additional seconds, while the entire 
> table is scanned for other instances where timestamp equals the value I'm 
> after.  (No additional entries exist, so no additional data is ever 
> returned, but it always does this scan unless I've created an INDEX)

  I assume that only works if you are looking for data that was
  inserted near the "beginning" of the database (e.g. low timestamp
  values).  If you're looking for data that was more recently inserted
  (e.g at the "end" of the database), it is going to sit there for 13
  seconds and then display the values you're looking for... right?

  So some kind of short-cut solution will help save some time on
  average, but not for all cases.

>   So my question is, can I avoid the need to scan the entire table 
> for additional values where time =  without having to CREATE 
> INDEX (and use almost 100% more disk space) if I guarantee that all rows 
> are inserted with timestamp in ascending order?

  Technically no, as SQL doesn't consider rows to be ordered within a
  table.  In theory, the same query run two times can return the rows
  in a different order.

  On a more practical side, SQLite will generally return rows in the
  order of their ROWID, assuming no other ordering is in place.  Just
  be sure you understand how ROWIDs are assigned.



  If you need to do this from the shell, I'm not sure there is anything
  you can do.  If you're doing this from the C API just look for timestamps
  that are >= your target and terminate the query as soon as you find one
  larger than the target timestamp (e.g. stop calling step() and call
  reset() on the statement).  If the number of datapoints at a given
  timestamp is known (because it is fixed, or because you stashed that
  value in another table), you could also use a LIMIT clause.  That has
  the added bonus of doing the right thing (even if it takes longer) if
  the rows somehow get out-of-order.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"'People who live in bamboo houses should not throw pandas.' Jesus said that."
   - "The Ninja", www.AskANinja.com, "Special Delivery 10: Pop!Tech 2006"
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Implicit INDEX?

2008-04-14 Thread Griggs, Donald
Hi, Chris,

Regarding:  "It seems to me (and I might be wrong) that since the values
are already in the database in ascending order by timestamp that SQLite
is able to find all the results quite quickly. ..."


To test this, I think you'd want to create a select for some of the most
recent data (i.e. data at the tail of the database), perhaps after
clearing cache.  I suspect this will take the full table scan time to
return any values.

Two thoughts:

   1).  Easy. 
How bad is the extra 0.8 GByte cost of the index?  At today's prices,
it's only about 20 cents on ordinary 5" drives.  (but maybe you're
programming a cellphone)

   2).  Fancy.
You could create your own sparse index table mapping a ROWID to say,
every thousandth timestamp.  Then you could create upper and lower ROWID
bounds on any query based on timestamps.  Maybe you import the records
in batches already and can create the new table at the same time.
  


This email and any attachments have been scanned for known viruses using 
multiple scanners. We believe that this email and any attachments are virus 
free, however the recipient must take full responsibility for virus checking. 
This email message is intended for the named recipient only. It may be 
privileged and/or confidential. If you are not the named recipient of this 
email please notify us immediately and do not copy it or use it for any 
purpose, nor disclose its contents to any other person.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Implicit INDEX?

2008-04-14 Thread Chris Tracy
> a shot in the dark. Try
>
> SELECT * from bridge_table WHERE timestamp < 1167615601;
>
> although, it would probably not help because SQLite would still have
> to scan the table to ascertain there were no more rows that met the
> criterion. My hunch is that without giving the program some hint (aka,
> INDEX) you can't get what you want. But, others on this list likely
> have much better knowledge of the internals.

Unfortunately, that doesn't solve the issue of needing to scan all 
the records in the database (without an explicit INDEX), since SQLite 
doesn't realize and thus doesn't act on the fact that the data in the 
timestamp column is already sorted.

Chris
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Implicit INDEX?

2008-04-14 Thread P Kishor
On 4/14/08, Chris Tracy <[EMAIL PROTECTED]> wrote:
> > Your first sentence above is that you "almost immediately get back all
>  > the results you want" then you go on to say that the "actual SELECT
>  > doesn't return for almost 13 additional seconds." So, what is it
>  > exactly that you "almost immediately get back"?
>
>
> When I run the select inside the sqlite3 command line, I get back
>  all the results I'm after within a split second.  However, the actual
>  command itself doesn't return for 13 additional seconds, while the entire
>  table is scanned for other instances where timestamp equals the value I'm
>  after.  (No additional entries exist, so no additional data is ever
>  returned, but it always does this scan unless I've created an INDEX)
>
> To wit:
>
>  sqlite3 test.db
>  sqlite> SELECT * from bridge_table WHERE timestamp = 1167615600;
>  [within a split second I get several hundred values ending with...]
>  1167615600|494|8|2
>  1167615600|495|9|7
>  1167615600|496|5|21
>  1167615600|497|8|24
>  1167615600|498|6|46
>  1167615600|499|1|14
>  1167615600|500|9|31
>  [and here it sits for 13 more seconds, looking through the rest of the
>  table for any other values where time = 1167615600, but none exist, so
>  finally...]
>  sqlite>  [I get the prompt back once it finishes, having received no
> additional output, because there isn't any]
>
> It seems to me (and I might be wrong) that since the values are
>  already in the database in ascending order by timestamp that SQLite is
>  able to find all the results quite quickly.  However, it must then scan
>  through the rest of the database on the off chance that some other rows
>  exist where timestamp = 1167615600, because it doesn't know that they're
>  all stored in ascending order.
>
> So my question is, can I avoid the need to scan the entire table
>  for additional values where time =  without having to CREATE
>  INDEX (and use almost 100% more disk space) if I guarantee that all rows
>  are inserted with timestamp in ascending order?
>


a shot in the dark. Try

SELECT * from bridge_table WHERE timestamp < 1167615601;

although, it would probably not help because SQLite would still have
to scan the table to ascertain there were no more rows that met the
criterion. My hunch is that without giving the program some hint (aka,
INDEX) you can't get what you want. But, others on this list likely
have much better knowledge of the internals.

Good luck.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Implicit INDEX?

2008-04-14 Thread Chris Tracy
> Your first sentence above is that you "almost immediately get back all
> the results you want" then you go on to say that the "actual SELECT
> doesn't return for almost 13 additional seconds." So, what is it
> exactly that you "almost immediately get back"?

When I run the select inside the sqlite3 command line, I get back 
all the results I'm after within a split second.  However, the actual 
command itself doesn't return for 13 additional seconds, while the entire 
table is scanned for other instances where timestamp equals the value I'm 
after.  (No additional entries exist, so no additional data is ever 
returned, but it always does this scan unless I've created an INDEX)

To wit:

sqlite3 test.db
sqlite> SELECT * from bridge_table WHERE timestamp = 1167615600;
[within a split second I get several hundred values ending with...]
1167615600|494|8|2
1167615600|495|9|7
1167615600|496|5|21
1167615600|497|8|24
1167615600|498|6|46
1167615600|499|1|14
1167615600|500|9|31
[and here it sits for 13 more seconds, looking through the rest of the 
table for any other values where time = 1167615600, but none exist, so 
finally...]
sqlite>  [I get the prompt back once it finishes, having received no
additional output, because there isn't any]

It seems to me (and I might be wrong) that since the values are 
already in the database in ascending order by timestamp that SQLite is 
able to find all the results quite quickly.  However, it must then scan 
through the rest of the database on the off chance that some other rows 
exist where timestamp = 1167615600, because it doesn't know that they're 
all stored in ascending order.

So my question is, can I avoid the need to scan the entire table 
for additional values where time =  without having to CREATE 
INDEX (and use almost 100% more disk space) if I guarantee that all rows 
are inserted with timestamp in ascending order?

Chris


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Implicit INDEX?

2008-04-14 Thread P Kishor
On 4/14/08, Chris Tracy <[EMAIL PROTECTED]> wrote:
> Err,
>
> Make that example table data:
>
>  <...>
>
> 1167611400|89|7|34
>  1167611400|101|5|4
>  1167611400|147|9|14
>  1167611400|173|7|2
>  1167611400|195|8|1
>  1167611400|314|8|12
>  1167611400|347|6|48
>  1167611400|364|1|49
>
> 
>
> 1167615600|111|7|39
>  1167615600|155|2|8
>  1167615600|157|4|7
>  1167615600|220|4|47
>  1167615600|247|7|21
>  1167615600|261|8|30
>  1167615600|308|9|20
>
> <...>
>
> As I'd originally said, the timestamps are inserted in ascending
>  order.  (I just screwed up pasting my example values in the wrong order)
>
>


ok, but first, I don't really understand the following --

>SELECT * from bridge_table WHERE timestamp = ;
>
>and almost immediately get back all the results I want, since they're
>right next to each other in the db.  However, the actual SELECT doesn't
>return for almost 13 additional seconds, as the entire table is scanned
>for other rows where timestamp might be 

Your first sentence above is that you "almost immediately get back all
the results you want" then you go on to say that the "actual SELECT
doesn't return for almost 13 additional seconds." So, what is it
exactly that you "almost immediately get back"?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Implicit INDEX?

2008-04-14 Thread Chris Tracy
Err,

Make that example table data:

<...>
1167611400|89|7|34
1167611400|101|5|4
1167611400|147|9|14
1167611400|173|7|2
1167611400|195|8|1
1167611400|314|8|12
1167611400|347|6|48
1167611400|364|1|49

1167615600|111|7|39
1167615600|155|2|8
1167615600|157|4|7
1167615600|220|4|47
1167615600|247|7|21
1167615600|261|8|30
1167615600|308|9|20
<...>

As I'd originally said, the timestamps are inserted in ascending 
order.  (I just screwed up pasting my example values in the wrong order)

Chris
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Implicit INDEX?

2008-04-14 Thread Chris Tracy
>>  inserted into the database in ascending order, and where there may be as
>>  many as 500 hID entries for each timestamp.  After a while, this table

> Have you considered making timestamp a PRIMARY KEY?
>
> So, declare it as INTEGER PRIMARY KEY NOT NULL

Unfortunately, the timestamps aren't unique, so I can't use 
PRIMARY KEY to solve the problem.  (Each run generates as many as 500 
entries, all with the same timestamp)

A simple example of the data in the table follows:

<...>
1167615600|111|7|39
1167615600|155|2|8
1167615600|157|4|7
1167615600|220|4|47
1167615600|247|7|21
1167615600|261|8|30
1167615600|308|9|20

1167611400|89|7|34
1167611400|101|5|4
1167611400|147|9|14
1167611400|173|7|2
1167611400|195|8|1
1167611400|314|8|12
1167611400|347|6|48
1167611400|364|1|49
<...>

Chris
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Implicit INDEX?

2008-04-14 Thread P Kishor
On 4/14/08, Chris Tracy <[EMAIL PROTECTED]> wrote:
> Hello,
>
> I'm a relative newcommer to SQL and SQLite and have was is likely
>  a silly question.  However, I've not found an answer to it or even a
>  reference to anything similar in searching the list archives or the web.
>
> If I have a table laid out thusly:
>
>  CREATE TABLE bridge_table (
>  timestamp INTEGER NOT NULL,
>  hID INTEGER NOT NULL,
>  sID INTEGER NOT NULL,
>  pID INTEGER NOT NULL
>  );
>
>  where timestamp values (unix 32-bit integer timestamps) would always be
>  inserted into the database in ascending order, and where there may be as
>  many as 500 hID entries for each timestamp.  After a while, this table
>  will grow to contain between 30 and 50 million records.
>
> In doing some tests, I find that with the fully populated table, I
>  can run:
>
>  SELECT * from bridge_table WHERE timestamp = ;
>
>  and almost immediately get back all the results I want, since they're
>  right next to each other in the db.  However, the actual SELECT doesn't
>  return for almost 13 additional seconds, as the entire table is scanned
>  for other rows where timestamp might be .
>
> Of course, the simple answer is to add an index on timestamp,
>  which does what I want.  However, doing so increases the size of this
>  50 million row table from 1GB to 1.8GB.  It seems to me that if I could
>  convince SQLite that the rows would always be inserted in ascending order
>  by timestamp, that I could have a sort of implicit index without consuming
>  the additional 0.8GB.
>
> Is this even theoretically possible or am I missing something
>  obvious that would prevent an "implicit index" from working like I'm
>  suggesting?

Have you considered making timestamp a PRIMARY KEY?

So, declare it as INTEGER PRIMARY KEY NOT NULL

(the NOT NULL may be unnecessary at that point).

That should help.


>
> Thanks,
>
> Chris
>  ___
>  sqlite-users mailing list
>  sqlite-users@sqlite.org
>  http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
Puneet Kishor http://punkish.eidesis.org/
Nelson Institute for Environmental Studies http://www.nelson.wisc.edu/
Open Source Geospatial Foundation (OSGeo) http://www.osgeo.org/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users