RE: [sqlite] Clustered indicies Was: [sqlite] A proposal for SQLite version 3.0

Steve O'Hara Thu, 08 Apr 2004 01:51:20 -0700

Ben,

I think you're asking for is a mechanism to define the nature of the primary
key.
e.g. If you could define the primary key as some kind of function or to use
a specific integer field, then the table will be naturally ordered by that
field.

I don't know about other SQL engines, but our baby (BASIS) allows you to
create your primary key like this - this gives us tremendous performance
improvements for some apps.  For instances, we have some clients who store
newspaper articles from various publications in a table.  The natural way in
which users want to see these articles is sorted by the published date,
latest first - that's fine, but these tables contain 40 million articles and
their queries can produce pretty large sets.  We create a primary key based
on the published date which means we don't need to sort their queries.

Just a thought

Steve

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: 08 April 2004 04:05
To: [EMAIL PROTECTED]
Subject: [sqlite] Clustered indicies Was: [sqlite] A proposal for SQLite
version 3.0

Jeff,

Jeff Pleimling <[EMAIL PROTECTED]>
08/04/2004 12:42 PM

        To:     [EMAIL PROTECTED]
        cc:
        Subject:        Re: [sqlite] A proposal for SQLite version 3.0

At 12:08 PM 4/8/2004 +1000, [EMAIL PROTECTED] wrote:
> I believe you're thinking of a 'clustered index'. This puts the data
> into the order of the index. There can be, of course, only one clustered
> index per table.
> Since at least some of the data in the table is moved around on every
insert,
> regular indexs need to take this into account (usually by indirection,
rather
> then modifying all of the indexes with each insert).

Ahh, I didn't think of that. I don't have any other indices on my table,
so this wouldn't be a problem for me... but I can see now how it would
harm the general case. I guess the simplest implementation would have a
proabition on having -any- other indicies on the table.

> >If a table could be ordered according to an index, rather than having
an
> >external index, I think it would significantly improve the time and
space
> >performance of my databases. I don't know whether my experience would
be
> >shared by other users. It it were something that could go into 3.0 it
> >would at least do me some good.
> Clustered indexes can really slow the performance for OLTP (On-Line
> Transaction Processing) and other systems where data is added/deleted in
a
> mixed fashion. Every time a record is inserted, data is possibly moved
on
> the disk (with page splits causing even more slowdowns).

Yes, that's what's happening already in the index whenever I do an insert.
My thinking was that maintaining an index-ordered table would be less work
overall than maintaining a table with an ordered index. I could be wrong
on that, but I'm not sure I see the flaw in my logic.

> If your system is entirely historic data, that would be great - but if
your
> system is inserting meter readings in (near) real-time, you'd probably
> get a big performance hit.

It's certainly real-time, with the occasional query.

> There are many pros and cons. A google search turns up articles (usually
for
> MS SQL Server) on both side - some people saying 'always' and some
'never'.

I can certainly see how the usefulness of this feature could be limited.

I guess the problem is primarily related to how the rowid is chosen. If I
could choose a rowid that would put the table in the correct order, and
renumber as required I might make some ground. I could order by ROWID,
then... although there may have to be some kind of code tweakage to allow
the where clauses to operate correctly.

Hmm... maybe something like this:
BEGIN TRANSACTION;
SELECT * FROM mytable WHERE <whatever is required to find the insertion
point>
-- if the insertion point is after current data:
INSERT INTO mytable VALUES (MAX(ROWID) + 10, <values>)
-- else if insertion point is between two values
INSERT INTO mytable VALUES ((<previous rowid> + <next rowid>)/2, <values>)
-- else rearrange contiguous values
UPDATE mytable SET ROWID = ROWID+1 WHERE ROWID >= <insertion point> AND
ROWID < <next free space>
INSERT INTO mytable VALUES (<insertion point>, <values>)
END TRANSACTION;

Perhaps the changes to sqlite could be as minimal as providing a facility
to say:
"I promise to keep these rows in an order consistent with this index"
so sqlite will use the index in queries.

Benjamin.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: [sqlite] Clustered indicies Was: [sqlite] A proposal for SQLite version 3.0

Reply via email to