Re: [sqlite] Re: How to optimize a select that gets 17 rows from a 700k row DB (via perl using DBI)?

2007-01-16 Thread Nicolas Williams
On Sat, Jan 13, 2007 at 06:35:20PM -0500, [EMAIL PROTECTED] wrote:
> I guess that's the trick, to have the "current" or at least "recent" 
> database and then the historical one.  As of now, the process of polling 
> the 17 machines takes about 40 seconds or so (when I first started running 
> the process minutely, it was 20, so you can see I have to do something 
> soon :))
> 
> So assuming the two-db model, what's the trick to it?  Here are some ideas 
> off the top of my head--can you (or any reader) please give me your 
> thoughts (be as brutal as you like--I'm under no illusion that I know what 
> I'm talking about):

Use UPDATE to update the records in the "current" database and either
separately INSERT into the historical DB or use an ON UPDATE trigger to
INSERT into the historical DB when entries in the current db are
updated.

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Re: How to optimize a select that gets 17 rows from a 700k row DB (via perl using DBI)?

2007-01-14 Thread Kees Nuyt

Hi Jonathan,

Splitting and automatic updating of history via triggers isn't
very complicated.

BEGIN TRANSACTION;

CREATE TABLE StatsCurrent (
MachineVARCHAR(16) NOT NULL,
Load   REAL,
ScratchCHAR(4),
MemINTEGER,
MemPctFree INTEGER,
Procs  INTEGER,
Users  INTEGER,
Timestamp  VARCHAR(20) not null,
MessageVARCHAR(160),
PRIMARY KEY (Machine)
);

CREATE TABLE StatsHistory (
MachineVARCHAR(16) NOT NULL,
Load   REAL,
ScratchCHAR(4),
MemINTEGER,
MemPctFree INTEGER,
Procs  INTEGER,
Users  INTEGER,
Timestamp  VARCHAR(20) not null,
MessageVARCHAR(160),
PRIMARY KEY (Machine,Timestamp)
);
CREATE INDEX TS ON StatsHistory (Timestamp);

CREATE TRIGGER insert_stats 
AFTER INSERT ON StatsCurrent FOR EACH ROW
BEGIN
INSERT INTO StatsHistory
(Machine, Load, Scratch, Mem, MemPctFree, Procs, Users,
Timestamp, Message)
VALUES
(new.Machine, new.Load, new.Scratch, new.Mem,
new.MemPctFree, new.Procs, new.Users, new.Timestamp,
new.Message);
END;

CREATE VIEW show_stats AS
SELECT *
  FROM StatsCurrent
 ORDER BY Load, Mem*MemPctFree desc, Scratch desc;
COMMIT;

BEGIN TRANSACTION;
INSERT OR REPLACE into StatsCurrent
(Machine, Load, Scratch, Mem, MemPctFree, Procs, Users,
Timestamp, Message)
VALUES
('m1', 0.11, 1010, 2000, 10, 45, 4, '2006-12-31 23:30:01',
'm1 sample 1');
INSERT OR REPLACE into StatsCurrent
(Machine, Load, Scratch, Mem, MemPctFree, Procs, Users,
Timestamp, Message)
VALUES
('m2', 0.12, 1020, 2000, 11, 45, 4, '2006-12-31 23:30:02',
'm2 sample 1');
INSERT OR REPLACE into StatsCurrent
(Machine, Load, Scratch, Mem, MemPctFree, Procs, Users,
Timestamp, Message)
VALUES
('m3', 0.13, 1030, 3000, 12, 45, 4, '2006-12-31 23:30:03',
'm3 sample 1');
COMMIT;

BEGIN TRANSACTION;
INSERT OR REPLACE into StatsCurrent
(Machine, Load, Scratch, Mem, MemPctFree, Procs, Users,
Timestamp, Message)
VALUES
('m1', 0.16, 1030, 1000, 19, 45, 4, '2006-12-31 23:31:01',
'm1 sample 2');
INSERT OR REPLACE into StatsCurrent
(Machine, Load, Scratch, Mem, MemPctFree, Procs, Users,
Timestamp, Message)
VALUES
('m2', 0.15, 1020, 2000, 18, 45, 4, '2006-12-31 23:31:02',
'm2 sample 2');
INSERT OR REPLACE into StatsCurrent
(Machine, Load, Scratch, Mem, MemPctFree, Procs, Users,
Timestamp, Message)
VALUES
('m3', 0.14, 1010, 3000, 17, 45, 4, '2006-12-31 23:31:03',
'm3 sample 2');
COMMIT;

SELECT * FROM show_stats;

m3 0.14 1010 3000 17 45 4 2006-12-31 23:31:03 m3 sample 2
m2 0.15 1020 2000 18 45 4 2006-12-31 23:31:02 m2 sample 2
m1 0.16 1030 1000 19 45 4 2006-12-31 23:31:01 m1 sample 2

(more or less tested, StatsHistory really gets all the rows)

HTH

On Sat, 13 Jan 2007 18:35:20 -0500, you wrote:

>First off, thanks for the help and sorry for the formatting of the 
>message.  I didn't know how it was going to turn out and I probably was 
>overly optimistic as well as too verbose.
>
>Secondly, as I feared, seems like it was an XY question, so sorry for that 
>as well.
>
>I'll address the two replies I can see so far, and some of the info in 
>each section will likely overlap.
>
>Nico:
>
>I guess that's the trick, to have the "current" or at least "recent" 
>database and then the historical one.  As of now, the process of polling 
>the 17 machines takes about 40 seconds or so (when I first started running 
>the process minutely, it was 20, so you can see I have to do something 
>soon :))
>
>So assuming the two-db model, what's the trick to it?  Here are some ideas 
>off the top of my head--can you (or any reader) please give me your 
>thoughts (be as brutal as you like--I'm under no illusion that I know what 
>I'm talking about):
>
>1)  The "current" table only ever has 17 rows. 
>a)Have some kind of thing built in to the script that runs 
>minutely to copy the "current" data to the historical DB before kicking 
>off the part that updates the current data.
>b)Add a trigger to the DB where the SQLite engine takes care of 
>the copy somehow--this would probably be more difficult since I don't know 
>how to add a trigger and I am thinking that the historical database will 
>be in a different file altogether.
>c)Something I haven't thought of
>
>2)  The current table is only allowed to have a maximum on N rows.  Upon 
>reaching this size, data are moved to the historical database and only the 
>most recent observations for each machine are left in the current DB.  Not 
>sure how I could do that.  Is there a way to do this within SQLite?
>
>3)  A job runs every night or week (at a time when people are least likely 
>to be using the page such as 3 am) that transfers the data from the 
>current DB to the historical, leaving only the most recent observation for 
>each machine.
>
>Jay:
>

Re: Re: [sqlite] Re: How to optimize a select that gets 17 rows from a 700k row DB (via perl using DBI)?

2007-01-14 Thread Miha Vrhovnik
I'm gonna cut all the content and say just one thing.
If Sqlite supported table partitioning this would be piece of cake without any 
complications.

http://www.postgresql.org/docs/8.2/static/ddl-partitioning.html

--
It's time to get rid of your current e-mail client ...
... and start using si.Mail.

It's small & free. ( http://simail.sourceforge.net/ )

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Re: How to optimize a select that gets 17 rows from a 700k row DB (via perl using DBI)?

2007-01-14 Thread Jay Sprenkle

On 1/13/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:


Jay:

The closer to real-time, the better.  The most often a cron job can run
under Linux is minutely, and minutely is pretty good.  I guess I could
have the summary process occur at the end of the script that polls the
machines.  It could generate static HTML, which would presumably make the
page load super fast.  However, under the current regime, the process of
creating that summary is going to take at least 10 seconds.  40 seconds
for polling + 10 seconds for summarizing=50 seconds, and that number is
only going to get bigger!  So I'll have to figure out a better table
structure anyway.


You don't have to run under cron for something like that. Loading and
unloading the program several times a minute is not very efficient anyway.
Just let it run continuously and use sleep() (or a timer) to yield your time
slice until
the next time you want to run.

What's the advantage of a database for this application? If all you need is
to load balance it would seem simpler to just query each machine for it's
load and react accordingly. I'm not sure if Perl supports SOAP
interfaces or serializing data over an http connection. You might look
into that for later.


Are indices something that only work if you create them BEFORE you

start adding data?

No. The index on stats.Timestamp should speed up finding
the record max(Timestamp). It will speed up queries for existing data to.
It's like a table of contents for a book.

Here's what I would try:

1. Can you speed up this?
select Machine, max(Timestamp) as M from stats group by machine

If this is trying to get the machine with the latest time stamp then
perhaps this might be faster:
 select Machine, Timestamp as M from stats order by Timestamp desc limit 1
It gets one record instead of summarizing a lot of data.

Your code does a join of two tables on the machine column and timestamp:

select a.* from stats a, (select Machine, max(Timestamp) as M from stats
group by machine) b where a.machine=b.machine and a.timestamp=b.M order by
load, Mem*MemPctFree desc, Scratch desc;

Did you index both tables on ( machine, timestamp )?
It's got to match them up so an index will speed up the search of both sides
when it tries to match them up.






Additional thoughts:

In general, I think splitting the tables up is the way to go.  Any further
comments/suggestions appreciated!

Jonathan








-

To unsubscribe, send email to [EMAIL PROTECTED]


-






--
--
The PixAddixImage Collector suite:
http://groups-beta.google.com/group/pixaddix

SqliteImporter and SqliteReplicator: Command line utilities for Sqlite
http://www.reddawn.net/~jsprenkl/Sqlite

Cthulhu Bucks!
http://www.cthulhubucks.com


[sqlite] Re: How to optimize a select that gets 17 rows from a 700k row DB (via perl using DBI)?

2007-01-13 Thread Jonathan . W . Crane
First off, thanks for the help and sorry for the formatting of the 
message.  I didn't know how it was going to turn out and I probably was 
overly optimistic as well as too verbose.

Secondly, as I feared, seems like it was an XY question, so sorry for that 
as well.

I'll address the two replies I can see so far, and some of the info in 
each section will likely overlap.

Nico:

I guess that's the trick, to have the "current" or at least "recent" 
database and then the historical one.  As of now, the process of polling 
the 17 machines takes about 40 seconds or so (when I first started running 
the process minutely, it was 20, so you can see I have to do something 
soon :))

So assuming the two-db model, what's the trick to it?  Here are some ideas 
off the top of my head--can you (or any reader) please give me your 
thoughts (be as brutal as you like--I'm under no illusion that I know what 
I'm talking about):

1)  The "current" table only ever has 17 rows. 
a)Have some kind of thing built in to the script that runs 
minutely to copy the "current" data to the historical DB before kicking 
off the part that updates the current data.
b)Add a trigger to the DB where the SQLite engine takes care of 
the copy somehow--this would probably be more difficult since I don't know 
how to add a trigger and I am thinking that the historical database will 
be in a different file altogether.
c)Something I haven't thought of

2)  The current table is only allowed to have a maximum on N rows.  Upon 
reaching this size, data are moved to the historical database and only the 
most recent observations for each machine are left in the current DB.  Not 
sure how I could do that.  Is there a way to do this within SQLite?

3)  A job runs every night or week (at a time when people are least likely 
to be using the page such as 3 am) that transfers the data from the 
current DB to the historical, leaving only the most recent observation for 
each machine.

Jay:

The closer to real-time, the better.  The most often a cron job can run 
under Linux is minutely, and minutely is pretty good.  I guess I could 
have the summary process occur at the end of the script that polls the 
machines.  It could generate static HTML, which would presumably make the 
page load super fast.  However, under the current regime, the process of 
creating that summary is going to take at least 10 seconds.  40 seconds 
for polling + 10 seconds for summarizing=50 seconds, and that number is 
only going to get bigger!  So I'll have to figure out a better table 
structure anyway.

Additional thoughts:

In general, I think splitting the tables up is the way to go.  Any further 
comments/suggestions appreciated!

Jonathan






-
To unsubscribe, send email to [EMAIL PROTECTED]
-