Re: [sqlite] Re: How to optimize a select that gets 17 rows from a 700k row DB (via perl using DBI)?
On Sat, Jan 13, 2007 at 06:35:20PM -0500, [EMAIL PROTECTED] wrote: > I guess that's the trick, to have the "current" or at least "recent" > database and then the historical one. As of now, the process of polling > the 17 machines takes about 40 seconds or so (when I first started running > the process minutely, it was 20, so you can see I have to do something > soon :)) > > So assuming the two-db model, what's the trick to it? Here are some ideas > off the top of my head--can you (or any reader) please give me your > thoughts (be as brutal as you like--I'm under no illusion that I know what > I'm talking about): Use UPDATE to update the records in the "current" database and either separately INSERT into the historical DB or use an ON UPDATE trigger to INSERT into the historical DB when entries in the current db are updated. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Re: How to optimize a select that gets 17 rows from a 700k row DB (via perl using DBI)?
Hi Jonathan, Splitting and automatic updating of history via triggers isn't very complicated. BEGIN TRANSACTION; CREATE TABLE StatsCurrent ( MachineVARCHAR(16) NOT NULL, Load REAL, ScratchCHAR(4), MemINTEGER, MemPctFree INTEGER, Procs INTEGER, Users INTEGER, Timestamp VARCHAR(20) not null, MessageVARCHAR(160), PRIMARY KEY (Machine) ); CREATE TABLE StatsHistory ( MachineVARCHAR(16) NOT NULL, Load REAL, ScratchCHAR(4), MemINTEGER, MemPctFree INTEGER, Procs INTEGER, Users INTEGER, Timestamp VARCHAR(20) not null, MessageVARCHAR(160), PRIMARY KEY (Machine,Timestamp) ); CREATE INDEX TS ON StatsHistory (Timestamp); CREATE TRIGGER insert_stats AFTER INSERT ON StatsCurrent FOR EACH ROW BEGIN INSERT INTO StatsHistory (Machine, Load, Scratch, Mem, MemPctFree, Procs, Users, Timestamp, Message) VALUES (new.Machine, new.Load, new.Scratch, new.Mem, new.MemPctFree, new.Procs, new.Users, new.Timestamp, new.Message); END; CREATE VIEW show_stats AS SELECT * FROM StatsCurrent ORDER BY Load, Mem*MemPctFree desc, Scratch desc; COMMIT; BEGIN TRANSACTION; INSERT OR REPLACE into StatsCurrent (Machine, Load, Scratch, Mem, MemPctFree, Procs, Users, Timestamp, Message) VALUES ('m1', 0.11, 1010, 2000, 10, 45, 4, '2006-12-31 23:30:01', 'm1 sample 1'); INSERT OR REPLACE into StatsCurrent (Machine, Load, Scratch, Mem, MemPctFree, Procs, Users, Timestamp, Message) VALUES ('m2', 0.12, 1020, 2000, 11, 45, 4, '2006-12-31 23:30:02', 'm2 sample 1'); INSERT OR REPLACE into StatsCurrent (Machine, Load, Scratch, Mem, MemPctFree, Procs, Users, Timestamp, Message) VALUES ('m3', 0.13, 1030, 3000, 12, 45, 4, '2006-12-31 23:30:03', 'm3 sample 1'); COMMIT; BEGIN TRANSACTION; INSERT OR REPLACE into StatsCurrent (Machine, Load, Scratch, Mem, MemPctFree, Procs, Users, Timestamp, Message) VALUES ('m1', 0.16, 1030, 1000, 19, 45, 4, '2006-12-31 23:31:01', 'm1 sample 2'); INSERT OR REPLACE into StatsCurrent (Machine, Load, Scratch, Mem, MemPctFree, Procs, Users, Timestamp, Message) VALUES ('m2', 0.15, 1020, 2000, 18, 45, 4, '2006-12-31 23:31:02', 'm2 sample 2'); INSERT OR REPLACE into StatsCurrent (Machine, Load, Scratch, Mem, MemPctFree, Procs, Users, Timestamp, Message) VALUES ('m3', 0.14, 1010, 3000, 17, 45, 4, '2006-12-31 23:31:03', 'm3 sample 2'); COMMIT; SELECT * FROM show_stats; m3 0.14 1010 3000 17 45 4 2006-12-31 23:31:03 m3 sample 2 m2 0.15 1020 2000 18 45 4 2006-12-31 23:31:02 m2 sample 2 m1 0.16 1030 1000 19 45 4 2006-12-31 23:31:01 m1 sample 2 (more or less tested, StatsHistory really gets all the rows) HTH On Sat, 13 Jan 2007 18:35:20 -0500, you wrote: >First off, thanks for the help and sorry for the formatting of the >message. I didn't know how it was going to turn out and I probably was >overly optimistic as well as too verbose. > >Secondly, as I feared, seems like it was an XY question, so sorry for that >as well. > >I'll address the two replies I can see so far, and some of the info in >each section will likely overlap. > >Nico: > >I guess that's the trick, to have the "current" or at least "recent" >database and then the historical one. As of now, the process of polling >the 17 machines takes about 40 seconds or so (when I first started running >the process minutely, it was 20, so you can see I have to do something >soon :)) > >So assuming the two-db model, what's the trick to it? Here are some ideas >off the top of my head--can you (or any reader) please give me your >thoughts (be as brutal as you like--I'm under no illusion that I know what >I'm talking about): > >1) The "current" table only ever has 17 rows. >a)Have some kind of thing built in to the script that runs >minutely to copy the "current" data to the historical DB before kicking >off the part that updates the current data. >b)Add a trigger to the DB where the SQLite engine takes care of >the copy somehow--this would probably be more difficult since I don't know >how to add a trigger and I am thinking that the historical database will >be in a different file altogether. >c)Something I haven't thought of > >2) The current table is only allowed to have a maximum on N rows. Upon >reaching this size, data are moved to the historical database and only the >most recent observations for each machine are left in the current DB. Not >sure how I could do that. Is there a way to do this within SQLite? > >3) A job runs every night or week (at a time when people are least likely >to be using the page such as 3 am) that transfers the data from the >current DB to the historical, leaving only the most recent observation for >each machine. > >Jay: >
Re: Re: [sqlite] Re: How to optimize a select that gets 17 rows from a 700k row DB (via perl using DBI)?
I'm gonna cut all the content and say just one thing. If Sqlite supported table partitioning this would be piece of cake without any complications. http://www.postgresql.org/docs/8.2/static/ddl-partitioning.html -- It's time to get rid of your current e-mail client ... ... and start using si.Mail. It's small & free. ( http://simail.sourceforge.net/ ) - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Re: How to optimize a select that gets 17 rows from a 700k row DB (via perl using DBI)?
On 1/13/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: Jay: The closer to real-time, the better. The most often a cron job can run under Linux is minutely, and minutely is pretty good. I guess I could have the summary process occur at the end of the script that polls the machines. It could generate static HTML, which would presumably make the page load super fast. However, under the current regime, the process of creating that summary is going to take at least 10 seconds. 40 seconds for polling + 10 seconds for summarizing=50 seconds, and that number is only going to get bigger! So I'll have to figure out a better table structure anyway. You don't have to run under cron for something like that. Loading and unloading the program several times a minute is not very efficient anyway. Just let it run continuously and use sleep() (or a timer) to yield your time slice until the next time you want to run. What's the advantage of a database for this application? If all you need is to load balance it would seem simpler to just query each machine for it's load and react accordingly. I'm not sure if Perl supports SOAP interfaces or serializing data over an http connection. You might look into that for later. Are indices something that only work if you create them BEFORE you start adding data? No. The index on stats.Timestamp should speed up finding the record max(Timestamp). It will speed up queries for existing data to. It's like a table of contents for a book. Here's what I would try: 1. Can you speed up this? select Machine, max(Timestamp) as M from stats group by machine If this is trying to get the machine with the latest time stamp then perhaps this might be faster: select Machine, Timestamp as M from stats order by Timestamp desc limit 1 It gets one record instead of summarizing a lot of data. Your code does a join of two tables on the machine column and timestamp: select a.* from stats a, (select Machine, max(Timestamp) as M from stats group by machine) b where a.machine=b.machine and a.timestamp=b.M order by load, Mem*MemPctFree desc, Scratch desc; Did you index both tables on ( machine, timestamp )? It's got to match them up so an index will speed up the search of both sides when it tries to match them up. Additional thoughts: In general, I think splitting the tables up is the way to go. Any further comments/suggestions appreciated! Jonathan - To unsubscribe, send email to [EMAIL PROTECTED] - -- -- The PixAddixImage Collector suite: http://groups-beta.google.com/group/pixaddix SqliteImporter and SqliteReplicator: Command line utilities for Sqlite http://www.reddawn.net/~jsprenkl/Sqlite Cthulhu Bucks! http://www.cthulhubucks.com
[sqlite] Re: How to optimize a select that gets 17 rows from a 700k row DB (via perl using DBI)?
First off, thanks for the help and sorry for the formatting of the message. I didn't know how it was going to turn out and I probably was overly optimistic as well as too verbose. Secondly, as I feared, seems like it was an XY question, so sorry for that as well. I'll address the two replies I can see so far, and some of the info in each section will likely overlap. Nico: I guess that's the trick, to have the "current" or at least "recent" database and then the historical one. As of now, the process of polling the 17 machines takes about 40 seconds or so (when I first started running the process minutely, it was 20, so you can see I have to do something soon :)) So assuming the two-db model, what's the trick to it? Here are some ideas off the top of my head--can you (or any reader) please give me your thoughts (be as brutal as you like--I'm under no illusion that I know what I'm talking about): 1) The "current" table only ever has 17 rows. a)Have some kind of thing built in to the script that runs minutely to copy the "current" data to the historical DB before kicking off the part that updates the current data. b)Add a trigger to the DB where the SQLite engine takes care of the copy somehow--this would probably be more difficult since I don't know how to add a trigger and I am thinking that the historical database will be in a different file altogether. c)Something I haven't thought of 2) The current table is only allowed to have a maximum on N rows. Upon reaching this size, data are moved to the historical database and only the most recent observations for each machine are left in the current DB. Not sure how I could do that. Is there a way to do this within SQLite? 3) A job runs every night or week (at a time when people are least likely to be using the page such as 3 am) that transfers the data from the current DB to the historical, leaving only the most recent observation for each machine. Jay: The closer to real-time, the better. The most often a cron job can run under Linux is minutely, and minutely is pretty good. I guess I could have the summary process occur at the end of the script that polls the machines. It could generate static HTML, which would presumably make the page load super fast. However, under the current regime, the process of creating that summary is going to take at least 10 seconds. 40 seconds for polling + 10 seconds for summarizing=50 seconds, and that number is only going to get bigger! So I'll have to figure out a better table structure anyway. Additional thoughts: In general, I think splitting the tables up is the way to go. Any further comments/suggestions appreciated! Jonathan - To unsubscribe, send email to [EMAIL PROTECTED] -