Hi all.  I will try to post all data I can think of.  The basic issue is 
that I have a Perl CGI script that shows the most recent load stats for a 
bunch of servers we use at our site for long, computing-intense programs. 
The environment is all RedHat Enterprise Linux 4 ( 2.6.9-42.26.ELsmp #1 
SMP i686 i686 i386 GNU/Linux)  SQLite version is 3.2.2.  As the number of 
records in my database has increased, the page load time has also 
increased, now to about 10 seconds, which is unacceptable, IMO.  The idea 
is to give users an idea of the least loaded machine to help them make the 
decision of which to use for their next program.  The whole concept of how 
to best do this is, of course, in and of itself a complicated issue.  At 
my org. there is no other system in place so this is how I do it.  I've 
gotten a big positive response from users.  The last implementation used a 
CSV file and each iteration of gathering the data from the servers only 
allowed the cron job to run every five minutes.  After upgrading to 
SQLite, the now minutely cron job has never failed to complete 
successfully in under one minute that I know of.  I'd like to focus on 
optimizing my current implementation, and then, if relevant, talk about 
other, perhaps better ways to do it.  Currently, both the old, 5-minutely 
CSV version and the current beta (SQLite) version run in parallel, and the 
CSV version is still more widely used since the page actually loads in a 
reasonable amount of time.  The CSV data are overwritten each time so the 
process of reading in that data is always the same.  The SQLite database 
is INSERTed into because I want to keep historical data.  The rationale 
for this is explained later.

The current system uses a DB with a table called stats that has more or 
less minutely data on various conditions for ~17 so-called peak machines. 
The table can be described thus:

CREATE TABLE stats (Machine text not null,Load real,Scratch text(4), Mem 
int, MemPctFree int, Procs int, Users int, Timestamp text(20) not null, 
Message text);

My CGI script fetches the most recent observation for each machine with a 
select statement of the form:

select a.* from stats a, (select Machine, max(Timestamp) as M from stats 
group by machine) b where a.machine=b.machine and a.timestamp=b.M order by 
load, Mem*MemPctFree desc, Scratch desc;

I floundered about trying to find a select statement that would simply 
give me the right answer; I suspect the main opportunity for optimization 
is in rewriting that statement. 

The CGI script uses DBI and the exact statement is my $aref = 
$dbh->selectall_arrayref($ss2); where $ss2 is the select statement above.

The idea being that the least loaded machines are first in the list.  The 
usefulness of the stats reported are of course debatable, but again, let's 
make sure the problem is addressed separately.  Some of the machines have 
very large scratch partitions which make those machines particularly 
suitable for jobs that generate very large files.

A typical group of rows might look like this:


Machine
Load
Scratch
space
available
Total
memory
(GB)
Free
memory
(%)
Running
processes
User(s)
Timestamp
Message
peaklx6
0
21G
12
96
1
1
[EMAIL PROTECTED]:28:35

peaklx1
0
58G
12
88
1
1
[EMAIL PROTECTED]:29:03

peaklx5
0
19G
12
86
1
1
[EMAIL PROTECTED]:28:33

peaklx4
0

12
84
1
4
[EMAIL PROTECTED]:28:31

peaklx2
0
59G
12
65
1
2
[EMAIL PROTECTED]:29:05

peaklx21
0

4
14
1
2
[EMAIL PROTECTED]:29:07

peaklx3
0.1

12
78
1
3
[EMAIL PROTECTED]:28:27

fstlx1
0.2
20G
4
5
1
11
[EMAIL PROTECTED]:28:37

peaklx22
1

4
53
2
0
[EMAIL PROTECTED]:29:09

peaklx24
2

4
58
3
0
[EMAIL PROTECTED]:29:12

peaklx29
2

4
40
3
3
[EMAIL PROTECTED]:28:25

peaklx23
2

4
39
3
0
[EMAIL PROTECTED]:29:11

peaklx30
2

4
29
3
0
[EMAIL PROTECTED]:28:29

peaklx28
2

4
28
3
0
[EMAIL PROTECTED]:28:23

peaklx26
3

12
69
4
0
[EMAIL PROTECTED]:29:17

peaklx25
4

4
21
5
0
[EMAIL PROTECTED]:29:15

peaklx27
4

4
17
5
2
[EMAIL PROTECTED]:28:21


The problem I am having is that the page load times are getting very, very 
long.  The table now has about 700,000 columns and the select statement is 
taking about 10 seconds to execute (I benchmarked various parts of the 
program and it's not the DB connect or loading the modules or whatnot). 
The DB file lives on an NFS mounted share but running the same query 
locally and interactively and both all take approximately the same amount 
of time.  The size of the DB file itself is currently 89MB.  The web 
server (and all the servers in question) all are dual processor 2.8GHz 
Xeons with a minimum of 4GB of memory (all running the same OS).  Some 
have hyperthreading turned on and others don't.

I tried putting a copy of the DB file in the same directory as the script; 
no dice.

I tried timing the subquery and found that the query 'select Machine, 
max(Timestamp) as M from stats group by machine' takes about 2 seconds 
while the entire query takes about 10.  I tried adding several indices to 
the table:

CREATE INDEX PKEY on stats (Machine, Timestamp);
CREATE INDEX TS on stats (Timestamp);
CREATE INDEX MACH on stats (Machine);

and tried the same benchmarks after each index creation but with the same 
times.  Are indices something that only work if you create them BEFORE you 
start adding data?

I have tried changing the default cache size to an obscenely large 
700000000 with no change in my benchmarks.  I assume increasing page size 
won't make a difference if cache size is that big.

I tried the very dangerous PRAGMA synchronous = OFF; However, that setting 
doesn't seem to persist between sessions, and I can't seem to figure out 
if it actually worked from within perl.  However it didn't make a 
difference interactively.

Finally, PRAGMA temp_store = MEMORY; did not help either.

Anyway I am a novice SQL programmer; I would describe myself as a rapidly 
improving, competent Perl programmer.  The only solution that I can think 
of is creating a second table that is the table with the most recent info, 
and then keeping the big database for historical reasons (eventually I 
hope to create charts and such from the historical data to chart trends 
and perhaps give my users (and, perhaps, eventually, a sort of poor-man's 
load balancing program) more info to go on when it comes to picking which 
machine to run their big job on).  I have also read about preparing a 
statement and then executing the resulting statement handle but does that 
work for SQLite?  I've had great luck just using fetchall_(array|hash)ref.

>From what I read, I should be able to get better performance out of this 
app.  I probably have wasted a lot of time pursuing dead ends, but I 
figured I'd do my due diligence before emailing the list.  I must also 
admit I am pretty impressed that despite my mucking around, the overall 
app remains as robust as ever (sure every once in a while as I was 
building the indices and whatnot the UPDATES failed but for the most part, 
she takes a licking and keeps on ticking).

So there it is; what next, my friends?

TIA,

Jonathan




-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to