[PERFORM] nested query on last n rows of huge table

2006-02-28 Thread jcfischer
Hi list

I'm fairly new to Postgres so bear with me. Googling and searching the
list, I didn't find anything that resembled my problem.

I have a large table with ca. 10 million inserts per day (fairly simple
data: timestam, a couple of id's and a varchar message)

I run a query every couple of minutes that looks at the new entries
since the last run and retrieves them for further processing (using a
WHERE eventtime > '2006-02-24 14:00:00' ) to limit to the most recent
entries

These queries run around 40-50 seconds (largely due to some LIKE %msg%
threwn in for good measure). Postgres performs a seq table scan on
those queries :-(

My idea is to limit the search to only the last n entries because I
found that a

SELECT * from table ORDER eventtime DESC limit 1000

is very fast. Because the inserts are in chronolgical order, I can
store the sequential id of the highest row from the last query and
subtract that from the current high row count to determine that number.

Is there a way to limit the expensive query to only those last 1000 (or
whatever) results?

I have tried to nest SELECTS but my SQL-fu is to limited to get
anything through the SQL processor :-)

thanks
Jens-Christian Fischer


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] nested query on last n rows of huge table

2006-02-28 Thread jcfischer
sorry: Postgres 8.0.2 server. The EXPLAIN ANALYZE for the query looks
like this:

explain analyze select
syslog.logs.eventtime,assets.hosts.name,syslog.processes.name as
process
from
syslog.logs,assets.hosts,assets.ipaddrs,assets.macaddrs,syslog.processes
where msg like '%session opened for user root%'
and syslog.logs.assets_ipaddr_id = assets.ipaddrs.id
and assets.ipaddrs.macaddr_id = assets.macaddrs.id
and assets.macaddrs.host_id = assets.hosts.id
and syslog.processes.id = syslog.logs.process_id and
eventtime > timestamp '2006-02-24 15:05:00'

Nested Loop  (cost=0.00..328832.34 rows=2 width=254) (actual
time=49389.924..49494.665 rows=45 loops=1)
  ->  Nested Loop  (cost=0.00..328826.32 rows=1 width=90) (actual
time=49365.709..49434.500 rows=45 loops=1)
->  Nested Loop  (cost=0.00..328820.30 rows=1 width=90) (actual
time=49327.211..49360.043 rows=45 loops=1)
  ->  Nested Loop  (cost=0.00..328814.27 rows=1 width=90)
(actual time=49327.183..49344.281 rows=45 loops=1)
->  Seq Scan on logs  (cost=0.00..328809.04 rows=1
width=16) (actual time=49314.928..49331.451 rows=45 loops=1)
  Filter: (((msg)::text ~~ '%session opened for
user root%'::text) AND (eventtime > '2006-02-24 15:05:00'::timestamp
without time zone))
->  Index Scan using "pk_syslog.processes" on
processes  (cost=0.00..5.21 rows=1 width=82) (actual time=0.278..0.280
rows=1 loops=45)
  Index Cond: (processes.id =
"outer".process_id)
  ->  Index Scan using "pk_assets.ipaddrs" on ipaddrs
(cost=0.00..6.01 rows=1 width=8) (actual time=0.344..0.346 rows=1
loops=45)
Index Cond: ("outer".assets_ipaddr_id = ipaddrs.id)
->  Index Scan using "pk_assets.macaddrs" on macaddrs
(cost=0.00..6.01 rows=1 width=8) (actual time=1.648..1.650 rows=1
loops=45)
  Index Cond: ("outer".macaddr_id = macaddrs.id)
  ->  Index Scan using "pk_assets.hosts" on hosts  (cost=0.00..6.01
rows=1 width=172) (actual time=1.330..1.331 rows=1 loops=45)
Index Cond: ("outer".host_id = hosts.id)
Total runtime: 49494.830 ms


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster