[SQL] How to keep at-most N rows per group? periodic DELETEs or constraints or..?

2008-01-08 Thread Jamie Tufnell
Hi,

I have a table that stores per-user histories of recently viewed items
and I'd like to limit the amount of history items to <= 50 per user.
I'm considering doing this with a query run from cron every so often
but I'm not happy with what I've come up with so far, and since it's a
quite active table I thought I'd ask here to see if there's a more
efficient way.

Right now the table structure is as follows...

user_item_history: id (PK), user_id (FK), item_id (FK), timestamp

For user_ids that have more than 50 rows, I want to keep the most
recent 50 and delete the rest.

The most obvious way of doing this for me is:

--
-- Get the user_ids with 50 or more history entries like this
--
SELECT user_id, count(*)
FROM user_scene_history
GROUP BY user_id
HAVING count(*) > 50;

--
-- Then iterate the ids above (_user_id)
--
DELETE FROM user_scene_history
WHERE user_id = _user_id AND id NOT IN (
SELECT id FROM user_scene_history
WHERE user_id = _user_id
ORDER BY timestamp DESC
LIMIT 50);

I've left out the simple logic tying the above two queries together
for clarity..

I haven't actually tested this but while I assume it would work I
imagine there is a neater and possibly more efficient way of attacking
this.  I'm also open to different approaches of limiting the user's
history too ... perhaps with table constraints so they can simply
never exceed 50 entries? But I'm not sure how to do this..

Any help would be greatly appreciated..

Thanks,
Jamie

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [SQL] How to keep at-most N rows per group? periodic DELETEs or constraints or..?

2008-01-08 Thread codeWarrior
How about using a trigger to call a stored procedure ? [ON INSERT to 
user_item_history DO ...]

and have your stored procedure count the records for that user and delete 
the oldest record if necessary...

IF (SELECT COUNT(*) WHERE user_id = NEW.user_id) >= 50 THEN

-- DELETE THE OLDEST RECORD

 END IF;

RETURN NEW.*

""Jamie Tufnell"" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
> Hi,
>
> I have a table that stores per-user histories of recently viewed items
> and I'd like to limit the amount of history items to <= 50 per user.
> I'm considering doing this with a query run from cron every so often
> but I'm not happy with what I've come up with so far, and since it's a
> quite active table I thought I'd ask here to see if there's a more
> efficient way.
>
> Right now the table structure is as follows...
>
> user_item_history: id (PK), user_id (FK), item_id (FK), timestamp
>
> For user_ids that have more than 50 rows, I want to keep the most
> recent 50 and delete the rest.
>
> The most obvious way of doing this for me is:
>
> --
> -- Get the user_ids with 50 or more history entries like this
> --
> SELECT user_id, count(*)
> FROM user_scene_history
> GROUP BY user_id
> HAVING count(*) > 50;
>
> --
> -- Then iterate the ids above (_user_id)
> --
> DELETE FROM user_scene_history
> WHERE user_id = _user_id AND id NOT IN (
>SELECT id FROM user_scene_history
>WHERE user_id = _user_id
>ORDER BY timestamp DESC
>LIMIT 50);
>
> I've left out the simple logic tying the above two queries together
> for clarity..
>
> I haven't actually tested this but while I assume it would work I
> imagine there is a neater and possibly more efficient way of attacking
> this.  I'm also open to different approaches of limiting the user's
> history too ... perhaps with table constraints so they can simply
> never exceed 50 entries? But I'm not sure how to do this..
>
> Any help would be greatly appreciated..
>
> Thanks,
> Jamie
>
> ---(end of broadcast)---
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>   choose an index scan if your joining column's datatypes do not
>   match
> 



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [SQL] How to keep at-most N rows per group? periodic DELETEs or constraints or..?

2008-01-08 Thread Jamie Tufnell
Hi codeWarrior,

codeWarrior wrote:
> > For user_ids that have more than 50 rows, I want to keep the most
> > recent 50 and delete the rest.
> How about using a trigger to call a stored procedure ? [ON INSERT to
> user_item_history DO ...]

[snip]

Thanks for your input!  I've implemented this but I'm concerned about
performance.  As I mentioned it's frequently being added to and this
function will be called maybe a couple of times a second.  In my brief
initial tests it seems like this is running quite slowly...

Just to make sure I haven't done anything obviously wrong, I've
included my implementation below.. if you could look over it I'd
really appreciate it.

CREATE OR REPLACE FUNCTION user_item_history_limit() RETURNS TRIGGER
AS $_$
DECLARE
threshold integer = 50;
numrows integer;
BEGIN
SELECT INTO numrows count(*) FROM user_item_history WHERE user_id
= new.user_id;
IF numrows > threshold THEN
DELETE FROM user_item_history WHERE user_id = new.user_id AND
id NOT IN (
SELECT id FROM user_item_history
WHERE user_id = new.user_id
ORDER BY timestamp DESC LIMIT threshold);
RAISE NOTICE '% rows exceeds threshold of % for user_id %;
trimming..', numrows, threshold, new.user_id;
END IF;
RETURN new;
END;
$_$
LANGUAGE plpgsql;

CREATE TRIGGER user_item_history_limit AFTER INSERT ON user_item_history
FOR EACH ROW EXECUTE PROCEDURE user_item_history_limit();

Any suggestions greatly appreciated!

Thanks again,
Jamie

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [SQL] How to keep at-most N rows per group? periodic DELETEs or constraints or..?

2008-01-08 Thread Erik Jones


On Jan 8, 2008, at 8:24 AM, Jamie Tufnell wrote:


Hi,

I have a table that stores per-user histories of recently viewed items
and I'd like to limit the amount of history items to <= 50 per user.
I'm considering doing this with a query run from cron every so often
but I'm not happy with what I've come up with so far, and since it's a
quite active table I thought I'd ask here to see if there's a more
efficient way.

Right now the table structure is as follows...

user_item_history: id (PK), user_id (FK), item_id (FK), timestamp

For user_ids that have more than 50 rows, I want to keep the most
recent 50 and delete the rest.


Create an row trigger that fires after insert containing something  
along the lines of :


DELETE FROM user_item_history
WHERE id IN (SELECT id
   FROM user_item_history
   WHERE user_id=NEW.user_id
   ORDER BY timestamp DESC
   OFFSET 50);

Erik Jones

DBA | Emma®
[EMAIL PROTECTED]
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [SQL] How to keep at-most N rows per group? periodic DELETEs or constraints or..?

2008-01-08 Thread codeWarrior
Jamie:

I think you are probably having slowdown issues in your "DELETE FROM WHERE 
NOT IN SELECT ORDER BY DESCENDING" construct -- that seems a bit convoluted 
to me



NOT IN is what is probably slowing you down the most



ALSO: It looks to me like you have a column named "timestamp' ??? This is 
bad practice since "timestamp" is a reserved word... You really ought NOT to 
use reserved words for column names... different debate.



Why bother deleting records anyway ? Why not alter your query that tracks 
the 50 records to LIMIT 50 ???









""Jamie Tufnell"" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
> Hi codeWarrior,
>
> codeWarrior wrote:
>> > For user_ids that have more than 50 rows, I want to keep the most
>> > recent 50 and delete the rest.
>> How about using a trigger to call a stored procedure ? [ON INSERT to
>> user_item_history DO ...]
>
> [snip]
>
> Thanks for your input!  I've implemented this but I'm concerned about
> performance.  As I mentioned it's frequently being added to and this
> function will be called maybe a couple of times a second.  In my brief
> initial tests it seems like this is running quite slowly...
>
> Just to make sure I haven't done anything obviously wrong, I've
> included my implementation below.. if you could look over it I'd
> really appreciate it.
>
> CREATE OR REPLACE FUNCTION user_item_history_limit() RETURNS TRIGGER
>AS $_$
> DECLARE
>threshold integer = 50;
>numrows integer;
> BEGIN
>SELECT INTO numrows count(*) FROM user_item_history WHERE user_id
> = new.user_id;
>IF numrows > threshold THEN
>DELETE FROM user_item_history WHERE user_id = new.user_id AND
> id NOT IN (
>SELECT id FROM user_item_history
>WHERE user_id = new.user_id
>ORDER BY timestamp DESC LIMIT threshold);
>RAISE NOTICE '% rows exceeds threshold of % for user_id %;
> trimming..', numrows, threshold, new.user_id;
>END IF;
>RETURN new;
> END;
> $_$
>LANGUAGE plpgsql;
>
> CREATE TRIGGER user_item_history_limit AFTER INSERT ON user_item_history
>FOR EACH ROW EXECUTE PROCEDURE user_item_history_limit();
>
> Any suggestions greatly appreciated!
>
> Thanks again,
> Jamie
>
> ---(end of broadcast)---
> TIP 7: You can help support the PostgreSQL project by donating at
>
>http://www.postgresql.org/about/donate
> 



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [SQL] How to keep at-most N rows per group? periodic DELETEs or constraints or..?

2008-01-08 Thread Jamie Tufnell
On 1/8/08, codeWarrior <[EMAIL PROTECTED]> wrote:
> Jamie:
>
> I think you are probably having slowdown issues in your "DELETE FROM WHERE
> NOT IN SELECT ORDER BY DESCENDING" construct -- that seems a bit convoluted
> to me

Hmm so rather than NOT IN ( .. LIMIT 50)  would you suggest IN ( ...
OFFSET 50) like in Erik's example?  Or something else entirely?

> ALSO: It looks to me like you have a column named "timestamp' ??? This is
> bad practice since "timestamp" is a reserved word... You really ought NOT to
> use reserved words for column names... different debate.

I do realize it would be better to use something else and thanks for
the tip :-)  This is an established database and "timestamp" has been
used in other tables which is why I stuck to it here.. one day when
time permits maybe I'll rename them all!

> Why bother deleting records anyway ? Why not alter your query that tracks
> the 50 records to LIMIT 50 ???

The read query does LIMIT 50 and the reason for deleting the rest of
the records is because they're not needed by the application and
there's loads of them being created all the time (currently several
million unnecessary rows) -- I imagine eventually this will slow
things down?

Do you think a regular batch process to delete rows might be more
appropriate than a trigger in this scenario?

Thanks,
Jamie

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[SQL] Re: How to keep at-most N rows per group? periodic DELETEs or constraints or..?

2008-01-08 Thread Steve Midgley
I think what you want is related to this post on how to create a FIFO 
queue in Postgres:


http://people.planetpostgresql.org/greg/index.php?/archives/89-Implementing-a-queue-in-SQL-Postgres-version.html

The major difference is that you want a FIFO queue per user_id, so the 
triggering code would want to bump old records aggregating on user_id 
to calculate the "limit" subquery. His original code is this:



 DELETE FROM q WHERE id NOT IN
(SELECT id FROM q ORDER BY id DESC LIMIT maxrows);

And that subquery is where (I think!) you'd want to add "where user_id 
= [user_id]" - I'm not sure how you'll pass user_id into this function, 
maybe someone else can help with that?


Hopefully this is useful?

Steve

At 06:24 AM 1/8/2008, [EMAIL PROTECTED] wrote:

Date: Tue, 8 Jan 2008 14:24:22 +
From: "Jamie Tufnell" <[EMAIL PROTECTED]>
To: [email protected]
Subject: How to keep at-most N rows per group? periodic DELETEs or 
constraints or..?
Message-ID: 
<[EMAIL PROTECTED]>


Hi,

I have a table that stores per-user histories of recently viewed items
and I'd like to limit the amount of history items to <= 50 per user.
I'm considering doing this with a query run from cron every so often
but I'm not happy with what I've come up with so far, and since it's a
quite active table I thought I'd ask here to see if there's a more
efficient way.

Right now the table structure is as follows...

user_item_history: id (PK), user_id (FK), item_id (FK), timestamp

For user_ids that have more than 50 rows, I want to keep the most
recent 50 and delete the rest.

The most obvious way of doing this for me is:

--
-- Get the user_ids with 50 or more history entries like this
--
SELECT user_id, count(*)
FROM user_scene_history
GROUP BY user_id
HAVING count(*) > 50;

--
-- Then iterate the ids above (_user_id)
--
DELETE FROM user_scene_history
WHERE user_id = _user_id AND id NOT IN (
SELECT id FROM user_scene_history
WHERE user_id = _user_id
ORDER BY timestamp DESC
LIMIT 50);

I've left out the simple logic tying the above two queries together
for clarity..

I haven't actually tested this but while I assume it would work I
imagine there is a neater and possibly more efficient way of attacking
this.  I'm also open to different approaches of limiting the user's
history too ... perhaps with table constraints so they can simply
never exceed 50 entries? But I'm not sure how to do this..

Any help would be greatly appreciated..

Thanks,
Jamie


[SQL] trigger for TRUNCATE?

2008-01-08 Thread Gerardo Herzig
Hi all. Acording to the docs, TRUNCATE will not fire a DELETE trigger on 
the table being truncated.

There is a way to capture a TRUNCATE in any way?

Thanks!
Gerardo

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [SQL] How to keep at-most N rows per group? periodic DELETEs or constraints or..?

2008-01-08 Thread Jamie Tufnell
On 1/8/08, Erik Jones <[EMAIL PROTECTED]> wrote:
> > Hmm so rather than NOT IN ( .. LIMIT 50)  would you suggest IN ( ...
> > OFFSET 50) like in Erik's example?  Or something else entirely?
>
> Well, that would give you some gain.  Think about it like this:  once
> a given user's history records are at 50 and you insert a row, if you
> use the NOT IN clause your comparing each of 51 rows to each of the
> 50 you want to keep to find the one that can go while with the IN
> version your comparing each of the 51 rows to the 1 that can go.  Now
> how much of a gain that will be I can't say, YMMV.  I don't remember
> you saying anything about it so I'll also go ahead and point out that
> you most likely will want an index on user_id if you don't already.

Thanks for the explanation Erik.  I did already have the index, but
I've reimplemented using IN/OFFSET instead of NOT IN/LIMIT and it does
indeed seem to be faster.

> > Do you think a regular batch process to delete rows might be more
> > appropriate than a trigger in this scenario?
>
> That depends on your usage pattern.  Assuming you aren't running user
> history report queries constantly that's probably what I'd do.  Also,
> if you're sure you won't need anything but the last 50 records per
> user, I'd definitely agree with cleaning out data that's not needed.

OK cool, thanks for your advice Erik.

Cheers,
Jamie

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [SQL] How to keep at-most N rows per group? periodic DELETEs or constraints or..?

2008-01-08 Thread Erik Jones

On Jan 8, 2008, at 11:41 AM, Jamie Tufnell wrote:


On 1/8/08, codeWarrior <[EMAIL PROTECTED]> wrote:

Jamie:

I think you are probably having slowdown issues in your "DELETE  
FROM WHERE
NOT IN SELECT ORDER BY DESCENDING" construct -- that seems a bit  
convoluted

to me


Hmm so rather than NOT IN ( .. LIMIT 50)  would you suggest IN ( ...
OFFSET 50) like in Erik's example?  Or something else entirely?


Well, that would give you some gain.  Think about it like this:  once  
a given user's history records are at 50 and you insert a row, if you  
use the NOT IN clause your comparing each of 51 rows to each of the  
50 you want to keep to find the one that can go while with the IN  
version your comparing each of the 51 rows to the 1 that can go.  Now  
how much of a gain that will be I can't say, YMMV.  I don't remember  
you saying anything about it so I'll also go ahead and point out that  
you most likely will want an index on user_id if you don't already.


ALSO: It looks to me like you have a column named "timestamp' ???  
This is
bad practice since "timestamp" is a reserved word... You really  
ought NOT to

use reserved words for column names... different debate.


I do realize it would be better to use something else and thanks for
the tip :-)  This is an established database and "timestamp" has been
used in other tables which is why I stuck to it here.. one day when
time permits maybe I'll rename them all!

Why bother deleting records anyway ? Why not alter your query that  
tracks

the 50 records to LIMIT 50 ???


The read query does LIMIT 50 and the reason for deleting the rest of
the records is because they're not needed by the application and
there's loads of them being created all the time (currently several
million unnecessary rows) -- I imagine eventually this will slow
things down?

Do you think a regular batch process to delete rows might be more
appropriate than a trigger in this scenario?


That depends on your usage pattern.  Assuming you aren't running user  
history report queries constantly that's probably what I'd do.  Also,  
if you're sure you won't need anything but the last 50 records per  
user, I'd definitely agree with cleaning out data that's not needed.


Erik Jones

DBA | Emma®
[EMAIL PROTECTED]
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [SQL] trigger for TRUNCATE?

2008-01-08 Thread Chris Browne
[EMAIL PROTECTED] (Gerardo Herzig) writes:
> Hi all. Acording to the docs, TRUNCATE will not fire a DELETE trigger
> on the table being truncated.
> There is a way to capture a TRUNCATE in any way?

I think there's some sort of "to do" on that...

It ought to be not *too* difficult (I imagine!) to be able to
associate a trigger with the TRUNCATE action, and therefore run some
stored function any time TRUNCATE takes place.

For the Slony-I replication system, it would be attractive for this to
lead to attaching two functions:
  - One function would return an exception so that TRUNCATE against
a subscriber node would fail...

  - Another would pretty much be as simple as submitting an event;
perform createEvent('_ourcluster', 'TRUNCATE_TABLE', table_id);

A new event, TRUNCATE_TABLE, would do a TRUNCATE against the
subscribers.

This represents a pretty easy enhancement, given the new kind of
trigger.
-- 
(reverse (concatenate 'string "moc.enworbbc" "@" "enworbbc"))
http://www3.sympatico.ca/cbbrowne/nonrdbms.html
Frisbeetarianism: The belief that when  you die, your  soul goes up on
the roof and gets stuck...

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [SQL] trigger for TRUNCATE?

2008-01-08 Thread Pavel Stehule
Hello

theoretically you can have trigger on any statement, but I am not sure
about conformance with std. But, you can wrap TRUNCATE statement into
some procedure, and then call this procedure with some other actions.

Regards
Pavel Stehule

On 08/01/2008, Chris Browne <[EMAIL PROTECTED]> wrote:
> [EMAIL PROTECTED] (Gerardo Herzig) writes:
> > Hi all. Acording to the docs, TRUNCATE will not fire a DELETE trigger
> > on the table being truncated.
> > There is a way to capture a TRUNCATE in any way?
>
> I think there's some sort of "to do" on that...
>
> It ought to be not *too* difficult (I imagine!) to be able to
> associate a trigger with the TRUNCATE action, and therefore run some
> stored function any time TRUNCATE takes place.
>
> For the Slony-I replication system, it would be attractive for this to
> lead to attaching two functions:
>   - One function would return an exception so that TRUNCATE against
> a subscriber node would fail...
>
>   - Another would pretty much be as simple as submitting an event;
> perform createEvent('_ourcluster', 'TRUNCATE_TABLE', table_id);
>
> A new event, TRUNCATE_TABLE, would do a TRUNCATE against the
> subscribers.
>
> This represents a pretty easy enhancement, given the new kind of
> trigger.
> --
> (reverse (concatenate 'string "moc.enworbbc" "@" "enworbbc"))
> http://www3.sympatico.ca/cbbrowne/nonrdbms.html
> Frisbeetarianism: The belief that when  you die, your  soul goes up on
> the roof and gets stuck...
>
> ---(end of broadcast)---
> TIP 3: Have you checked our extensive FAQ?
>
>http://www.postgresql.org/docs/faq
>

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org