Re: [GENERAL] Deletion Challenge

2015-12-15 Thread Berend Tober

Jim Nasby wrote:

On 12/9/15 7:59 PM, Berend Tober wrote:

This project is a game, btw, described at


You might be interested in https://schemaverse.com/



Schemaverse looks somewhat interesting. Seems like it and Fairwinds 
share in common Postgresql as a foundation, but they are very different 
games: while they both have an aspect of conquest, one by battle, one by 
economic dominance, Fairwinds involves a different kind of strategy that 
entails a balance of cooperation and competition. Also, whereas in 
Schemaverse you get "free money" to build your space ship just by 
joining the game, a fundamental feature of Fairwinds is wealth creation 
... you have to make your own success out of nothing.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Deletion Challenge

2015-12-14 Thread Benjamin Smith
On Saturday, December 05, 2015 11:08:05 AM Berend Tober wrote:
> WITH max_click AS (
>SELECT
>  cash_journal.fairian_id,
>  max(cash_journal.click) AS click
>  FROM cash_journal
>  GROUP BY cash_journal.fairian_id
>  )
>delete from cash_journal j
>  using max_click b
>  where j.fairian_id = b.fairian_id
>  and j.click< b.click;
> 
> WITH max_journal_id AS (
>SELECT
>  cash_journal.fairian_id,
>  cash_journal.click,
>  max(cash_journal.cash_journal_id) AS cash_journal_id
>  FROM cash_journal
>  GROUP BY cash_journal.fairian_id, cash_journal.click
>  )
>delete from cash_journal j
>   using max_journal_id b
>   where j.fairian_id= b.fairian_id
>   and j.click   = b.click
>   and j.cash_journal_id < b.cash_journal_id;

Although I couldn't be sure if this would provide atomicity, I'd merge these 
into one query like: 

WITH max_click AS (
   SELECT
 cash_journal.fairian_id,
 max(cash_journal.click) AS click
 FROM cash_journal
 GROUP BY cash_journal.fairian_id
 ), 
max_journal_id AS (
   SELECT
 cash_journal.fairian_id,
 cash_journal.click,
 max(cash_journal.cash_journal_id) AS cash_journal_id
 FROM cash_journal
 GROUP BY cash_journal.fairian_id, cash_journal.click
 ), 
delete_journal1 AS 
 (
   delete from cash_journal j
 using max_click b
 where j.fairian_id = b.fairian_id
 and j.click< b.click
returning *, 'journal1'::varchar AS source
), 
delete_journal2 AS 
   (
   delete from cash_journal j
  using max_journal_id b
  where j.fairian_id= b.fairian_id
  and j.click   = b.click
  and j.cash_journal_id < b.cash_journal_id
returning *, 'journal2'::varchar AS source
   )
-- AND THEN TO FIND OUT WHAT HAPPENED 
SELECT delete_journal1.* 
UNION ALL 
select delete_journal2.* 




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Deletion Challenge

2015-12-14 Thread Benjamin Smith
> test=> delete from cash_journal where ARRAY[click, cash_journal_id] NOT in
> (select max(ARRAY[click,cash_journal_id]) from cash_journal group by
> fairian_id); DELETE 7


For what it's worth, we've run into *severe* performance issues using in() if 
there are a large number of values in conjunction with a complex query. (EG: 
more than 10,000) Using a with() prefix table and joining against that doesn't 
seem to carry anything like a similar performance penalty.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Deletion Challenge

2015-12-14 Thread Jim Nasby

On 12/9/15 7:59 PM, Berend Tober wrote:

The issue is that I'd like the application (that is, the data base and
its stored procedures) to be robust enough to be a "long-running"
application, i.e. one that doesn't suffer gradual performance
degradation as time and the accumulated data increase.


In my experience, no such thing exists. This is one of the things that 
makes database development very different than other forms of 
programming. Once you write a program, it's basically always going to 
perform the same. Database performance slowly changes over time, not 
only due to different amounts of data, but also different *composition* 
of data stored.


Of course, it is wise to avoid things that will obviously hurt future 
performance, so it's good that you're thinking about this. But don't 
expect something you can "set and forget". :)



This question was sort of addressed at the "query tuning" aspect, and
I'm confident that partitioning would help.


Keep in mind that partitioning isn't a magic bullet either, though in 
this case I agree it would help. Sometimes something as simple as having 
"active" and "history" partitions is enough.



This project is a game, btw, described at


You might be interested in https://schemaverse.com/
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Deletion Challenge

2015-12-10 Thread Berend Tober

Steve Crawford wrote:

The two general solutions are the "keep the last one" proposed by Adrian
"keep the last N" that I sent.

But it might be worth stepping back a bit. You said you are having
performance problems that you feel would be improved by removing only a
million rows which doesn't sound like that much to me. It's less than
half of what I *add* to just one of my tables every week and my database
is dwarfed by those of many of the participants on this list.

This suggests that there may be other issues such as tuning, indexing or
query optimization at play. Depending on your requirements, partitioning
might be useful. It wouldn't be last N but could easily be done to
partition by date-ranges which makes archiving and purging a low-cost
operation.

You might want to expand a bit on the core issue you are trying to solve.




I really appreciate the deep-dive.

I'm quite sure the performance issue is mostly a hardware limitation at 
this point. The application is in a developmental phase and running for 
test purposes on grossly inadequate hardware for production purposes ... 
think home laptop computer in one case.


The issue is that I'd like the application (that is, the data base and 
its stored procedures) to be robust enough to be a "long-running" 
application, i.e. one that doesn't suffer gradual performance 
degradation as time and the accumulated data increase. For the cash and 
similar journals, I'd like to retain some amount of history so that 
players can review "recent" transactions so as to understand and verify 
how the current balance was arrived at, but at some point let old 
transactions age-off and be deleted.


This question was sort of addressed at the "query tuning" aspect, and 
I'm confident that partitioning would help. But since this is just a 
game, retention of a full and auditable history is not really required: 
I have a lot of flexibility to determine what to keep and in fact am not 
exactly sure how much to keep. I know I need at least one row in order 
to retain the current balance, but I'm thinking something on the order 
of scores or hundreds, maybe a thousand transactions per player in each 
of several similar journals retained at any point in time would be 
sufficient.


This project is a game, btw, described at

https://github.com/bmtober/fairwinds

for those interested in the backstory.

I am eager to get some real-world experience with multiple players 
actually using the application and providing feedback, which is 
gradually happening by means of presentations I have and am scheduled to 
make at local user groups. Eventually I want to raise some money to rent 
some virtual private server space and host it on a publicly-available 
site when obvious scalability issues like this are mitigated.





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Deletion Challenge

2015-12-09 Thread Berend Tober

Steve Crawford wrote:

If I understand correctly the value of "click" always advances and within a 
"click" the
"cash_journal_id" always advances - not necessarily by single steps so within a 
fairian_id, ordering
by "click" plus "cash_journal_id" would return the records in order from which 
you want the most
recent 5 for each farian_id.

Typing without testing and ignoring performance optimizations, something along 
the lines of the
following should work and covers the "last 5" issue as well.

with stuff_to_delete as (
select farian_id, click, cash_journal_id,
rank() over (partition by farian_id order by (click, cash_journal_id) desc) as 
howold)
from cash_journal)
delete from cash_journal
using stuff_to_delete
where
cash_journal.farian_id = stuff_to_delete.farian_id
and cash_journal.click = stuff_to_delete.click
and cash_journal.cash_journal_id = stuff_to_delete.cash_journal_id
and stuff_to_delete.howold > 5;



Assessing without testing, I like that. Thanks!

Although the above is not the exactly the form I was using, an earlier iteration of a related 
problem employed window functions. But as the data set grew performance suffered, so if deletes were 
not done on a regular, continuing basis in order to keep the historical data set approximately 
"small", the process execution time using a windowing scheme eventually exceeded the extent of my 
patience.


That "non-scalable" situation is actually what motivated the deliberate de-normalization (of 
retaining the "running balance" in a separate column) and the desire to delete old data. The 
original implementation calculated the running balance on-the-fly, employing windowing per 
fairian_id, and those tallies of the net balance entailed increasingly lengthy execution times as 
the number of rows increased, hence I was motivated to retain only a relatively constant-sized 
per-farian history, and I dismissed the use of windowing for the delete problem since it was so 
problematic for the running-balance-without-delete problem.


Thanks for knocking some sense into me!




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Deletion Challenge

2015-12-09 Thread Berend Tober

Adrian Klaver wrote:

On 12/05/2015 08:08 AM, Berend Tober wrote:

/*

Deletion Challenge

I want to delete all but the most recent transaction, per person, from a
table that records a transaction history because at some point the
transaction history grows large enough to adversely effect performance,
and also becomes less relevant for retention.

...



test=> delete from cash_journal where ARRAY[click, cash_journal_id] NOT in 
(select max(ARRAY[click,cash_journal_id]) from cash_journal group by fairian_id);
DELETE 7

test=> SELECT * FROM cash_journal order by fairian_id, click, cash_journal_id;
  click | cash_journal_id | fairian_id | debit | credit | balance |   
description
---+-++---++-+--
412 |   1 |  7 | 5 ||  14 | Sold food 
quantity 7 units.
 37 |   7 |  8 | 8 ||   8 | Ratified 
contract f1abd670358e03
 37 |   9 |  9 | 7 ||   7 | Ratified 
contract 1574bddb75c78a
 36 |  14 | 18 | 0 |  0 |   0 | initial 
cash balance
413 |   1 | 25 |   |995 |   0 | Redeemed 
bond 7719a1c782a1ba
(5 rows)



Nice.

The idea of a NOT IN query had occurred to me briefly, but I failed to pursue it because at some 
point in the distant past I had gained the impression that NOT IN queries were not computationally 
efficient. During one round of testing I had like a million rows. I'll have to run some EXPLAIN 
query testing with a larger data sample for comparison. Thanks!




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Deletion Challenge

2015-12-09 Thread David G. Johnston
On Wed, Dec 9, 2015 at 1:31 PM, Adrian Klaver 
wrote:

> On 12/09/2015 12:24 AM, Berend Tober wrote:
>
>> Adrian Klaver wrote:
>>
>>> On 12/05/2015 08:08 AM, Berend Tober wrote:
>>>
 /*

 Deletion Challenge

 I want to delete all but the most recent transaction, per person, from a
 table that records a transaction history because at some point the
 transaction history grows large enough to adversely effect performance,
 and also becomes less relevant for retention.

 ...


>>> test=> delete from cash_journal where ARRAY[click, cash_journal_id]
>>> NOT in (select max(ARRAY[click,cash_journal_id]) from cash_journal
>>> group by fairian_id);
>>> DELETE 7
>>>
>>> test=> SELECT * FROM cash_journal order by fairian_id, click,
>>> cash_journal_id;
>>>   click | cash_journal_id | fairian_id | debit | credit | balance
>>> |   description
>>>
>>> ---+-++---++-+--
>>>
>>> 412 |   1 |  7 | 5 ||  14 |
>>> Sold food quantity 7 units.
>>>  37 |   7 |  8 | 8 ||   8 |
>>> Ratified contract f1abd670358e03
>>>  37 |   9 |  9 | 7 ||   7 |
>>> Ratified contract 1574bddb75c78a
>>>  36 |  14 | 18 | 0 |  0 |   0 |
>>> initial cash balance
>>> 413 |   1 | 25 |   |995 |   0 |
>>> Redeemed bond 7719a1c782a1ba
>>> (5 rows)
>>>
>>>
>> Nice.
>>
>> The idea of a NOT IN query had occurred to me briefly, but I failed to
>> pursue it because at some point in the distant past I had gained the
>> impression that NOT IN queries were not computationally efficient.
>> During one round of testing I had like a million rows. I'll have to run
>> some EXPLAIN query testing with a larger data sample for comparison.
>> Thanks!
>>
>
> Plan B:
>
> WITH d AS
> (SELECT * FROM
> cash_journal
> LEFT JOIN
> (SELECT
> MAX(ARRAY[click,cash_journal_id]) AS mx
> FROM
> cash_journal
> GROUP BY
> fairian_id)
> AS
> mxa
> ON
> mxa.mx=ARRAY[click, cash_journal_id]
> WHERE
> mx IS NULL)
> DELETE FROM
> cash_journal
> USING
> d
> WHERE
> d.click = cash_journal.click
> AND
> d.cash_journal_id = cash_journal.cash_journal_id;
>
>
​Couldn't the LEFT JOIN relation in the CTE be better written using "SELECT
DISTINCT ON (click, cash_journal_id) click, cash_journal_id [...] ORDER BY
click DESC, cash_journal_id" or something similar?  It doesn't seem like
you should need to introduce an array and an aggregate here.

​It does have the negative property of only providing a single row; which
excludes using it for the "last 5" part but I suspect it will be
considerably faster for the single version.

David J.


Re: [GENERAL] Deletion Challenge

2015-12-09 Thread Steve Crawford
The two general solutions are the "keep the last one" proposed by Adrian
"keep the last N" that I sent.

But it might be worth stepping back a bit. You said you are having
performance problems that you feel would be improved by removing only a
million rows which doesn't sound like that much to me. It's less than half
of what I *add* to just one of my tables every week and my database is
dwarfed by those of many of the participants on this list.

This suggests that there may be other issues such as tuning, indexing or
query optimization at play. Depending on your requirements, partitioning
might be useful. It wouldn't be last N but could easily be done to
partition by date-ranges which makes archiving and purging a low-cost
operation.

You might want to expand a bit on the core issue you are trying to solve.

Cheers,
Steve


On Wed, Dec 9, 2015 at 12:43 PM, David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Wed, Dec 9, 2015 at 1:31 PM, Adrian Klaver 
> wrote:
>
>> On 12/09/2015 12:24 AM, Berend Tober wrote:
>>
>>> Adrian Klaver wrote:
>>>
 On 12/05/2015 08:08 AM, Berend Tober wrote:

> /*
>
> Deletion Challenge
>
> I want to delete all but the most recent transaction, per person, from
> a
> table that records a transaction history because at some point the
> transaction history grows large enough to adversely effect performance,
> and also becomes less relevant for retention.
>
> ...
>
>
 test=> delete from cash_journal where ARRAY[click, cash_journal_id]
 NOT in (select max(ARRAY[click,cash_journal_id]) from cash_journal
 group by fairian_id);
 DELETE 7

 test=> SELECT * FROM cash_journal order by fairian_id, click,
 cash_journal_id;
   click | cash_journal_id | fairian_id | debit | credit | balance
 |   description

 ---+-++---++-+--

 412 |   1 |  7 | 5 ||  14 |
 Sold food quantity 7 units.
  37 |   7 |  8 | 8 ||   8 |
 Ratified contract f1abd670358e03
  37 |   9 |  9 | 7 ||   7 |
 Ratified contract 1574bddb75c78a
  36 |  14 | 18 | 0 |  0 |   0 |
 initial cash balance
 413 |   1 | 25 |   |995 |   0 |
 Redeemed bond 7719a1c782a1ba
 (5 rows)


>>> Nice.
>>>
>>> The idea of a NOT IN query had occurred to me briefly, but I failed to
>>> pursue it because at some point in the distant past I had gained the
>>> impression that NOT IN queries were not computationally efficient.
>>> During one round of testing I had like a million rows. I'll have to run
>>> some EXPLAIN query testing with a larger data sample for comparison.
>>> Thanks!
>>>
>>
>> Plan B:
>>
>> WITH d AS
>> (SELECT * FROM
>> cash_journal
>> LEFT JOIN
>> (SELECT
>> MAX(ARRAY[click,cash_journal_id]) AS mx
>> FROM
>> cash_journal
>> GROUP BY
>> fairian_id)
>> AS
>> mxa
>> ON
>> mxa.mx=ARRAY[click, cash_journal_id]
>> WHERE
>> mx IS NULL)
>> DELETE FROM
>> cash_journal
>> USING
>> d
>> WHERE
>> d.click = cash_journal.click
>> AND
>> d.cash_journal_id = cash_journal.cash_journal_id;
>>
>>
> ​Couldn't the LEFT JOIN relation in the CTE be better written using
> "SELECT DISTINCT ON (click, cash_journal_id) click, cash_journal_id [...]
> ORDER BY click DESC, cash_journal_id" or something similar?  It doesn't
> seem like you should need to introduce an array and an aggregate here.
>
> ​It does have the negative property of only providing a single row; which
> excludes using it for the "last 5" part but I suspect it will be
> considerably faster for the single version.
>
> David J.
>


Re: [GENERAL] Deletion Challenge

2015-12-09 Thread Adrian Klaver

On 12/09/2015 12:24 AM, Berend Tober wrote:

Adrian Klaver wrote:

On 12/05/2015 08:08 AM, Berend Tober wrote:

/*

Deletion Challenge

I want to delete all but the most recent transaction, per person, from a
table that records a transaction history because at some point the
transaction history grows large enough to adversely effect performance,
and also becomes less relevant for retention.

...



test=> delete from cash_journal where ARRAY[click, cash_journal_id]
NOT in (select max(ARRAY[click,cash_journal_id]) from cash_journal
group by fairian_id);
DELETE 7

test=> SELECT * FROM cash_journal order by fairian_id, click,
cash_journal_id;
  click | cash_journal_id | fairian_id | debit | credit | balance
|   description
---+-++---++-+--

412 |   1 |  7 | 5 ||  14 |
Sold food quantity 7 units.
 37 |   7 |  8 | 8 ||   8 |
Ratified contract f1abd670358e03
 37 |   9 |  9 | 7 ||   7 |
Ratified contract 1574bddb75c78a
 36 |  14 | 18 | 0 |  0 |   0 |
initial cash balance
413 |   1 | 25 |   |995 |   0 |
Redeemed bond 7719a1c782a1ba
(5 rows)



Nice.

The idea of a NOT IN query had occurred to me briefly, but I failed to
pursue it because at some point in the distant past I had gained the
impression that NOT IN queries were not computationally efficient.
During one round of testing I had like a million rows. I'll have to run
some EXPLAIN query testing with a larger data sample for comparison.
Thanks!


Plan B:

WITH d AS
(SELECT * FROM
cash_journal
LEFT JOIN
(SELECT
MAX(ARRAY[click,cash_journal_id]) AS mx
FROM
cash_journal
GROUP BY
fairian_id)
AS
mxa
ON
mxa.mx=ARRAY[click, cash_journal_id]
WHERE
mx IS NULL)
DELETE FROM
cash_journal
USING
d
WHERE
d.click = cash_journal.click
AND
d.cash_journal_id = cash_journal.cash_journal_id;


--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Deletion Challenge

2015-12-08 Thread Berend Tober

/*

Deletion Challenge

I want to delete all but the most recent transaction, per person, from a
table that records a transaction history because at some point the
transaction history grows large enough to adversely effect performance,
and also becomes less relevant for retention.

I have devised a way to accomplish this, but it is a 'two-stage'
approach: that is, it requires two delete statements. I would like to
know if there is a way to do it in a single statement.

Bonus challenge: Same question, except preserving the most recent N, for
N > 1, rows for each person so that a short history is retained after
the deletion.

I have included below an annotated test case and my current solution for
the N = 1 case.

*/

DROP TABLE IF EXISTS cash_journal;


CREATE TABLE cash_journal (
click bigint NOT NULL,
cash_journal_id bigint NOT NULL,
fairian_id bigint NOT NULL,
debit double precision,
credit double precision,
balance real DEFAULT 0,
description text
);

COMMENT ON COLUMN cash_journal.clickIS 'Time of transaction.';
COMMENT ON COLUMN cash_journal.cash_journal_id  IS 'Sequence of transaction 
within current click.';
COMMENT ON COLUMN cash_journal.fairian_id   IS 'Fairian account effected.';
COMMENT ON COLUMN cash_journal.debitIS 'Account balance increase 
amount.';
COMMENT ON COLUMN cash_journal.credit   IS 'Account balance decrease 
amount.';
COMMENT ON COLUMN cash_journal.balance  IS 'Account balance, per 
Fairian running total.';
COMMENT ON COLUMN cash_journal.description  IS 'Transaction description.';

/*

Below is some sample data, listed in the click/sequence order that the
data would actually be entered. That is, the 'click' column represents
advancing time, and within each click, transactions are sequenced by the
'cash_journal_id' column. Note there are some missing cash_journal_id
sequence numbers. This is an artifact of having presented here only
an illustrative sample. Generally, within each click, the sequence
would start at one and increment uniformly by one for each new row
in the same click, and then reset to one for the next click. The
missing increments in the sample data should not make any difference
in the solution.

The 'balance' column is a per-player running total, which is a
deliberate denormalization. It is calculated in a before insert trigger
by starting with the per-player previous balance, and then adding
the new row debit, if any, and subtracting the new row credit, if any.

Note, not all Fairians will have a transaction in every click, but any
number of Fairians may have multiple transactions in any click.

*/

copy cash_journal 
(click,cash_journal_id,fairian_id,debit,credit,balance,description) from stdin;
36  3   7   0   0   0   Initial cash balance
36  4   8   0   0   0   Initial cash balance
36  5   9   0   0   0   Initial cash balance
36  14  18  0   0   0   initial cash balance
37  5   7   9   \N  9   Ratified contract fa35e192121eab
37  7   8   8   \N  8   Ratified contract f1abd670358e03
37  9   9   7   \N  7   Ratified contract 1574bddb75c78a
411 1   25  0   0   0   Initial cash balance
411 2   25  1000\N  1000Issued bond 7719a1c782a1ba
412 1   7   5   \N  14  Sold food quantity 7 units.
412 2   25  \N  5   995 Bought food quantity 7 units.
413 1   25  \N  995 0   Redeemed bond 7719a1c782a1ba
\.


SELECT * FROM cash_journal order by fairian_id, click, cash_journal_id;

/*

The sample starting data is shown here in order by Fairian so that it is
perhaps easier to see what is happening for each player. Note that the
result of the deletion should be the last row for each player.

 click | cash_journal_id | fairian_id | debit | credit | balance |   
description
---+-++---++-+--
36 |   3 |  7 | 0 |  0 |   0 | Initial cash 
balance
37 |   5 |  7 | 9 ||   9 | Ratified 
contract fa35e192121eab
   412 |   1 |  7 | 5 ||  14 | Sold food 
quantity 7 units.
36 |   4 |  8 | 0 |  0 |   0 | Initial cash 
balance
37 |   7 |  8 | 8 ||   8 | Ratified 
contract f1abd670358e03
36 |   5 |  9 | 0 |  0 |   0 | Initial cash 
balance
37 |   9 |  9 | 7 ||   7 | Ratified 
contract 1574bddb75c78a
36 |  14 | 18 | 0 |  0 |   0 | initial cash 
balance
   411 |   1 | 25 | 0 |  0 |   0 | Initial cash 
balance
   411 |

Re: [GENERAL] Deletion Challenge

2015-12-08 Thread Steve Crawford
If I understand correctly the value of "click" always advances and within a
"click" the "cash_journal_id" always advances - not necessarily by single
steps so within a fairian_id, ordering by "click" plus "cash_journal_id"
would return the records in order from which you want the most recent 5 for
each farian_id.

Typing without testing and ignoring performance optimizations, something
along the lines of the following should work and covers the "last 5" issue
as well.

with stuff_to_delete as (
select farian_id, click, cash_journal_id,
rank() over (partition by farian_id order by (click, cash_journal_id) desc)
as howold)
from cash_journal)
delete from cash_journal
using stuff_to_delete
where
cash_journal.farian_id = stuff_to_delete.farian_id
and cash_journal.click = stuff_to_delete.click
and cash_journal.cash_journal_id = stuff_to_delete.cash_journal_id
and stuff_to_delete.howold > 5;

Cheers,
Steve


On Sat, Dec 5, 2015 at 8:08 AM, Berend Tober  wrote:

> /*
>
> Deletion Challenge
>
> I want to delete all but the most recent transaction, per person, from a
> table that records a transaction history because at some point the
> transaction history grows large enough to adversely effect performance,
> and also becomes less relevant for retention.
>
> I have devised a way to accomplish this, but it is a 'two-stage'
> approach: that is, it requires two delete statements. I would like to
> know if there is a way to do it in a single statement.
>
> Bonus challenge: Same question, except preserving the most recent N, for
> N > 1, rows for each person so that a short history is retained after
> the deletion.
>
> I have included below an annotated test case and my current solution for
> the N = 1 case.
>
> */
>
> DROP TABLE IF EXISTS cash_journal;
>
>
> CREATE TABLE cash_journal (
> click bigint NOT NULL,
> cash_journal_id bigint NOT NULL,
> fairian_id bigint NOT NULL,
> debit double precision,
> credit double precision,
> balance real DEFAULT 0,
> description text
> );
>
> COMMENT ON COLUMN cash_journal.clickIS 'Time of transaction.';
> COMMENT ON COLUMN cash_journal.cash_journal_id  IS 'Sequence of
> transaction within current click.';
> COMMENT ON COLUMN cash_journal.fairian_id   IS 'Fairian account
> effected.';
> COMMENT ON COLUMN cash_journal.debitIS 'Account balance
> increase amount.';
> COMMENT ON COLUMN cash_journal.credit   IS 'Account balance
> decrease amount.';
> COMMENT ON COLUMN cash_journal.balance  IS 'Account balance, per
> Fairian running total.';
> COMMENT ON COLUMN cash_journal.description  IS 'Transaction
> description.';
>
> /*
>
> Below is some sample data, listed in the click/sequence order that the
> data would actually be entered. That is, the 'click' column represents
> advancing time, and within each click, transactions are sequenced by the
> 'cash_journal_id' column. Note there are some missing cash_journal_id
> sequence numbers. This is an artifact of having presented here only
> an illustrative sample. Generally, within each click, the sequence
> would start at one and increment uniformly by one for each new row
> in the same click, and then reset to one for the next click. The
> missing increments in the sample data should not make any difference
> in the solution.
>
> The 'balance' column is a per-player running total, which is a
> deliberate denormalization. It is calculated in a before insert trigger
> by starting with the per-player previous balance, and then adding
> the new row debit, if any, and subtracting the new row credit, if any.
>
> Note, not all Fairians will have a transaction in every click, but any
> number of Fairians may have multiple transactions in any click.
>
> */
>
> copy cash_journal
> (click,cash_journal_id,fairian_id,debit,credit,balance,description) from
> stdin;
> 36  3   7   0   0   0   Initial cash balance
> 36  4   8   0   0   0   Initial cash balance
> 36  5   9   0   0   0   Initial cash balance
> 36  14  18  0   0   0   initial cash balance
> 37  5   7   9   \N  9   Ratified contract
> fa35e192121eab
> 37  7   8   8   \N  8   Ratified contract
> f1abd670358e03
> 37  9   9   7   \N  7   Ratified contract
> 1574bddb75c78a
> 411 1   25  0   0   0   Initial cash balance
> 411 2   25  1000\N  1000Issued bond 7719a1c782a1ba
> 412 1   7   5   \N  14  Sold food quantity 7 units.
> 412 2   25  \N  5   995 Bought food quantity 7
> units.
> 413 1   25  \N  995 0   Redeemed bond
> 7719a1c782a1ba
> \.
>
>
> SELECT * FROM cash_journal order by fairian_id, click, cash_journal_id;
>
> /*
>
> The sample starting data is shown here in order by Fairian so that it is
> perhaps easier to 

Re: [GENERAL] Deletion Challenge

2015-12-08 Thread Adrian Klaver
On 12/05/2015 08:08 AM, Berend Tober wrote:
> /*
> 
> Deletion Challenge
> 
> I want to delete all but the most recent transaction, per person, from a
> table that records a transaction history because at some point the
> transaction history grows large enough to adversely effect performance,
> and also becomes less relevant for retention.
> 
> I have devised a way to accomplish this, but it is a 'two-stage'
> approach: that is, it requires two delete statements. I would like to
> know if there is a way to do it in a single statement.
> 
> Bonus challenge: Same question, except preserving the most recent N, for
> N > 1, rows for each person so that a short history is retained after
> the deletion.
> 
> I have included below an annotated test case and my current solution for
> the N = 1 case.
> 
> */
> 
> DROP TABLE IF EXISTS cash_journal;
> 
> 
> CREATE TABLE cash_journal (
>  click bigint NOT NULL,
>  cash_journal_id bigint NOT NULL,
>  fairian_id bigint NOT NULL,
>  debit double precision,
>  credit double precision,
>  balance real DEFAULT 0,
>  description text
> );
> 
> COMMENT ON COLUMN cash_journal.clickIS 'Time of transaction.';
> COMMENT ON COLUMN cash_journal.cash_journal_idIS 'Sequence of 
> transaction within current click.';
> COMMENT ON COLUMN cash_journal.fairian_idIS 'Fairian account 
> effected.';
> COMMENT ON COLUMN cash_journal.debitIS 'Account balance increase 
> amount.';
> COMMENT ON COLUMN cash_journal.creditIS 'Account balance 
> decrease amount.';
> COMMENT ON COLUMN cash_journal.balanceIS 'Account balance, per 
> Fairian running total.';
> COMMENT ON COLUMN cash_journal.descriptionIS 'Transaction 
> description.';
> 
> /*
> 
> Below is some sample data, listed in the click/sequence order that the
> data would actually be entered. That is, the 'click' column represents
> advancing time, and within each click, transactions are sequenced by the
> 'cash_journal_id' column. Note there are some missing cash_journal_id
> sequence numbers. This is an artifact of having presented here only
> an illustrative sample. Generally, within each click, the sequence
> would start at one and increment uniformly by one for each new row
> in the same click, and then reset to one for the next click. The
> missing increments in the sample data should not make any difference
> in the solution.
> 
> The 'balance' column is a per-player running total, which is a
> deliberate denormalization. It is calculated in a before insert trigger
> by starting with the per-player previous balance, and then adding
> the new row debit, if any, and subtracting the new row credit, if any.
> 
> Note, not all Fairians will have a transaction in every click, but any
> number of Fairians may have multiple transactions in any click.
> 
> */
> 
> copy cash_journal 
> (click,cash_journal_id,fairian_id,debit,credit,balance,description) from 
> stdin;
> 3637000Initial cash balance
> 3648000Initial cash balance
> 3659000Initial cash balance
> 361418000initial cash balance
> 37579\N9Ratified contract fa35e192121eab
> 37788\N8Ratified contract f1abd670358e03
> 37997\N7Ratified contract 1574bddb75c78a
> 411125000Initial cash balance
> 4112251000\N1000Issued bond 7719a1c782a1ba
> 412175\N14Sold food quantity 7 units.
> 412225\N5995Bought food quantity 7 units.
> 413125\N9950Redeemed bond 7719a1c782a1ba
> \.
> 
> 
> SELECT * FROM cash_journal order by fairian_id, click, cash_journal_id;
> 
> /*
> 
> The sample starting data is shown here in order by Fairian so that it is
> perhaps easier to see what is happening for each player. Note that the
> result of the deletion should be the last row for each player.
> 
>   click | cash_journal_id | fairian_id | debit | credit | balance 
> |   description
> ---+-++---++-+--
>  
> 
>  36 |   3 |  7 | 0 |  0 |   0 | 
> Initial cash balance
>  37 |   5 |  7 | 9 ||   9 | 
> Ratified contract fa35e192121eab
> 412 |   1 |  7 | 5 ||  14 | Sold 
> food quantity 7 units.
>  36 |   4 |  8 | 0 |  0 |   0 | 
> Initial cash balance
>  37 |   7 |  8 | 8 ||   8 | 
> Ratified contract f1abd670358e03
>  36 |   5 |  9 | 0 |  0 |   0 | 
> Initial cash balance
>  37 |   9 |  9 | 7 ||   7 | 
> Ratified contract 1574bddb75c78a
>  36 |  14 | 18 | 0 |  0 |   0 | 
> initial cash