On Wed, Dec 9, 2015 at 1:31 PM, Adrian Klaver <adrian.kla...@aklaver.com>
wrote:

> On 12/09/2015 12:24 AM, Berend Tober wrote:
>
>> Adrian Klaver wrote:
>>
>>> On 12/05/2015 08:08 AM, Berend Tober wrote:
>>>
>>>> /*
>>>>
>>>> Deletion Challenge
>>>>
>>>> I want to delete all but the most recent transaction, per person, from a
>>>> table that records a transaction history because at some point the
>>>> transaction history grows large enough to adversely effect performance,
>>>> and also becomes less relevant for retention.
>>>>
>>>> ...
>>>>
>>>>
>>> test=> delete from cash_journal where ARRAY[click, cash_journal_id]
>>> NOT in (select max(ARRAY[click,cash_journal_id]) from cash_journal
>>> group by fairian_id);
>>> DELETE 7
>>>
>>> test=> SELECT * FROM cash_journal order by fairian_id, click,
>>> cash_journal_id;
>>>   click | cash_journal_id | fairian_id | debit | credit | balance
>>> |           description
>>>
>>> -------+-----------------+------------+-------+--------+---------+----------------------------------
>>>
>>>     412 |               1 |          7 |     5 |        |      14 |
>>> Sold food quantity 7 units.
>>>      37 |               7 |          8 |     8 |        |       8 |
>>> Ratified contract f1abd670358e03
>>>      37 |               9 |          9 |     7 |        |       7 |
>>> Ratified contract 1574bddb75c78a
>>>      36 |              14 |         18 |     0 |      0 |       0 |
>>> initial cash balance
>>>     413 |               1 |         25 |       |    995 |       0 |
>>> Redeemed bond 7719a1c782a1ba
>>> (5 rows)
>>>
>>>
>> Nice.
>>
>> The idea of a NOT IN query had occurred to me briefly, but I failed to
>> pursue it because at some point in the distant past I had gained the
>> impression that NOT IN queries were not computationally efficient.
>> During one round of testing I had like a million rows. I'll have to run
>> some EXPLAIN query testing with a larger data sample for comparison.
>> Thanks!
>>
>
> Plan B:
>
> WITH d AS
>     (SELECT * FROM
>         cash_journal
>     LEFT JOIN
>         (SELECT
>             MAX(ARRAY[click,cash_journal_id]) AS mx
>         FROM
>             cash_journal
>         GROUP BY
>             fairian_id)
>         AS
>             mxa
>     ON
>         mxa.mx=ARRAY[click, cash_journal_id]
>     WHERE
>         mx IS NULL)
> DELETE FROM
>     cash_journal
> USING
>     d
> WHERE
>     d.click = cash_journal.click
> AND
>     d.cash_journal_id = cash_journal.cash_journal_id;
>
>
​Couldn't the LEFT JOIN relation in the CTE be better written using "SELECT
DISTINCT ON (click, cash_journal_id) click, cash_journal_id [...] ORDER BY
click DESC, cash_journal_id" or something similar?  It doesn't seem like
you should need to introduce an array and an aggregate here.

​It does have the negative property of only providing a single row; which
excludes using it for the "last 5" part but I suspect it will be
considerably faster for the single version.

David J.

Reply via email to