I am afraid, this looks even uglier then your second solution, but should work, and be quicker...

-- You need this to avoid having to rescan the whole table for each customerid every time and resort the results
create index customer_txstamp_idx on bank_account (customer_id, ts);


select ba.* from bank_account ba where
transaction_id = (select transaction_id from bank_account where customer_id = ba.customer_id order by customer_id desc, ts desc limit 1);


Now, note that, if you have lots of different customers in that table, it will still take a while to fetch them all (although, it should still be a lot quicker then half an hour) - in that case, you may consider either getting them one-by-one (by adding ... and customer_id=? to the above query) or using cursors...

I hope, it helps...

Dima


Erik G. Burrows wrote:


It seems to me this is a simple problem, but the solution eludes me.

I have a table:

bank_account (
 transaction_id int not null serial,
 customer_id int not null references customer(id),
 ts timestamp not null default now(),
 amount float not null,
 balance float not null,
 primary key(transaction_id)
)

I need to get the most recent transaction for each customer. I need only
the transaction ID, but the entire row would be best.



I have two solutions, both of which are too slow for use in my
interactive web-based interface:

Solution1: Outer left self join:
SELECT
 ba1.*
FROM
 bank_account ba1
 LEFT OUTER JOIN bank_account ba2
   ON ba1.customer_id = ba2.customer_id AND ba1.ts < ba2.ts
WHERE
 ba2.ts IS NULL;

This query works great on tables of less than a few thousand rows. For
my 300k row table, it takes several hours.


Solution2: max-concat trick SELECT split_part(max( extract(EPOCH from ts)::VARCHAR || ' ' || transaction_id::VARCHAR), ' ', 2)::INT FROM bank_account GROUP BY customer_id

This is an ugly and obviously inefficient solution, but it does the job
in about 1/2 hour. Still too long though.



I've been working on this problem for days, and consulting friends. No
elegant, fast solution is presenting itself. As I said, I feel I'm not
seeing the obvious solution in front of my face.

In the mean-time I can use this query to do the job on a per-customer
basis:

select
 *
from
 bank_account
where
 id = <the customer's ID>
 and ts = (select max(ts) from bank_account ba2 where ba2.customer_id =
bank_account.customer_id);


However, doing this for all 40,000 customers is not workable as a manual process. My last resort is to do it this way to pre-generate the report, but I'd far rather do it real-time.

Help! My brain hurts!






---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Reply via email to