-- You need this to avoid having to rescan the whole table for each customerid every time and resort the results
create index customer_txstamp_idx on bank_account (customer_id, ts);
select ba.* from bank_account ba where
transaction_id = (select transaction_id from bank_account where customer_id = ba.customer_id order by customer_id desc, ts desc limit 1);
Now, note that, if you have lots of different customers in that table, it will still take a while to fetch them all (although, it should still be a lot quicker then half an hour) - in that case, you may consider either getting them one-by-one (by adding ... and customer_id=? to the above query) or using cursors...
I hope, it helps...
Dima
Erik G. Burrows wrote:
It seems to me this is a simple problem, but the solution eludes me.
I have a table:
bank_account ( transaction_id int not null serial, customer_id int not null references customer(id), ts timestamp not null default now(), amount float not null, balance float not null, primary key(transaction_id) )
I need to get the most recent transaction for each customer. I need only the transaction ID, but the entire row would be best.
I have two solutions, both of which are too slow for use in my interactive web-based interface:
Solution1: Outer left self join: SELECT ba1.* FROM bank_account ba1 LEFT OUTER JOIN bank_account ba2 ON ba1.customer_id = ba2.customer_id AND ba1.ts < ba2.ts WHERE ba2.ts IS NULL;
This query works great on tables of less than a few thousand rows. For my 300k row table, it takes several hours.
Solution2: max-concat trick SELECT split_part(max( extract(EPOCH from ts)::VARCHAR || ' ' || transaction_id::VARCHAR), ' ', 2)::INT FROM bank_account GROUP BY customer_id
This is an ugly and obviously inefficient solution, but it does the job in about 1/2 hour. Still too long though.
I've been working on this problem for days, and consulting friends. No elegant, fast solution is presenting itself. As I said, I feel I'm not seeing the obvious solution in front of my face.
In the mean-time I can use this query to do the job on a per-customer basis:
select * from bank_account where id = <the customer's ID> and ts = (select max(ts) from bank_account ba2 where ba2.customer_id = bank_account.customer_id);
However, doing this for all 40,000 customers is not workable as a manual process. My last resort is to do it this way to pre-generate the report, but I'd far rather do it real-time.
Help! My brain hurts!
---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faqs/FAQ.html