Upping work_mem did roughly halve the time, but after thinking about Shaun's suggestion, I figured it's better to calculate this stuff once and then store it. So here is how the table looks now:
Table "public.stop_event" Column | Type | Modifiers ---------------------+-----------------------------+--------------------------------------------------------- stop_time | timestamp without time zone | not null stop | integer | not null bus | integer | not null direction | integer | not null route | integer | not null id | bigint | not null default nextval('stop_event_id_seq'::regclass) previous_stop_event | bigint | Indexes: "stop_event_pkey" PRIMARY KEY, btree (id) "stop_event_previous_stop_event_idx" btree (previous_stop_event) Foreign-key constraints: "stop_event_direction_id_fkey" FOREIGN KEY (direction) REFERENCES direction(id) "stop_event_previous_stop_event_fkey" FOREIGN KEY (previous_stop_event) REFERENCES stop_event(id) "stop_event_route_fkey" FOREIGN KEY (route) REFERENCES route(id) "stop_event_stop" FOREIGN KEY (stop) REFERENCES stop(id) Referenced by: TABLE "stop_event" CONSTRAINT "stop_event_previous_stop_event_fkey" FOREIGN KEY (previous_stop_event) REFERENCES stop_event(id) previous_stop_event simply references the previous (by stop_time) stop event for the combination of stop, route, and direction. I have successfully populated this column for my existing test data. However, when I try to do a test self-join using it, Postgres does two seq scans across the whole table, even though I have indexes on both id and previous_stop_event: http://explain.depesz.com/s/ctck. Any idea why those indexes are not being used? Thank you again, -Eli On Thu, Aug 21, 2014 at 9:05 AM, Shaun Thomas <stho...@optionshouse.com> > wrote: > >> On 08/21/2014 08:29 AM, Eli Naeher wrote: >> >> With around 1.2 million rows, this takes 20 seconds to run. 1.2 million >>> rows is only about a week's worth of data, so I'd like to figure out a >>> way to make this faster. >>> >> >> Well, you'll probably be able to reduce the run time a bit, but even with >> really good hardware and all in-memory processing, you're not going to see >> significant run-time improvements with that many rows. This is one of the >> reasons reporting-specific structures, such as fact tables, were designed >> to address. >> >> Repeatedly processing the same week/month/year aggregate worth of several >> million rows will just increase linearly with each iteration as data size >> increases. You need to maintain up-to-date aggregates on the metrics you >> actually want to measure, so you're only reading the few hundred rows you >> introduce every update period. You can retrieve those kind of results in a >> few milliseconds. >> >> -- >> Shaun Thomas >> OptionsHouse, LLC | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 >> 312-676-8870 >> stho...@optionshouse.com >> >> ______________________________________________ >> >> See http://www.peak6.com/email_disclaimer/ for terms and conditions >> related to this email >> > >