Oops, I forgot to include the test self-join query I'm using. It is simply:
SELECT se1.stop_time AS curr, se2.stop_time AS prev FROM stop_event se1 JOIN stop_event se2 ON se1.previous_stop_event = se2.id; On Thu, Aug 21, 2014 at 11:19 AM, Eli Naeher <enae...@gmail.com> wrote: > Upping work_mem did roughly halve the time, but after thinking about > Shaun's suggestion, I figured it's better to calculate this stuff once and > then store it. So here is how the table looks now: > > Table "public.stop_event" > Column | Type | > Modifiers > > ---------------------+-----------------------------+--------------------------------------------------------- > stop_time | timestamp without time zone | not null > stop | integer | not null > bus | integer | not null > direction | integer | not null > route | integer | not null > id | bigint | not null default > nextval('stop_event_id_seq'::regclass) > previous_stop_event | bigint | > Indexes: > "stop_event_pkey" PRIMARY KEY, btree (id) > "stop_event_previous_stop_event_idx" btree (previous_stop_event) > Foreign-key constraints: > "stop_event_direction_id_fkey" FOREIGN KEY (direction) REFERENCES > direction(id) > "stop_event_previous_stop_event_fkey" FOREIGN KEY > (previous_stop_event) REFERENCES stop_event(id) > "stop_event_route_fkey" FOREIGN KEY (route) REFERENCES route(id) > "stop_event_stop" FOREIGN KEY (stop) REFERENCES stop(id) > Referenced by: > TABLE "stop_event" CONSTRAINT "stop_event_previous_stop_event_fkey" > FOREIGN KEY (previous_stop_event) REFERENCES stop_event(id) > > previous_stop_event simply references the previous (by stop_time) stop > event for the combination of stop, route, and direction. I have > successfully populated this column for my existing test data. However, when > I try to do a test self-join using it, Postgres does two seq scans across > the whole table, even though I have indexes on both id and > previous_stop_event: http://explain.depesz.com/s/ctck. Any idea why those > indexes are not being used? > > Thank you again, > -Eli > > On Thu, Aug 21, 2014 at 9:05 AM, Shaun Thomas <stho...@optionshouse.com> >> wrote: >> >>> On 08/21/2014 08:29 AM, Eli Naeher wrote: >>> >>> With around 1.2 million rows, this takes 20 seconds to run. 1.2 million >>>> rows is only about a week's worth of data, so I'd like to figure out a >>>> way to make this faster. >>>> >>> >>> Well, you'll probably be able to reduce the run time a bit, but even >>> with really good hardware and all in-memory processing, you're not going to >>> see significant run-time improvements with that many rows. This is one of >>> the reasons reporting-specific structures, such as fact tables, were >>> designed to address. >>> >>> Repeatedly processing the same week/month/year aggregate worth of >>> several million rows will just increase linearly with each iteration as >>> data size increases. You need to maintain up-to-date aggregates on the >>> metrics you actually want to measure, so you're only reading the few >>> hundred rows you introduce every update period. You can retrieve those kind >>> of results in a few milliseconds. >>> >>> -- >>> Shaun Thomas >>> OptionsHouse, LLC | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 >>> 312-676-8870 >>> stho...@optionshouse.com >>> >>> ______________________________________________ >>> >>> See http://www.peak6.com/email_disclaimer/ for terms and conditions >>> related to this email >>> >> >> >