Oops, I forgot to include the test self-join query I'm using. It is simply:

SELECT se1.stop_time AS curr, se2.stop_time AS prev
FROM stop_event se1
JOIN stop_event se2 ON se1.previous_stop_event = se2.id;



On Thu, Aug 21, 2014 at 11:19 AM, Eli Naeher <enae...@gmail.com> wrote:

> Upping work_mem did roughly halve the time, but after thinking about
> Shaun's suggestion, I figured it's better to calculate this stuff once and
> then store it. So here is how the table looks now:
>
>                                           Table "public.stop_event"
>        Column        |            Type             |
>  Modifiers
>
> ---------------------+-----------------------------+---------------------------------------------------------
>  stop_time           | timestamp without time zone | not null
>  stop                | integer                     | not null
>  bus                 | integer                     | not null
>  direction           | integer                     | not null
>  route               | integer                     | not null
>  id                  | bigint                      | not null default
> nextval('stop_event_id_seq'::regclass)
>  previous_stop_event | bigint                      |
> Indexes:
>     "stop_event_pkey" PRIMARY KEY, btree (id)
>     "stop_event_previous_stop_event_idx" btree (previous_stop_event)
> Foreign-key constraints:
>     "stop_event_direction_id_fkey" FOREIGN KEY (direction) REFERENCES
> direction(id)
>     "stop_event_previous_stop_event_fkey" FOREIGN KEY
> (previous_stop_event) REFERENCES stop_event(id)
>     "stop_event_route_fkey" FOREIGN KEY (route) REFERENCES route(id)
>     "stop_event_stop" FOREIGN KEY (stop) REFERENCES stop(id)
> Referenced by:
>     TABLE "stop_event" CONSTRAINT "stop_event_previous_stop_event_fkey"
> FOREIGN KEY (previous_stop_event) REFERENCES stop_event(id)
>
> previous_stop_event simply references the previous (by stop_time) stop
> event for the combination of stop, route, and direction. I have
> successfully populated this column for my existing test data. However, when
> I try to do a test self-join using it, Postgres does two seq scans across
> the whole table, even though I have indexes on both id and
> previous_stop_event: http://explain.depesz.com/s/ctck. Any idea why those
> indexes are not being used?
>
> Thank you again,
> -Eli
>
> On Thu, Aug 21, 2014 at 9:05 AM, Shaun Thomas <stho...@optionshouse.com>
>> wrote:
>>
>>> On 08/21/2014 08:29 AM, Eli Naeher wrote:
>>>
>>>  With around 1.2 million rows, this takes 20 seconds to run. 1.2 million
>>>> rows is only about a week's worth of data, so I'd like to figure out a
>>>> way to make this faster.
>>>>
>>>
>>> Well, you'll probably be able to reduce the run time a bit, but even
>>> with really good hardware and all in-memory processing, you're not going to
>>> see significant run-time improvements with that many rows. This is one of
>>> the reasons reporting-specific structures, such as fact tables, were
>>> designed to address.
>>>
>>> Repeatedly processing the same week/month/year aggregate worth of
>>> several million rows will just increase linearly with each iteration as
>>> data size increases. You need to maintain up-to-date aggregates on the
>>> metrics you actually want to measure, so you're only reading the few
>>> hundred rows you introduce every update period. You can retrieve those kind
>>> of results in a few milliseconds.
>>>
>>> --
>>> Shaun Thomas
>>> OptionsHouse, LLC | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
>>> 312-676-8870
>>> stho...@optionshouse.com
>>>
>>> ______________________________________________
>>>
>>> See http://www.peak6.com/email_disclaimer/ for terms and conditions
>>> related to this email
>>>
>>
>>
>

Reply via email to