[PERFORM] Large Table - Slow Window Functions (Better Approach?)

2013-03-11 Thread Jeff Adams - NOAA Affiliate
Greetings,



I have a large table (~90 million rows) containing vessel positions. In
addition to a column that contains the location information (the_geom), the
table also contains two columns that are used to uniquely identify the
vessel (mmsi and name) and a column containing the Unix time (epoch) at
which the position information was logged. I frequently need to assign
records to vessel transits. To do this, I currently create a CTE that uses
a Window function (partitioning the data by mmsi and name ordered by epoch)
to examine the time that has elapsed between successive position reports
for individual vessels. For every position record for a vessel (as
identified using mmsi and name), if the time elapsed between the current
position record and the previous record (using the lag function) is less
than or equal to 2 hours, I assign the record a value of 0 to a CTE column
named tr_index. If the time elapsed is greater than 2 hours, I assign the
record a value of 1 to the tr_index column. I then use the CTE to generate
transit numbers by summing the values in the tr_index field across a Window
that also partitions the data by mmsi and name and is ordered by epoch.
This works, but is very slow (hours). The table is indexed (multi-column
index on mmsi, name and index on epoch). Does anyone see a way to get what
I am after in a more efficient manner. What I am after is an assignment of
transit number to vessels' position records based on whether the records
were within two hours of each other. The SQL that I used is provided below.
Any advice would be greatly appreciated...



WITH

cte_01 AS

(

SELECT

a.id,

a.mmsi,

a.name,

a.epoch,

a.the_geom

CASE

  WHEN ((a.epoch - lag(a.epoch) OVER w) / 60) > 120 THEN 1

  ELSE 0

END AS tr_index

FROM table a

WINDOW w AS (PARTITION BY a.mmsi, a.name ORDER BY a.epoch)

)





SELECT

a.id,

a.mmsi,

a.name,

a.epoch,

a.the_geom,

1 + sum(a.tr_index) OVER w AS transit,

a.active

FROM cte_01 a

WINDOW w AS (PARTITION BY a.mmsi, a.name ORDER BY a.epoch)


-- 
Jeff


Re: [PERFORM] sniff test on some PG 8.4 numbers

2013-03-11 Thread Jon Nelson
On Sun, Mar 10, 2013 at 11:28 PM, Greg Smith  wrote:
> On 3/10/13 9:18 PM, Jon Nelson wrote:
>
>> The following is with ext4, nobarrier, and noatime. As noted in the
>> original post, I have done a fair bit of system tuning. I have the
>> dirty_bytes and dirty_background_bytes set to 3GB and 2GB,
>> respectively.
>
>
> That's good, but be aware those values are still essentially unlimited write
> caches.  A server with 4 good but regular hard drives might do as little as
> 10MB/s of random writes on a real workload.  If 2GB of data ends up dirty,
> the flushing that happens at the end of a database checkpoint will need to
> clear all of that out of RAM.  When that happens, you're looking at a 3
> minute long cache flush to push out 2GB.  It's not unusual for pgbench tests
> to pause for over a minute straight when that happens.  With your setup,
> where checkpoints happen every 5 minutes, this is only happening once per
> test run.  The disruption isn't easily visible if you look at the average
> rate; it's outweighed by the periods where writes happen very fast because
> the cache isn't full yet.  You have to get pgbench to plot latency over time
> to see them and then analyze that data.  This problem is the main reason I
> put together the pgbench-tools set for running things, because once you get
> to processing the latency files and make graphs from them it starts to be a
> pain to look at the results.

I'll try to find time for this, but it may need to wait until the weekend again.

>> I built 9.2 and using 9.2 and the following pgbench invocation:
>>
>> pgbench  -j 8  -c 32 -M prepared -T 600
>>
>> transaction type: TPC-B (sort of)
>> scaling factor: 400
>
>
> I misread this completely in your message before; I thought you wrote 4000.
> A scaling factor of 400 is making a database that's 6GB in size.  Your test
> is basically seeing how fast the system memory and the RAID cache can move
> things around.  In that situation, your read and write numbers are
> reasonable.  They aren't actually telling you anything useful about the
> disks though, because they're barely involved here. You've sniffed the CPU,
> memory, and RAID controller and they smell fine.  You'll need at least an
> order of magnitude increase in scale to get a whiff of the disks.

LOL! Your phrasing is humourous and the information useful.

I ran for 8.0 hours and go this:

transaction type: TPC-B (sort of)
scaling factor: 400
query mode: prepared
number of clients: 32
number of threads: 8
duration: 28800 s
number of transactions actually processed: 609250619
tps = 21154.058025 (including connections establishing)
tps = 21154.075922 (excluding connections establishing)

> pgbench scale numbers give approximately 16MB per scale factor.  You don't
> actually stress the drives until that total number is at least 2X as big as
> RAM.  We had to raise the limit on the pgbench scales recently because it
> only goes up to ~20,000 on earlier versions, and that's not a big enough
> scale to test many servers now.
>
> On the select-only tests, much of the increase from ~100K to ~200K is
> probably going from 8.4 to 9.2.  There's two major and several minor tuning
> changes that make it much more efficient at that specific task.
>
>
>> These are the *only* changes I've made to the config file:
>>
>> shared_buffers = 32GB
>> wal_buffers = 16MB
>> checkpoint_segments = 1024
>
>
> Note that these are the only changes that actually impact pgbench results.
> The test doesn't stress very many parts of the system, such as the query
> optimizer.
>
> Also be aware these values may not be practical to use in production. You
> can expect bad latency issues due to having shared_buffers so large.  All
> that memory has to be reconciled and written to disk if it's been modified
> at each checkpoint, and 32GB of such work is a lot.  I have systems where we
> can't make shared_buffers any bigger than 4GB before checkpoint pauses get
> too bad.
>
> Similarly, setting checkpoint_segments to 1024 means that you might go
> through 16GB of writes before a checkpoint happens.  That's great for
> average performance...but when that checkpoint does hit, you're facing a
> large random I/O backlog.

I thought the bgwriter mitigated most of the problems here? Often I'll
see the actual checkpoints with 'sync' times typically below a few
seconds (when there is anything to do at all). I can't say I've seen
checkpoint pauses in my workloads.

> There's not much you can do about all this on the Linux side.  If you drop
> the dirty_* parameters too much, maintenance operations like VACUUM start to
> get slow.  Really all you can do is avoid setting shared_buffers and
> checkpoint_segments too high, so the checkpoint backlog never gets gigantic.
> The tuning you've done is using higher values than we normally recommend
> because it's not quite practical to deploy like that.  That and the very
> small database are probably why your numbers are so high.

Mostly I do data warehouse t

Re: [PERFORM] Large Table - Slow Window Functions (Better Approach?)

2013-03-11 Thread Pavel Stehule
Hello

you can try procedural solution - use a cursor over ordered data in
plpgsql and returns table

Regards

Pavel Stehule

2013/3/11 Jeff Adams - NOAA Affiliate :
> Greetings,
>
>
>
> I have a large table (~90 million rows) containing vessel positions. In
> addition to a column that contains the location information (the_geom), the
> table also contains two columns that are used to uniquely identify the
> vessel (mmsi and name) and a column containing the Unix time (epoch) at
> which the position information was logged. I frequently need to assign
> records to vessel transits. To do this, I currently create a CTE that uses a
> Window function (partitioning the data by mmsi and name ordered by epoch) to
> examine the time that has elapsed between successive position reports for
> individual vessels. For every position record for a vessel (as identified
> using mmsi and name), if the time elapsed between the current position
> record and the previous record (using the lag function) is less than or
> equal to 2 hours, I assign the record a value of 0 to a CTE column named
> tr_index. If the time elapsed is greater than 2 hours, I assign the record a
> value of 1 to the tr_index column. I then use the CTE to generate transit
> numbers by summing the values in the tr_index field across a Window that
> also partitions the data by mmsi and name and is ordered by epoch. This
> works, but is very slow (hours). The table is indexed (multi-column index on
> mmsi, name and index on epoch). Does anyone see a way to get what I am after
> in a more efficient manner. What I am after is an assignment of transit
> number to vessels' position records based on whether the records were within
> two hours of each other. The SQL that I used is provided below. Any advice
> would be greatly appreciated...
>
>
>
> WITH
>
> cte_01 AS
>
> (
>
> SELECT
>
> a.id,
>
> a.mmsi,
>
> a.name,
>
> a.epoch,
>
> a.the_geom
>
> CASE
>
>   WHEN ((a.epoch - lag(a.epoch) OVER w) / 60) > 120 THEN 1
>
>   ELSE 0
>
> END AS tr_index
>
> FROM table a
>
> WINDOW w AS (PARTITION BY a.mmsi, a.name ORDER BY a.epoch)
>
> )
>
>
>
>
>
> SELECT
>
> a.id,
>
> a.mmsi,
>
> a.name,
>
> a.epoch,
>
> a.the_geom,
>
> 1 + sum(a.tr_index) OVER w AS transit,
>
> a.active
>
> FROM cte_01 a
>
> WINDOW w AS (PARTITION BY a.mmsi, a.name ORDER BY a.epoch)
>
>
>
> --
> Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Large Table - Slow Window Functions (Better Approach?)

2013-03-11 Thread Jeff Adams - NOAA Affiliate
Pavel,

Thanks for the response. I have not yet had the opportunity to use cursors,
but am now curious. Could you perhaps provide a bit more detail as to what
the implementation of your suggested approach would look like?

On Mon, Mar 11, 2013 at 11:03 AM, Pavel Stehule wrote:

> Hello
>
> you can try procedural solution - use a cursor over ordered data in
> plpgsql and returns table
>
> Regards
>
> Pavel Stehule
>
> 2013/3/11 Jeff Adams - NOAA Affiliate :
> > Greetings,
> >
> >
> >
> > I have a large table (~90 million rows) containing vessel positions. In
> > addition to a column that contains the location information (the_geom),
> the
> > table also contains two columns that are used to uniquely identify the
> > vessel (mmsi and name) and a column containing the Unix time (epoch) at
> > which the position information was logged. I frequently need to assign
> > records to vessel transits. To do this, I currently create a CTE that
> uses a
> > Window function (partitioning the data by mmsi and name ordered by
> epoch) to
> > examine the time that has elapsed between successive position reports for
> > individual vessels. For every position record for a vessel (as identified
> > using mmsi and name), if the time elapsed between the current position
> > record and the previous record (using the lag function) is less than or
> > equal to 2 hours, I assign the record a value of 0 to a CTE column named
> > tr_index. If the time elapsed is greater than 2 hours, I assign the
> record a
> > value of 1 to the tr_index column. I then use the CTE to generate transit
> > numbers by summing the values in the tr_index field across a Window that
> > also partitions the data by mmsi and name and is ordered by epoch. This
> > works, but is very slow (hours). The table is indexed (multi-column
> index on
> > mmsi, name and index on epoch). Does anyone see a way to get what I am
> after
> > in a more efficient manner. What I am after is an assignment of transit
> > number to vessels' position records based on whether the records were
> within
> > two hours of each other. The SQL that I used is provided below. Any
> advice
> > would be greatly appreciated...
> >
> >
> >
> > WITH
> >
> > cte_01 AS
> >
> > (
> >
> > SELECT
> >
> > a.id,
> >
> > a.mmsi,
> >
> > a.name,
> >
> > a.epoch,
> >
> > a.the_geom
> >
> > CASE
> >
> >   WHEN ((a.epoch - lag(a.epoch) OVER w) / 60) > 120 THEN 1
> >
> >   ELSE 0
> >
> > END AS tr_index
> >
> > FROM table a
> >
> > WINDOW w AS (PARTITION BY a.mmsi, a.name ORDER BY a.epoch)
> >
> > )
> >
> >
> >
> >
> >
> > SELECT
> >
> > a.id,
> >
> > a.mmsi,
> >
> > a.name,
> >
> > a.epoch,
> >
> > a.the_geom,
> >
> > 1 + sum(a.tr_index) OVER w AS transit,
> >
> > a.active
> >
> > FROM cte_01 a
> >
> > WINDOW w AS (PARTITION BY a.mmsi, a.name ORDER BY a.epoch)
> >
> >
> >
> > --
> > Jeff
>



-- 
Jeffrey D. Adams
Contractor
OAI, Inc.
In support of:
National Marine Fisheries Service
Office of Protected Resources
1315 East West Hwy, Building SSMC3
Silver Spring, MD 20910-3282
phone: (301) 427-8434
fax: (301) 713-0376


Re: [PERFORM] Large Table - Slow Window Functions (Better Approach?)

2013-03-11 Thread Pavel Stehule
2013/3/11 Jeff Adams - NOAA Affiliate :
> Pavel,
>
> Thanks for the response. I have not yet had the opportunity to use cursors,
> but am now curious. Could you perhaps provide a bit more detail as to what
> the implementation of your suggested approach would look like?

an example:

$$
DECLARE
  r record;
  prev_r record;

BEGIN
  FOR r IN SELECT * FROM a ORDER BY epoch, mmsi
  LOOP
IF prev_r IS NOT NULL THEN
  /* do some counting */
  prev_r contains previous row, r contains current row
  do some
  RETURN NEXT .. /* return data in defined order */
END IF;
prev_r = r;
  END LOOP;


Probably slow part of your query is sorting - first can be accelerated
by index, but second (as CTE result cannot) - you can try increase
work_mem ??

Regards

Pavel

>
>
> On Mon, Mar 11, 2013 at 11:03 AM, Pavel Stehule 
> wrote:
>>
>> Hello
>>
>> you can try procedural solution - use a cursor over ordered data in
>> plpgsql and returns table
>>
>> Regards
>>
>> Pavel Stehule
>>
>> 2013/3/11 Jeff Adams - NOAA Affiliate :
>> > Greetings,
>> >
>> >
>> >
>> > I have a large table (~90 million rows) containing vessel positions. In
>> > addition to a column that contains the location information (the_geom),
>> > the
>> > table also contains two columns that are used to uniquely identify the
>> > vessel (mmsi and name) and a column containing the Unix time (epoch) at
>> > which the position information was logged. I frequently need to assign
>> > records to vessel transits. To do this, I currently create a CTE that
>> > uses a
>> > Window function (partitioning the data by mmsi and name ordered by
>> > epoch) to
>> > examine the time that has elapsed between successive position reports
>> > for
>> > individual vessels. For every position record for a vessel (as
>> > identified
>> > using mmsi and name), if the time elapsed between the current position
>> > record and the previous record (using the lag function) is less than or
>> > equal to 2 hours, I assign the record a value of 0 to a CTE column named
>> > tr_index. If the time elapsed is greater than 2 hours, I assign the
>> > record a
>> > value of 1 to the tr_index column. I then use the CTE to generate
>> > transit
>> > numbers by summing the values in the tr_index field across a Window that
>> > also partitions the data by mmsi and name and is ordered by epoch. This
>> > works, but is very slow (hours). The table is indexed (multi-column
>> > index on
>> > mmsi, name and index on epoch). Does anyone see a way to get what I am
>> > after
>> > in a more efficient manner. What I am after is an assignment of transit
>> > number to vessels' position records based on whether the records were
>> > within
>> > two hours of each other. The SQL that I used is provided below. Any
>> > advice
>> > would be greatly appreciated...
>> >
>> >
>> >
>> > WITH
>> >
>> > cte_01 AS
>> >
>> > (
>> >
>> > SELECT
>> >
>> > a.id,
>> >
>> > a.mmsi,
>> >
>> > a.name,
>> >
>> > a.epoch,
>> >
>> > a.the_geom
>> >
>> > CASE
>> >
>> >   WHEN ((a.epoch - lag(a.epoch) OVER w) / 60) > 120 THEN 1
>> >
>> >   ELSE 0
>> >
>> > END AS tr_index
>> >
>> > FROM table a
>> >
>> > WINDOW w AS (PARTITION BY a.mmsi, a.name ORDER BY a.epoch)
>> >
>> > )
>> >
>> >
>> >
>> >
>> >
>> > SELECT
>> >
>> > a.id,
>> >
>> > a.mmsi,
>> >
>> > a.name,
>> >
>> > a.epoch,
>> >
>> > a.the_geom,
>> >
>> > 1 + sum(a.tr_index) OVER w AS transit,
>> >
>> > a.active
>> >
>> > FROM cte_01 a
>> >
>> > WINDOW w AS (PARTITION BY a.mmsi, a.name ORDER BY a.epoch)
>> >
>> >
>> >
>> > --
>> > Jeff
>
>
>
>
> --
> Jeffrey D. Adams
> Contractor
> OAI, Inc.
> In support of:
> National Marine Fisheries Service
> Office of Protected Resources
> 1315 East West Hwy, Building SSMC3
> Silver Spring, MD 20910-3282
> phone: (301) 427-8434
> fax: (301) 713-0376


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Large Table - Slow Window Functions (Better Approach?)

2013-03-11 Thread Jeff Adams - NOAA Affiliate
Thanks again. The sorting does appear to be the issue. I will test out your
cursor idea...

On Mon, Mar 11, 2013 at 11:34 AM, Pavel Stehule wrote:

> 2013/3/11 Jeff Adams - NOAA Affiliate :
> > Pavel,
> >
> > Thanks for the response. I have not yet had the opportunity to use
> cursors,
> > but am now curious. Could you perhaps provide a bit more detail as to
> what
> > the implementation of your suggested approach would look like?
>
> an example:
>
> $$
> DECLARE
>   r record;
>   prev_r record;
>
> BEGIN
>   FOR r IN SELECT * FROM a ORDER BY epoch, mmsi
>   LOOP
> IF prev_r IS NOT NULL THEN
>   /* do some counting */
>   prev_r contains previous row, r contains current row
>   do some
>   RETURN NEXT .. /* return data in defined order */
> END IF;
> prev_r = r;
>   END LOOP;
>
>
> Probably slow part of your query is sorting - first can be accelerated
> by index, but second (as CTE result cannot) - you can try increase
> work_mem ??
>
> Regards
>
> Pavel
>
> >
> >
> > On Mon, Mar 11, 2013 at 11:03 AM, Pavel Stehule  >
> > wrote:
> >>
> >> Hello
> >>
> >> you can try procedural solution - use a cursor over ordered data in
> >> plpgsql and returns table
> >>
> >> Regards
> >>
> >> Pavel Stehule
> >>
> >> 2013/3/11 Jeff Adams - NOAA Affiliate :
> >> > Greetings,
> >> >
> >> >
> >> >
> >> > I have a large table (~90 million rows) containing vessel positions.
> In
> >> > addition to a column that contains the location information
> (the_geom),
> >> > the
> >> > table also contains two columns that are used to uniquely identify the
> >> > vessel (mmsi and name) and a column containing the Unix time (epoch)
> at
> >> > which the position information was logged. I frequently need to assign
> >> > records to vessel transits. To do this, I currently create a CTE that
> >> > uses a
> >> > Window function (partitioning the data by mmsi and name ordered by
> >> > epoch) to
> >> > examine the time that has elapsed between successive position reports
> >> > for
> >> > individual vessels. For every position record for a vessel (as
> >> > identified
> >> > using mmsi and name), if the time elapsed between the current position
> >> > record and the previous record (using the lag function) is less than
> or
> >> > equal to 2 hours, I assign the record a value of 0 to a CTE column
> named
> >> > tr_index. If the time elapsed is greater than 2 hours, I assign the
> >> > record a
> >> > value of 1 to the tr_index column. I then use the CTE to generate
> >> > transit
> >> > numbers by summing the values in the tr_index field across a Window
> that
> >> > also partitions the data by mmsi and name and is ordered by epoch.
> This
> >> > works, but is very slow (hours). The table is indexed (multi-column
> >> > index on
> >> > mmsi, name and index on epoch). Does anyone see a way to get what I am
> >> > after
> >> > in a more efficient manner. What I am after is an assignment of
> transit
> >> > number to vessels' position records based on whether the records were
> >> > within
> >> > two hours of each other. The SQL that I used is provided below. Any
> >> > advice
> >> > would be greatly appreciated...
> >> >
> >> >
> >> >
> >> > WITH
> >> >
> >> > cte_01 AS
> >> >
> >> > (
> >> >
> >> > SELECT
> >> >
> >> > a.id,
> >> >
> >> > a.mmsi,
> >> >
> >> > a.name,
> >> >
> >> > a.epoch,
> >> >
> >> > a.the_geom
> >> >
> >> > CASE
> >> >
> >> >   WHEN ((a.epoch - lag(a.epoch) OVER w) / 60) > 120 THEN 1
> >> >
> >> >   ELSE 0
> >> >
> >> > END AS tr_index
> >> >
> >> > FROM table a
> >> >
> >> > WINDOW w AS (PARTITION BY a.mmsi, a.name ORDER BY a.epoch)
> >> >
> >> > )
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > SELECT
> >> >
> >> > a.id,
> >> >
> >> > a.mmsi,
> >> >
> >> > a.name,
> >> >
> >> > a.epoch,
> >> >
> >> > a.the_geom,
> >> >
> >> > 1 + sum(a.tr_index) OVER w AS transit,
> >> >
> >> > a.active
> >> >
> >> > FROM cte_01 a
> >> >
> >> > WINDOW w AS (PARTITION BY a.mmsi, a.name ORDER BY a.epoch)
> >> >
> >> >
> >> >
> >> > --
> >> > Jeff
>


[PERFORM] The dreaded semwait on FreeBSD

2013-03-11 Thread Benjamin Krajmalnik
I have a very heavy OLTP application in the field.

We have two PostgreSQL databases (9.0.x)/FreeBSD 8.1/amd64 - one is a
listener which just stores data digests, the other is the actual
database.

The digests from the listener are processed by multi-threaded daemons
and inserted into the main database.

Every so often, we see the queues on the listener grow and get
significantly behind.

Checking the system metrics, we do not see any issue with either memory,
CPU, or IO (using top and iostat), so it would appear we are being hit
by contention.  When data insertion rates slow down we can see a lot of
the potgresql processes in a semwait state.

 

The data which is coming in from the daemons is inserted into a
temporary table, which fires of a trigger which in turn calls a stored
procedure (massive) which processes the data input.

We also have another daemon which runs in the background continuously
creating a materialized view of a dashboard.  If this daemon does not
run, the queues do not grow (or the rate of growth decreases if some
other heavy processing is going on such as a large report being
generated), so it appears this is one of the primary causes for the
contention which is causing the semwaits.

 

The data is inserted into 3 partitioned tables, each of which is fairly
wide.  The daemon which processes the dashboard looks at only 2 fields
(state and last value), using the Devices and Tests tables.  If I were
to create a 2 more partitioned tables which only holds the columns in
question (as well as the columns necessary to associate the rows),
should this reduce the contention?  

 

The partitioned tables are as such:

 

Devices ->  Tests -> Statistical info on tests

 

I was thinking of adding two more partitioned tables

 

Altdevices -> DashboardTests

 

That way, when we process the dashboard we do not touch any of the 3
primary tables, which are the ones which are constantly being pounded
on.

Looking at the stats on the server (Xact committed), we are processing
approximately 4000 transactions per second.

 

On another note, this setup is using streaming replication to a
secondary server which is used as a read only server for reporting.
Would accessing data from the secondary server somehow cause contention
on the primary server?   From the patterns of behavior, it would appear
so (when large reports are generated we are seeing some effect on the
data insertion rates).

 

Thanks in advance,

 

Benjamin

 



[PERFORM] Slow query when used in a view

2013-03-11 Thread Shaun Thomas

Hey everyone!

A developer was complaining about a view he created to abstract an added 
column in a left join. He was contemplating denormalizing the added 
value into the parent table and using a trigger to maintain it instead, 
and I obviously looked into the problem. I noticed the view was 
incurring a sequence scan on an obvious index condition, but the regular 
join version was not.


Curious, I whipped up this test case:

CREATE TABLE foo (id BIGINT, small_label VARCHAR);
INSERT INTO foo (id) VALUES (generate_series(1, 1));
ALTER TABLE foo ADD CONSTRAINT pk_foo_id PRIMARY KEY (id);

CREATE TABLE bar (id BIGINT, foo_id BIGINT);

INSERT INTO bar (id, foo_id)
SELECT a, a%1
  FROM generate_series(1, 10) a;

ALTER TABLE bar ADD CONSTRAINT pk_bar_id PRIMARY KEY (id);

CREATE TABLE tiny_foo (small_label VARCHAR NOT NULL PRIMARY KEY);
INSERT INTO tiny_foo (small_label)
VALUES (('yes', 'we', 'have', 'no', 'bananas'));

UPDATE foo SET small_label = 'bananas' WHERE id=750;

ANALYZE foo;
ANALYZE bar;
ANALYZE tiny_foo;

CREATE VIEW v_slow_view AS
SELECT foo.*, tf.small_label IS NOT NULL AS has_small_label
  FROM foo
  LEFT JOIN tiny_foo tf USING (small_label);


Now, this is with PostgreSQL 9.1.8, basically default everything in a 
base Ubuntu install. So, the good query plan using all tables directly:


SELECT bar.*, foo.*, tf.small_label IS NOT NULL AS has_small_label
  FROM bar
  LEFT JOIN foo ON (foo.id = bar.foo_id)
  LEFT JOIN tiny_foo tf USING (small_label)
 WHERE bar.id IN (750, 1750, 2750)
 ORDER BY bar.id;

does this:

Index Scan using pk_foo_id on foo  (cost=0.00..8.27 rows=1 width=16)
  Index Cond: (id = bar.foo_id)

The bad one using the view:

SELECT bar.*, sv.*
  FROM bar
  LEFT JOIN v_slow_view sv ON (sv.id = bar.foo_id)
 WHERE bar.id IN (750, 1750, 2750)
 ORDER BY bar.id;

Mysteriously, does this:

Seq Scan on foo  (cost=0.00..145.00 rows=1 width=16)

I'm... perplexed. This test case is way too shallow to be affected by 
join_collapse_limit and its ilk, so I'm not sure what's going on here. I 
sense an optimization fence, but I can't see where.


Thanks in advance!

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
stho...@optionshouse.com

__

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to 
this email


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query when used in a view

2013-03-11 Thread Tom Lane
Shaun Thomas  writes:
> A developer was complaining about a view he created to abstract an added 
> column in a left join. ...
> Curious, I whipped up this test case:

> CREATE VIEW v_slow_view AS
> SELECT foo.*, tf.small_label IS NOT NULL AS has_small_label
>FROM foo
>LEFT JOIN tiny_foo tf USING (small_label);

> SELECT bar.*, foo.*, tf.small_label IS NOT NULL AS has_small_label
>FROM bar
>LEFT JOIN foo ON (foo.id = bar.foo_id)
>LEFT JOIN tiny_foo tf USING (small_label)
>   WHERE bar.id IN (750, 1750, 2750)
>   ORDER BY bar.id;

> SELECT bar.*, sv.*
>FROM bar
>LEFT JOIN v_slow_view sv ON (sv.id = bar.foo_id)
>   WHERE bar.id IN (750, 1750, 2750)
>   ORDER BY bar.id;

These queries are not actually equivalent.  In the first one, it is
impossible for "has_small_label" to read out as NULL: it will either be
true or false.  However, in the second one, the IS NOT NULL is evaluated
below the LEFT JOIN to "sv", and therefore it is required that the query
return NULL for "has_small_label" in any row where bar.foo_id lacks a
join partner.

To implement that behavior correctly, we're forced to form the
foo-to-tiny_foo join first, then do the left join with bar (which'll
replace RHS columns by nulls where necessary).

And that means that you get the inefficient plan wherein the
foo-to-tiny_foo join is computed in its entirety.

9.2 does this case better, by virtue of the "parameterized plan" stuff,
which exists specifically to let us use nestloop-with-inner-indexscan
plans even when there are some join order restrictions complicating
matters.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Slow concurrent processing

2013-03-11 Thread Misa Simic
Hi all,

We have one table with list of "records for processing"...

We loop trough that table and call one long runing function:

 do_the_math_for_record(record_id)

which use different tables for select related rows for input record_id, do
some calculations and insert results in two tables...

and we have made 1 function process_all_records()

what simply does: SELECT do_the_math_for_record(record_id)  FROM
records_for_processing

When we run that function - it last about 4 minutes...


There are about 300 rows in records_for_processing... we have logged the
time on the beginning of do_the_math, and the time in end of do the math...
and noticed that processing each row, last between 0.5 to 2 seconds...

so our do_the_math looks like:

PERFORM log_time(record_id, clock_timestamp(), 1)

PERFORM do_the_math_and_save_results(record_id);

PERFORM log_time(record_id, clock_timestamp(), 2)


Then we thought, if we take all "records for processing" and process each
in separate connection - it should last longer...

but - got worse result! (using 30 concurrent connections...)... about 7
mins...

if we reduce concurrent connections on 10 - we got result in approx the
same time as sequential processing...

but - if replace do_the_math_and_save_results with pg_sleep(1); To simulate
long running function so processing each row - last 1 sec...

Sequential processing last as expected 300 seconds!

Concurrent processing last faster with higher number of
concurrent connections - about 30 seconds with 30 connections! (much faster
- and expected...)

however, if we return our: do_the_math_and_save_results - we can't get
better results in concurrent processing...

with higher number of conccurent connections - result is worse... also we
have noticed that for some records difference between end_time and
start_time si even longer than 1 min - but it is random - not always on the
same id... i.e. in this concurrent run lasts 1 min - in next 1 sec - but
some other takes about 1 min...

Any idea - why? :)

It says to me - that there is somewhere lock on some tables - so probably
our concurrent connections wait - to other finish... but I cant figure out:
what and why...

do_the_math_and_save results - selects data from 10 other tables,
calculates something, and results inserts in other tables...

there are about 3 tracking tables with (record_id - other data.. and
about 7 settings tables what we join to tracking tables to get all
info...), then do the math with that info - and insert results..

we don't do any update... (to have possibility two connections want to
update the same row in the same table)

data from tracking_tables - should be separate sets of data for two
differenet record_ids...

(joined rows from settings tables could be common - for two sets of
different record_id)

but - even they are the same set - SELECTs should not lock the rows in
tables...

There are places where we do:

INSERT INTO result_table (columns)
SELECT query (tracking and settings tables joined)

Is there a chance it does some lock somewhere?

can above query be run "concurrently"?

Many thanks,

Misa