Re: [PERFORM] seq scan in the case of max() on the primary key column

2011-06-22 Thread Marti Raudsepp
On Thu, Jun 16, 2011 at 21:36, Shaun Thomas stho...@peak6.com wrote:
 You can call that instead of max, and it'll be much faster. You can create
 an analog for min if you need it. So for this, you'd call:

Cool, I've needed this function sometimes but never bothered enough to
write it myself. Now I created a wiki snippet page for this handy
feature here:
https://wiki.postgresql.org/wiki/Efficient_min/max_over_partitioned_table

With Jim Nasby's idea to use regclass instead of relation names, the
function is now half its length and probably more reliable. There's no
need to touch pg_class directly at all.

I also changed it to return bigint instead of integer, as that's more
versatile, and the performance loss is probably negligible.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] seq scan in the case of max() on the primary key column

2011-06-22 Thread Shaun Thomas

On 06/22/2011 04:55 AM, Marti Raudsepp wrote:


With Jim Nasby's idea to use regclass instead of relation names, the
function is now half its length and probably more reliable. There's no
need to touch pg_class directly at all.


Sadly until we upgrade to EDB 9.0, I have to use my function. :) EDB 8.3 
(which is really PostgreSQL 8.2) doesn't have a regclass-text 
conversion. But I'll bookmark the wiki page anyway, so I can update my 
function after upgrading. Heh.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@peak6.com

__

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] seq scan in the case of max() on the primary key column

2011-06-22 Thread Greg Smith

On 06/22/2011 05:55 AM, Marti Raudsepp wrote:

Now I created a wiki snippet page for this handy
feature here:
https://wiki.postgresql.org/wiki/Efficient_min/max_over_partitioned_table
   


I just tweaked this a bit to document the version compatibility issues 
around it and make it easier to follow.  I think that's now the page we 
should point people toward when this pops up again.  Between that and my 
blog post I reference in it, they can find all the details and a 
workaround in one place.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] seq scan in the case of max() on the primary key column

2011-06-22 Thread Scott Marlowe
On Wed, Jun 22, 2011 at 7:12 AM, Shaun Thomas stho...@peak6.com wrote:
 On 06/22/2011 04:55 AM, Marti Raudsepp wrote:

 With Jim Nasby's idea to use regclass instead of relation names, the
 function is now half its length and probably more reliable. There's no
 need to touch pg_class directly at all.

 Sadly until we upgrade to EDB 9.0, I have to use my function. :) EDB 8.3
 (which is really PostgreSQL 8.2) doesn't have a regclass-text conversion.
 But I'll bookmark the wiki page anyway, so I can update my function after
 upgrading. Heh.


Given that many folks still run  9.0 in production, the wiki page
should really have a version of that function for older versions,
whether it's long or not.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] seq scan in the case of max() on the primary key column

2011-06-22 Thread Shaun Thomas

On 06/22/2011 01:12 PM, Scott Marlowe wrote:


Given that many folks still run  9.0 in production, the wiki page
should really have a version of that function for older versions,
whether it's long or not.


This version does work on anything 8.3 and above. I just lamented on 9.0 
because we decided to skip 8.4 in favor of 9.0. And as we use EDB 
instead of PostgreSQL directly, our 8.3 is actually 8.2. Got that? ;)


Sorry for the confusion.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@peak6.com

__

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] seq scan in the case of max() on the primary key column

2011-06-22 Thread Greg Smith

On 06/22/2011 02:12 PM, Scott Marlowe wrote:

Given that many folks still run  9.0 in production, the wiki page
should really have a version of that function for older versions,
whether it's long or not.
   


I updated the page already to be clear about what versions of PostgreSQL 
it works on, and it directs people to Shaun's original message if they 
are running 8.2.  The only people who might get confused now are the 
ones running EDB's versions, where the exact features you get in 
particular versions can be slightly different than the community 
version.  But that problem both exists in other parts of the wiki, and 
is a bit outside of its scope to try and address.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] seq scan in the case of max() on the primary key column

2011-06-21 Thread Shaun Thomas

On 06/17/2011 03:31 PM, Jim Nasby wrote:


c_parent_oid CONSTANT oid := (p_parent_schema || '.' ||
p_parent_table )::regclass;


Well isn't *that* a handy bit of magic. How did I not know about that? 
Thanks!


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@peak6.com

__

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] seq scan in the case of max() on the primary key column

2011-06-17 Thread Svetlin Manavski
Yes, confirmed that the problem is in the partitioned table.
Shaun, that solution is brilliant.
Thank you,
Svetlin Manavski

On Thu, Jun 16, 2011 at 7:36 PM, Shaun Thomas stho...@peak6.com wrote:

 On 06/16/2011 12:25 PM, Magnus Hagander wrote:

  PostgreSQL 9.0 is unable to use an index scan to find min/max on a
 partitioned table. 9.1, however, can do that.


 Unfortunately this is true. You can fake it this way though:

 /**
 * Return the Maximum INT Value for a Partitioned Table Column
 *
 * @param string  Name of Schema of the base partition table.
 * @param string  Name of the base partition table.
 * @param string  Name of column to search.
 */
 CREATE OR REPLACE FUNCTION spc_max_part_int(VARCHAR,  VARCHAR, VARCHAR)
 RETURNS INT AS
 $$
 DECLARE

  sSchema ALIAS FOR $1;
  sTable ALIAS FOR $2;
  sColName ALIAS FOR $3;

  sChild VARCHAR;
  nMax INT;
  nTemp INT;
  nParent OID;

 BEGIN

  EXECUTE '
   SELECT max(' || sColName ||')
 FROM ONLY ' || sSchema || '.' || quote_ident(sTable)
  INTO nMax;

  SELECT INTO nParent t.oid
FROM pg_class t
JOIN pg_namespace n ON (t.relnamespace=n.oid)
   WHERE n.nspname = sSchema
 AND t.relname = sTable;

  FOR sChild IN
SELECT t.relname
  FROM pg_class t
  JOIN pg_inherits c ON (c.inhrelid=t.oid AND c.inhparent=nParent)
  LOOP
nTemp := utility.spc_max_part_int(sSchema, sChild, sColName);
nMax := greatest(nTemp, nMax);
  END LOOP;

  RETURN nMax;

 END;
 $$ LANGUAGE plpgsql STABLE;


 You can call that instead of max, and it'll be much faster. You can create
 an analog for min if you need it. So for this, you'd call:

 SELECT spc_max_part_int('appqosdata', 'tcpsessions', 'id');

 Someone probably has a better solution. :)

 --
 Shaun Thomas
 OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
 312-676-8870
 stho...@peak6.com

 __

 See  http://www.peak6.com/email_disclaimer.php
 for terms and conditions related to this email



Re: [PERFORM] seq scan in the case of max() on the primary key column

2011-06-17 Thread Greg Smith

On 06/17/2011 08:43 AM, Shaun Thomas wrote:
It's a bit of a hack, but it's worked fine for us while we wait for 
the planner to catch up. :)


Right.  In situations where people can modify their application to 
redirect MIN/MAX() calls over to directly query the individual 
partitions, that's a great workaround.  Your function is the slickest 
such solution I've seen for that, so filing it away in case this pops up 
in that situation.


But if you can't touch the application code and just need it to work as 
desired, you either need to use PostgreSQL 9.1 (not yet released) or 
figure out how to backport that fix into an earlier version (not easy).  
A babbled a bit about this specific case at 
http://blog.2ndquadrant.com/en/2011/06/max-partitioning-with-min-pain.html 
if anyone wants more information, or a specific simple test case to play 
with.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] seq scan in the case of max() on the primary key column

2011-06-17 Thread Jim Nasby
On Jun 16, 2011, at 1:36 PM, Shaun Thomas wrote:
 /**
 * Return the Maximum INT Value for a Partitioned Table Column
 *
 * @param string  Name of Schema of the base partition table.
 * @param string  Name of the base partition table.
 * @param string  Name of column to search.
 */
 CREATE OR REPLACE FUNCTION spc_max_part_int(VARCHAR,  VARCHAR, VARCHAR)
 RETURNS INT AS
 $$
 DECLARE
 snip
  SELECT INTO nParent t.oid
FROM pg_class t
JOIN pg_namespace n ON (t.relnamespace=n.oid)
   WHERE n.nspname = sSchema
 AND t.relname = sTable;

FWIW, instead of that, I would do this:

CREATE FUNCTION ...(
  p_parent_schema text
  , p_parent_table text
) ...
DECLARE
  c_parent_oid CONSTANT oid := (p_parent_schema || '.' || p_parent_table 
)::regclass;

... or ...

CREATE FUNCTION(
  p_parent text
)
DECLARE
  c_parent_oid CONSTANT oid := p_parent::regclass;


Advantages:

- ::regclass is search_path-aware, so you're not forced into providing a schema 
if you don't want to
- it will throw an error if it doesn't find a regclass entry
- you can cast the oid back to text: EXECUTE 'SELECT max(' ... 'FROM ' || 
c_parent_oid::regclass
- you can also query directly with the OID: SELECT relkind = 't' AS is_table 
FROM pg_class WHERE oid = c_parent_oid
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] seq scan in the case of max() on the primary key column

2011-06-16 Thread Magnus Hagander
On Thu, Jun 16, 2011 at 15:55, Svetlin Manavski
svetlin.manav...@gmail.com wrote:
 Hi everybody,

 I am running PostgreSQL 9.0 which performs well in most of the cases. I
 would skip all the parameters if these are not necessary.
 I need to frequently (every min) get the max value of the primary key column
 on some tables, like this case which works perfectly well:
 explain analyze select max(id) from appqosdata.tcpsessions;
 ---
  Result (cost=0.49..0.50 rows=1 width=0) (actual time=45.316..45.317 rows=1
 loops=1) InitPlan 1 (returns $0)
 - Limit (cost=0.00..0.49 rows=1 width=8) (actual time=45.302..45.303 rows=1
 loops=1)
   - Index Scan Backward using idx_tcpsessions_id on tcpsessions
 (cost=0.00..6633362.76 rows=13459023 width=8) (actual time=45.296..45.296
 rows=1 loops=1)
 Index Cond: (id IS NOT NULL)
 Total runtime: 45.399 ms

 But I have the following similar case which surprises me quite a lot:
 explain analyze select max(createdtime) from appqosdata.tcpsessiondata;
 ---
 Aggregate (cost=1123868.30..1123868.31 rows=1 width=8) (actual
 time=376932.636..376932.637 rows=1 loops=1)
 - Append (cost=0.00..965113.04 rows=63502104 width=8) (actual
 time=0.020..304844.944 rows=63501281 loops=1)
 - Seq Scan on tcpsessiondata (cost=0.00..12.80 rows=780 width=8) (actual
 time=0.002..0.002 rows=0 loops=1)
 - Seq Scan on tcpsessiondata_default tcpsessiondata (cost=0.00..965100.24
 rows=63501324 width=8) (actual time=0.015..173159.505 rows=63501281 loops=1)
 Total runtime: 376980.975 ms

 I have the following table definitions:
 CREATE TABLE appqosdata.tcpsessiondata_default
 (
  Primary key(createdtime), --bigint
 check (sessionid = 0),

  Foreign key(detectorid, sessionid) References
 appqosdata.tcpsessions(detectorid,id)

 ) inherits (appqosdata.tcpsessiondata);
 CREATE TABLE appqosdata.tcpsessions
 (
 detectorid smallint not null default(0) references appqosdata.detectors(id),
 id bigint not null,
  ...
 primary key(detectorid, id)
 );

 As you can see I have tens of millions of rows in both tables which would be
 ten times more in production. So seq scan is not acceptable at all to get
 one single value.
 Why that difference and what can I do to make the first query use its index
 on the primary key.

Looks like the first table is not partitioned, but the second one is?

PostgreSQL 9.0 is unable to use an index scan to find min/max on a
partitioned table. 9.1, however, can do that.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] seq scan in the case of max() on the primary key column

2011-06-16 Thread Shaun Thomas

On 06/16/2011 12:25 PM, Magnus Hagander wrote:


PostgreSQL 9.0 is unable to use an index scan to find min/max on a
partitioned table. 9.1, however, can do that.


Unfortunately this is true. You can fake it this way though:

/**
* Return the Maximum INT Value for a Partitioned Table Column
*
* @param string  Name of Schema of the base partition table.
* @param string  Name of the base partition table.
* @param string  Name of column to search.
*/
CREATE OR REPLACE FUNCTION spc_max_part_int(VARCHAR,  VARCHAR, VARCHAR)
RETURNS INT AS
$$
DECLARE

  sSchema ALIAS FOR $1;
  sTable ALIAS FOR $2;
  sColName ALIAS FOR $3;

  sChild VARCHAR;
  nMax INT;
  nTemp INT;
  nParent OID;

BEGIN

  EXECUTE '
   SELECT max(' || sColName ||')
 FROM ONLY ' || sSchema || '.' || quote_ident(sTable)
  INTO nMax;

  SELECT INTO nParent t.oid
FROM pg_class t
JOIN pg_namespace n ON (t.relnamespace=n.oid)
   WHERE n.nspname = sSchema
 AND t.relname = sTable;

  FOR sChild IN
SELECT t.relname
  FROM pg_class t
  JOIN pg_inherits c ON (c.inhrelid=t.oid AND c.inhparent=nParent)
  LOOP
nTemp := utility.spc_max_part_int(sSchema, sChild, sColName);
nMax := greatest(nTemp, nMax);
  END LOOP;

  RETURN nMax;

END;
$$ LANGUAGE plpgsql STABLE;


You can call that instead of max, and it'll be much faster. You can 
create an analog for min if you need it. So for this, you'd call:


SELECT spc_max_part_int('appqosdata', 'tcpsessions', 'id');

Someone probably has a better solution. :)

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@peak6.com

__

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance