date:20080609

Re: [HACKERS] libpq support for arrays and composites

2008-06-09 Thread Andrew Dunstan




Andrew Chernow wrote:


libpqtypes already implemented this.  It is a different approach but 
provides the same functionality; with the inclusion of being able to 
handle every data type.  libpqtypes uses the PGresult API for 
composites and arrays, rather than adding a new set of functions.





Yes, I thought you'd say that :-)

This has some significant limitations - for example (quoting from your docs)

  Arrays  are only handled using binary format.  This means that any type
  used as an array element must be put and gotten in binary format.  If a
  user-defined  type  does  not implement a send and recv function in the
  backend, it can not be used as an array element.

That makes it quite useless for my intended purpose. 


I also am not particularly enamoured of the libpqtypes way of doing things, 
which feels rather foreign to me.

Lastly, the idea is to provide extra facilities to libpq clients without 
requiring any extra library.

cheers

andrew





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq support for arrays and composites

2008-06-09 Thread Tom Lane

Andrew Dunstan [EMAIL PROTECTED] writes:
 Lastly, the idea is to provide extra facilities to libpq clients
 without requiring any extra library.

Or more to the point, without requiring boatloads of new code that
only some libpq users would have any use for.

To my mind, the point of the present proposal is to provide some
client-side code that understands how to invert the data
formatting/escaping rules implemented by array_out, record_out,
and perhaps array_in/record_in.  We can make that happen without
taking a quantum jump in libpq's API complexity --- and offhand
it seems that Andrew D's proposal is at about the right level of
complication.  libpqtypes has its place also, but I think it's
addressing a different level of problem complexity.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Automating our version-stamping a bit better

2008-06-09 Thread Heikki Linnakangas


Andrew Dunstan wrote:

Tom Lane wrote:


I'm tempted to suggest letting the script invoke autoconf, too,
but that would require standardizing where to find the correct
version of autoconf for each branch; so it might not be such a
great idea.   


Unfortunately that's true. Maybe we could agree on using an alias for 
the right version of autoconf, but it seems likely to be error prone.


Or we could sidestep the issue by not running autoconf, but 
search-replace the version strings in configure directly with the perl 
script.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq support for arrays and composites

2008-06-09 Thread Andrew Chernow


That makes it quite useless for my intended purpose.

I found no more use cases for text results after libpqtypes started to 
take shape, eventhough libpqtypes supports all data types in text  
binary excluding arrays and composites.  Because of this, adding a text 
parser for arrays and composites felt like a lot of work for a little 
gain.  libpqtypes is really designed to be a binary interface.  The text 
support offered allows existing applications to use the new interface 
with results generated by PQexec(), meaning you can use PQgetf w/o 
having to change code to use PQputf().


If you take another glance at libpqtypes, you may see that result format 
decisions are pretty well abstracted and there really is no need for 
text results anymore (okay, I'll catagorize that as an opinion).


 I also am not particularly enamoured of the libpqtypes way of doing
 things, which feels rather foreign to me.

Not sure we can fix this issue.  We made every attempt to keep things 
familiar ... printf/scanf style.  It's a new approach for libpq but an 
old one for C hacks.


--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] proposal: add window function to 8.4

2008-06-09 Thread H . Harada

This topic has been discussed on this list and many user expect that
PostgreSQL implements it.
I'd like to work on this feature and hope that we can include it on 8.4.

Former discussions are here:

http://archives.postgresql.org/pgsql-hackers/2004-11/msg01093.php

http://archives.postgresql.org/pgsql-hackers/2007-01/msg00861.php


How it works and examples:
 SELECT dept, empno,
 RANK() OVER(PARTITION BY dept ORDER BY age) as age_rank,
 RANK() OVER(PARTITION BY dept ORDER BY salary) as salary_rank,
 SUM(salary) OVER(PARTITION BY dept ORDER BY age) as run_total
 FROM employees ORDER BY 1, 3, 4;

 dept empno age_rank salary_rank run_total
 ENG  2 12   4
 ENG  1 21   9
 QA   3 12   3
 QA   4 21   65000
 (ref.: http://www.gavinsherry.org/talks/window_talk.pdf)


My current idea and concept:
- add window function and treat it specially such like aggregate
function and setof function.
- some features may be dropped at the first release, considering to
support them later.
- to formulate and to let it work properly are primary, performance
optimization is secondary.


From my survey around web and list archive, the points are:
- what is window function rank(), rank_dense(), lead() and others?
  First of all, we should define the window function such like
aggregate function. In my opinion, there are two types of functions in
OVER () call context. One of them is aggregate, and the other is
window (ranking) function. Sum() in a query like

  SELECT empno, sum(salary) OVER (PARTITION BY depno) FROM empsalary;

 is obviously aggregate function. This type of function can be used as
it is now. Only executer will change its behavior.
 Current pgsql feature sets lack window function like rank(). This
type of function must 1) have a context such as SETOF functions, 2)
return values until executor says DONE, rather than function itself
says DONE as in SETOF function, and 3) know about other tuples
(mainly ORDER BY clause), as rank() doesn't take any arguments but
knows the ranking boundary.  I suppose that pgsql have new function
system for these types called window function.
 Once we can define window function, users have extensibility to this
type of function.

- do we really need FRAME clause?
 From my survey, Oracle and DB2 have FRAME clause

  SELECT empno, sum(salary) OVER (ORDER BY empno ROWS BETWEEN
UNBOUNDED PRECEDING AND CURRENT ROW) FROM empsalary;

 while MS SQL Server doesn't (note that the literal from ROWS to
CURRENT ROW is called FRAME clause). To implement FRAME clause is
much more complicated than only PARTITION and ORDER clause support
because we need random access to result tuples. Though we will be
expected to support FAME clause in long-term, for the first release it
might be able to be put off. Even though rank() doesn't support FRAME
clause (only PARTITION and ORDER) it is so useful, more than now at
least.

- execution order
 In SQL:2003, the execution order is defined as

 where - group by - having - (windowing) * N - order by (outer,
currently existing one)
 where windowing is
 partition by - order by - (framing) * N

 But Oracle seems that it has another order

 (windowing) * N - where - group by ... and so on.

 which is better for us? With Oracle's one you can say
  SELECT empno, rank() OVER (PARTITION BY depno ORDER BY saraly) AS
topsalary FROM empsalary
  WHERE topsaraly  3
 to get the top 3 people taking heighest salary. In the SQL standard,
you should the nest query.
 I insist the first (standard) one is better because we may want use
the result of normal aggregate in OVER clause.

- plan and node
 Currently in my mind the execution strategy could be:

 1. Where  GroupBy  Having
 |
 2. SortBy partitionClause, orderByClause
 |
 3. Window
   foreach partition:
 if not there_is_frame():
   aggvalue = null
   foreach row in partition:
 aggvalue = agg_trans_func(aggvalue)
   aggvalue = agg_final_func(aggvalue)

 foreach row in partition:
   if has frame clause:
 aggvalue = null
 frame = make_frame()
 foreach row_in_frame:
   aggvalue = aggregate_trans_func(aggvalue)
 aggvalue = aggregate_final_func(aggvalue)

   set aggvalue to row
   val = window_func()
   set val to row
   goto 2. if another window remained
 |
 4. SortBy ORDER BY clause (outer)  Limit
 |
 5. Output

 This pseudo code is quite simple and stupid. We may optimize it by
splitting tasks with MergeJoin, etc. or think about process 2. that
collect same PARTITION clauses to reduce sort operation. Or to use
Hash Agg to create PARTITION. But let's be stupid at first.
Optimization is secondary.


References:
 description by Stefan DeBloch
http://wwwdvs.informatik.uni-kl.de/courses/NEDM/WS0607/Vorlesungsunterlagen/NEDM.Chapter.06.Windows_and_Query_Functions_in_SQL.pdf
 via Wikipedia[Select (SQL)] http://en.wikipedia.org/wiki/Select_(SQL)


 BTW, what about Bizgres

Re: [HACKERS] TODO, FAQs to Wiki?

2008-06-09 Thread Magnus Hagander

Alvaro Herrera wrote:
 Actually, now that I try it, it seems that the MediaWiki markup is not
 completely helpful here -- right now, on some items we have a one-line
 header and then possibly a longer description, and it seems the only
 way to do that in MediaWiki is like this:
 
 * Set proper permissions on non-system schemas during db creationbr 
 Currently all schemas are owned by the super-user because they are copied 
 from the template1 database.  However, since all objects are inherited from 
 the template database, it is not clear that setting schemas to the db owner 
 is correct.
 
 Note the dumb br thing in the middle.
 
 Personally I find that ugly enough as to be unacceptable; what do others
 think?

How about using definition lists?


//Magnus



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-09 Thread Decibel!


On Jun 7, 2008, at 2:00 PM, Andrew Dunstan wrote:
If we go down this road then I would far rather we tried to devise  
some safe (or semi-safe) way of doing it instead of simply  
providing expert (a.k.a. footgun) mode.


For instance, I'm wondering if we could do something with checksums  
of the input lines or something else that would make this difficult  
to do in circumstances other than pg_restore.



Yes, but that provides no help at all outside of pg_dump. Being able  
to add a FK with NO CHECK would be tremendously useful outside of  
pg_dump. Actually, in the interest of stating the problem and not the  
solution, what we need is a way to add FKs that doesn't lock  
everything up to perform the key checks. Perhaps there is some semi- 
safe way that the constraint could be added and the checks done in  
the background...


As for the footgun aspect, are we the enterprise-class OSS database  
or the one that caters itself to noobs that will go out of their way  
to make life hard on themselves? I'm all in favor of not adding  
footguns that don't have value, but this one holds a lot of value for  
anyone trying to maintain a large database in a 24/7 environment. To  
put this in perspective, the amount of revenue we would loose from  
adding just one FK to one of our larger tables would more than cover  
paying someone to develop this feature.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature

Re: [HACKERS] proposal: add window function to 8.4

2008-06-09 Thread Decibel!


On Jun 9, 2008, at 7:32 AM, H.Harada wrote:

This topic has been discussed on this list and many user expect that
PostgreSQL implements it.
I'd like to work on this feature and hope that we can include it on  
8.4.



I can't really comment on the technical aspects of your proposal, but  
yes, please, windowing functions would be great to have even if not  
fully implemented.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature

[HACKERS] Strange issue with GiST index scan taking far too long

2008-06-09 Thread Mark Cave-Ayland


Hi there,

I'm currently looking at a bug report in PostGIS where we are getting 
extremely long index scan times querying an index in one case, but the 
same scan can take much less time if the input geometry is calculated as 
the result of another function.


First of all, I include the EXPLAIN ANALYZE of the basic query which 
looks like this:



postgis=# explain analyze select count(*) from geography where centroid 
 (select the_geom from geography where id=69495);


QUERY PLAN
-
 Aggregate  (cost=7157.29..7157.30 rows=1 width=0) (actual 
time=2691.783..2691.784 rows=1 loops=1)

   InitPlan
 -  Seq Scan on geography  (cost=0.00..7149.00 rows=1 width=4559) 
(actual time=60.987..61.373 rows=1 loops=1)

   Filter: (id = 69495::numeric)
   -  Index Scan using geography_geom_centroid_idx on geography 
(cost=0.00..8.28 rows=1 width=0) (actual time=79.241..2645.722 
rows=32880 loops=1)

 Index Cond: (centroid  $0)
 Filter: (centroid  $0)
 Total runtime: 2692.288 ms
(8 rows)


The only real thing to know about the query is that the id field within 
the geography table is a primary key, and hence only a single geometry 
is being returned from within the subselect. Note that most of the time 
is disappearing into the index scan.


Where things start getting really strange is when we add an extra 
function called force_2d() into the mix. All this function does is scan 
through the single geometry returned from the subselect and remove any 
3rd dimension coordinates. Now the resulting EXPLAIN ANALYZE for this 
query looks like this:



postgis=# explain analyze select count(*) from geography where centroid 
 (select force_2d(the_geom) from geography where id=69495);


QUERY PLAN

 Aggregate  (cost=7157.29..7157.30 rows=1 width=0) (actual 
time=343.004..343.005 rows=1 loops=1)

   InitPlan
 -  Seq Scan on geography  (cost=0.00..7149.00 rows=1 width=4559) 
(actual time=48.714..49.016 rows=1 loops=1)

   Filter: (id = 69495::numeric)
   -  Index Scan using geography_geom_centroid_idx on geography 
(cost=0.00..8.28 rows=1 width=0) (actual time=49.367..235.296 rows=32880 
loops=1)

 Index Cond: (centroid  $0)
 Filter: (centroid  $0)
 Total runtime: 343.084 ms
(8 rows)


So by adding in an extra function around the subselect result, we have 
speeded up the index lookup by several orders of magnitude, and the 
speedup appears to be coming from somewhere within the index scan?! I've 
spent a little bit of time playing with this and it seems even writing a 
function to return a copy of the input function is enough. Here is my 
test function below:



Datum LWGEOM_mcatest(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(LWGEOM_mcatest);
Datum LWGEOM_mcatest(PG_FUNCTION_ARGS)
{
PG_LWGEOM *pgl = (PG_LWGEOM *)
 PG_DETOAST_DATUM(PG_GETARG_DATUM(0));
void *mem;

/* Copy somewhere else */
mem = palloc(VARSIZE(pgl));
memcpy(mem, pgl, VARSIZE(pgl)-VARHDRSZ);

PG_RETURN_POINTER(mem);
}


CREATE OR REPLACE FUNCTION mcatest(geometry)
RETURNS geometry
AS '$libdir/lwpostgis','LWGEOM_mcatest'
LANGUAGE 'C';


And then here is the resulting EXPLAIN ANALYZE:

postgis=# explain analyze select count(*) from geography where centroid 
 (select mcatest(the_geom) from geography where id=69495);


QUERY PLAN

 Aggregate  (cost=7157.29..7157.30 rows=1 width=0) (actual 
time=283.126..283.127 rows=1 loops=1)

   InitPlan
 -  Seq Scan on geography  (cost=0.00..7149.00 rows=1 width=4559) 
(actual time=48.712..49.040 rows=1 loops=1)

   Filter: (id = 69495::numeric)
   -  Index Scan using geography_geom_centroid_idx on geography 
(cost=0.00..8.28 rows=1 width=0) (actual time=49.321..215.524 rows=32880 
loops=1)

 Index Cond: (centroid  $0)
 Filter: (centroid  $0)
 Total runtime: 283.221 ms
(8 rows)


Unfortunately I can't seem to work out why the extra time is 
disappearing into the index scan when my extra mcatest() function is not 
present, especially as sprof doesn't seem to want to run at the moment 
:(  I'm wondering if it's related to either excess TOAST/palloc/pfree 
somewhere in the code, but I'd definitely appreciate any pointers.


All these tests were done using PostgreSQL 8.3.1 and the latest PostGIS SVN.


Many thanks,

Mark.

--
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-09 Thread Andrew Dunstan




Decibel! wrote:



Yes, but that provides no help at all outside of pg_dump. Being able 
to add a FK with NO CHECK would be tremendously useful outside of 
pg_dump. Actually, in the interest of stating the problem and not the 
solution, what we need is a way to add FKs that doesn't lock 
everything up to perform the key checks. Perhaps there is some 
semi-safe way that the constraint could be added and the checks done 
in the background...


I had some thoughts along the same lines.

But how do you propose to recover when the check fails? What should 
pg_restore do if the dump is corrupt causing an FK check to fail?


I suppose we could have some sort of marking for FK constraints along 
the lines of {checked, unchecked, invalid}.




As for the footgun aspect, are we the enterprise-class OSS database or 
the one that caters itself to noobs that will go out of their way to 
make life hard on themselves? 


We are the database that tries very hard to keep its promises. If you 
want to change or relax those promises then the implications need to be 
very very clear.


I'm all in favor of not adding footguns that don't have value, but 
this one holds a lot of value for anyone trying to maintain a large 
database in a 24/7 environment. To put this in perspective, the amount 
of revenue we would loose from adding just one FK to one of our larger 
tables would more than cover paying someone to develop this feature.





Come up with a good proposal and I'm your man :-)  I haven't seen one yet.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Strange issue with GiST index scan taking far too long

2008-06-09 Thread A. Kretschmer

am  Mon, dem 09.06.2008, um 14:18:50 +0100 mailte Mark Cave-Ayland folgendes:
 Hi there,
 
 I'm currently looking at a bug report in PostGIS where we are getting 
 extremely long index scan times querying an index in one case, but the 
 same scan can take much less time if the input geometry is calculated as 
 the result of another function.
 
 First of all, I include the EXPLAIN ANALYZE of the basic query which 
 looks like this:
 
 
 postgis=# explain analyze select count(*) from geography where centroid 
  (select the_geom from geography where id=69495);
 
 QUERY PLAN
 -
  Aggregate  (cost=7157.29..7157.30 rows=1 width=0) (actual 
 time=2691.783..2691.784 rows=1 loops=1)
InitPlan
  -  Seq Scan on geography  (cost=0.00..7149.00 rows=1 width=4559) 
 (actual time=60.987..61.373 rows=1 loops=1)
Filter: (id = 69495::numeric)
-  Index Scan using geography_geom_centroid_idx on geography 
 (cost=0.00..8.28 rows=1 width=0) (actual time=79.241..2645.722 
 rows=32880 loops=1)
  Index Cond: (centroid  $0)
  Filter: (centroid  $0)
  Total runtime: 2692.288 ms
 (8 rows)
 


There are a BIG difference between estimated rows and real rows (1 vs.
32880). Why? Do you have recent statistics?


 
 The only real thing to know about the query is that the id field within 
 the geography table is a primary key, and hence only a single geometry 

Sure? I can't believe this because the rows=32880.



 is being returned from within the subselect. Note that most of the time 
 is disappearing into the index scan.
 
 Where things start getting really strange is when we add an extra 
 function called force_2d() into the mix. All this function does is scan 
 through the single geometry returned from the subselect and remove any 
 3rd dimension coordinates. Now the resulting EXPLAIN ANALYZE for this 
 query looks like this:
 
 
 postgis=# explain analyze select count(*) from geography where centroid 
  (select force_2d(the_geom) from geography where id=69495);
 
 QUERY PLAN
 
  Aggregate  (cost=7157.29..7157.30 rows=1 width=0) (actual 
 time=343.004..343.005 rows=1 loops=1)
InitPlan
  -  Seq Scan on geography  (cost=0.00..7149.00 rows=1 width=4559) 
 (actual time=48.714..49.016 rows=1 loops=1)
Filter: (id = 69495::numeric)
-  Index Scan using geography_geom_centroid_idx on geography 
 (cost=0.00..8.28 rows=1 width=0) (actual time=49.367..235.296 rows=32880 
 loops=1)
  Index Cond: (centroid  $0)
  Filter: (centroid  $0)
  Total runtime: 343.084 ms
 (8 rows)
 
 
 So by adding in an extra function around the subselect result, we have 
 speeded up the index lookup by several orders of magnitude, and the 

Wild guess: you have a big cache.


But i'm not a PostGIS-Expert.



Regards, Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: - Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Strange issue with GiST index scan taking far too long

2008-06-09 Thread Simon Riggs


On Mon, 2008-06-09 at 14:18 +0100, Mark Cave-Ayland wrote:
 Unfortunately I can't seem to work out why the extra time is 
 disappearing into the index scan when my extra mcatest() function is
 not present,

Hmmm, perhaps implicit casting?

Try this to see if it works better also

select count(*) from geography where centroid 
 (select the_geom::geometry from geography where id=69495);

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Strange issue with GiST index scan taking far too long

2008-06-09 Thread Mark Cave-Ayland


Simon Riggs wrote:


Hmmm, perhaps implicit casting?

Try this to see if it works better also

select count(*) from geography where centroid 
 (select the_geom::geometry from geography where id=69495);



Hi Simon,

Unfortunately that seems to take the slow runtime path too. I did 
initially think about casting being involved (since the underlying index 
storage type is actually box2d rather than geometry), however my 
mcatest() function is also declared as returning geometry too.


Interesting enough, forcing a cast to box2d instead of geometry seems to 
take the faster path, i.e:



postgis=# explain analyze select count(*) from geography where centroid
 (select the_geom::box2d from geography where id=69495);

QUERY PLAN

 Aggregate  (cost=7157.29..7157.30 rows=1 width=0) (actual 
time=376.033..376.034 rows=1 loops=1)

   InitPlan
 -  Seq Scan on geography  (cost=0.00..7149.00 rows=1 width=4559) 
(actual time=42.853..43.051 rows=1 loops=1)

   Filter: (id = 69495::numeric)
   -  Index Scan using geography_geom_centroid_idx on geography 
(cost=0.00..8.28 rows=1 width=0) (actual time=43.218..286.535 rows=32880 
loops=1)

 Index Cond: (centroid  ($0)::geometry)
 Filter: (centroid  ($0)::geometry)
 Total runtime: 376.117 ms
(8 rows)


ATB,

Mark.

--
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-09 Thread Tom Lane

Decibel! [EMAIL PROTECTED] writes:
 Actually, in the interest of stating the problem and not the  
 solution, what we need is a way to add FKs that doesn't lock  
 everything up to perform the key checks.

Ah, finally a useful comment.  I think it might be possible to do an
add FK concurrently type of command that would take exclusive lock
for just long enough to add the triggers, then scan the tables with just
AccessShareLock to see if the existing rows meet the constraint, and
if so finally mark the constraint valid.  Meanwhile the constraint
would be enforced against newly-added rows by the triggers, so nothing
gets missed.  You'd still get a small hiccup in system performance
from the transient exclusive lock, but nothing like as bad as it is
now.  Would that solve your problem?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Potential deadlock with auto-analyze

2008-06-09 Thread Tom Lane

It says here that CVS HEAD has a deadlock risk:
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=chinchilladt=2008-06-09%2008:16:01
The problem manifested while attempting to drop a GIN index (though
I doubt GIN is specially at fault):

[484ceb40.47da:169] ERROR:  deadlock detected
[484ceb40.47da:170] DETAIL:  Process 18394 waits for AccessExclusiveLock on 
relation 41759 of database 41648; blocked by process 18401.
Process 18401 waits for AccessShareLock on relation 41766 of database 
41648; blocked by process 18394.
Process 18394: DROP INDEX text_idx;
Process 18401: autovacuum: ANALYZE public.test__int
[484ceb40.47da:171] HINT:  See server log for query details.
[484ceb40.47da:172] STATEMENT:  DROP INDEX text_idx;

We need to fix that --- having DDL randomly fail is not going to go
over well with autovacuum users.  I don't have time to look into it
just now, but I rather suspect that what's happening is that the
DROP tries to lock the index first and the table second, since it
can't find out which table is involved without looking at the index.
So it can deadlock against anything going in the more-normal direction.

If that's the right diagnosis, we probably ought to try to fix DROP
INDEX to not do that, since it could deadlock against any number of
things not just autovac.  But there's another question here: can we
make autovac lose the deadlock war, instead of the user process?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-09 Thread Simon Riggs


On Mon, 2008-06-09 at 10:57 -0400, Tom Lane wrote:
 Decibel! [EMAIL PROTECTED] writes:
  Actually, in the interest of stating the problem and not the  
  solution, what we need is a way to add FKs that doesn't lock  
  everything up to perform the key checks.
 
 Ah, finally a useful comment.  I think it might be possible to do an
 add FK concurrently type of command that would take exclusive lock
 for just long enough to add the triggers, then scan the tables with just
 AccessShareLock to see if the existing rows meet the constraint, and
 if so finally mark the constraint valid.  Meanwhile the constraint
 would be enforced against newly-added rows by the triggers, so nothing
 gets missed.  You'd still get a small hiccup in system performance
 from the transient exclusive lock, but nothing like as bad as it is
 now.  Would that solve your problem?

That's good, but it doesn't solve the original user complaint about
needing to re-run many, many large queries to which we already know the
answer.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics

2008-06-09 Thread Jeff Davis

On Sun, 2008-06-08 at 19:03 -0400, Tom Lane wrote:
 Your argument seems to consider only columns having a normal
 distribution.  How badly does it fall apart for non-normal
 distributions?  (For instance, Zipfian distributions seem to be pretty
 common in database work, from what I've seen.)
 

If using Idea 1: Keep an array of stadistinct that correspond to each
bucket size, I would expect it to still be a better estimate than it is
currently, because it's keeping a separate ndistinct for each histogram
bucket.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-09 Thread Andrew Dunstan




Simon Riggs wrote:

On Mon, 2008-06-09 at 10:57 -0400, Tom Lane wrote:
  

Decibel! [EMAIL PROTECTED] writes:

Actually, in the interest of stating the problem and not the  
solution, what we need is a way to add FKs that doesn't lock  
everything up to perform the key checks.
  

Ah, finally a useful comment.  I think it might be possible to do an
add FK concurrently type of command that would take exclusive lock
for just long enough to add the triggers, then scan the tables with just
AccessShareLock to see if the existing rows meet the constraint, and
if so finally mark the constraint valid.  Meanwhile the constraint
would be enforced against newly-added rows by the triggers, so nothing
gets missed.  You'd still get a small hiccup in system performance
from the transient exclusive lock, but nothing like as bad as it is
now.  Would that solve your problem?



That's good, but it doesn't solve the original user complaint about
needing to re-run many, many large queries to which we already know the
answer.

  


But we don't know it for dead sure, we only think we do. What if the 
data for one or other of the tables is corrupted? We'll end up with data 
we believe is consistent but in fact is not, ISTM. If you can somehow 
guarantee the integrity of data in both tables then we might be 
justified in assuming that the FK constraint will be consistent - that's 
why I suggested some sort of checksum mechanism might serve the purpose.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Potential deadlock with auto-analyze

2008-06-09 Thread Alvaro Herrera

Tom Lane wrote:

 If that's the right diagnosis, we probably ought to try to fix DROP
 INDEX to not do that, since it could deadlock against any number of
 things not just autovac.  But there's another question here: can we
 make autovac lose the deadlock war, instead of the user process?

Hmm, I thought autovacuum was already supposed to lose on deadlock.
Perhaps it's only considering VACUUM though and not analyze ... [checks]
nope, that doesn't seem to be it.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-09 Thread Tom Lane

Simon Riggs [EMAIL PROTECTED] writes:
 On Mon, 2008-06-09 at 10:57 -0400, Tom Lane wrote:
 Ah, finally a useful comment.  I think it might be possible to do an
 add FK concurrently type of command that would take exclusive lock

 That's good, but it doesn't solve the original user complaint about
 needing to re-run many, many large queries to which we already know the
 answer.

No, we are running a large query to which the user *thinks* he knows the
answer.  There are any number of reasons why he might be wrong.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-09 Thread Simon Riggs


On Mon, 2008-06-09 at 11:23 -0400, Andrew Dunstan wrote:
 
 Simon Riggs wrote:
  On Mon, 2008-06-09 at 10:57 -0400, Tom Lane wrote:

  Decibel! [EMAIL PROTECTED] writes:
  
  Actually, in the interest of stating the problem and not the  
  solution, what we need is a way to add FKs that doesn't lock  
  everything up to perform the key checks.

  Ah, finally a useful comment.  I think it might be possible to do an
  add FK concurrently type of command that would take exclusive lock
  for just long enough to add the triggers, then scan the tables with just
  AccessShareLock to see if the existing rows meet the constraint, and
  if so finally mark the constraint valid.  Meanwhile the constraint
  would be enforced against newly-added rows by the triggers, so nothing
  gets missed.  You'd still get a small hiccup in system performance
  from the transient exclusive lock, but nothing like as bad as it is
  now.  Would that solve your problem?
  
 
  That's good, but it doesn't solve the original user complaint about
  needing to re-run many, many large queries to which we already know the
  answer.
 
 But we don't know it for dead sure, we only think we do. What if the 
 data for one or other of the tables is corrupted? We'll end up with data 
 we believe is consistent but in fact is not, ISTM. If you can somehow 
 guarantee the integrity of data in both tables then we might be 
 justified in assuming that the FK constraint will be consistent - that's 
 why I suggested some sort of checksum mechanism might serve the purpose.

Agreed.

Can we get COPY to output the checksum of its output as part of the
command tag? How else can we return the checksum? In $file.cksum for any
given output file?

We can then use an explicit checksum option in the COPY when we reload,
with CHECKSUM option.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-09 Thread Simon Riggs


On Mon, 2008-06-09 at 11:33 -0400, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  On Mon, 2008-06-09 at 10:57 -0400, Tom Lane wrote:
  Ah, finally a useful comment.  I think it might be possible to do an
  add FK concurrently type of command that would take exclusive lock
 
  That's good, but it doesn't solve the original user complaint about
  needing to re-run many, many large queries to which we already know the
  answer.
 
 No, we are running a large query to which the user *thinks* he knows the
 answer.  There are any number of reasons why he might be wrong.

Of course. I should have said to which we already know the answer to
indicate I'm passing on others' criticisms of us.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Strange issue with GiST index scan taking far too long

2008-06-09 Thread Tom Lane

Mark Cave-Ayland [EMAIL PROTECTED] writes:
 So by adding in an extra function around the subselect result, we have 
 speeded up the index lookup by several orders of magnitude, and the 
 speedup appears to be coming from somewhere within the index scan?!

Is the value you are fetching from the geography table large enough to
be toasted?  I'm thinking you might be looking at the cost of repeated
de-toastings.

BTW, that mcatest function is buggy --- it's not copying the last
4 bytes of the source value.  I don't know enough about PostGIS
data types to know what effect that would have.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Strange issue with GiST index scan taking far too long

2008-06-09 Thread Mark Cave-Ayland


Tom Lane wrote:


Is the value you are fetching from the geography table large enough to
be toasted?  I'm thinking you might be looking at the cost of repeated
de-toastings.


Yeah, it's a fairly large geometry field so it will definitely be 
getting toasted. So is it a case of with the mcatest function in place, 
we're effectively caching the de-TOASTED value?



BTW, that mcatest function is buggy --- it's not copying the last
4 bytes of the source value.  I don't know enough about PostGIS
data types to know what effect that would have.


It would probably lose that last point in the coordinate sequence, but 
nothing major. I've removed the -VARHDRSZ part just to check and it 
doesn't make any difference.



Many thanks,

Mark.

--
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-09 Thread Tom Lane

Simon Riggs [EMAIL PROTECTED] writes:
 On Mon, 2008-06-09 at 11:33 -0400, Tom Lane wrote:
 No, we are running a large query to which the user *thinks* he knows the
 answer.  There are any number of reasons why he might be wrong.

 Of course. I should have said to which we already know the answer to
 indicate I'm passing on others' criticisms of us.

[ shrug... ]  We don't know the answer either, and anyone who says
we do is merely betraying his ignorance of the number of ways to load
a foot-gun.

I don't have any confidence in the checksum proposal either, as it's
still naively assuming that changes in the data are the only possible
problem.  Consider that you are loading the data into a new database,
which might be running under a different locale setting, might contain
a new implementation of a datatype with subtly (or not so subtly)
different semantics, or might just already contain data in the target
tables.  pg_dump scripts are not nearly bright enough to defend against
these types of threats.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Strange issue with GiST index scan taking far too long

2008-06-09 Thread Tom Lane

Mark Cave-Ayland [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Is the value you are fetching from the geography table large enough to
 be toasted?  I'm thinking you might be looking at the cost of repeated
 de-toastings.

 Yeah, it's a fairly large geometry field so it will definitely be 
 getting toasted. So is it a case of with the mcatest function in place, 
 we're effectively caching the de-TOASTED value?

Well, yeah, because the first thing it does is pg_detoast_datum.
Just as a cross-check, try changing it to copy the value without
forcibly detoasting --- I'll bet it's still slow then.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-09 Thread Filip Rembiałkowski

2008/6/9 Simon Riggs [EMAIL PROTECTED]:


 On Mon, 2008-06-09 at 10:57 -0400, Tom Lane wrote:
  Decibel! [EMAIL PROTECTED] writes:
   Actually, in the interest of stating the problem and not the
   solution, what we need is a way to add FKs that doesn't lock
   everything up to perform the key checks.
 
  Ah, finally a useful comment.  I think it might be possible to do an
  add FK concurrently type of command that would take exclusive lock
  for just long enough to add the triggers, then scan the tables with just
  AccessShareLock to see if the existing rows meet the constraint, and
  if so finally mark the constraint valid.  Meanwhile the constraint
  would be enforced against newly-added rows by the triggers, so nothing
  gets missed.  You'd still get a small hiccup in system performance
  from the transient exclusive lock, but nothing like as bad as it is
  now.  Would that solve your problem?

 That's good, but it doesn't solve the original user complaint about
 needing to re-run many, many large queries to which we already know the
 answer.


just a guess, but maybe create FK concurrently feature combined with
synchronized scan feature _does_ resolve original problem.

if you run many create FK concurrently one after another, wouldn't the seq
scan be reused?












 --
  Simon Riggs   www.2ndQuadrant.com
  PostgreSQL Training, Services and Support


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
Filip Rembiałkowski

Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-09 Thread Andrew Dunstan




Simon Riggs wrote:
But we don't know it for dead sure, we only think we do. What if the 
data for one or other of the tables is corrupted? We'll end up with data 
we believe is consistent but in fact is not, ISTM. If you can somehow 
guarantee the integrity of data in both tables then we might be 
justified in assuming that the FK constraint will be consistent - that's 
why I suggested some sort of checksum mechanism might serve the purpose.



Agreed.

Can we get COPY to output the checksum of its output as part of the
command tag? How else can we return the checksum? In $file.cksum for any
given output file?
  


It seems a reasonable idea to use the command tag, unless that's going 
to break lots of stuff. I think the only thing we can usefully checksum 
is the output lines in the client encoding.



We can then use an explicit checksum option in the COPY when we reload,
with CHECKSUM option.

  


We need rather more than this to make sure your facility isn't abused. 
That's the part that I haven't been able to think of a good answer for 
(yet).


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Proposal: GiST constraints

2008-06-09 Thread Jeff Davis


I would like to consider adding constraints to GiST indexes. I think it
is possible to add constraints that are more sophisticated than just
UNIQUE. My use case is a non-overlapping constraint, but I think it's
possible for this to be more general.

The idea is to make an array in shared memory with size equal to
max_connections. Each element of the array would hold the oid of an
index, and a tid.

When inserting into a GiST index, take out a lock on the array, and scan
it for other tuples being concurrently inserted into the same index, and
execute some function to determine if any tuples conflict (using the tid
stored in the array). Then, check the visibility of the tuple at that
tid. If the conflicting tuple is live, release the lock on the array and
abort. If the tuple is dead, set the array entry to be invalid, make an
entry in the array, and release the lock. If the tuple has an xmin or
xmax that's still in progress, release the lock on the array, block
waiting on the appropriate xid, and then try again. If no conflicts
exist in the array, make an entry in the array, and release the lock.

Then, search the GiST index using the same function to determine if
conflicts exist in the index. If conflicts exist in the index, check the
visibility information for the tuple and proceed, wait or abort (in the
same way as above). If no conflicts exist, insert.

This should work fine for multi-column indexes where the constraints for
each column are different. For instance, unique and non-overlapping
could be mixed.

I spoke about this idea with several people at EAST and PGCon. In
particular, Teodor had the good idea to store the tid in the array,
rather than the value, to make the idea more general to types of
different sizes.

Thoughts, ideas?

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] proposal: new contrib module - session variables

2008-06-09 Thread Pavel Stehule

Hello all,

session variables are missing feature long time. Currently nobody
works on package or SQL/PSM module implementation so there is some
gap.Currently best know solution are session variables based on plper
hash tables or Joe Conway's C implementation.

I propose include Joe's code into contrib set
http://www.joeconway.com/sessfunc.tar.gz It's important mainly for
window's users, that has problems with own compilation.

Objections?

Regards
Pavel Stehule

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Strange issue with GiST index scan taking far too long

2008-06-09 Thread Simon Riggs


On Mon, 2008-06-09 at 11:49 -0400, Tom Lane wrote:
 Mark Cave-Ayland [EMAIL PROTECTED] writes:
  So by adding in an extra function around the subselect result, we have 
  speeded up the index lookup by several orders of magnitude, and the 
  speedup appears to be coming from somewhere within the index scan?!
 
 Is the value you are fetching from the geography table large enough to
 be toasted?  I'm thinking you might be looking at the cost of repeated
 de-toastings.

So you are saying it is de-toasted 32880 times, in this case? If not,
where are the repeated de-toastings happening?

Sounds like we need some stats on how well toast is working for us.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-09 Thread Robert Treat

On Monday 09 June 2008 11:59:27 Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  On Mon, 2008-06-09 at 11:33 -0400, Tom Lane wrote:
  No, we are running a large query to which the user *thinks* he knows the
  answer.  There are any number of reasons why he might be wrong.
 
  Of course. I should have said to which we already know the answer to
  indicate I'm passing on others' criticisms of us.

 [ shrug... ]  We don't know the answer either, and anyone who says
 we do is merely betraying his ignorance of the number of ways to load
 a foot-gun.


I think the more realistic scenario (based on the FK idea) is that you want to 
prevent any future rows from coming without validating the FK, and you're 
willing to clean up any violators after the fact, since you can make that 
an out of the critical path operation.

if you extend this to a more general create constraint concurrently (to 
handle normal constraint, not null constraints, etc...), it would certainly 
be a big win, and i think most would see it as a reasonable compromise. 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: new contrib module - session variables

2008-06-09 Thread Andrew Dunstan




Pavel Stehule wrote:

Hello all,

session variables are missing feature long time. Currently nobody
works on package or SQL/PSM module implementation so there is some
gap.Currently best know solution are session variables based on plper
hash tables or Joe Conway's C implementation.

I propose include Joe's code into contrib set
http://www.joeconway.com/sessfunc.tar.gz It's important mainly for
window's users, that has problems with own compilation.

Objections?

  


The URL fails, so I can't comment on Joe's work. But I seriously doubt 
that any contrib module is really the right way to go about doing 
session variables. If we're going to have them then they need to be 
properly builtin, and available to all PLs.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Strange issue with GiST index scan taking far too long

2008-06-09 Thread Tom Lane

Simon Riggs [EMAIL PROTECTED] writes:
 On Mon, 2008-06-09 at 11:49 -0400, Tom Lane wrote:
 Is the value you are fetching from the geography table large enough to
 be toasted?  I'm thinking you might be looking at the cost of repeated
 de-toastings.

 So you are saying it is de-toasted 32880 times, in this case? If not,
 where are the repeated de-toastings happening?

Inside the index support functions.  I'm thinking we could fix this by
forcibly detoasting values passed as index scan keys, but it's not quite
clear where's the best place to do that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: GiST constraints

2008-06-09 Thread Tom Lane

Jeff Davis [EMAIL PROTECTED] writes:
 I would like to consider adding constraints to GiST indexes. I think it
 is possible to add constraints that are more sophisticated than just
 UNIQUE. My use case is a non-overlapping constraint, but I think it's
 possible for this to be more general.

I would like to see something that replaces the current btree-only kluge
for UNIQUE, if we're going to try to do something general.  IOW, don't
think of this as GiST-specific.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: new contrib module - session variables

2008-06-09 Thread Pavel Stehule

2008/6/9 Andrew Dunstan [EMAIL PROTECTED]:


 Pavel Stehule wrote:

 Hello all,

 session variables are missing feature long time. Currently nobody
 works on package or SQL/PSM module implementation so there is some
 gap.Currently best know solution are session variables based on plper
 hash tables or Joe Conway's C implementation.

 I propose include Joe's code into contrib set
 http://www.joeconway.com/sessfunc.tar.gz It's important mainly for
 window's users, that has problems with own compilation.

 Objections?



 The URL fails, so I can't comment on Joe's work. But I seriously doubt that
 any contrib module is really the right way to go about doing session
 variables. If we're going to have them then they need to be properly
 builtin, and available to all PLs.


probably Joe's server is out currently :(.

Joe's implementation was simply - hash table stored in session context.

Pavel


 cheers

 andrew


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics

2008-06-09 Thread Nathan Boley

 Your argument seems to consider only columns having a normal
 distribution.

My example was based upon normally distributed data because people
usually know what they are and they are reasonably common.

 How badly does it fall apart for non-normal distributions?

This should work to the extent that there is a correlation between the
number of distinct values in a histogram bin and the 'size' of the
bin. I know that this isn't a very good answer, but it's a hard
problem in the general case.

However, for any non-uniform probability distribution there exists a
measure under which there is a non-zero correlation between these. I
wrote up a hand-wavie semi-formal argument at the end, but it's not
tough to see intuitively.

Just think about the graph of the cdf. The histogram boundaries are
horizontal lines that are equally spaced from top to bottom. If we
rescale the x-axis st there is a fixed distance between each distinct
value, and define the distance as ( ndistinct in a given
interval)/(total ndistinct) then the only place where this doesn't
hold is when the CDF is f(x) = x, aka the dist is uniform. And even
then we just get a 0 coefficient, which is exactly what we are always
assuming now.

Obviously we run into problems when
a) we have a poor estimate for ndistinct - but then we have worse problems
b) our length measure doesn't correspond well with ndistinct in an interval

expanding on b)...

your mention of Zipfian distributions is particularly good example of
where this could be poor. Right now ( someone correct me if I'm wrong
) words are sorted alphabetically. However, if we wanted this
estimator to do a good job, we would sort them by their length or,
better yet, frequency in the english language ( which is highly
correlated with length ).

 (For instance, Zipfian distributions seem to be pretty common in database 
 work, from what I've seen.)

This should work well for any power law distribution. If people are
curious, I can rerun my previous example using a power law
distribution instead of normal.

However, the easy way of thinking about all of this is that we're
building a linear model between ndistinct and histogram bucket width.
It's intuitive to expect there to be a correlation between the two,
and so the model should have some predictive power.

-Nathan


somewhat formal - probably will be difficult to read without some
basic probability theory
To see this, first note that we can alternatively define the uniform
distribution on [a,b] as the distribution whose CDF is a straight line
that passes through both (a,0) and (b,1) ( just integrate the PDF ).
So any non-uniform distribution will have a CDF with slope that is
both below and above 1/(b-a) at some set of points, implying the
existence of an interval [i1, i2] st ( CDF(i2) - CDF(i1) )  ( i2 - i1
)/(b-a). Then, under the constraints of the probability measure, there
exists a second disjoint interval st ( CDF(i2') - CDF(i1') )  ( i2' -
i1' )/(b-a). In other words,

Next, assume that the number of potential distinct values in our
interval scales linearly with the length of the interval. Although it
seems as if this assumption could be somewhat burdensome, there always
exists a measure under which this is true for sets with a defined
order relation.  ( As remarked earlier by Tom, we are already making
this assumption ). To see this, consider defining the length(i1, i2)
as ( the number of distinct value in  [i1, i2] )/( total num distinct
values ), where the number of distinct values is the set of values { v
| v = i1 and v = i2 }.

Next, note that the joint distribution of identically distributed,
independent random variables is multinomial with cell probabilities
given by the value of the pdf at each distinct point.  Next, I'll
state without proof that for an IID RV  the expected number of
distinct values is maximized for a uniform distribution ( this is
pretty easy to see: think about the binomial case. Do you want your
cell probabilities to be ( 1.0, 0 ) or ( 0.5, 0.5 ) )

Finally, note that the number of expected distinct values decreases
faster than linearly in the length of the interval. This is pretty
clear when we consider the sparse case. As the number of potential
entries ( in this case, the interval length) approaches infinity, the
probability of a new entry being distinct approaches 1. This means
that,  in this limit, every new entry ends up being distinct, aka the
number of distinct values scales linearly in the number of new
entries. As the interval shrinks, new entries have some probability of
being repeats. As the interval shrinks to 0, there is a zero
probability of new entries being unique. Since,

a) there doesn't exists a linear relationship that contains the two
boundary points
b) the multinomial distribution of the PDF is continuous
c) the relationship is clearly decreasing

we can surmise that it is sub-linear.

Therefore, we have two intervals that have sub and super linear slopes
that cancel one another. However,

Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-09 Thread Simon Riggs


On Mon, 2008-06-09 at 12:37 -0400, Robert Treat wrote:
 On Monday 09 June 2008 11:59:27 Tom Lane wrote:
  Simon Riggs [EMAIL PROTECTED] writes:
   On Mon, 2008-06-09 at 11:33 -0400, Tom Lane wrote:
   No, we are running a large query to which the user *thinks* he knows the
   answer.  There are any number of reasons why he might be wrong.
  
   Of course. I should have said to which we already know the answer to
   indicate I'm passing on others' criticisms of us.
 
  [ shrug... ]  We don't know the answer either, and anyone who says
  we do is merely betraying his ignorance of the number of ways to load
  a foot-gun.
 
 
 I think the more realistic scenario (based on the FK idea) is that you want 
 to 
 prevent any future rows from coming without validating the FK, and you're 
 willing to clean up any violators after the fact, since you can make that 
 an out of the critical path operation.
 
 if you extend this to a more general create constraint concurrently (to 
 handle normal constraint, not null constraints, etc...), it would certainly 
 be a big win, and i think most would see it as a reasonable compromise. 

Agreed. I think the out of the critical path action is more likely to
be the intended path rather than the never check at all route.

If we break down the action into two parts.

ALTER TABLE ... ADD CONSTRAINT foo FOREIGN KEY ... NOVALIDATE;
which holds exclusive lock, but only momentarily
After this runs any new data is validated at moment of data change, but
the older data has yet to be validated.

ALTER TABLE ... VALIDATE CONSTRAINT foo
which runs lengthy check, though only grabs lock as last part of action

This way we have the ability to add them concurrently if we choose by
running one after the other, or we can run first part only for now and
run the other one at a more convenient moment.

On a full set of checks on a large complex database can easily take
hours or even days.

We should allow this. It's not a footgun, its an honest attempt by
people to add RI checks to their database. The only other alternative
for some people is to not add FKs at all, which is also a footgun, but
we don't seem bothered that they might take that option.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: GiST constraints

2008-06-09 Thread Jeff Davis

On Mon, 2008-06-09 at 13:28 -0400, Tom Lane wrote:
 Jeff Davis [EMAIL PROTECTED] writes:
  I would like to consider adding constraints to GiST indexes. I think it
  is possible to add constraints that are more sophisticated than just
  UNIQUE. My use case is a non-overlapping constraint, but I think it's
  possible for this to be more general.
 
 I would like to see something that replaces the current btree-only kluge
 for UNIQUE, if we're going to try to do something general.  IOW, don't
 think of this as GiST-specific.
 

I was concerned that the BTree kludge would outperform what I am
suggesting for the case of UNIQUE, and might therefore still be
necessary. 

My proposal requires an extra index lookup, because in GiST a
conflicting tuple isn't necessarily found near the place it might be
inserted. Maybe that cost is not too high, because for the case of BTree
UNIQUE it would just be accessing the same pages twice (once to test for
conflicts, and once to insert). 

I'm not sure exactly what you have in mind when you say kludge. My
proposal doesn't solve the problem of update foo set a = a + 1, in
which the UNIQUE constraint may fail when it should succeed. I don't see
how that problem can be solved without deferring the constraint checking
until the end of the statement, which sounds costly.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: new contrib module - session variables

2008-06-09 Thread Joe Conway


Pavel Stehule wrote:

The URL fails, so I can't comment on Joe's work. But I seriously doubt that
any contrib module is really the right way to go about doing session
variables. If we're going to have them then they need to be properly
builtin, and available to all PLs.


probably Joe's server is out currently :(.

Joe's implementation was simply - hash table stored in session context.


Sorry -- I forgot to restart apache after upgrading my kernel yesterday :-(

I don't have a strong opinion either way regarding including this in 
contrib or elsewhere, but if there is sufficient interest I'll find the 
time to update it for cvs head as well as ensure it represents the 
latest and greatest that I have available. I've used and improved it in 
minor ways since that tarball was put up.


Joe

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] math error or rounding problem Money type

2008-06-09 Thread Jan Urbański


Gregory Stark wrote:

Mark Kirkwood [EMAIL PROTECTED] writes:


IFAIK (dimly recalling numerical analysis courses at university) SUM and ROUND
can *never* be commuted. In general the recommended approach is to round as
late as possible and as few times are possible - so your 1st query is the
correct or best way to go.


Justin, isn't your problem related precisely to what Tom said?

Now, when you're casting to Money, you're doing a cast like that 
original_type - text - money (that's from your trailbalance view). I 
suspect the original_type is NUMERIC (and I think it's a very good type 
to keep your monetary data in).
My guess: what happens is that you have numbers with more that 6 
fractional digits in your original table, and they're kept as NUMERIC 
values. If you round them to the 6th fractional digit *before* summing 
them up, you can indeed get different results from what you'd get if 
you'd rounded them *after* doign the sum.


Compare:

=# select round(0.004 + 0.004, 6) ;
  round
--
 0.01
(1 row)

=# select round(0.004, 6) + round(0.004) ;
 ?column?
--
 0.00

Do you see what (could've) happened? The first query is computed like this:
round(0.004 + 0.004, 0) = round(0.008, 6) = 0.01
whereas the second one is more like:
round(0.004, 6) + round(0.004, 6) = 0.00 + 0.00 = 0.00

Fractional parts that have been thrown away by the rounding may, when 
added up, become fractional parts that get significant when you're 
calculating the rounded value of the sum.


So yes, probably the way to go is do *all* computations in NUMERIC and 
only cast when you're about to generate a report or present the data to 
the end user. Otherwise you risk losing some cents like that (and you 
need to be aware that a cast to MONEY *is* in fact a truncation, and you 
will not get mathematically correct results).


Cheers,
Jan

--
Jan Urbanski
GPG key ID: E583D7D2

ouden estin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-09 Thread Alvaro Herrera

Simon Riggs wrote:

 If we break down the action into two parts.
 
 ALTER TABLE ... ADD CONSTRAINT foo FOREIGN KEY ... NOVALIDATE;
 which holds exclusive lock, but only momentarily
 After this runs any new data is validated at moment of data change, but
 the older data has yet to be validated.
 
 ALTER TABLE ... VALIDATE CONSTRAINT foo
 which runs lengthy check, though only grabs lock as last part of action

The problem I see with this approach in general (two-phase FK creation)
is that you have to keep the same transaction for the first and second
command, but you really want concurrent backends to see the tuple for
the not-yet-validated constraint row.

Another benefit that could arise from this is that the hypothetical
VALIDATE CONSTRAINT step could validate more than one constraint at a
time, possibly processing all the constraints with a single table scan.

Perhaps VALIDATE CONSTRAINT could be handled as an automatic commit-time
action.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhauling GUCS

2008-06-09 Thread Josh Berkus


Tom,


Actually, the reason it's still 10 is that the effort expended to get it
changed has been *ZERO*.  I keep asking for someone to make some
measurements, do some benchmarking, anything to make a plausible case
for a specific higher value as being a reasonable place to set it.
The silence has been deafening.

Not surprising really. It is a simple adjustment to make and it also is
easy to spot when its a problem. However it is not trivial to test for
(in terms of time and effort). I know 10 is wrong and so do you. If you
don't I am curious why I see so many posts from you saying, Your
estimates are off, what is your default_statistics_target? with yet
even more responses saying, Uhh 10. 


I tried (back in 7.4) to do some systematic testing of this.  The 
problem is that the cases were higher d_s_t are required are 
specifically ones with complex, unbalanced data distributions and/or 
very large databases.  This makes test cases extremely difficult and 
time-consuming to generate; further, I found that the test cases I had 
from my clients' databases were not portable (in addition to being 
confidential).


Also, I'd actually assert that 10 seems to be perfectly adequate for 
the majority of users.  That is, the number of users where I've 
recommended increasing d_s_t for the whole database is smaller than the 
number where I don't, and of course we never hear from most users at 
all.  So I'm pretty happy recommending Leave the default.  If you 
encounter problem queries, increase it to 100, and analyse the database. 
 If you're running a data warehouse, increase it to 1000.


Where analyze does systematically fall down is with databases over 500GB 
in size, but that's not a function of d_s_t but rather of our tiny 
sample size.


--Josh

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] handling TOAST tables in autovacuum

2008-06-09 Thread Zdenek Kotala


Alvaro Herrera napsal(a):

Hi,

We've been making noises about dealing with TOAST tables as separate
entities in autovacuum for some time now.  So here's a proposal:


Maybe dumb idea - whats about make a queue of toast pointers ready for vacuum 
and remove this toast items directly from toast table and index?


Zdenek

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhauling GUCS

2008-06-09 Thread Gregory Stark

Josh Berkus [EMAIL PROTECTED] writes:

 Where analyze does systematically fall down is with databases over 500GB in
 size, but that's not a function of d_s_t but rather of our tiny sample size.

Speak to the statisticians. Our sample size is calculated using the same
theory behind polls which sample 600 people to learn what 250 million people
are going to do on election day. You do NOT need (significantly) larger
samples for larger populations.

In fact where those polls have difficulty is the same place we have some
problems. For *smaller* populations like individual congressional races you
need to have nearly the same 600 sample for each of those small races. That
adds up to a lot more than 600 total. In our case it means when queries cover
a range much less than a whole bucket then the confidence interval increases
too.

Also, our estimates for n_distinct are very unreliable. The math behind
sampling for statistics just doesn't work the same way for properties like
n_distinct. For that Josh is right, we *would* need a sample size proportional
to the whole data set which would practically require us to scan the whole
table (and have a technique for summarizing the results in a nearly constant
sized data structure).

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-09 Thread Gregory Stark

Alvaro Herrera [EMAIL PROTECTED] writes:

 The problem I see with this approach in general (two-phase FK creation)
 is that you have to keep the same transaction for the first and second
 command, but you really want concurrent backends to see the tuple for
 the not-yet-validated constraint row.

Do you? It seems like having a constraint which is enforced on any new
operations but which doesn't guarantee that existing records satisfy it is a
useful feature in itself -- separating the two concepts this property is true
for all records and any action taken must leave the record with this
property

ISTM you can validate an invalid constraint using any snapshot taken at any
time = the original snapshot. As long as the constraint is being enforced for
all transactions which start after the validating snapshot's xmin then when
it's done it can know the constraint is valid.

Taking a lock on the table to create the constraint certainly leaves that
property fulfilled. Actually it seems we could not take any lock and just
check when it comes time to do the validation that the snapshot's xmin is =
the xmin on the constraint. I'm starting to get leery of all these tightly
argued bits of logic though. Each one on its own is safe but the resulting
system is getting to be quite complex.

 Another benefit that could arise from this is that the hypothetical
 VALIDATE CONSTRAINT step could validate more than one constraint at a
 time, possibly processing all the constraints with a single table scan.

Interesting.

 Perhaps VALIDATE CONSTRAINT could be handled as an automatic commit-time
 action.

I don't really like this, at least not as the only option, because as I said
above and Robert Treat also said, it could be useful to have the constraint in
place for new operations but check it for the existing data at some later
date. (Or even never)

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhauling GUCS

2008-06-09 Thread Hakan Kocaman

On 6/9/08, Gregory Stark [EMAIL PROTECTED] wrote:

 Josh Berkus [EMAIL PROTECTED] writes:

  Where analyze does systematically fall down is with databases over 500GB
 in
  size, but that's not a function of d_s_t but rather of our tiny sample
 size.


 n_distinct. For that Josh is right, we *would* need a sample size
 proportional
 to the whole data set which would practically require us to scan the whole
 table (and have a technique for summarizing the results in a nearly
 constant
 sized data structure).


Hi,
is this (summarizing results in a constant sized data structure) something
which could be achived by Bloom-Filters ?
http://archives.postgresql.org/pgsql-general/2008-06/msg00076.php

Kind regards
Hakan Kocaman

Re: [HACKERS] Overhauling GUCS

2008-06-09 Thread Gregory Stark


Hakan Kocaman [EMAIL PROTECTED] writes:

 On 6/9/08, Gregory Stark [EMAIL PROTECTED] wrote:

 n_distinct. For that Josh is right, we *would* need a sample size
 proportional to the whole data set which would practically require us to
 scan the whole table (and have a technique for summarizing the results in a
 nearly constant sized data structure).

 is this (summarizing results in a constant sized data structure) something
 which could be achived by Bloom-Filters ?

Uhm, it would be a bit of a strange application of them but actually it seems
to me that would be a possible approach. It would need a formula for
estimating the number of distinct values given the number of bits set in the
bloom filter. That should be a tractable combinatorics problem (in fact it's
pretty similar to the combinatorics I posted a while back about getting all
the drives in a raid array busy). And if you have a dynamic structure where
the filter size grows then it would overestimate because extra copied bits
would be set.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Overhauling GUCS

2008-06-09 Thread Josh Berkus

Greg,

 Speak to the statisticians. Our sample size is calculated using the same
 theory behind polls which sample 600 people to learn what 250 million
 people are going to do on election day. You do NOT need (significantly)
 larger samples for larger populations.

Your analogy is bad.  For elections, the voters have only a few choices.  
In a 300 million row table, there could be 300 million different values, 
and the histogram becomes less accurate for every order of magnitude 
smaller than 300 million it is.

 Also, our estimates for n_distinct are very unreliable. The math behind
 sampling for statistics just doesn't work the same way for properties
 like n_distinct. For that Josh is right, we *would* need a sample size
 proportional to the whole data set which would practically require us to
 scan the whole table (and have a technique for summarizing the results
 in a nearly constant sized data structure).

Actually, a number of papers have shown block-based algorithms which can 
arrive a reasonably confident (between 50% and 250% of accurate) estimates 
based on scanning only 5% of *blocks*.  Simon did some work on this a 
couple years ago, but he and I had difficultly convincing -hackers that a 
genuine problem existed.

You're correct that we'd need to change pg_statistic, though.  For one 
thing, we need to separate the sample size from the histogram size.

Also, we seem to be getting pretty far away from the original GUC 
discussion.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-09 Thread Simon Riggs


On Mon, 2008-06-09 at 14:07 -0400, Alvaro Herrera wrote:
 Simon Riggs wrote:
 
  If we break down the action into two parts.
  
  ALTER TABLE ... ADD CONSTRAINT foo FOREIGN KEY ... NOVALIDATE;
  which holds exclusive lock, but only momentarily
  After this runs any new data is validated at moment of data change, but
  the older data has yet to be validated.
  
  ALTER TABLE ... VALIDATE CONSTRAINT foo
  which runs lengthy check, though only grabs lock as last part of action
 
 The problem I see with this approach in general (two-phase FK creation)
 is that you have to keep the same transaction for the first and second
 command, but you really want concurrent backends to see the tuple for
 the not-yet-validated constraint row.

Well, they *must* be in separate transactions if we are to avoid holding
an AccessExclusiveLock while we perform the check. Plus the whole idea
is to perform the second part at some other non-critical time, though we
all agree that never performing the check at all is foolhardy.

Maybe we say that you can defer the check, but after a while autovacuum
runs it for you if you haven't done so. It would certainly be useful to
run the VALIDATE part as a background task with vacuum wait enabled.

 Another benefit that could arise from this is that the hypothetical
 VALIDATE CONSTRAINT step could validate more than one constraint at a
 time, possibly processing all the constraints with a single table scan.

Good thought, though not as useful for FK checks.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-09 Thread Alvaro Herrera

Simon Riggs wrote:

 Maybe we say that you can defer the check, but after a while autovacuum
 runs it for you if you haven't done so. It would certainly be useful to
 run the VALIDATE part as a background task with vacuum wait enabled.

It would be useful if there was anywhere to report the error to, or an
action that could be taken automatically.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: GiST constraints

2008-06-09 Thread Tom Lane

Jeff Davis [EMAIL PROTECTED] writes:
 On Mon, 2008-06-09 at 13:28 -0400, Tom Lane wrote:
 I would like to see something that replaces the current btree-only kluge
 for UNIQUE, if we're going to try to do something general.  IOW, don't
 think of this as GiST-specific.

 I'm not sure exactly what you have in mind when you say kludge.

Well, there are at least three things not to like about the btree UNIQUE
implementation:

1. It's btree-specific and can't be shared by other index AMs that might
wish to implement constraints.

2. It involves the index AM reaching into the heap, which is at the
least a serious failure of modularity.

3. There's no way to implement a deferred uniqueness check, nor even to
handle the within-statement conflict problem.

It looks to me like the same knocks can be laid on your proposal.

Now admittedly I don't have a solution that addresses these objections
(much less one that does it without losing any performance) but I'm
hesitant to see us building new features in this area without any idea
how we will fix these things --- especially #3, which is a SQL-spec
violation as well as a frequent user complaint.  I'd like to have at
least a design plan for fixing these things, so we know whether we are
painting ourselves (further) into a corner.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-09 Thread Bruce Momjian

Gurjeet Singh wrote:
 On Fri, May 30, 2008 at 10:40 AM, Tom Lane [EMAIL PROTECTED] wrote:
 
  But since you mention it: one of the plausible answers for fixing the
  vacuum problem for read-only slaves is to have the slaves push an xmin
  back upstream to the master to prevent premature vacuuming.  The current
  design of pg_standby is utterly incapable of handling that requirement.
  So there might be an implementation dependency there, depending on how
  we want to solve that problem.
 
 
 I think it would be best to not make the slave interfere with the master's
 operations; that's only going to increase the operational complexity of such
 a solution.
 
 There could be multiple slaves following a master, some serving

For the slave to not interfere with the master at all, we would need to
delay application of WAL files on each slave until visibility on that
slave allows the WAL to be applied, but in that case we would have
long-running transactions delay data visibility of all slave sessions.

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-09 Thread Bruce Momjian

Andreas 'ads' Scherbaum wrote:
 On Fri, 30 May 2008 16:22:41 -0400 (EDT) Greg Smith wrote:
 
  On Fri, 30 May 2008, Andreas 'ads' Scherbaum wrote:
  
   Then you ship 16 MB binary stuff every 30 second or every minute but
   you only have some kbyte real data in the logfile.
  
  Not if you use pg_clearxlogtail ( 
  http://www.2ndquadrant.com/replication.htm ), which got lost in the giant 
  March commitfest queue but should probably wander into contrib as part of 
  8.4.
 
 Yes, this topic was discussed several times in the past but to
 solve this it needs a patch/solution which is integrated into PG
 itself, not contrib.

Agreed.  I realize why we are not zeroing those bytes (for performance),
but can't we have the archiver zero those bytes before calling the
'archive_command'?

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-09 Thread Alvaro Herrera

Bruce Momjian wrote:

 Agreed.  I realize why we are not zeroing those bytes (for performance),
 but can't we have the archiver zero those bytes before calling the
 'archive_command'?

Perhaps make the zeroing user-settable.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-09 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 Gurjeet Singh wrote:
 There could be multiple slaves following a master, some serving

 For the slave to not interfere with the master at all, we would need to
 delay application of WAL files on each slave until visibility on that
 slave allows the WAL to be applied, but in that case we would have
 long-running transactions delay data visibility of all slave sessions.

Right, but you could segregate out long-running queries to one slave
server that could be further behind than the others.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-09 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 Agreed.  I realize why we are not zeroing those bytes (for performance),
 but can't we have the archiver zero those bytes before calling the
 'archive_command'?

The archiver doesn't know any more about where the end-of-data is than
the archive_command does.  Moreover, the archiver doesn't know whether
the archive_command cares.  I think the separate module is a fine
solution.

It should also be pointed out that the whole thing becomes uninteresting
if we get real-time log shipping implemented.  So I see absolutely no
point in spending time integrating pg_clearxlogtail now.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-09 Thread Greg Smith


On Mon, 9 Jun 2008, Tom Lane wrote:


It should also be pointed out that the whole thing becomes uninteresting
if we get real-time log shipping implemented.  So I see absolutely no
point in spending time integrating pg_clearxlogtail now.


There are remote replication scenarios over a WAN (mainly aimed at 
disaster recovery) that want to keep a fairly updated database without 
putting too much traffic over the link.  People in that category really 
want zeroed tail+compressed archives, but probably not the extra overhead 
that comes with shipping smaller packets in a real-time implementation.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] a question about exec_simple_query()

2008-06-09 Thread 汪琦

Hi, everyone:

In functin exec_simple_query(), why we run a simple query 
wrapped in a portal.

For instance:

version 8.3.0 ,postgres.c, Line 908

/*
 * Create unnamed portal to run the query or queries in. If 
there
 * already is one, silently drop it.
 */
portal = CreatePortal(, true, true);

..


In other words, what's the benifit we use a portal to run a 
simple query?


Thanks for your help!  :)




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-09 Thread Koichi Suzuki

Just for information.

In terms of archive compression, I have archive log compression which
will be found in http://pgfoundry.org/projects/pglesslog/

This feature is also included in NTT's synchronized log shipping
replication presented in the last PGCon.

2008/6/10 Greg Smith [EMAIL PROTECTED]:
 On Mon, 9 Jun 2008, Tom Lane wrote:

 It should also be pointed out that the whole thing becomes uninteresting
 if we get real-time log shipping implemented.  So I see absolutely no
 point in spending time integrating pg_clearxlogtail now.

 There are remote replication scenarios over a WAN (mainly aimed at disaster
 recovery) that want to keep a fairly updated database without putting too
 much traffic over the link.  People in that category really want zeroed
 tail+compressed archives, but probably not the extra overhead that comes
 with shipping smaller packets in a real-time implementation.

 --
 * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
--
Koichi Suzuki

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: GiST constraints

2008-06-09 Thread Jeff Davis

On Mon, 2008-06-09 at 21:00 -0400, Tom Lane wrote:
 1. It's btree-specific and can't be shared by other index AMs that might
 wish to implement constraints.
 

This can be solved by my proposal, but I just don't know how it would
apply to something like GIN, for instance. It could replace the unique
constraint for BTree, but I'm not sure it would perform as well. It's
not that my proposal is GiST-specific, it's just that is the only use
case I can think of that is an improvement.

 2. It involves the index AM reaching into the heap, which is at the
 least a serious failure of modularity.

We need to reach into the heap for visibility information, if we're to
implement any constraints at all. Also, we have to test against values
being inserted by other concurrent transactions, and those values can
be variable in size. What other mechanism do we have to share those
variable-sized values among several backends?

 3. There's no way to implement a deferred uniqueness check, nor even to
 handle the within-statement conflict problem.

This is the big one.

 Now admittedly I don't have a solution that addresses these objections
 (much less one that does it without losing any performance) but I'm
 hesitant to see us building new features in this area without any idea
 how we will fix these things --- especially #3, which is a SQL-spec
 violation as well as a frequent user complaint.  I'd like to have at
 least a design plan for fixing these things, so we know whether we are
 painting ourselves (further) into a corner.

I'll see if I can come up with something. I agree that's an important
problem to solve. Does anyone know how other DBMSs do this? I found this
thread from the TODO:

http://archives.postgresql.org/pgsql-hackers/2006-09/msg01458.php

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] a question about exec_simple_query()

2008-06-09 Thread Tom Lane

=?GB2312?Q?=CD=F4=E7=F9?= [EMAIL PROTECTED] writes:
   In other words, what's the benifit we use a portal to run a 
 simple query?

er, because it doesn't work otherwise?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

62 matches

Mail list logo