Re: [HACKERS] Prefered Types

2011-05-04 Thread Зотов Роман

04.05.2011 0:00, Alvaro Herrera пишет:

Excerpts from Зотов Роман's message of mar may 03 16:31:31 -0300 2011:


but here we can see problem like
F(smallint)
F(integer)
but call like F(float)
i wouldn`t like to fail it.

I think this particular example would be a mistake, because that cast
would be lossy, so you want to invoke it only when the user explicitely
selects it.  The other way around would be fine, I think, that is,
F(float)
F(float8)
and the call is F(int)
As i think i not must write function with Float, String, and many other 
arg when my function have INT arg... and if caller wouln`t think about 
types he cant use your strong types
why it not must work like as assignment??? why implicit and assignment 
is different???

I know only implicit and explicit casts and i think imlicit=asssign




PS This patch needet, because in any case we must calc prefer more
smartly, yes this patch is 1/10 of full solution, but it`s first step!!!

Well, if the other 9/10 were clear, there would be no discussion.  The
problem is that the missing bits have not been designed and thus we
don't know if this 1/10 will be useful to them.  We need to find a
complete design before committing to any initial portion which may turn
out to be bogus down the road.

Yes, but while you think what update table1 set IntField = FloatField is 
valid

but Select FuncWithIntArg(FloatArg) is not valid
you have no problems in current solution, because it works same :)

--
С уважением,
Зотов Роман Владимирович
руководитель Отдела разработки
ЗАО НПО Консультант
г.Иваново, ул. Палехская, д. 10
тел./факс: (4932) 41-01-21
mailto: zo...@oe-it.ru


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extreme bloating of intarray GiST indexes

2011-05-04 Thread Alexander Korotkov
I have another hypothesis about index bloat cause. AFAIK, vaccum procedure
on GiST don't have any storage utilization guarantee. For example, if only
one live item is in some page, then only one item will be left in this page.
I.e. there is no index reroganization during vacuum. If there wouldn't be
many inserts into such pages in future then they will be stay bloat.


With best regards,
Alexander Korotkov.


[HACKERS] Pull up aggregate subquery

2011-05-04 Thread Hitoshi Harada
I sometimes wonder if we could pull up aggregate query in the optimizer.

My typical problem is:

Consider two relations, medium size M and large size L. L has
reference column to M's primary key. Say,

create table size_m as select i as id, repeat(i::text, i % 100) as
val from generate_series(1, 2) i;
create table size_l as select i as id, m.id as m_id,
repeat(i::text, i % 100) as val from generate_series(1, 10) i,
size_m m where i.i / 10 = m.id;

Now, you want to query M under some condition with join aggregate L
group by M's primary key.

select m_id, sum_len from size_m m inner join(select m_id,
sum(length(val)) as sum_len from size_l group by m_id)l on m.id =
l.m_id where val = '1';

The generated plan is:

   QUERY PLAN
-
 Nested Loop  (cost=36116.92..38339.67 rows=50 width=235) (actual
time=440.679..1039.964 rows=1 loops=1)
   Join Filter: (m.id = size_l.m_id)
   -  Seq Scan on size_m m  (cost=0.00..897.00 rows=1 width=227)
(actual time=0.021..16.698 rows=1 loops=1)
 Filter: (val = '1'::text)
   -  GroupAggregate  (cost=36116.92..37217.09 rows=10026 width=248)
(actual time=440.651..1013.704 rows=1 loops=1)
 -  Sort  (cost=36116.92..36366.90 rows=1 width=248)
(actual time=440.619..593.062 rows=1 loops=1)
   Sort Key: size_l.m_id
   Sort Method: external sort  Disk: 25736kB
   -  Seq Scan on size_l  (cost=0.00..4565.91 rows=1
width=248) (actual time=0.011..138.243 rows=1 loops=1)
 Total runtime: 1044.039 ms
(10 rows)

Note that most of the result of aggregate is discarded on join M,
because M resulted in small output with filter by M.val. If we can
filter M first and filter L by the M's result before running
aggregate, the response may dramatically get faster.

If you filter by M.id instead of M.val the optimizer is smart enough
to push down the condition to L, which is filtered before aggregate.

select m_id, sum_len from size_m m inner join(select m_id,
sum(length(val)) as sum_len from size_l group by m_id)l on m.id =
l.m_id where id = 1;

 QUERY PLAN

 Nested Loop  (cost=0.00..5713.02 rows=2 width=235) (actual
time=72.121..82.364 rows=1 loops=1)
   -  Seq Scan on size_m m  (cost=0.00..897.00 rows=1 width=227)
(actual time=0.028..10.252 rows=1 loops=1)
 Filter: (id = 1)
   -  GroupAggregate  (cost=0.00..4815.98 rows=2 width=248) (actual
time=72.065..72.067 rows=1 loops=1)
 -  Seq Scan on size_l  (cost=0.00..4815.89 rows=10
width=248) (actual time=0.051..71.968 rows=10 loops=1)
   Filter: (m_id = 1)
 Total runtime: 82.525 ms
(7 rows)

This seems like the benefit of EquivalentClass. 1 = M.id = L.m_id is
implied and the optimizer adds implicit constant filter L.m_id = 1 in
the plan tree. In contrast, in the former case of M.val = '1' doesn't
imply any condition for L.m_id. That's fair enough.

However, I think we can filter L by L.m_id before aggregate because
L.m_id is of the group clause as well as the join condition and M.id
is unique in M. In such cases, the query can be transform something
like:
GroupAggregate
  - NestLoop (L.m_id = M.id)
- SeqScan L
- SeqScan M (filter: M.val = '1')
This transformation can be done by pulling up aggregate Query in
pull_up_subqueries(). Currently the optimizer doesn't pull up any
queries which contains aggregate, but as shown above in some cases we
can do it. Attached is WIP proof of concept patch to do it. I know it
breaks general queries but it transforms as I described above. I
suppose the missing piece is adding condition of when to pull up
aggregate. how is done in the patch.

db1=# explain select m_id, sum_len from size_m m inner join(select
m_id, sum(length(val)) as sum_len from size_l group by m_id)l on m.id
= l.m_id where val = '1';
   QUERY PLAN

 GroupAggregate  (cost=6712.96..6713.16 rows=10 width=471) (actual
time=125.496..125.499 rows=1 loops=1)
   -  Sort  (cost=6712.96..6712.99 rows=10 width=471) (actual
time=125.228..125.288 rows=10 loops=1)
 Sort Key: size_l.m_id
 Sort Method: quicksort  Memory: 25kB
 -  Nested Loop  (cost=0.00..6712.80 rows=10 width=471)
(actual time=0.142..125.089 rows=10 loops=1)
   Join Filter: (m.id = size_l.m_id)
   -  Seq Scan on size_m m  (cost=0.00..897.00 rows=1
width=227) (actual time=0.037..8.956 rows=1 loops=1)
 Filter: (val = '1'::text)
   -  Seq Scan on size_l  

Re: [HACKERS] Predicate locking

2011-05-04 Thread Kevin Grittner
 Greg Smith  wrote:
 
 My point was just that no one has really done that here yet:
 provided an example showing SSI serialization working as a
 substitute for predicate locking in this sort of use case. I trust
 that the theory is sound here, and yet I'd still like to see that
 demonstrated.
 
Fair enough.  You can find examples where there are no false
positives or false negatives in the src/test/isolation directory in
any checkout of the source code.  Dan will be presenting the results
of a set of DBT-2 runs at PGCon, which might help.  I've been
gradually building up a set of examples at:
 
http://wiki.postgresql.org/wiki/SSI
 
That set is incomplete so far due to a scarcity of round tuits, but
if you want to suggest any particular tests, or add any (it's a
Wiki), I welcome the input.
 
With all that going on, and having mentioned that Wiki page on this
thread, I didn't think posting examples to this list was useful, but
could be persuaded otherwise.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] full_page_writes makes no difference?

2011-05-04 Thread Markus Wanner
Hi,

On 05/04/2011 03:46 AM, Tian Luo wrote:
 No matter I turn on or turn off the full_page_writes, I always
 observe 8192-byte writes of log data for simple write operations
 (write/update).

How did you measure that?  A single transaction doing a single write, I
guess.  Ever tried multiple transactions with a simple write operation
each and checking how much WAL that spits out per transaction?

As I understand it, dirty blocks are written to disk as soon as
feasible.  After all, that helps crash recovery.  With a basically idle
system, as soon as feasible might be pretty soon.  However, put your
(disk sub-) system under load and as soon as feasible might take awhile.

 But according to the document, when this is off, it could speed up
 operations but may cause problems during recovery. So, I guess this is
 because it writes less when the option is turned off. However, this
 contradicts my observations 

I think you didn't trigger the savings.  It's about writing full pages
on the first write to a block after a checkpoint.  Did you monitor
checkpoint times of Postgres in your tests?

 If I am not missing anything, I find that the writes of log data go
 through function XLogWrite in source file
 backend/access/transam/xlog.c.
 
 In this file, log data are written with the following code:
 
 from = XLogCtl-pages + startidx * (Size) XLOG_BLCKSZ;
 nbytes = npages * (Size) XLOG_BLCKSZ;
 if (write(openLogFile, from, nbytes) != nbytes)
 {
  ...
 }
 
 So, nbytes should always be multiples of XLOG_BLCKSZ, which in the
 default case, is 8192.

That observation seems correct.

Regards

Markus Wanner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] full_page_writes makes no difference?

2011-05-04 Thread Pavan Deolasee
On Wed, May 4, 2011 at 7:16 AM, Tian Luo jackro...@gmail.com wrote:

 Hi guys,

 No matter I turn on or turn off the full_page_writes, I always
 observe 8192-byte writes of log data for simple write operations
 (write/update).


Not sure how you measured it, but ISTM that the correct GUC to play with is
fsync. If thats turned off, the WAL buffers won't be fsynced to the disk
at every commit. But that would mean reduced reliability in case of database
crash.



 But according to the document, when this is off, it could speed up
 operations but may cause problems during recovery. So, I guess this is
 because it writes less when the option is turned off. However, this
 contradicts my observations 


When full_page_writes is turned off, the full page won't be backed up in the
WAL record after the first modification after a checkpoint. So yes, it can
reduce the amount of WAL written to the disk.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] full_page_writes makes no difference?

2011-05-04 Thread Pavan Deolasee
On Wed, May 4, 2011 at 5:46 PM, Pavan Deolasee pavan.deola...@gmail.comwrote:



 On Wed, May 4, 2011 at 7:16 AM, Tian Luo jackro...@gmail.com wrote:

 Hi guys,

 No matter I turn on or turn off the full_page_writes, I always
 observe 8192-byte writes of log data for simple write operations
 (write/update).


 Not sure how you measured it, but ISTM that the correct GUC to play with is
 fsync. If thats turned off, the WAL buffers won't be fsynced to the disk
 at every commit. But that would mean reduced reliability in case of database
 crash.



And I should have added that post 8.3, we also have a user-settable
parameter called synchronous_commit. Normally, database must write WAL up to
the commit record to the stable storage when a transaction commits to ensure
that there is no data loss in case of database crash. But if
synchronous_commit is turned off, the database might delay writing the WAL
buffers to the disk, thus reducing write activity, but at a increased risk
of data loss.

http://www.postgresql.org/docs/8.3/static/runtime-config-wal.html#GUC-SYNCHRONOUS-COMMIT

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] branching for 9.2devel

2011-05-04 Thread Jim Nasby
On May 3, 2011, at 11:10 PM, Andrew Dunstan wrote:
 On 05/03/2011 09:53 PM, David Blewett wrote:
 On Tue, May 3, 2011 at 9:51 PM, David Blewettda...@dawninglight.net  wrote:
 This seems like a pretty good idea, but maybe it'd be easiest to take
 it a step further and add an automatic pgindent-ified patch is
 created when a patch is added to the commitfest app?
 That should read: ... but maybe it'd be easiest to take it a step
 further and have an additional, automatically created patch file that
 is run through pgindent when a patch is added to the commitfest app.
 
 
 You can't indent patches, only patched files. And that's the problem with 
 this happy scheme. For it to work at all sanely we'd need to keep the 
 committed code that the patch is to be applied against strictly pgindent 
 clean, presumably via some automated process such as a commit hook. That's 
 been suggested in the past, but hasn't met with universal approval, IIRC.

What if this hypothetical tool pulled the latest source, made a copy of that 
source, applied the patch to the copy, pg_indented the original AND the copy, 
and then diff'd? I think that would give you a properly indented patch. The 
context lines in the patch would have the wrong indentation, but I think patch 
is pretty smart about dealing with that (or at least can be told to ignore 
whitespace differences).
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Prefered Types

2011-05-04 Thread Jim Nasby
On May 4, 2011, at 12:00 AM, Зотов Роман wrote:
 F(smallint)
 F(integer)
 but call like F(float)
 i wouldn`t like to fail it.
 I think this particular example would be a mistake, because that cast
 would be lossy, so you want to invoke it only when the user explicitely
 selects it.  The other way around would be fine, I think, that is,
 F(float)
 F(float8)
 and the call is F(int)
 As i think i not must write function with Float, String, and many other arg 
 when my function have INT arg... and if caller wouln`t think about types he 
 cant use your strong types
 why it not must work like as assignment??? why implicit and assignment is 
 different???
 I know only implicit and explicit casts and i think imlicit=asssign

I was ready to educate Зотов on why that was a bad idea until I read...

 Yes, but while you think what update table1 set IntField = FloatField is valid
 but Select FuncWithIntArg(FloatArg) is not valid
 you have no problems in current solution, because it works same :)

Sure enough, this works: UPDATE ... SET int_field = ( SELECT float_field FROM 
... );

Which begs the question... why do we allow on assignment casting of a float to 
an int? I would think that should be explicit only...

If we are going to allow assignment casting of float to int, then I'm 
hard-pressed to see why we wouldn't allow you to call an int function with a 
float value, assuming there wasn't a more suitable cast available.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Prefered Types

2011-05-04 Thread Tom Lane
Jim Nasby j...@nasby.net writes:
 Which begs the question... why do we allow on assignment casting of a float 
 to an int?

Because the SQL standard requires it.

In any case, the user's intent in such a case is perfectly clear.
The reasons for not allowing assignment casts to happen in expression
contexts are (1) it would often provide us with too many ways to resolve
an operator or function call, leading to ambiguous operator failures;
(2) it would be surprising for the parser to silently choose a cast that
causes information loss.  Neither of these arguments applies to assign
this value to an integer column.

It's true that if you have exactly one function named f, and it takes an
int, then f(float) could be considered clear and unambiguous.  But
designing this behavior around only the easy cases is not going to work.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] branching for 9.2devel

2011-05-04 Thread Robert Haas
On Wed, May 4, 2011 at 12:19 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Mind you, I've read more than enough horribly-formatted patches to wish
 that we could do something about this.  But I doubt that a mechanical
 reformatting pass before reviewing will be a net plus.

It wouldn't hurt to have the option.

It would also be nice if we could come to some conclusions on how to
handle $SUBJECT.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] patch for new feature: Buffer Cache Hibernation

2011-05-04 Thread Mitsuru IWASAKI
Hi,

I am working on new feature `Buffer Cache Hibernation' which enables
postgres to keep higher cache hit ratio even just started.

Postgres usually starts with ZERO buffer cache.  By saving the buffer
cache data structure into hibernation files just before shutdown, and
loading them at startup, postgres can start operations with the saved
buffer cache as the same condition as just before the last shutdown.

Here is the patch for 9.0.3 (also tested on 8.4.7)
http://people.freebsd.org/~iwasaki/postgres/buffer-cache-hibernation-postgresql-9.0.3.patch

The patch includes the following.
- At shutdown, buffer cache data structure (such as BufferDescriptors,
  BufferBlocks and StrategyControl) is saved into hibernation files.
- At startup, buffer cache data structure is loaded from hibernation
  files and buffer lookup hashtable is setup based on buffer descriptors.
- Above functions are enabled by specifying `enable_buffer_cache_hibernation=on'
  in postgresql.conf.

Any comments are welcome and I would very much appreciate merging the
patch in source tree.

Have fun and thanks!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DLL export with mingw-w64: currently a no-op

2011-05-04 Thread Andrew Dunstan



On 05/04/2011 01:25 AM, Johann 'Myrkraverk' Oskarsson wrote:
On Tue, 03 May 2011 12:40:28 -, Andrew Dunstan 
and...@dunslane.net wrote:


Our Makefiles use dlltool and dllwrap to create DLLs. If you used our 
recommended build method pgxs would do lots of the work for you.


I'm not sure why you asked the mingw-w64 team about building a 
Postgres extension - they are unlikely to know anything much about 
our build system.


However, they do know about the mingw build tools.  In particular, from:
  http://oldwiki.mingw.org/index.php/dllwrap

dllwrap is a tool to build DLLs. It seems to be deprecated in favour of
gcc -shared option, but some projects still use it. SQLite, for example.

Armed with this information, it may be prudent review the DLL build
process in PGXS.

For the record: I originally asked the mingw-w64 team for help to link
against the 64bit JVM.DLL for which there was no corresponding .def file.


Google cache:
  
http://webcache.googleusercontent.com/search?q=cache:tZkweT_cNQQJ:oldwiki.mingw.org/index.php/dllwrap+mingw+dllwrapcd=1hl=isct=clnkgl=isclient=operasource=www.google.is



OK, but if we want to get rid of using dllwrap we'd need a complete 
patch for it. Just changing the definition of the macro without changing 
the rest isn't going to cut it, I think. But it might well be worth doing.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extreme bloating of intarray GiST indexes

2011-05-04 Thread Tom Lane
Alexander Korotkov aekorot...@gmail.com writes:
 I have another hypothesis about index bloat cause. AFAIK, vaccum procedure
 on GiST don't have any storage utilization guarantee. For example, if only
 one live item is in some page, then only one item will be left in this page.
 I.e. there is no index reroganization during vacuum. If there wouldn't be
 many inserts into such pages in future then they will be stay bloat.

Possibly, but the same is true of btree indexes, and we very seldom see
cases where that's a serious issue.  In any case, this is all just
speculation without evidence --- we need to see actual data to figure
out what's going on.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for new feature: Buffer Cache Hibernation

2011-05-04 Thread Andrew Dunstan



On 05/04/2011 10:10 AM, Mitsuru IWASAKI wrote:

Hi,

I am working on new feature `Buffer Cache Hibernation' which enables
postgres to keep higher cache hit ratio even just started.

Postgres usually starts with ZERO buffer cache.  By saving the buffer
cache data structure into hibernation files just before shutdown, and
loading them at startup, postgres can start operations with the saved
buffer cache as the same condition as just before the last shutdown.

Here is the patch for 9.0.3 (also tested on 8.4.7)
http://people.freebsd.org/~iwasaki/postgres/buffer-cache-hibernation-postgresql-9.0.3.patch

The patch includes the following.
- At shutdown, buffer cache data structure (such as BufferDescriptors,
   BufferBlocks and StrategyControl) is saved into hibernation files.
- At startup, buffer cache data structure is loaded from hibernation
   files and buffer lookup hashtable is setup based on buffer descriptors.
- Above functions are enabled by specifying `enable_buffer_cache_hibernation=on'
   in postgresql.conf.

Any comments are welcome and I would very much appreciate merging the
patch in source tree.




That sounds cool.

Please a) make sure your patch is up to data against the latest source 
in git and b) submit it to the next commitfest at 
https://commitfest.postgresql.org/action/commitfest_view?id=10


We don't backport features, and 9.1 is closed for features now, so the 
earliest release this could be used in is 9.2.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for new feature: Buffer Cache Hibernation

2011-05-04 Thread Greg Stark
On Wed, May 4, 2011 at 3:10 PM, Mitsuru IWASAKI iwas...@jp.freebsd.org wrote:
 Postgres usually starts with ZERO buffer cache.  By saving the buffer
 cache data structure into hibernation files just before shutdown, and
 loading them at startup, postgres can start operations with the saved
 buffer cache as the same condition as just before the last shutdown.

Offhand this seems pretty handy for benchmarks where it would help get
reproducible results.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for new feature: Buffer Cache Hibernation

2011-05-04 Thread Tom Lane
Mitsuru IWASAKI iwas...@jp.freebsd.org writes:
 Postgres usually starts with ZERO buffer cache.  By saving the buffer
 cache data structure into hibernation files just before shutdown, and
 loading them at startup, postgres can start operations with the saved
 buffer cache as the same condition as just before the last shutdown.

This seems like a lot of complication for rather dubious gain.  What
happens when the DBA changes the shared_buffers setting, for instance?
How do you protect against the cached buffers getting out-of-sync with
the actual disk files (especially during recovery scenarios)?  What
about crash-induced corruption in the cache file itself (consider the
not-unlikely possibility that init will kill the database before it's
had time to dump all the buffers during a system shutdown)?  Do you have
any proof that writing out a few GB of buffers and then reading them
back in is actually much cheaper than letting the database re-read the
data from the disk files?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for new feature: Buffer Cache Hibernation

2011-05-04 Thread Alvaro Herrera
Excerpts from Tom Lane's message of mié may 04 12:44:36 -0300 2011:

 This seems like a lot of complication for rather dubious gain.  What
 happens when the DBA changes the shared_buffers setting, for instance?
 How do you protect against the cached buffers getting out-of-sync with
 the actual disk files (especially during recovery scenarios)?  What
 about crash-induced corruption in the cache file itself (consider the
 not-unlikely possibility that init will kill the database before it's
 had time to dump all the buffers during a system shutdown)?  Do you have
 any proof that writing out a few GB of buffers and then reading them
 back in is actually much cheaper than letting the database re-read the
 data from the disk files?

I thought the idea wasn't to copy the entire buffer but only a
descriptor, so that the buffer would be loaded from the original page.

If shared_buffers changes, there's no problem.  If the new setting is
smaller, then the last paages would just not be copied, and would have
to be read from disk the first time they are accessed.  If the new
setting is larger, then the last few buffers would remain unused until
requested.

As for gain, I have heard of test setups requiring hours of runtime in
order to prime the buffer cache.

Crash safety would have to be researched, sure.  Maybe only do it in
clean shutdown.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] full_page_writes makes no difference?

2011-05-04 Thread Jeff Janes
On Tue, May 3, 2011 at 6:46 PM, Tian Luo jackro...@gmail.com wrote:
 Hi guys,

 No matter I turn on or turn off the full_page_writes, I always
 observe 8192-byte writes of log data for simple write operations
 (write/update).

 But according to the document, when this is off, it could speed up
 operations but may cause problems during recovery. So, I guess this is
 because it writes less when the option is turned off. However, this
 contradicts my observations 

 If I am not missing anything, I find that the writes of log data go
 through function XLogWrite in source file
 backend/access/transam/xlog.c.

 In this file, log data are written with the following code:

 from = XLogCtl-pages + startidx * (Size) XLOG_BLCKSZ;
 nbytes = npages * (Size) XLOG_BLCKSZ;
 if (write(openLogFile, from, nbytes) != nbytes)
 {
  ...
 }

 So, nbytes should always be multiples of XLOG_BLCKSZ, which in the
 default case, is 8192.

 My question is, if it always writes full pages no matter
 full_page_writes is on or off, what is the difference?

The full pages refers to the shared_buffers pages, not the xlog pages.

The thing it answers is, does the full shared_buffer page get injected
into the xlog, or just a diff of it?

If you look at the offset of the xlog write, you would see that it is
writing 8192 bytes to the same offset over and over again.

In my hands using pgbench -T 300 -c 1, I get about 16 transactions
each with a 8192 xlog write to the same offset before moving to the
next xlog block.

But immediately after a checkpoint, I get only 1 or 2 writes to the
same offset before moving to the next one, due to full page writes
taking up so much more room in the xlog stream.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for new feature: Buffer Cache Hibernation

2011-05-04 Thread Dickson S. Guedes
2011/5/4 Greg Stark gsst...@mit.edu:
 On Wed, May 4, 2011 at 3:10 PM, Mitsuru IWASAKI iwas...@jp.freebsd.org 
 wrote:
 Postgres usually starts with ZERO buffer cache.  By saving the buffer
 cache data structure into hibernation files just before shutdown, and
 loading them at startup, postgres can start operations with the saved
 buffer cache as the same condition as just before the last shutdown.

 Offhand this seems pretty handy for benchmarks where it would help get
 reproducible results.

It could have an option to force it or not at start of postgres. This
could helps on benchmarks scenarios.

-- 
Dickson S. Guedes
mail/xmpp: gue...@guedesoft.net - skype: guediz
http://guedesoft.net - http://www.postgresql.org.br

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pull up aggregate subquery

2011-05-04 Thread Tom Lane
Hitoshi Harada umi.tan...@gmail.com writes:
 I sometimes wonder if we could pull up aggregate query in the optimizer.

I don't have time to look at this right now, but please add to the
upcoming commitfest.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extension Packaging

2011-05-04 Thread David E. Wheeler
Hey folks,

I'd kind of like to get this issue nailed down soon so I can update the PGXN 
HOWTO and illustrate a generally agreed-upon best practice for extension 
developers. How *do* we want people to use versions in their extension?

Thanks,

David

On Apr 28, 2011, at 2:16 PM, David E. Wheeler wrote:

 On Apr 28, 2011, at 7:04 AM, Tom Lane wrote:
 
 I think what we're discussing here is bug-fix revisions that don't
 affect the SQL declarations for the extension.  Presumably, that means a
 change in the C code, so the shared library is the right place to keep
 the revision number.  A version number in the control file seems to
 carry a nontrivial risk of being out of sync with the actual code in the
 shared library.
 
 But that's exactly where it is stored right now.
 
 What's not clear to me is whether to just suggest that extension authors
 who care about this should provide a foo_version() function, or to try
 to standardize it a bit more than that.
 
 Please, if those are the choices, go with the latter. If you leave it to 
 extension authors, they'll all have different names and different return 
 types, and will thus be worthless to most folks wanting a generalized way to 
 see what versions of extensions they have installed. Hell, I already regret 
 that pgtap_version() returns NUMERIC. Which reminds me, I might change it in 
 a future version. Then it's *really* inconsistent, isn't it?
 
 One point worth thinking about is that not all extensions will have
 a shared library at all --- SQL-only extensions have been mentioned
 several times as an important use case.  For those, there's no such
 thing as an update that doesn't change the script file, and we shouldn't
 try to impose a requirement of providing a lower-level revision number.
 
 No, but there are new releases without code changes. I've been making 
 releases that tweak documentation and the Makefile (for 9.1 support) but not 
 the code. Should the extension in this case get a new version or not?
 
 Look, I read this thread this morning carefully, but I have to say I don't 
 really understand it. Considering that there was consensus on not requiring 
 any format, meaning, or mandated sort ordering of versions, there's suddenly 
 quite a lot of discussion of the meaning and format, if not sort ordering.
 
 So maybe it's half-assed. Maybe the version can be anything but the revision 
 must be an integer. Maybe there's a `pg_extension_version($extension_name)` 
 function that returns ARRAY[$version, $revision], and the revision is set in 
 the control file but not included in the version or in the upgrade file 
 names. I think I can live with that. But, hell, you're halfway to mandating 
 the meaning by doing this. Will we have to go the rest of the way in the 
 future?
 
 Best,
 
 David
 
 
 
 
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for new feature: Buffer Cache Hibernation

2011-05-04 Thread Greg Stark
On Wed, May 4, 2011 at 4:44 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Do you have
 any proof that writing out a few GB of buffers and then reading them
 back in is actually much cheaper than letting the database re-read the
 data from the disk files?

I believe he's just writing out the meta data. Ie, which blocks to
re-reread from the disk files.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for new feature: Buffer Cache Hibernation

2011-05-04 Thread Greg Smith

Alvaro Herrera wrote:

As for gain, I have heard of test setups requiring hours of runtime in
order to prime the buffer cache.
  


And production ones too.  I have multiple customers where a server 
restart is almost a planned multi-hour downtime.  The system may be back 
up, but for a couple of hours performance is so terrible it's barely 
usable.  You can watch the MB/s ramp up as the more random data fills in 
over time; getting that taken care of in a larger block more amenable to 
elevator sorting would be a huge help.


I never bothered with this particular idea though because shared_buffers 
is only a portion of the important data.  Cedric's pgfincore code digs 
into the OS cache, too, which can then save enough to be really useful 
here.  And that's already got a snapshot/restore feature.  The slides at 
http://www.pgcon.org/2010/schedule/events/261.en.html have a useful into 
to that, pages 30 through 34 are the neat ones.  That provides some 
other neat APIs for preloading popular data into cache too.  I'd rather 
work on getting something like that into core, rather than adding 
something that only is targeting just shared_buffers.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for new feature: Buffer Cache Hibernation

2011-05-04 Thread Jeff Janes
On Wed, May 4, 2011 at 7:10 AM, Mitsuru IWASAKI iwas...@jp.freebsd.org wrote:
 Hi,

 I am working on new feature `Buffer Cache Hibernation' which enables
 postgres to keep higher cache hit ratio even just started.

 Postgres usually starts with ZERO buffer cache.  By saving the buffer
 cache data structure into hibernation files just before shutdown, and
 loading them at startup, postgres can start operations with the saved
 buffer cache as the same condition as just before the last shutdown.

 Here is the patch for 9.0.3 (also tested on 8.4.7)
 http://people.freebsd.org/~iwasaki/postgres/buffer-cache-hibernation-postgresql-9.0.3.patch

 The patch includes the following.
 - At shutdown, buffer cache data structure (such as BufferDescriptors,
  BufferBlocks and StrategyControl) is saved into hibernation files.
 - At startup, buffer cache data structure is loaded from hibernation
  files and buffer lookup hashtable is setup based on buffer descriptors.
 - Above functions are enabled by specifying 
 `enable_buffer_cache_hibernation=on'
  in postgresql.conf.

 Any comments are welcome and I would very much appreciate merging the
 patch in source tree.

 Have fun and thanks!

It applies and builds against head with offsets and some fuzz.  It
fails make check, but apparently only because
src/test/regress/expected/rangefuncs.out needs to be updated to
include the new setting.  (Although all the other enable% settings
are for the planner, so making a new setting with that prefix that
does something else might be undesirable)

I think that PgFincore (http://pgfoundry.org/projects/pgfincore/)
provides similar functionality.  Are you familiar with that?  If so,
could you contrast your approach with that one?

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] VARIANT / ANYTYPE datatype

2011-05-04 Thread Alvaro Herrera
Hello,

A customer came to us with this request: a way to store any data in a
column.  We've gone back and forth trying to determine reasonable
implementation restrictions, safety and useful semantics for them.
I note that this has been requested in the past:
http://archives.postgresql.org/pgsql-general/2004-02/msg01266.php
and both Oracle and MS-SQL have it and apparently people find them
useful.  I didn't find any indication that SQL contains anything
resembling this.

The main idea is to be able to store column values in an audit table
like this:

change_time timestamptz
table_name  name
column_name name
old_value   variant
new_value   variant
So per-column changes, which is much more convenient than the regular
idea of storing the whole NEW and/or OLD record(s).

Currently, they use text for old_value and new_value, but this is, of
course, not very satisfactory.

My thought was that a variant type would store the datum as
typiddata so that it would be possible to identify the datatype
stored in each column/row and interpret adequately, calling the
appropriate output function etc.  On input it would be limited to come
only from inside the system, not from the outside world, as that would
have obvious security implications; so it'd be similar to pg_node_tree
in that regard.

Now this has obvious limitations: first, any query that tries to extract
data would need to include a cast of the variant value to the
appropriate type, so that the type can be resolved early.  Thus,
trying to extract rows of different types would be forbidden.

Also, there would be a security problem with a datum storing something
whose datatype later changes (consider a user-defined record type or
things like that).  My first reaction was to do something like 
CREATE TYPE foo VARIANT OF (int, text, timestamptz);
and then you could declare old_value with type foo, which would only
allow values of the declared types.  This makes it easy to catalogue
used types in any variant, and thus easy to restrict modifying or
dropping types that are used in some variant.  However, this idea was
rejected by the customer due to the unusability: you would have to
remember to edit the variant to add the new type anytime you added a new
column to a table, which would be cumbersome.

What the customer suggested was to have a new fork, which stores
type OIDs of datatypes used in the variant.  Then when a type is to be
altered, all forks would be scanned to determine if the type is used,
and raise an error if so.  I rejected that idea as unworkable.

However, as a middle ground we agreed that we could allow a declared
variant to store any pinned type without restrictions; those can't be
changed or dropped so there's no safety concern.  Storing other types
(user-defined types, records and so on) would require some safety net.

Before spending too much time exploring a detailed design, I'd like to
hear some opinions on the basic idea.

-- 
Álvaro Herrera alvhe...@alvh.no-ip.org

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for new feature: Buffer Cache Hibernation

2011-05-04 Thread Josh Berkus
All,

I thought that Dimitri had already implemented this using Fincore.  It's
linux-only, but that should work well enough to test the general concept.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-04 Thread Kevin Grittner
Alvaro Herrera alvhe...@alvh.no-ip.org wrote:
 
 A customer came to us with this request: a way to store any data
 in a column.
 
+1
 
More than once (usually in replication, interface, or audit
situations) I've had to create a table with one column each of a
number of different data types, only one of which will be used in
each row.  It would make life much simpler for programming such
things if an unknown type could be stored.  I could either cast
the types in the rows based on the related identifier column, but it
would be even nicer if they could be read unknown and pg_typeof or
something similar could be used to control flow based on the type.
 
As one example, only *last night* I created the following table as
part of the development of our shop's next generation of software. 
(Yes I know about Slony, Londiste, and others -- we need to roll
something out which integrates with existing systems, without
production disruption, over the next few years.  This does require a
custom solution.)
 
  Table public.DbTranOpValue
 Column |   Type| Modifiers
+---+---
 countyNo   | CountyNoT   | not null
 backendPid | integer   | not null
 tranStart  | TimestampT  | not null
 logRecordSeqNo | LogRecordSeqNoT | not null
 columnName | ColumnNameT | not null
 isAfter| BooleanT| not null
 textValue  | text  |
 intValue   | bigint|
 numericValue   | numeric   |
 binaryValue| bytea |
 booleanValue   | BooleanT|
 dateValue  | DateT   |
 timeValue  | TimeT   |
 timestampValue | TimestampT  |
Indexes:
DbTranOpValue_pkey PRIMARY KEY, btree
  (backendPid, tranStart, logRecordSeqNo, columnName)
 
I would much rather have had a value column of unknown type.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] branching for 9.2devel

2011-05-04 Thread Josh Berkus

 You can't indent patches, only patched files. And that's the problem 
 with this happy scheme. For it to work at all sanely we'd need to keep 
 the committed code that the patch is to be applied against strictly 
 pgindent clean, presumably via some automated process such as a commit 
 hook. That's been suggested in the past, but hasn't met with universal 
 approval, IIRC.

Well, there is another solution to this, which is to use Git branches
and forks instead of mailing around patches.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] branching for 9.2devel

2011-05-04 Thread Magnus Hagander
On Wed, May 4, 2011 at 19:21, Josh Berkus j...@agliodbs.com wrote:

 You can't indent patches, only patched files. And that's the problem
 with this happy scheme. For it to work at all sanely we'd need to keep
 the committed code that the patch is to be applied against strictly
 pgindent clean, presumably via some automated process such as a commit
 hook. That's been suggested in the past, but hasn't met with universal
 approval, IIRC.

 Well, there is another solution to this, which is to use Git branches
 and forks instead of mailing around patches.

That makes no difference to this problem, really. If the committer (or
reviewer) has to reindent it anyway, you can just as well do a git
checkout work   patch -p1  /where/ever  pgindent  git diff as
git remote add somewhere  git fetch somewhere  git checkout work
--track somewhere/something  pgindent  git diff.

There are some reasons why using git branches and forks are nice to
work with, but they don't solve tihs problem.

Or are you saying there should be an automated service where you
registered your git url + branch and then it would pull that branch,
run pgindent for you, and then republish it somewhere? Not sure how
big a win that is in the end, plus it's going to fail as soon as you
get a confligt anywhere anyway...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] branching for 9.2devel

2011-05-04 Thread David Blewett
On Wed, May 4, 2011 at 1:21 PM, Josh Berkus j...@agliodbs.com wrote:

 You can't indent patches, only patched files. And that's the problem
 with this happy scheme. For it to work at all sanely we'd need to keep
 the committed code that the patch is to be applied against strictly
 pgindent clean, presumably via some automated process such as a commit
 hook. That's been suggested in the past, but hasn't met with universal
 approval, IIRC.

 Well, there is another solution to this, which is to use Git branches
 and forks instead of mailing around patches.

Shouldn't it be as simple as keeping a git clone of trunk up to date,
applying the patch, running pgindent and emitting the resulting diff?
Once it's been generated, just run git reset --hard to clean out all
local changes.
-- 
Thanks,

David Blewett

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-04 Thread Tom Lane
Alvaro Herrera alvhe...@alvh.no-ip.org writes:
 A customer came to us with this request: a way to store any data in a
 column.  We've gone back and forth trying to determine reasonable
 implementation restrictions, safety and useful semantics for them.

Yes, it seems rather messy.

 The main idea is to be able to store column values in an audit table
 like this:
 old_value variant
 new_value variant
 Currently, they use text for old_value and new_value, but this is, of
 course, not very satisfactory.

Just out of curiosity, what actual functionality gain would ensue over
just using text?  It seems like doing anything useful with the audit
table contents would still require casting the column to text, or the
moral equivalent of that.

If we did have a concept of ANYTYPE, or really we'd need ANYTYPE[],
it could perhaps be used in pg_statistic, in place of the rather ugly
anyarray hack that's there now.  But I note that nothing above the level
of C code can do anything very useful with the contents of pg_statistic,
and I'm not sure that having an official type would change that.

 However, as a middle ground we agreed that we could allow a declared
 variant to store any pinned type without restrictions; those can't be
 changed or dropped so there's no safety concern.

If you're going to accept that there are restrictions, I don't see that
there is a good argument against your thought of a declared union
type.  At least then it's clear what the restrictions are.  I am firmly
against exposing the notion of pinned types in any user-visible SQL
semantics, and even more firmly against the idea of creating fundamental
functionality that only works for built-in types and can't be extended
to add-on types.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extreme bloating of intarray GiST indexes

2011-05-04 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 (c) we *think* that other columns of the table, including other indexed
 columns, are changed much more frequently than the intarray column is.
 Currently doing analysis on that.

Yeah, I noticed that your statistics for the table showed far more
updates than insertions or deletions.  If the intarray itself didn't
change often, that would result in lots of duplicate entries being made
in the index.  They'd get cleaned by vacuum eventually, but maybe not
fast enough to avoid the one-live-tuple-per-page syndrome that Alexander
was speculating about.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unfriendly handling of pg_hba SSL options with SSL off

2011-05-04 Thread Peter Eisentraut
On mån, 2011-04-25 at 19:18 -0400, Tom Lane wrote:
 Hm, does that mean we have consensus on treating it as an error?

Regarding the patch you committed: I would avoid hardcoding
postgresql.conf in error or hint messages, since we don't know whether
that's the actual name of the file.  No other message has that file name
hardcoded.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extreme bloating of intarray GiST indexes

2011-05-04 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 Can you provide the data in the column that's indexed?

 Attached.  This is for the index which is 90% free space.

I tried loading this data in fresh, and then creating a gist__intbig_ops
index on it.  I got these pgstattuple numbers (in 8.4.8):

table_len  | 8806400
tuple_count| 15005
tuple_len  | 4081360
tuple_percent  | 46.35
dead_tuple_count   | 0
dead_tuple_len | 0
dead_tuple_percent | 0
free_space | 4088100
free_percent   | 46.42

On the other hand, loading the data with a pre-existing empty index gave

table_len  | 7798784
tuple_count| 15005
tuple_len  | 4081360
tuple_percent  | 52.33
dead_tuple_count   | 0
dead_tuple_len | 0
dead_tuple_percent | 0
free_space | 3183672
free_percent   | 40.82

Neither of those numbers are great, and it's a bit surprising that
CREATE INDEX produces a result notably worse than incremental loading;
but still a darn sight better than 90% free space.  So I think probably
the update pattern has a lot to do with this.

 * If you didn't notice earlier, it's a partial index.  The two columns
 which determine the partial index change more often than the intarray
 column.

Yeah, with only about half of the table actually indexed, since you
showed only 7786 index entries in your results.  But unless there's
reason to think the indexed and unindexed entries are substantially
different in the intarray column, this is probably not very relevant.

 * We've also determined some other unusual patterns from watching the
 application:

 (a) the listings table is a very wide table, with about 60 columns

 (b) whenever the table gets updated, the application code updates these
 60 columns in 4 sections.  So there's 4 updates to the same row, in a
 single transaction.

Hmm.  That is going to lead to four dead index entries for every live
one (unless some of the updates are HOT, which won't happen if you're
changing any indexed columns).  VACUUM will get back the space
eventually, but not before you've caused some index bloat.

I tried doing something similar to my test table:

contrib_regression=# alter table listings add column junk int;
ALTER TABLE
contrib_regression=# create index li on listings(junk);
CREATE INDEX
contrib_regression=# begin;
BEGIN
contrib_regression=# update listings set junk=1;
UPDATE 15005
contrib_regression=# update listings set junk=2;
UPDATE 15005
contrib_regression=# update listings set junk=3;
UPDATE 15005
contrib_regression=# update listings set junk=4;
UPDATE 15005
contrib_regression=# commit;
COMMIT
contrib_regression=# vacuum listings;
VACUUM

and then got these pgstattuple numbers:

table_len  | 39460864
tuple_count| 15005
tuple_len  | 4081360
tuple_percent  | 10.34
dead_tuple_count   | 0
dead_tuple_len | 0
dead_tuple_percent | 0
free_space | 32923872
free_percent   | 83.43

which is up in the same ballpark with your problem.  Now probably your
client's app is not updating all rows at once, but still this is a
pretty wasteful update pattern.  Is there a reason not to update all
the columns in a single update?

If you can't change the app, I'd suggest more aggressive autovacuuming
as the least painful fix.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unfriendly handling of pg_hba SSL options with SSL off

2011-05-04 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On mån, 2011-04-25 at 19:18 -0400, Tom Lane wrote:
 Hm, does that mean we have consensus on treating it as an error?

 Regarding the patch you committed: I would avoid hardcoding
 postgresql.conf in error or hint messages, since we don't know whether
 that's the actual name of the file.  No other message has that file name
 hardcoded.

Fair enough --- do you have a proposal for alternate wording?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-04 Thread Darren Duncan

Alvaro Herrera wrote:

A customer came to us with this request: a way to store any data in a
column.  We've gone back and forth trying to determine reasonable
implementation restrictions, safety and useful semantics for them.
I note that this has been requested in the past:
http://archives.postgresql.org/pgsql-general/2004-02/msg01266.php
and both Oracle and MS-SQL have it and apparently people find them
useful.  I didn't find any indication that SQL contains anything
resembling this.


I see VARIANT/ANYTYPE as the most general case of supporting union types, which, 
say, could have more specific examples of allow any number or date here but 
nothing else.  If VARIANT is supported, unions in general ought to be also.


The most effective way of supporting VARIANT or union types in general is having 
an implementation where in the general case each value in the database knows its 
own data type rather than the data type being provided by a context such as what 
table column it is in.  For example, if rather than storing a data value 
directly we store a 2-attribute struct naming a data type and pointing to or 
holding the data value.


See how SQLite works as an example of how VARIANTs or unions could work, 
although that on its own would need to be made more comprehensive for Pg.


I claim ignorance as to how Pg currently implements these matters.

Where VARIANT/union types are supported by default, declaring more specific 
types is just a type constraint and an optimization.


Of course, when we know the type of a column/etc isn't going to be VARIANT or 
some other union type, then a simple optimization allows us to just store the 
value and have its type provided by context rather than the struct.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-04 Thread Merlin Moncure
On Wed, May 4, 2011 at 12:36 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Alvaro Herrera alvhe...@alvh.no-ip.org writes:
 A customer came to us with this request: a way to store any data in a
 column.  We've gone back and forth trying to determine reasonable
 implementation restrictions, safety and useful semantics for them.

 Yes, it seems rather messy.

 The main idea is to be able to store column values in an audit table
 like this:
 old_value     variant
 new_value     variant
 Currently, they use text for old_value and new_value, but this is, of
 course, not very satisfactory.

 Just out of curiosity, what actual functionality gain would ensue over
 just using text?  It seems like doing anything useful with the audit
 table contents would still require casting the column to text, or the
 moral equivalent of that.

The problem with text is that for composite/table types you lose the
ability to cast back when the structure changes.   Most of the
auditing implementations I've done cast new/old to text and drop it
directly into a single column on the audit record.  It works ok, but
you have to messily update the text later when the table changes.  For
non-variant composites you can add columns down the line and it works
ok in dependent records without too much fuss.

I think though that getting this to work such that type dependency is
via row/column instead of just column is going to be tough.  Outside
of composites, I don't see much advantages vs the text route,
performance maybe?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extreme bloating of intarray GiST indexes

2011-05-04 Thread Josh Berkus
On 5/4/11 11:29 AM, Tom Lane wrote:
 which is up in the same ballpark with your problem.  Now probably your
 client's app is not updating all rows at once, but still this is a
 pretty wasteful update pattern.  Is there a reason not to update all
 the columns in a single update?

Yeah, really crappy application code.  Discussing it with app developers
now ...

 If you can't change the app, I'd suggest more aggressive autovacuuming
 as the least painful fix.

Will test that.  It's not clear that vacuuming is helping at all.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extreme bloating of intarray GiST indexes

2011-05-04 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 On 5/4/11 11:29 AM, Tom Lane wrote:
 If you can't change the app, I'd suggest more aggressive autovacuuming
 as the least painful fix.

 Will test that.  It's not clear that vacuuming is helping at all.

Well, you realize of course that you need a REINDEX to get the index
size back down to a sane range.  Autovacuum may or may not be able to
keep it from creeping back up ... but vacuum definitely won't remove
existing bloat.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Message style cleanup

2011-05-04 Thread Peter Eisentraut
On ons, 2011-05-04 at 20:32 +0200, Magnus Hagander wrote:
 Is this part really correct?
 
 -   fprintf(stderr, _(%s: final receive failed: %s\n),
 +   fprintf(stderr, _(%s: final receive failed: %s),
 
 all other msgs have \n.. I see you removed the final dot in many of
 the other messages, did you just accidentally grab one too  many?
 (There are a couple of other examples of removing the \n though, so
 perhaps it was  intentional? But they're all preceeding
 disconnect_and_exit() so I don't see why it shouldn't be consistent?

The result of PQerrorMessage() already ends in \n.

With that in mind, maybe I missed something?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Message style cleanup

2011-05-04 Thread Magnus Hagander
On Wed, May 4, 2011 at 21:09, Peter Eisentraut pete...@gmx.net wrote:
 On ons, 2011-05-04 at 20:32 +0200, Magnus Hagander wrote:
 Is this part really correct?

 -       fprintf(stderr, _(%s: final receive failed: %s\n),
 +       fprintf(stderr, _(%s: final receive failed: %s),

 all other msgs have \n.. I see you removed the final dot in many of
 the other messages, did you just accidentally grab one too  many?
 (There are a couple of other examples of removing the \n though, so
 perhaps it was  intentional? But they're all preceeding
 disconnect_and_exit() so I don't see why it shouldn't be consistent?

 The result of PQerrorMessage() already ends in \n.

 With that in mind, maybe I missed something?

Oh, d'uh. I'm not paying attention. Sorry about the noise.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Prefered Types

2011-05-04 Thread Robert Haas
On Tue, May 3, 2011 at 3:06 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I wrote:
 Alvaro Herrera alvhe...@commandprompt.com writes:
 The interesting discussion is what happens next.  To me, this is all
 related to this previous discussion:
 http://archives.postgresql.org/pgsql-hackers/2010-09/msg00232.php

 Yeah, there doesn't seem like much point unless we have a clear idea
 what we're going to do with the change.

 BTW, it occurs to me to wonder whether, instead of making types be more
 or less preferred, we should attack the issue from a different direction
 and assign preferred-ness ratings to casts.  That seems to be more or
 less the direction that Robert was considering in the above-linked
 thread.  I'm not sure it's better than putting the ratings on types ---
 in particular, neither viewpoint seems to offer a really clean answer
 about what to do when trying to resolve a multiple-argument function
 in which one possible resolution offers a more-preferred conversion for
 one argument but a less-preferred conversion for another one.  But it's
 an alternative we ought to think about before betting all the chips on
 generalizing typispreferred.

 Personally I've always felt that the typispreferred mechanism was a bit
 of a wart; changing it from a bool to an int won't improve that, it'll
 just make it a more complicated wart.  Casts have already got a
 standards-blessed notion that some are more equal than others, so
 maybe attaching preferredness ratings to them will be less of a wart.
 Not sure about it though.

I think this is a pretty good analysis.   One of the big, fat problems
with typispreferred is that it totally falls apart when more than two
types are involved.  For example, given a call f(int2), we can't
decide between f(int4) and f(int8), but it seems pretty clear (to me,
at least) that we should prefer to promote as little as possible and
should therefore pick f(int4).  The problem is less acute with
string-like data types because there are only two typcategory-S data
types that get much use: text and varchar.  But add a third type to
the mix (varchar2...) or start playing around with functions that are
defined for name and bpchar but not text or some such thing, and
things get sticky.

Generalizing typispreferred to an integer definitely helps with these
cases, assuming anyway that you are dealing mostly with built-in
types, or that the extensions you are using can somehow agree among
themselves on reasonable weighting values.  But it is not a perfect
solution either, because it can really only handle pretty linear
topologies.  It's reasonable to suppose that the integer types are
ordered int2 - int4 - int8 - numeric and that the floating point types
are ordered float4 - float8 (- numeric?), but I think the two
hierarchies are pretty much incomparable, and an integer
typispreferred won't handle that very well, unless we make the two
groups separate categories, but arguably numeric belongs in both
groups so that doesn't really seem to work very well either.
Certainly from a theoretical perspective there's no reason why you
couldn't have A - B - X and C - D - X, with A-C, A-D, B-C, and B-D
incomparable.  It almost feels like you need a graph to model it
properly, which perhaps argues for your idea of attaching weights to
the casts.

But there are some problems with that, too.  In particular, it would
be nice to be able to hook in new types with a minimum of fuss.  For
example, say we add a new string type, like citext, via an extension.
Right now, we need to add casts not only from citext to text, but also
from citext to all the things to which text has casts, if we really
want citext to behave like text.  That solution works OK for the first
extension type we load in, but as soon as you add any nonstandard
casts from text to other things (perhaps yet another extension type of
some kind), it starts to get a bit leaky.  In some sense it feels like
it'd be nice to be able to walk the graph - if an implicit cast from
A to B is OK, and an implicit cast from B to C is OK, perhaps an
implicit cast from A to C is also OK.  But that seems awfully
expensive to do at runtime, and it'd introduce some strange behavior
particularly with the way we have the reg* - oid and oid - reg*
casts set up.

select a.castsource::regtype, a.casttarget::regtype,
b.casttarget::regtype from pg_cast a, pg_cast b where a.casttarget =
b.castsource and a.castcontext = 'i' and b.castcontext = 'i' and not
exists (select 1 from pg_cast x where x.castsource = a.castsource and
x.casttarget = b.casttarget and x.castcontext = 'i') and a.castsource
 b.casttarget;

It's not clear to me whether in any of this there is a solution to the
problem of int2 being a second-class citizen.  Perhaps we could add
casts from int4 and int8 back to int2, and make it less-preferred than
all of the other integer types, but I'm not sure what else that would
break.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL 

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-04 Thread Andrew Dunstan



On 05/04/2011 01:36 PM, Tom Lane wrote:



The main idea is to be able to store column values in an audit table
like this:
old_value   variant
new_value   variant
Currently, they use text for old_value and new_value, but this is, of
course, not very satisfactory.

Just out of curiosity, what actual functionality gain would ensue over
just using text?  It seems like doing anything useful with the audit
table contents would still require casting the column to text, or the
moral equivalent of that.



Yeah, I've been down this road once or twice, and I think that's the $64 
question.


I wrote a custom audit app two or three years ago. After several 
iterations the customer and I found that using an hstore for the old/new 
(or old record / changeset, which is what we actually use) was the most 
suitable for our use.


I think if we did this we'd need to add some sort of is_type()  and 
typeof() functions for variant objects.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Predicate locking

2011-05-04 Thread Robert Haas
On Tue, May 3, 2011 at 10:07 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 ... on a toy table with contrived values.  How different is this
 from the often-asked question about why a query against a four-line
 table is not using the index they expect, and how can we expect it
 to scale if it doesn't?  I agree that it's not unreasonable for
 someone to ask either question.  If my response falls short, I'm
 game to try again.

I guess what surprises me about this a bit is that we have to
predicate-lock the whole table even if we're not actually looking at
all the rows.  I can sort of see why that's necessary, but I'm a bit
fuzzy on the details, and it does seem a little unfortunate in this
instance...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Predicate locking

2011-05-04 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 On Tue, May 3, 2011 at 10:07 PM, Kevin Grittner
 kevin.gritt...@wicourts.gov wrote:
 ... on a toy table with contrived values.  How different is this
 from the often-asked question about why a query against a
 four-line table is not using the index they expect, and how can
 we expect it to scale if it doesn't?  I agree that it's not
 unreasonable for someone to ask either question.  If my response
 falls short, I'm game to try again.
 
 I guess what surprises me about this a bit is that we have to
 predicate-lock the whole table even if we're not actually looking
 at all the rows.  I can sort of see why that's necessary, but I'm
 a bit fuzzy on the details, and it does seem a little unfortunate
 in this instance...
 
Well, as far as I can tell, every production-quality database with
predicate locking models the predicates based on the rows actually
accessed.  Until now, that has been every popular SQL database
except PostgreSQL and Oracle.  That makes predicate locking
sensitive to the plan chosen.  It was because of this that I thought
it might be wise to include a bump to the seq_page_cost and/or
cpu_tuple_cost for plans inside a serializable transaction.  This
would encourage indexed access rather than a table scan at an
earlier threshold, thereby reducing false positive serialization
failures.  At the time the suggestion got a rather cool reception. 
Is it time to reconsider that?
 
On the other hand, as a shop where we're probably going to set
default_transaction_isolation = serializable in our postgresql.conf
files and include trigger checks that we're running at that level,
we can just boost those globally.  That may also work for others.
 
Once I wrap up these changes to our replication system I'm in the
middle of coding, I'll see about getting all our development
machines onto 9.1beta with default serialization and see how much
trouble our apps have.  Even on our development machines we run with
a copy of real data from a circuit court county database.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Prefered Types

2011-05-04 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 ... One of the big, fat problems
 with typispreferred is that it totally falls apart when more than two
 types are involved.  For example, given a call f(int2), we can't
 decide between f(int4) and f(int8), but it seems pretty clear (to me,
 at least) that we should prefer to promote as little as possible and
 should therefore pick f(int4).

Yeah.  If your mental model is one of least promotion, you really
cannot express that at all with a preferred type concept, even if the
ratings are integers and not bools.  On the other hand, it does seem
possible to attach a cost or distance metric to casts and get some
reasonably intuitive behavior that way.  If you check the archives I
believe we've discussed this before using the distance terminology.
It still falls down though on the question of what to prefer when there
are several combinations of multiple casts to choose between.  And as
you say it's not entirely clear how well either approach generalizes to
after-the-fact insertion of new types/casts in the hierarchy.

Perhaps it would be adequate to allow automatic resolution of an
overloading conflict only when one of the available alternatives
dominates all others, ie, none of the argument positions requires a
longer distance cast than is used in that position by any other
available alternative.  I'm just throwing that out as a possibility,
I haven't tried it.

 It's not clear to me whether in any of this there is a solution to the
 problem of int2 being a second-class citizen.

I've always felt that the basic problem int2 has got is that the parser
initially types integer-looking constants as int4 or larger, even if
they'd fit in int2.  If it typed them as int2 then the unintuitive
behaviors would largely go away, without any need for allowing implicit
down-casting from int4 to int2.  I actually tried that once, probably
close to ten years ago, and it blew up real good because many cases that
formerly were considered an exact match no longer were, and the parser
started making some pretty surprising (or at least not backwards
compatible) resolution choices.  Maybe with a more controllable
type-promotion mechanism we could get better results there.

BTW, not to rain on the parade or anything, but I'll bet that
rejiggering anything at all here will result in whining that puts the
8.3-era removal of a few implicit casts to shame.  If the new behavior
is really significantly better *for users* then we can probably
withstand the complaints; but if it's just marginal improvements or just
improves life for hypothetical future extensions, it's not going to fly.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC 2011: Fast GiST index build

2011-05-04 Thread Alexander Korotkov
During studying of existing GiST code I have a question.
gistFindCorrectParent function have branch with following comment:
 /*
 * awful!!, we need search tree to find parent ... , but before we
 * should release all old parent
 */
Can you provide me an example of case when this branch works?


With best regards,
Alexander Korotkov.


Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-04 Thread Merlin Moncure
On Wed, May 4, 2011 at 2:55 PM, Andrew Dunstan and...@dunslane.net wrote:
 On 05/04/2011 01:36 PM, Tom Lane wrote:

 The main idea is to be able to store column values in an audit table
 like this:
 old_value       variant
 new_value       variant
 Currently, they use text for old_value and new_value, but this is, of
 course, not very satisfactory.

 Just out of curiosity, what actual functionality gain would ensue over
 just using text?  It seems like doing anything useful with the audit
 table contents would still require casting the column to text, or the
 moral equivalent of that.


 Yeah, I've been down this road once or twice, and I think that's the $64
 question.

 I wrote a custom audit app two or three years ago. After several iterations
 the customer and I found that using an hstore for the old/new (or old record
 / changeset, which is what we actually use) was the most suitable for our
 use.

yeah -- +1 on that method.  I think it's really the right way to go
with the recent hstore enhancements.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] adding a new column in IDENTIFY_SYSTEM

2011-05-04 Thread Simon Riggs
On Wed, May 4, 2011 at 3:47 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Jaime Casanova ja...@2ndquadrant.com writes:
 I want to propose the addition of a new field in IDENTIFY_SYSTEM:
 xlogversion, which will carry XLOG_PAGE_MAGIC from primary.
 The idea of sending that info is to allow us to know if the xlog page
 version of two different major versions are compatible or not.
 Currently pg_upgrade requires the primary to be taken down,

 That's *intentional*.

 The notion of WAL-shipping-replication compatibility between two
 different major versions is insane on its face.  They will not have
 compatible system catalog contents.  You might get perfect replication
 of the master's catalogs, but the slave wouldn't be able to interpret
 them.

That's exactly how hard in place upgrade was to begin with.

Considering how valuable this would be, it seems worth it to pursue this.

 The reason we have XLOG_PAGE_MAGIC is really more the opposite: to
 prevent people from trying to recover across a minor version update in
 which we had to break XLOG compatibility.  I don't recall right now
 if that's ever actually happened, but it definitely could.

If that is true, then allowing this patch will allow us to detect that
incompatibility when the standby connects to the master, and explain
the issue in a useful error message. Otherwise we will just barf on
the magic value.

Having access to these details might make it possible to upgrade. They
could be inferred, but it would be better to have the full data so we
can take an informed decision about whether or not it is possible.

So even if people don't believe in the rationale behind the patch,
would allowing it harm anything at this point?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Predicate locking

2011-05-04 Thread Kevin Grittner
I wrote:
 
 On the other hand, as a shop where we're probably going to set
 default_transaction_isolation = serializable in our
 postgresql.conf files and include trigger checks that we're
 running at that level, we can just boost those globally.  That may
 also work for others.
 
Just as a quick experiment I took Greg's example and tried it with
different costs, and thereby eliminated the false positives for this
particular example, all the way down to a 5 row table!:
 
set random_page_cost = 0.2;
set cpu_tuple_cost = 0.05;
drop table t;
create table t (id bigint, value bigint);
insert into t(id,value) (select s,1 from generate_series(1,5) as s);
create index t_idx on t(id);
begin transaction;
set transaction isolation level serializable;
select * from t where id = 2;
insert into t (id, value) values (-2, 1);

Execute this on the second client:

set random_page_cost = 0.2;
set cpu_tuple_cost = 0.05;
begin transaction;
set transaction isolation level serializable;
select * from t where id = 3;
insert into t (id, value) values (-3, 0);
commit;
 
Then go back to the first client and commit -- no problem.
 
I make no representation that these are great numbers for any
particular workload; it's just meant as a quick illustration that
these behaviors are tunable.  With serializable transactions, it
probably is reasonable to figure that the cost of a sequential scan
or of reading a tuple includes the cost of some percentage of
transactions being rolled back and restarted.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-04 Thread Alvaro Herrera
Excerpts from Tom Lane's message of mié may 04 14:36:44 -0300 2011:
 Alvaro Herrera alvhe...@alvh.no-ip.org writes:

  The main idea is to be able to store column values in an audit table
  like this:
  old_valuevariant
  new_valuevariant
  Currently, they use text for old_value and new_value, but this is, of
  course, not very satisfactory.
 
 Just out of curiosity, what actual functionality gain would ensue over
 just using text?  It seems like doing anything useful with the audit
 table contents would still require casting the column to text, or the
 moral equivalent of that.

Storage efficiency.  These people have really huge databases; small
changes in how tight things are packed makes a large difference for
them.  (For example, we developed a type to store SHA-2 digests in a
more compact way than bytea mainly because of this reason.  Also, at
some time they also wanted to apply compression to hstore keys and
values.)

As someone commented downthread, they also want to have things such as a
typeof operator.  It could be used in (say) a plpgsql function to
choose different branches of code.

Things go wild when you think about using VARIANT as a function
parameter type.  Given how complex are our current function resolution
rules I'm not really sold on making this work at all, so initially I'm
aiming at just raising an error in that case.

 If we did have a concept of ANYTYPE, or really we'd need ANYTYPE[],
 it could perhaps be used in pg_statistic, in place of the rather ugly
 anyarray hack that's there now.  But I note that nothing above the level
 of C code can do anything very useful with the contents of pg_statistic,
 and I'm not sure that having an official type would change that.

Well, we could get rid of ugly hacks that are in various places in the
backend code to cope with this.  Nor sure how useful it'd be for doing
things at the SQL level.

  However, as a middle ground we agreed that we could allow a declared
  variant to store any pinned type without restrictions; those can't be
  changed or dropped so there's no safety concern.
 
 If you're going to accept that there are restrictions, I don't see that
 there is a good argument against your thought of a declared union
 type.  At least then it's clear what the restrictions are.  I am firmly
 against exposing the notion of pinned types in any user-visible SQL
 semantics, and even more firmly against the idea of creating fundamental
 functionality that only works for built-in types and can't be extended
 to add-on types.

The argument was it's too cumbersome to use (which makes sense: users
are certain to forget to add the new type to the declared union when
they add a new column to the table, possibly causing a run-time error if
the audit trigger is in place and attempts to load a new datum into the
log table.)

I understand the reluctancy to use pinned-ness in a user-visible way.
Back to the drawing board.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-04 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Excerpts from Tom Lane's message of mié may 04 14:36:44 -0300 2011:
 Just out of curiosity, what actual functionality gain would ensue over
 just using text?  It seems like doing anything useful with the audit
 table contents would still require casting the column to text, or the
 moral equivalent of that.

 Storage efficiency.  These people have really huge databases; small
 changes in how tight things are packed makes a large difference for
 them.  (For example, we developed a type to store SHA-2 digests in a
 more compact way than bytea mainly because of this reason.  Also, at
 some time they also wanted to apply compression to hstore keys and
 values.)

Hmm.  The prototypical case for this would probably be a 4-byte int,
which if you add an OID to it so you can resolve the type is going to
take 8 bytes, plus you are going to need a length word because there is
really no alternative to the VARIANT type being varlena overall, which
makes it 9 bytes if you're lucky on alignment and up to 16 if you're
not.  That is not shorter than the average length of the text
representation of an int.  The numbers don't seem a lot better for
8-byte quantities like int8, float8, or timestamp.  It might be
marginally worthwhile for timestamp, but surely this is a huge amount of
effort to substitute for thinking of a more compact text representation
for timestamps.

Pardon me for being unconvinced.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DLL export with mingw-w64: currently a no-op

2011-05-04 Thread Johann 'Myrkraverk' Oskarsson
On Wed, 04 May 2011 15:11:57 -, Andrew Dunstan and...@dunslane.net  
wrote:



On 05/04/2011 01:25 AM, Johann 'Myrkraverk' Oskarsson wrote:
On Tue, 03 May 2011 12:40:28 -, Andrew Dunstan  
and...@dunslane.net wrote:


Our Makefiles use dlltool and dllwrap to create DLLs. If you used our  
recommended build method pgxs would do lots of the work for you.




dllwrap is a tool to build DLLs. It seems to be deprecated in favour of
gcc -shared option, but some projects still use it. SQLite, for  
example.


Armed with this information, it may be prudent review the DLL build
process in PGXS.


OK, but if we want to get rid of using dllwrap we'd need a complete  
patch for it. Just changing the definition of the macro without changing  
the rest isn't going to cut it, I think. But it might well be worth  
doing.


You're right.  And now that I know a little bit more about how to build  
DLLs

and what's involved I can take a look at this, sometime in the not-too-
distant future.


Until next time, have fun!

--
  Johann Oskarssonhttp://www.2ndquadrant.com/|[]
  PostgreSQL Development, 24x7 Support, Training and Services  --+--
 |
  Blog: http://my.opera.com/myrkraverk/blog/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-04 Thread David E. Wheeler
On May 4, 2011, at 3:04 PM, Alvaro Herrera wrote:

 As someone commented downthread, they also want to have things such as a
 typeof operator.  It could be used in (say) a plpgsql function to
 choose different branches of code.

FWIW, pg_typeof(any) has been in core since 9.0.

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-04 Thread Andrew Dunstan



On 05/04/2011 07:05 PM, Tom Lane wrote:

Alvaro Herreraalvhe...@commandprompt.com  writes:

Excerpts from Tom Lane's message of mié may 04 14:36:44 -0300 2011:

Just out of curiosity, what actual functionality gain would ensue over
just using text?  It seems like doing anything useful with the audit
table contents would still require casting the column to text, or the
moral equivalent of that.

Storage efficiency.  These people have really huge databases; small
changes in how tight things are packed makes a large difference for
them.  (For example, we developed a type to store SHA-2 digests in a
more compact way than bytea mainly because of this reason.  Also, at
some time they also wanted to apply compression to hstore keys and
values.)

Hmm.  The prototypical case for this would probably be a 4-byte int,
which if you add an OID to it so you can resolve the type is going to
take 8 bytes, plus you are going to need a length word because there is
really no alternative to the VARIANT type being varlena overall, which
makes it 9 bytes if you're lucky on alignment and up to 16 if you're
not.  That is not shorter than the average length of the text
representation of an int.  The numbers don't seem a lot better for
8-byte quantities like int8, float8, or timestamp.  It might be
marginally worthwhile for timestamp, but surely this is a huge amount of
effort to substitute for thinking of a more compact text representation
for timestamps.

Pardon me for being unconvinced.




I'm far from convinced that storing deltas per column rather than per 
record is a win anyway. I don't have hard numbers to hand, but my vague 
recollection is that my tests showed it to be a design that used more space.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Some surprising precedence behavior in PG's grammar

2011-05-04 Thread Tom Lane
While looking at the grammar's operator-precedence declarations in
connection with a recent pgsql-docs question, it struck me that this
declaration is a foot-gun waiting to go off:

%nonassoc   IS NULL_P TRUE_P FALSE_P UNKNOWN /* sets precedence for IS 
NULL, etc */

The only terminal that we actually need to attach precedence to for
IS NULL and related productions is IS.  The others are just listed
there to save attaching explicit %prec declarations to those productions.
This seems like a bad idea, because attaching a precedence to a terminal
symbol that doesn't absolutely have to have one is just asking for
trouble: it can cause bison to accept grammars that would better have
provoked a shift/reduce error, and to silently resolve the ambiguity in
a way that you maybe didn't expect.

So I thought to myself that it'd be better to remove the unnecessary
precedence markings, and tried, with the attached patch.  And behold,
I got a shift/reduce conflict:

state 2788

  1569 b_expr: b_expr qual_Op . b_expr
  1571   | b_expr qual_Op .

NULL_P shift, and go to state 1010
NULL_P[reduce using rule 1571 (b_expr)]

So the god of unintended consequences has been here already.  What this
is telling us is that in examples such as

CREATE TABLE foo (f1 int DEFAULT 10 %% NULL);

it is not immediately clear to bison whether to shift upon seeing the
NULL (which leads to a parse tree that says %% is an infix operator with
arguments 10 and NULL), or to reduce (which leads to a parse tree that
says that %% is a postfix operator with argument 10, and NULL is a
column declaration constraint separate from the DEFAULT expression).

If you try the experiment, you find out that the first interpretation is
preferred by the current grammar:

ERROR:  operator does not exist: integer %% unknown

Now, this is probably a good thing, because NULL as a column declaration
constraint is not SQL standard (only NOT NULL is), so we're resolving
the ambiguity in a way that's more likely to be SQL-compatible.  But it
could be surprising to somebody who expected the other behavior,
especially since this seemingly-closely-related command is parsed the
other way:

CREATE TABLE foo (f1 int DEFAULT 10 %% NOT NULL);
ERROR:  operator does not exist: integer %%

And the reason for that difference in behavior is that NOT has a
declared precedence lower than POSTFIXOP, whereas NULL has a declared
precedence that's higher.  That comparison determines how bison resolves
the shift/reduce conflict.

The fact that this behavior falls out of a precedence declaration that's
seemingly quite unrelated, and was surely not designed with this case in
mind, is a perfect example of why I say that precedence declarations are
hazardous.

So I'd still like to get rid of the precedence markings for TRUE_P,
FALSE_P, and UNKNOWN, and will do so unless somebody has a really good
reason not to.  (It looks like we could avoid marking ZONE, too.)  But
I would be happier if we could also not mark NULL, because that's surely
used in a lot of other places, and could easily bite us a lot harder
than this.  Can anyone think of an alternative way to resolve this
particular conflict without the blunt instrument of a precedence marking?

regards, tom lane

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 933a1a2..2fb0418 100644
*** a/src/backend/parser/gram.y
--- b/src/backend/parser/gram.y
*** static void SplitColQualList(List *qualL
*** 614,620 
  %left		Op OPERATOR		/* multi-character ops and user-defined operators */
  %nonassoc	NOTNULL
  %nonassoc	ISNULL
! %nonassoc	IS NULL_P TRUE_P FALSE_P UNKNOWN /* sets precedence for IS NULL, etc */
  %left		'+' '-'
  %left		'*' '/' '%'
  %left		'^'
--- 614,620 
  %left		Op OPERATOR		/* multi-character ops and user-defined operators */
  %nonassoc	NOTNULL
  %nonassoc	ISNULL
! %nonassoc	IS/* sets precedence for IS NULL, etc */
  %left		'+' '-'
  %left		'*' '/' '%'
  %left		'^'
*** a_expr:		c_expr	{ $$ = $1; }
*** 9887,9893 
  			 *	a ISNULL
  			 *	a NOTNULL
  			 */
! 			| a_expr IS NULL_P
  {
  	NullTest *n = makeNode(NullTest);
  	n-arg = (Expr *) $1;
--- 9887,9893 
  			 *	a ISNULL
  			 *	a NOTNULL
  			 */
! 			| a_expr IS NULL_P			%prec IS
  {
  	NullTest *n = makeNode(NullTest);
  	n-arg = (Expr *) $1;
*** a_expr:		c_expr	{ $$ = $1; }
*** 9901,9907 
  	n-nulltesttype = IS_NULL;
  	$$ = (Node *)n;
  }
! 			| a_expr IS NOT NULL_P
  {
  	NullTest *n = makeNode(NullTest);
  	n-arg = (Expr *) $1;
--- 9901,9907 
  	n-nulltesttype = IS_NULL;
  	$$ = (Node *)n;
  }
! 			| a_expr IS NOT NULL_P		%prec IS
  {
  	NullTest *n = makeNode(NullTest);
  	n-arg = (Expr *) $1;
*** a_expr:		c_expr	{ $$ = $1; }
*** 9919,9960 
  {
  	$$ = (Node 

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-04 Thread Tom Lane
David E. Wheeler da...@kineticode.com writes:
 On May 4, 2011, at 3:04 PM, Alvaro Herrera wrote:
 As someone commented downthread, they also want to have things such as a
 typeof operator.  It could be used in (say) a plpgsql function to
 choose different branches of code.

 FWIW, pg_typeof(any) has been in core since 9.0.

But note that pg_typeof() would surely say VARIANT if applied to such
a datatype.  You'd need some other, VARIANT-specific function that knew
enough to dig into the value at runtime and get the OID therein.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Some surprising precedence behavior in PG's grammar

2011-05-04 Thread Andrew Dunstan



On 05/04/2011 07:39 PM, Tom Lane wrote:

While looking at the grammar's operator-precedence declarations in
connection with a recent pgsql-docs question, it struck me that this
declaration is a foot-gun waiting to go off:

%nonassoc   IS NULL_P TRUE_P FALSE_P UNKNOWN /* sets precedence for IS 
NULL, etc */

The only terminal that we actually need to attach precedence to for
IS NULL and related productions is IS.  The others are just listed
there to save attaching explicit %prec declarations to those productions.
This seems like a bad idea, because attaching a precedence to a terminal
symbol that doesn't absolutely have to have one is just asking for
trouble: it can cause bison to accept grammars that would better have
provoked a shift/reduce error, and to silently resolve the ambiguity in
a way that you maybe didn't expect.

So I thought to myself that it'd be better to remove the unnecessary
precedence markings, and tried, with the attached patch.  And behold,
I got a shift/reduce conflict:

state 2788

   1569 b_expr: b_expr qual_Op . b_expr
   1571   | b_expr qual_Op .

 NULL_P shift, and go to state 1010
 NULL_P[reduce using rule 1571 (b_expr)]

So the god of unintended consequences has been here already.  What this
is telling us is that in examples such as

CREATE TABLE foo (f1 int DEFAULT 10 %% NULL);

it is not immediately clear to bison whether to shift upon seeing the
NULL (which leads to a parse tree that says %% is an infix operator with
arguments 10 and NULL), or to reduce (which leads to a parse tree that
says that %% is a postfix operator with argument 10, and NULL is a
column declaration constraint separate from the DEFAULT expression).

If you try the experiment, you find out that the first interpretation is
preferred by the current grammar:

ERROR:  operator does not exist: integer %% unknown


Yeah, IIRC the default action for a shift/reduce conflict is to shift, 
as it's usually the right thing to do.



Now, this is probably a good thing, because NULL as a column declaration
constraint is not SQL standard (only NOT NULL is), so we're resolving
the ambiguity in a way that's more likely to be SQL-compatible.  But it
could be surprising to somebody who expected the other behavior,
especially since this seemingly-closely-related command is parsed the
other way:

CREATE TABLE foo (f1 int DEFAULT 10 %% NOT NULL);
ERROR:  operator does not exist: integer %%

And the reason for that difference in behavior is that NOT has a
declared precedence lower than POSTFIXOP, whereas NULL has a declared
precedence that's higher.  That comparison determines how bison resolves
the shift/reduce conflict.

The fact that this behavior falls out of a precedence declaration that's
seemingly quite unrelated, and was surely not designed with this case in
mind, is a perfect example of why I say that precedence declarations are
hazardous.

So I'd still like to get rid of the precedence markings for TRUE_P,
FALSE_P, and UNKNOWN, and will do so unless somebody has a really good
reason not to.  (It looks like we could avoid marking ZONE, too.)  But
I would be happier if we could also not mark NULL, because that's surely
used in a lot of other places, and could easily bite us a lot harder
than this.  Can anyone think of an alternative way to resolve this
particular conflict without the blunt instrument of a precedence marking?




My bison-fu is a bit rusty, but years ago I could do this stuff in my 
sleep. I'll be surprised if there isn't a way.


If we do need a precedence setting for NULL_P, then I think it should 
probably be on its own and not sharing one with IS.


If you don't solve it soon I'll take a look after I clear a couple of 
higher priority items from my list.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-04 Thread Alvaro Herrera
Excerpts from Tom Lane's message of mié may 04 20:05:54 -0300 2011:
 Alvaro Herrera alvhe...@commandprompt.com writes:
  Excerpts from Tom Lane's message of mié may 04 14:36:44 -0300 2011:
  Just out of curiosity, what actual functionality gain would ensue over
  just using text?  It seems like doing anything useful with the audit
  table contents would still require casting the column to text, or the
  moral equivalent of that.
 
  Storage efficiency.  These people have really huge databases; small
  changes in how tight things are packed makes a large difference for
  them.  (For example, we developed a type to store SHA-2 digests in a
  more compact way than bytea mainly because of this reason.  Also, at
  some time they also wanted to apply compression to hstore keys and
  values.)
 
 Hmm.  The prototypical case for this would probably be a 4-byte int,
 which if you add an OID to it so you can resolve the type is going to
 take 8 bytes, plus you are going to need a length word because there is
 really no alternative to the VARIANT type being varlena overall, which
 makes it 9 bytes if you're lucky on alignment and up to 16 if you're
 not.

Hmm, I was hoping that we could skip making it varlena at least in some
cases ... perhaps that's hopeless, in which case we'll have to reassess
the whole idea.  Still there's the new functionality argument, though.

As a followup idea there exists the desire to store records as records
and not text representation of same (given differing record types, of
course), for which it'd be more worthwhile.

 It might be
 marginally worthwhile for timestamp, but surely this is a huge amount of
 effort to substitute for thinking of a more compact text representation
 for timestamps.

Surely if it's just for timestamps it'd be a waste of time.

 Pardon me for being unconvinced.

:-)

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Some surprising precedence behavior in PG's grammar

2011-05-04 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 05/04/2011 07:39 PM, Tom Lane wrote:
 If you try the experiment, you find out that the first interpretation is
 preferred by the current grammar:
 ERROR:  operator does not exist: integer %% unknown

 Yeah, IIRC the default action for a shift/reduce conflict is to shift, 
 as it's usually the right thing to do.

Well, there's nothing default about it: we've got precedence
declarations that specifically tell it to do that.  I'm just disturbed
because this isn't what those precedences were meant to do.

 I would be happier if we could also not mark NULL, because that's surely
 used in a lot of other places, and could easily bite us a lot harder
 than this.  Can anyone think of an alternative way to resolve this
 particular conflict without the blunt instrument of a precedence marking?

 My bison-fu is a bit rusty, but years ago I could do this stuff in my 
 sleep. I'll be surprised if there isn't a way.

 If we do need a precedence setting for NULL_P, then I think it should 
 probably be on its own and not sharing one with IS.

Yeah, I was thinking that too.  If we put %prec on the IS [NOT] NULL
productions then there is no need for NULL_P to have exactly its current
precedence; anything above POSTFIXOP would preserve the current behavior
in the DEFAULT ... NULL case.  (And if we decided we wanted to flip that
behavior, anything below POSTFIXOP would do that.)

But it would still make life safer for future grammar hacking if it
didn't have precedence at all.

BTW, I wonder why NOTNULL and ISNULL have their own precedence levels,
rather than being made to act exactly like IS [NOT] NULL ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-04 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 As a followup idea there exists the desire to store records as records
 and not text representation of same (given differing record types, of
 course), for which it'd be more worthwhile.

Maybe.  The conventional wisdom is that text representation of data is
more compact than PG's internal representation by a significant factor
--- our FAQ says up to 5x, in fact.  I know that that's including row
overhead and indexes and so on, but I still don't find it to be a given
that you're going to win on space with this sort of trick.

Another point here is that once the values in question get to be any
significant number of bytes apiece, both text and the proposed VARIANT
representation could be subject to compression, which would greatly
reduce any nominal advantage of one over the other.  Any
representational inefficiency in either would just be grist for the
compressor.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Some surprising precedence behavior in PG's grammar

2011-05-04 Thread Greg Stark
On Thu, May 5, 2011 at 12:39 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 So I'd still like to get rid of the precedence markings for TRUE_P,
 FALSE_P, and UNKNOWN, and will do so unless somebody has a really good
 reason not to.  (It looks like we could avoid marking ZONE, too.)  But
 I would be happier if we could also not mark NULL, because that's surely
 used in a lot of other places, and could easily bite us a lot harder
 than this.  Can anyone think of an alternative way to resolve this
 particular conflict without the blunt instrument of a precedence marking?


Isn't there already some gadget which forces postfix operators to be
discouraged compared to some other interpretation in other cases? That
would be the opposite of the current interpretation though which you
said you preferred.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] patch: fix race in SSI's CheckTargetForConflictsIn

2011-05-04 Thread Dan Ports
While running some benchmarks to test SSI performance, I found a race
condition that's capable of causing a segfault. A patch is attached.

The bug is in CheckTargetForConflictsIn, which scans the list of SIREAD
locks on a lock target when it's modified. There's an optimization in
there where the writing transaction will remove a SIREAD lock that it
holds itself, because it's being replaced with a (stronger) write lock.
To do that, it needs to drop its shared lwlocks and reacquire them in
exclusive mode. The existing code deals with concurrent modifications
in that interval by redoing checks. However, it misses the case where
some other transaction removes all remaining locks on the target, and
proceeds to remove the lock target itself.

The attached patch fixes this by deferring the SIREAD lock removal
until the end of the function. At that point, there isn't any need to
worry about concurrent updates to the target's lock list. The resulting
code is also simpler.

Dan

-- 
Dan R. K. Ports  MIT CSAILhttp://drkp.net/
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 3309d07..c274bca 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -3905,6 +3905,8 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
 	LWLockId	partitionLock;
 	PREDICATELOCKTARGET *target;
 	PREDICATELOCK *predlock;
+	PREDICATELOCK *mypredlock = NULL;
+	PREDICATELOCKTAG mypredlocktag;
 
 	Assert(MySerializableXact != InvalidSerializableXact);
 
@@ -3950,139 +3952,21 @@ CheckTargetForConflictsIn(PREDICATELOCKTARGETTAG *targettag)
 		if (sxact == MySerializableXact)
 		{
 			/*
-			 * If we're getting a write lock on the tuple and we're not in a
-			 * subtransaction, we don't need a predicate (SIREAD) lock.  We
-			 * can't use this optimization within a subtransaction because the
-			 * subtransaction could be rolled back, and we would be left
-			 * without any lock at the top level.
+			 * If we're getting a write lock on a tuple, we don't need
+			 * a predicate (SIREAD) lock on the same tuple. We can
+			 * safely remove our SIREAD lock, but we'll defer doing so
+			 * until after the loop because that requires upgrading to
+			 * an exclusive partition lock.
 			 *
-			 * At this point our transaction already has an ExclusiveRowLock
-			 * on the relation, so we are OK to drop the predicate lock on the
-			 * tuple, if found, without fearing that another write against the
-			 * tuple will occur before the MVCC information makes it to the
-			 * buffer.
+			 * We can't use this optimization within a subtransaction
+			 * because the subtransaction could roll back, and we
+			 * would be left without any lock at the top level.
 			 */
 			if (!IsSubTransaction()
  GET_PREDICATELOCKTARGETTAG_OFFSET(*targettag))
 			{
-uint32		predlockhashcode;
-PREDICATELOCKTARGET *rmtarget = NULL;
-PREDICATELOCK *rmpredlock;
-LOCALPREDICATELOCK *locallock,
-		   *rmlocallock;
-
-/*
- * This is a tuple on which we have a tuple predicate lock. We
- * only have shared LW locks now; release those, and get
- * exclusive locks only while we modify things.
- */
-LWLockRelease(SerializableXactHashLock);
-LWLockRelease(partitionLock);
-LWLockAcquire(SerializablePredicateLockListLock, LW_SHARED);
-LWLockAcquire(partitionLock, LW_EXCLUSIVE);
-LWLockAcquire(SerializableXactHashLock, LW_EXCLUSIVE);
-
-/*
- * Remove the predicate lock from shared memory, if it wasn't
- * removed while the locks were released.  One way that could
- * happen is from autovacuum cleaning up an index.
- */
-predlockhashcode = PredicateLockHashCodeFromTargetHashCode
-	((predlock-tag), targettaghash);
-rmpredlock = (PREDICATELOCK *)
-	hash_search_with_hash_value(PredicateLockHash,
-(predlock-tag),
-predlockhashcode,
-HASH_FIND, NULL);
-if (rmpredlock)
-{
-	Assert(rmpredlock == predlock);
-
-	SHMQueueDelete(predlocktargetlink);
-	SHMQueueDelete((predlock-xactLink));
-
-	rmpredlock = (PREDICATELOCK *)
-		hash_search_with_hash_value(PredicateLockHash,
-	(predlock-tag),
-	predlockhashcode,
-	HASH_REMOVE, NULL);
-	Assert(rmpredlock == predlock);
-
-	RemoveTargetIfNoLongerUsed(target, targettaghash);
-
-	LWLockRelease(SerializableXactHashLock);
-	LWLockRelease(partitionLock);
-	LWLockRelease(SerializablePredicateLockListLock);
-
-	locallock = (LOCALPREDICATELOCK *)
-		hash_search_with_hash_value(LocalPredicateLockHash,
-	targettag, targettaghash,
-	HASH_FIND, NULL);
-
-	/*
-	 * Remove entry in local lock table if it exists and has
-	 * no children. It's OK if it doesn't exist; that means
-	 * the lock was transferred to a new target by a different
-	 * backend.
-	 */
-	if (locallock != NULL)
-		

[HACKERS] Enhancing online recovery in SR mode

2011-05-04 Thread Tatsuo Ishii
Hi,

I have just committed changes to enhance online recovery in streaming
replication mode. Pgpool-II had to restart child process to recognize
failback node. Now pgpool-II becomes a little bit smarter and does not
need to restart children. This means that client sessions started
before online recovery can continue while doing online recovery.

The trick is to have local backend status cache in pgpool child. It
copies backend status into local memory. Thus the process continues to
disregard failback node until the session ends. Once the session ends,
the process exits with status 1, and pgpool-II main restart new one,
which will recognize failback node.

Note that this enhancement is applied to pcp_attach_node as well.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enhancing online recovery in SR mode

2011-05-04 Thread Tatsuo Ishii
Oops. This should have been posted to pgpool-hackers. Sorry for noise.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

 Hi,
 
 I have just committed changes to enhance online recovery in streaming
 replication mode. Pgpool-II had to restart child process to recognize
 failback node. Now pgpool-II becomes a little bit smarter and does not
 need to restart children. This means that client sessions started
 before online recovery can continue while doing online recovery.
 
 The trick is to have local backend status cache in pgpool child. It
 copies backend status into local memory. Thus the process continues to
 disregard failback node until the session ends. Once the session ends,
 the process exits with status 1, and pgpool-II main restart new one,
 which will recognize failback node.
 
 Note that this enhancement is applied to pcp_attach_node as well.
 --
 Tatsuo Ishii
 SRA OSS, Inc. Japan
 English: http://www.sraoss.co.jp/index_en.php
 Japanese: http://www.sraoss.co.jp
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Some surprising precedence behavior in PG's grammar

2011-05-04 Thread Tom Lane
Greg Stark gsst...@mit.edu writes:
 Isn't there already some gadget which forces postfix operators to be
 discouraged compared to some other interpretation in other cases?

Yeah.  I'm not unhappy with the current grammar's behavior in this case.
What's bothering me is that the implementation seems likely to create
surprising/unexpected behaviors after future grammar changes.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Some surprising precedence behavior in PG's grammar

2011-05-04 Thread Greg Stark
On Thu, May 5, 2011 at 4:03 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Greg Stark gsst...@mit.edu writes:
 Isn't there already some gadget which forces postfix operators to be
 discouraged compared to some other interpretation in other cases?

 Yeah.  I'm not unhappy with the current grammar's behavior in this case.
 What's bothering me is that the implementation seems likely to create
 surprising/unexpected behaviors after future grammar changes.

I do wonder how much we really gain from having postfix operators.
Other than ! I've never seen one and of course anyone who wanted to
use one could just as easily use a prefix operator. In practice I
think most unary operators are just special cases of binary operators
anyways and often once you have the binary operator it's clearer to
just use that anyways.

A *lot* of grammar conflicts we've had to worry about end up going
away if we didn't have postfix operators.
-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers