Re: [HACKERS] Extensions makefiles - coverage

2013-09-24 Thread Ronan Dunklau
On Sunday 22 September 2013 01:34:53 Peter Eisentraut wrote:
 On Thu, 2013-07-25 at 17:07 +0200, Ronan Dunklau wrote:
  I am using approximatively the layout that was proposed here:
  http://www.postgresql.org/message-id/51bb1b6e.2070...@dunslane.net
  It looks like everything is hard-coded to take the source and the
  gcda, gcno files in the base directory, but these files lay in a src
  directory with the proposed layout.
 
 The PostgreSQL build system isn't going to work very well if you build
 files outside of the current directory.  If you want to put your source
 files into a src/ subdirectory, then your top-level makefile should to a
 $(MAKE) -C src, and you need to have a second makefile in the src
 directory.  If you do that, then the existing coverage targets will work
 alright, I think.

The PGXS build system allows for the definition of an OBJS variable, which 
works fine with almost every other make target.

Maybe we need to take a step back, and think about what kind of extension 
layouts we want to support ?

At the time of this writing, the HOW TO on http://manager.pgxn.org/howto 
strongly encourage to put all C-files in an src directory.

As a result, many extensions on pgxn use this layout. It would be great not to 
have to change them to measure code coverage.

signature.asc
Description: This is a digitally signed message part.


Re: [HACKERS] logical changeset generation v6

2013-09-24 Thread Andres Freund
On 2013-09-23 23:12:53 -0400, Robert Haas wrote:
 What exactly is the purpose of this tool?  My impression is that the
 output of logical replication is a series of function calls to a
 logical replication plugin, but does that plugin necessarily have to
 produce an output format that gets streamed to a client via a tool
 like this?

There needs to be a client acking the reception of the data in some
form. There's currently two output methods, SQL and walstreamer, but
there easily could be further, it's basically two functions you have
write.

There are several reasons I think the tool is useful, starting with the
fact that it makes the initial use of the feature easier. Writing a
client for CopyBoth messages wrapping 'w' style binary messages, with the
correct select() loop isn't exactly trivial. I also think it's actually
useful in real scenarios where you want to ship the data to a
remote system for auditing purposes.

 For example, for replication, I'd think you might want the
 plugin to connect to a remote database and directly shove the data in;

That sounds like a bad idea to me. If you pull the data from the remote
side, you get the data in a streaming fashion and the latency sensitive
part of issuing statements to your local database is done locally.
Doing things synchronously like that also makes it way harder to use
synchronous_commit = off on the remote side, which is a tremendous
efficiency win.

If somebody needs something like this, e.g. because they want to
replicate into hundreds of shards depending on some key or such, the
question I don't know is how to actually initiate the
streaming. Somebody would need to start the logical decoding.

 for materialized views, we might like to push the changes into delta
 relations within the source database.

Yes, that's not a bad usecase and I think the only thing missing to use
output plugins that way is a convenient function to tell up to where
data has been received (aka synced to disk, aka applied).

  In either case, there's no
 particular need for any sort of client at all, and in fact it would be
 much better if none were required.  The existence of a tool like
 pg_receivellog seems to presuppose that the goal is spit out logical
 change records as text, but I'm not sure that's actually going to be a
 very common thing to want to do...

It doesn't really rely on anything being text - I've used it with a
binary plugin without problems. Obviously you might not want to use -f -
but an actual file instead...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] record identical operator

2013-09-24 Thread Andres Freund
On 2013-09-23 21:21:53 -0400, Stephen Frost wrote:
 Here's an example to illustrate what I'm talking about when it comes
 down to you can't claim that you'll produce exactly what the query
 will always, with these types:
 
 =# table citext_table;
  id | name  
 +---
   1 | one
   3 | three
   5 | 
   4 | Three
   2 | Three
 (5 rows)
 
 =# create MATERIALIZED VIEW citext_matview AS select name from citext_table 
 where id  0 group by name;
 SELECT 3
 
 =# table citext_matview;
  name  
 ---
  
  one
  three
 (3 rows)

I don't really understand why you have a problem with this specific
thing here. Kevin's goal seems to be to make materialized views behave
consistent with the way a plain view would if the matviews would just
have been refreshed fully, concurrently or incrementally and there have
been no further data changes.
SELECT * FROM table WHERE candidate_key IN (...);
used in a view or plainly currently guarantees you that you get the
original casing for citext. And if you e.g. have some additional
expensive joins, such a matview can very well make sense.

What does it matter that ... GROUP BY citext_col; doesn't return the
same casing consistently? You're aggregating, not accessing the original
data. If you need to separate the different casings now, cast to text.

Now, I can perfectly understand having a bit of architectural qualms
about Kevin's approach of things on the SQL level. But this doesn't seem
to be the thread to discuss that. FWIW, I can't forsee any realistic
approach that actually won't do such comparisons (although not
necessarily through C).

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FW: REVIEW: Allow formatting in log_line_prefix

2013-09-24 Thread David Rowley
On Tue, Sep 24, 2013 at 5:16 AM, Robert Haas robertmh...@gmail.com wrote:

 On Fri, Sep 20, 2013 at 11:28 PM, David Rowley dgrowle...@gmail.com
 wrote:\
  I put the above results into the attached spreadsheet. On my intel i5
 laptop
  I'm seeing a slow down of about 1 second per million queries for the
 longer
  log_line_prefix and about 1 second per 5 million queries with the shorter
  log_line_prefix.
 
  In the attached  spreadsheet I've mocked up some very rough estimates on
 the
  performance cost of this change. I think a while ago I read about a
  benchmark Robert ran on a 64 core machine which ran around 400k
 transactions
  per second. I've included some workings in the spreadsheet to show how
 this
  change would affect that benchmark, but I have assumed that the hardware
  would only be able to execute the log_line_prefix function at the same
 speed
  as my i5 laptop. With this very worst case estimates my calculations say
  that the performance hit is 0.6% with the log_line_prefix which contains
 all
  of the variables and 0.11% for the shorter log_line_prefix. I would
 imagine
  that the hardware that performed 400k TPS would be able to run
  log_line_prefix faster than my 3 year old i5 laptop, so this is likely
 quite
  a big over estimation of the hit.

 Well, on those tests, I was hardly logging anything, so it's hard to
 really draw any conclusions that way.

 I think the real concern with this patch, performance-wise, is what
 happens in environments where there is a lot of logging going on.
 We've had previous reports of people who can't even enable the logging
 they want because the performance cost is unacceptably high.  That's
 why we added that logging hook in 9.2 or 9.3, so that people can write
 alternative loggers that just throw away messages if the recipient
 can't eat them fast enough.

 I wouldn't be keen to accept a 25% performance hit on logging overall,
 but log_line_prefix() is only a small part of that cost.

 So... I guess the question that I'd ask is, if you write a PL/pgsql
 function that does RAISE NOTICE in a loop a large number of times, can
 you measure any difference in how fast that function executes on the
 patch and unpatched code?  If so, how much?



Ok, I've been doing a bit of benchmarking on this. To mock up a really fast
I/O system I created a RAM disk which I will ask Postgres to send the log
files to.

mkdir /tmp/ramdisk; chmod 777 /tmp/ramdisk
mount -t tmpfs -o size=256M tmpfs /tmp/ramdisk/

I created the following function in plpgsql

create function stresslog(n int) returns int as $$
begin
while n  0 loop
raise notice '%', n;
n := n - 1;
end loop;
return 0;
end;
$$ language plpgsql;

I was running postgreql with
david@ubuntu64:~/9.4/bin$ ./pg_ctl start -D /home/david/9.4/data -l
/tmp/ramdisk/pg.log

I ran the following command 5 times for each version.
client_min_message is set to warning and log_min_message is set to notice

postgres=# select stresslog(10);
 stresslog
---
 0
(1 row)

I do see a 15-18% slow down with the patched version, so perhaps I'll need
to look to see if I can speed it up a bit, although I do feel this
benchmark is not quite a normal workload.


Please see below the results, or if you want to play about with them a bit,
please use the attached spreadsheet.


* Round 1

Patched: log_line_prefix = %a %u %d %r %h %p %t %m %i %e %c %l %s %v %x 

Time:1822.105ms
Time:1762.529ms
Time:1839.724ms
Time:1837.372ms
Time:1813.240ms

unpatched: log_line_prefix = %a %u %d %r %h %p %t %m %i %e %c %l %s %v %x 

Time:1519.032ms
Time:1528.760ms
Time:1484.074ms
Time:1552.786ms
Time:1569.410ms

* Round 2

patched:  log_line_prefix = %a %u %d %e 

Time:625.969ms
Time:668.444ms
Time:648.310ms
Time:663.655ms
Time:715.397ms



unpatched:  log_line_prefix = %a %u %d %e 

Time:598.146ms
Time:518.827ms
Time:532.858ms
Time:529.584ms
Time:537.276ms


Regards

David Rowley


log_line_prefix_benchmark_stresslog_v0.4.xls
Description: MS-Excel spreadsheet

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE - visibility semantics

2013-09-24 Thread Andres Freund
Hi,

Various messages are discussing semantics around visibility. I by now
have a hard time keeping track. So let's keep the discussion of the
desired semantics to this thread.

There have been some remarks about serialization failures in read
committed transactions. I agree, those shouldn't occur. But I don't
actually think they are so much of a problem if we follow the path set
by existing uses of the EPQ logic. The scenario described seems to be an
UPSERT conflicting with a row it cannot see in the original snapshot of
the query.
In that case I think we just have to follow the example laid by
ExecUpdate, ExecDelete and heap_lock_tuple. Use the EPQ machinery (or an
alternative approach with similar enough semantics) to get a new
snapshot and follow the ctid chain. When we've found the end of the
chain we try to update that tuple.
That surely isn't free of surprising semantics, but it would follows existing
semantics. Which everybody writing concurrent applications in read
committed should (but doesn't) know. Adding a different set of semantics
seems like a bad idea.
Robert seems to have been the primary sceptic around this, what scenario
are you actually concerned about?

There are some scenarios that doesn't trivially answer. But I'd like to
understand the primary concerns first.

Regards,

Andres

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] record identical operator

2013-09-24 Thread Hannu Krosing
On 09/23/2013 10:38 PM, Stephen Frost wrote:

 and heavily caveated.
 I'm not sure what caveats would be needed.  It seems to me that a
 clear description of what it does would suffice.  Like all the
 other non-default opclasses in core, it will be non-default because
 it is less frequently useful.
 This will claim things are different, even when they aren't different
 when cast to text, or possibly even when extracted in binary mode,
 ensure this is really what you want is a pretty big caveat, imv.
Yes, it should be documented that it tests for sameness and gives
no guarantees that lack of sameness means different (as
determined by some other operator)

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ENABLE/DISABLE CONSTRAINT NAME

2013-09-24 Thread Bernd Helmle



--On 13. September 2013 20:17:19 -0400 Robert Haas robertmh...@gmail.com 
wrote:



You're missing the point.  Peter wasn't worried that your patch throws
an error; he's concerned about the fact that it doesn't.

In PostgreSQL, you can only create the following view because test1
has a primary key over column a:

= create table test1 (a int constraint pk primary key, b text);
= create view test2 as select a, b from test1 group by a;
= alter table test1 drop constraint pk;

The reason that, if the primary key weren't there, it would be
ambiguous which row should be returned as among multiple values where
a is equal and b is not.  If you can disable the constraint, then you
can create precisely that problem.


Hmm not sure i understand this argument either: this patch doesn't allow 
disabling a primary key. It only supports FKs and CHECK constraints 
explicitly.


--
Thanks

Bernd


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reasoning behind LWLOCK_PADDED_SIZE/increase it to a full cacheline

2013-09-24 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 So, what we do is we guarantee that LWLocks are aligned to 16 or 32byte
 boundaries. That means that on x86-64 (64byte cachelines, 24bytes
 unpadded lwlock) two lwlocks share a cacheline.

Yup.

 In my benchmarks changing the padding to 64byte increases performance in
 workloads with contended lwlocks considerably.

At a huge cost in RAM.  Remember we make two LWLocks per shared buffer.

I think that rather than using a blunt instrument like that, we ought to
see if we can identify pairs of hot LWLocks and make sure they're not
adjacent.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reasoning behind LWLOCK_PADDED_SIZE/increase it to a full cacheline

2013-09-24 Thread Andres Freund
On 2013-09-24 12:39:39 +0200, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  So, what we do is we guarantee that LWLocks are aligned to 16 or 32byte
  boundaries. That means that on x86-64 (64byte cachelines, 24bytes
  unpadded lwlock) two lwlocks share a cacheline.

  In my benchmarks changing the padding to 64byte increases performance in
  workloads with contended lwlocks considerably.
 
 At a huge cost in RAM.  Remember we make two LWLocks per shared buffer.

 I think that rather than using a blunt instrument like that, we ought to
 see if we can identify pairs of hot LWLocks and make sure they're not
 adjacent.

That's a good point. What about making all but the shared buffer lwlocks
64bytes? It seems hard to analyze the interactions between all the locks
and keep it maintained.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] record identical operator

2013-09-24 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 Your example demonstrates that if a given query can generate two
 different outputs, A and B, based on the same input data, then the
 contents of the materialized view cannot be equal to be A and also
 equal to B.  Well, no duh.

Two different outputs based on what *plan* is chosen.

 Of course, you don't need citext, or any other data type with a loose
 notion of equality, to generate that sort of problem:
[...]
 rhaas=# set datestyle = 'german';
 SET

I'm talking about *planner differences* changing the results.  If we've
got a ton of cases where a different plan means different output, then
we've got some serious problems.  I'd argue that it's pretty darn clear
that datestyle is going to be a *slightly* different animal.  My example
doesn't *require* changing any GUCs, it was just expedient for
illustration.

 But I'm still wondering what this is intended to prove.  

These types are, to those that use them at least, a known quantity wrt
what you get when using them in GROUP BYs, JOINs, etc.  You're trying
to 'fix' something that isn't really broken and you're only doing it
half-baked anyway because your 'fix' isn't going to actually make these
types produce consistent results.

 There are an
 infinite number of ways to write queries that produce different
 results, and I think we all know that materialized views aren't going
 to hold up very well if given such queries.  That seems a poor excuse
 for not fixing the cases that can be made to work.

New operators are not without cost, I don't think it's a good idea to
expose our internal binary representations of data out to the SQL level,
and the justification for adding them is that they solve a case that
they don't.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] all_visible replay aborting due to uninitialized pages

2013-09-24 Thread Andres Freund
On 2013-09-23 14:41:16 +0300, Heikki Linnakangas wrote:
 On 06.06.2013 17:22, Robert Haas wrote:
 On Thu, May 30, 2013 at 2:29 AM, Andres Freundand...@2ndquadrant.com  
 wrote:
 Yeah, I think it's fine.  The patch also looks fine, although I think
 the comments could use a bit of tidying.  I guess we need to
 back-patch this all the way back to 8.4?  It will require some
 adjustments for the older branches.
 
 I think 9.2 is actually far enough and it should apply there. Before
 that we only logged the unsetting of all_visible via
 heap_(inset|update|delete)'s wal records not the setting as far as I can
 tell. So I don't immediately see a danger  9.2.
 
 OK.  I have committed this.  For 9.2, I had to backport
 log_newpage_buffer() and use XLByteEQ rather than ==.
 
 I'm afraid this patch was a few bricks shy of a load. The
 log_newpage_buffer() function asserts that:
 
  /* We should be in a critical section. */
  Assert(CritSectionCount  0);
 
 But the call in vacuumlazy.c is not inside a critical section. Also, the
 comments in log_newpage_buffer() say that the caller should mark the buffer
 dirty *before* calling log_newpage_buffer(), but in vacuumlazy.c, it's
 marked dirty afterwards. I'm not sure what consequences that might have, but
 at least it contradicts the comment.
 
 (spotted this while working on a patch, and ran into the assertion on crash
 recovery)

What about the attached patches (one for 9.3 and master, the other for
9.2)? I've tested that I can trigger the assert before and not after by
inserting faults...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From 4fe13d241062c3aa47ddfe3cfb967d80809aa826 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Tue, 24 Sep 2013 11:58:34 +0200
Subject: [PATCH] Use critical section when ensuring empty pages are
 initialized during vacuum.

a6370fd9 tried to fix the problem that replay of XLOG_HEAP2_VISIBLE
records can fail when unitialized page are referenced during
replay. Unfortunately the patch/I missed the fact that log_newpage()
should be used inside a critical section leading to assertion failure
during WAL replay when those are enabled while working otherwise.

Also fix the issue that pages should be marked dirty before calling
log_newpage_buffer() (check src/backend/access/transam/README for
reasons).

Both issues noticed Heikki
---
 src/backend/commands/vacuumlazy.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index bb4e03e..af7cd59 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -663,6 +663,11 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
 			/* empty pages are always all-visible */
 			if (!PageIsAllVisible(page))
 			{
+START_CRIT_SECTION();
+
+/* dirty page before any possible XLogInsert()s */
+MarkBufferDirty(buf);
+
 /*
  * It's possible that another backend has extended the heap,
  * initialized the page, and then failed to WAL-log the page
@@ -682,9 +687,9 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
 	log_newpage_buffer(buf);
 
 PageSetAllVisible(page);
-MarkBufferDirty(buf);
 visibilitymap_set(onerel, blkno, buf, InvalidXLogRecPtr,
   vmbuffer, InvalidTransactionId);
+END_CRIT_SECTION();
 			}
 
 			UnlockReleaseBuffer(buf);
-- 
1.8.4.21.g992c386.dirty

From 2aee405f62207bd7691fd13225ed5be62cc1033a Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Tue, 24 Sep 2013 11:58:34 +0200
Subject: [PATCH] Use critical section when ensuring empty pages are
 initialized during vacuum.

a6370fd9 tried to fix the problem that replay of XLOG_HEAP2_VISIBLE
records can fail when unitialized page are referenced during
replay. Unfortunately the patch/I missed the fact that log_newpage()
should be used inside a critical section leading to assertion failure
during WAL replay when those are enabled while working otherwise.

Also fix the issue that pages should be marked dirty before calling
log_newpage_buffer() (check src/backend/access/transam/README for
reasons).

Both issues noticed Heikki
---
 src/backend/commands/vacuumlazy.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index ff8764b..87757aa 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -650,6 +650,11 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
 			/* empty pages are always all-visible */
 			if (!PageIsAllVisible(page))
 			{
+START_CRIT_SECTION();
+
+/* dirty page before any possible XLogInsert()s */
+MarkBufferDirty(buf);
+
 /*
  * It's possible that another backend has extended the heap,
  * initialized the page, and then failed to WAL-log the 

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-24 Thread MauMau

From: Peter Eisentraut pete...@gmx.net

That assumes that the conversion client encoding - server encoding -
NCHAR encoding is not lossy.


Yes, so Tatsuo san suggested to restrict server encoding - NCHAR encoding 
combination to those with lossless conversion.




I thought one main point of this exercise
was the avoid these conversions and be able to go straight from client
encoding into NCHAR.


It's slightly different.  Please see the following excerpt:

http://www.postgresql.org/message-id/B1A7485194DE4FDAB8FA781AFB570079@maumau

4. I guess some users really want to continue to use ShiftJIS or EUC_JP for
database encoding, and use NCHAR for a limited set of columns to store
international text in Unicode:
- to avoid code conversion between the server and the client for performance
- because ShiftJIS and EUC_JP require less amount of storage (2 bytes for
most Kanji) than UTF-8 (3 bytes)
This use case is described in chapter 6 of Oracle Database Globalization
Support Guide.



Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Strange hanging bug in a simple milter

2013-09-24 Thread Vesa-Matti J Kari

Hello,

On Mon, 23 Sep 2013, Stephen Frost wrote:

 I've now committed a fix for this issue.

I cloned the 9.4devel branch and linked my authmilter and a test program
(based on Heikki's earlier design) against the libpq that comes with it.

After hours of pretty extensive stress testing using 2, 3, 4, 5, 6,
50, 100 and 500 parallel threads, I have observed no deadlocks. Everything
runs smoothly.

Many thanks to all who contributed to the fix.

Regards,
vmk
-- 

   Tietotekniikkakeskus / Helsingin yliopisto
 IT department / University of Helsinki



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] record identical operator

2013-09-24 Thread Kevin Grittner
Stephen Frost sfr...@snowman.net wrote:

 Skipping out on much of the side-discussion to try and drive at
 the heart of this..

 Robert Haas (robertmh...@gmail.com) wrote:
 I would suggest that you read the referenced papers for details
 as to how the algorithm works.  To make a long story short, they
 do need to keep track of what's changed, and how.

 I suppose it's good to know that I wasn't completely
 misunderstanding the discussion in Ottawa.

 However, that still seems largely orthogonal to the present
 discussion.

 It *solves* this issue, from where I'm sitting, without any
 binary operators at all.

 [ argument that the only way we should ever do REFRESH is by
 using captured deltas, through incremental maintenance techniques
 ]

That would ensure that a query could not be used to define a
matview until we had implemented incremental maintenance for
queries of that type and complexity.  I expect that to come *close*
to covering all useful queries that way will take five to ten
years, if all goes well.  The approach I'm taking makes all queries
available *now*, with improvements in how many can be maintained
incrementally improving over time.  This is the route every other
database I've looked at has taken (for good reason, I think).

Ultimately, even when we have incremental maintenance supported for
all queries that can be used to define a matview, I think there
will be demand for a REFRESH command that re-runs the base query.
Not only does that fit the workload for some matviews, but consider
this, from the paper I cited[1]:

| Recomputing the view from scratch is too wasteful in most cases.
| Using the heuristic of inertia (only a part of the view changes
| in response to changes in the base relations), it is often
| cheaper to compute only the changes in the view.  We stress that
| the above is only a heuristic.  For example, if an entire base
| relation is deleted, it may be cheaper to recompute a view that
| depends on the deleted relation (if the new view will quickly
| evaluate to an empty relation) than to compute the changes to the
| view.

What we're talking about is a performance heuristic -- not
something more profound than that.  The route I'm taking is to get
it *working correctly now* using the simple technique, and then
embarking on the long journey of optimizing progressively more
cases.

What your argument boils down to IMV is essentially a case of
premature optimization.  You have yet to show any case where the
existing patch does not yield correct results, or show that there
is a better way to get to the point this patch takes us.

 [ later post shows a query that does not produce deterministic
 results ]

Sure, two direct runs of that same query, or two runs through a
regular view, could show different results (considering
synchronized scans, among other things).  I don't see what that
proves.  Of course a refresh of a matview is only going to produce
one of those and then will not produce a different result until it
is refreshed or (once we add incremental maintenance) something
changes in the underlying data.

Nobody ever claimed that a query which does not produce consistent
results would somehow produce them with this patch.  There are
queries using citext, numeric, and other types which *do* provide
consistent results which are consistently produced by a straight
query, a simple view, or a non-concurrent refresh of a materialized
view; this patch will cause a concurrent refresh to produce the
same results as those, rather than something different.  Period.
That's the point, and the whole point.  You have not shown that it
doesn't.  You have not shown why adding a 12th non-default opclass
is a particular problem here (although we have a consensus to use
different operators, to reserve this operator namespace for other
things).  You have not made any case at all for why people should
wait for incremental maintenance to be mature (a project which will
take years) before being able to use materialized views with
concurrent refreshes.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] A. Gupta, I. S. Mumick, and V. S. Subrahmanian.  Maintaining
Views Incrementally.  In SIGMOD 1993, pages 157-167.
http://dl.acm.org/citation.cfm?id=170066


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Strange hanging bug in a simple milter

2013-09-24 Thread Stephen Frost
* Vesa-Matti J Kari (vmk...@cc.helsinki.fi) wrote:
 Many thanks to all who contributed to the fix.

Great!  Thanks for the report and the testing.

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] 9.3 Json Array's

2013-09-24 Thread Andrew Dunstan


On 09/24/2013 12:59 AM, Chris Travers wrote:


I am still in the process of wrapping my head around the current JSON 
logic.  I hope to produce a proof of concept that can later be turned 
into a patch.  See my previous post on this topic.  Again 
collaboration is welcome.






Feel free to ask questions.

The heart of the API is the event handlers defined in this stuct in 
include/utils/jsonapi.h:


   typedef struct JsonSemAction
   {
void   *semstate;
json_struct_action object_start;
json_struct_action object_end;
json_struct_action array_start;
json_struct_action array_end;
json_ofield_action object_field_start;
json_ofield_action object_field_end;
json_aelem_action array_element_start;
json_aelem_action array_element_end;
json_scalar_action scalar;
   } JsonSemAction;


Basically there is a handler for the start and end of each non-scalar 
structural element in JSON, plus a handler for scalars.


There are several problems that will be posed by processing nested 
arrays and objects, including:


 * in effect you would need to construct a stack of state that could be
   pushed and popped
 * JSON arrays aren't a very good match for SQL arrays - they are
   unidimensional and heterogenous.


I'm not saying this can't be done - it will just take a bit of effort.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 9.3 beta breaks some extensions make install

2013-09-24 Thread Andrew Dunstan


On 09/23/2013 12:15 PM, Cédric Villemain wrote:


I'm working on it. It appears to have a slight problem or two I want
to fix at the same time, rather than backpatch something broken.

Would you mind sharing the problems you are facing ?
You've noticed the problem about installdirs, I suppose. The attached
patch is the fix currently applyed at debian.



I will when I'm sure it's not a case of pilot error on my part.

cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] record identical operator

2013-09-24 Thread Robert Haas
On Tue, Sep 24, 2013 at 7:14 AM, Stephen Frost sfr...@snowman.net wrote:
 Of course, you don't need citext, or any other data type with a loose
 notion of equality, to generate that sort of problem:
 [...]
 rhaas=# set datestyle = 'german';
 SET

 I'm talking about *planner differences* changing the results.  If we've
 got a ton of cases where a different plan means different output, then
 we've got some serious problems.  I'd argue that it's pretty darn clear
 that datestyle is going to be a *slightly* different animal.  My example
 doesn't *require* changing any GUCs, it was just expedient for
 illustration.

I'm completely confused, here.  What's so special about planner
differences?  Obviously, there are tons of queries that can produce
different results based on planner differences, but materialized views
didn't create that problem and they aren't responsible for solving it.

Also, even restricting ourselves to planner differences, there's no
particular link between the behavior of the type's equality operator
and whether or not the query always produces the same results.  For
example, let's consider good old text.

rhaas=# create table tag_data (id integer, tag text, userid text,
primary key (id, tag));
CREATE TABLE
rhaas=# insert into tag_data values (1, 'foo', 'tom'), (1, 'bar',
'dick'), (2, 'baz', 'harry');
INSERT 0 3
rhaas=# select id, string_agg(tag||':'||userid, ', ') tags from
tag_data group by 1;
 id |   tags
+---
  1 | foo:tom, bar:dick
  2 | baz:harry
(2 rows)

rhaas=# set enable_seqscan=false;
SET
rhaas=# select id, string_agg(tag||':'||userid, ', ') tags from
tag_data group by 1;
 id |   tags
+---
  1 | bar:dick, foo:tom
  2 | baz:harry
(2 rows)

Now texteq() is just a glorified version of memcmp(), so what does
this complaint possibly have to do with Kevin's patch, or even
materialized views in general?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cube extension point support // GSoC'13

2013-09-24 Thread Stas Kelvich
Hello

There is new version of patch. I have separated ordering operators to different 
patch (https://commitfest.postgresql.org/action/patch_view?id=1243), fixed 
formatting issues and implemented backward compatibility with old-style points 
in cube_is_point() and cube_out().

Also comparing output files I've discovered that this four files is combination 
of two types of different behavior:

1) SELECT '-1e-700'::cube AS cube;
can be (0) or (-0)

2) Amount of zeros in exponent of floating point, i.e. SELECT '1e27'::cube AS 
cube;
 can be (1e+027) or (1e+27)

On my system (OSX) it is second option in both situations. I've also tested it 
on FreeBSD 9.0 and Ubuntu 12.04 with the same results. So is there some ideas 
how can I reproduce such results?

Stas.



points.diff
Description: Binary data


On Sep 16, 2013, at 10:48 AM, Heikki Linnakangas wrote:

 On 12.07.2013 14:57, Stas Kelvich wrote:
 Hello.
 
 here is a patch adding to cube extension support for compressed 
 representation of point cubes. If cube is a point, i.e. has coincident lower 
 left and upper right corners, than only one corner is stored. First bit of 
 the cube header indicates whether the cube is point or not. Few moments:
 
 * Patch preserves binary compatibility with old indices
 * All functions that create cubes from user input, check whether it is a 
 point or not
 * All internal functions that can return cubes takes care of all cases where 
 a cube might become a point
 
 Great!
 
 cube_is_point() needs to still handle old-style points. An NDBOX without the 
 point-flag set, where the ll and ur coordinates for each dimension are the 
 same, still needs to be considered a point. Even if you are careful to never 
 construct such structs in the code, they can be present on-disk if you have 
 upgraded from an earlier version with pg_upgrade. Same in cube_out().
 
 * Added tests for checking correct point behavior
 
 You'll need to adjust all the expected output files, not only cube_1.out.
 
 Also this patch includes adapted Alexander Korotkov's patch with kNN-based 
 ordering operator, which he wrote for postgresql-9.0beta1 with knngist 
 patch. More info there 
 http://www.postgresql.org/message-id/aanlktimhfaq6hcibrnk0tlcqmiyhywhwaq2zd87wb...@mail.gmail.com
 
 To make review easier, it would be better to keep that as a separate patch, 
 actually. Could you split it up again, please?
 
 - Heikki


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3 Json Array's

2013-09-24 Thread Chris Travers


 On 24 September 2013 at 13:46 Andrew Dunstan and...@dunslane.net wrote:
 
 
 Feel free to ask questions.
 
 The heart of the API is the event handlers defined in this stuct in
 include/utils/jsonapi.h:
 
 typedef struct JsonSemAction
 {
  void   *semstate;
  json_struct_action object_start;
  json_struct_action object_end;
  json_struct_action array_start;
  json_struct_action array_end;
  json_ofield_action object_field_start;
  json_ofield_action object_field_end;
  json_aelem_action array_element_start;
  json_aelem_action array_element_end;
  json_scalar_action scalar;
 } JsonSemAction;
 
 
 Basically there is a handler for the start and end of each non-scalar
 structural element in JSON, plus a handler for scalars.
 
 There are several problems that will be posed by processing nested
 arrays and objects, including:
 
   * in effect you would need to construct a stack of state that could be
 pushed and popped

True.

   * JSON arrays aren't a very good match for SQL arrays - they are
 unidimensional and heterogenous.

This is true, but I think one would have to start with an assumption that the
data is valid for an SQL type and then check again once one gets it done.
   JSON is a pretty flexible format which makes it a poor match in many cases
for SQL types generally.  But I think the example so far (with
json_populate_recordset) is a good one, namely a best effort conversion.

 
 
 I'm not saying this can't be done - it will just take a bit of effort.

Yeah, looking through the code, I think it will be more work than I originally
thought but that just means it will take longer.
 
 cheers
 
 andrew
 
 
 
 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers
Best Wishes,
Chris Travers
http://www.2ndquadrant.com
PostgreSQL Services, Training, and Support

Re: [HACKERS] Reasoning behind LWLOCK_PADDED_SIZE/increase it to a full cacheline

2013-09-24 Thread Robert Haas
On Tue, Sep 24, 2013 at 6:48 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-09-24 12:39:39 +0200, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  So, what we do is we guarantee that LWLocks are aligned to 16 or 32byte
  boundaries. That means that on x86-64 (64byte cachelines, 24bytes
  unpadded lwlock) two lwlocks share a cacheline.

  In my benchmarks changing the padding to 64byte increases performance in
  workloads with contended lwlocks considerably.

 At a huge cost in RAM.  Remember we make two LWLocks per shared buffer.

 I think that rather than using a blunt instrument like that, we ought to
 see if we can identify pairs of hot LWLocks and make sure they're not
 adjacent.

 That's a good point. What about making all but the shared buffer lwlocks
 64bytes? It seems hard to analyze the interactions between all the locks
 and keep it maintained.

I think somebody had a patch a few years ago that made it so that the
LWLocks didn't have to be in a single array, but could instead be
anywhere in shared memory.  Internally, lwlock.c held onto LWLock
pointers instead of LWLockIds.  That idea seems like it might be worth
revisiting, in terms of giving us more options as to how LWLocks can
be laid out in shared memory.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FW: REVIEW: Allow formatting in log_line_prefix

2013-09-24 Thread Robert Haas
On Tue, Sep 24, 2013 at 5:04 AM, David Rowley dgrowle...@gmail.com wrote:
 So... I guess the question that I'd ask is, if you write a PL/pgsql
 function that does RAISE NOTICE in a loop a large number of times, can
 you measure any difference in how fast that function executes on the
 patch and unpatched code?  If so, how much?
 I do see a 15-18% slow down with the patched version, so perhaps I'll need
 to look to see if I can speed it up a bit, although I do feel this benchmark
 is not quite a normal workload.

Ouch!  That's pretty painful.  I agree that's not a normal workload,
but I don't think it's an entirely unfair benchmark, either.  There
certainly are people who suffer because of the cost of logging as
things are; for example, log_min_duration_statement is commonly used
and can produce massive amounts of output on a busy system.

I wouldn't mind too much if the slowdown you are seeing only occurred
when the feature is actually used, but taking a 15-18% hit on logging
even when the new formatting features aren't being used seems too
expensive to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Assertions in PL/PgSQL

2013-09-24 Thread Robert Haas
On Mon, Sep 23, 2013 at 5:48 AM, Amit Khandekar
amit.khande...@enterprisedb.com wrote:
 The assert levels sound a bit like a user might be confused by these
 levels being present at both places: In the RAISE syntax itself, and the
 assert GUC level. But  I like the syntax. How about keeping the ASSERT
 keyword optional ? When we have WHEN, we anyway mean that we ware asserting
 that this condition must be true. So something like this :

 RAISE [ level ] 'format' [, expression [, ... ]] [ USING option =
 expression [, ... ] ];
 RAISE [ level ] condition_name [ USING option = expression [, ... ] ];
 RAISE [ level ] SQLSTATE 'sqlstate' [ USING option = expression [, ... ]
 ];
 RAISE [ level ] USING option = expression [, ... ];
 RAISE [ ASSERT ] WHEN bool_expression;
 RAISE ;


 I don't think so it is a good idea. WHEN clause should be independent on
 exception level.


 I am ok with generalizing the WHEN clause across all levels. The main
 proposal was for adding assertion support, so we can keep the WHEN
 generalization as a nice-to-have stuff and do it only if it comes as a
 natural extension in the assertion support patch.

I think that's right: ISTM that at this point there are two different
proposals here.

1. Allowing ASSERT as an argument to RAISE.

2. Allowing RAISE to have a WHEN clause.

Those two things are logically separate.  We could do either one
without doing the other one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] record identical operator

2013-09-24 Thread Stephen Frost
* Kevin Grittner (kgri...@ymail.com) wrote:
 That's the point, and the whole point.  You have not shown that it
 doesn't.  You have not shown why adding a 12th non-default opclass
 is a particular problem here (although we have a consensus to use
 different operators, to reserve this operator namespace for other
 things).  

We need justification to add operators, imv, especially ones that expose
our internal binary representation of data.  I worry that adding these
will come back to bite us later and that we're making promises we won't
be able to keep.

If these inconsistencies in what happens with these data types are an
issue then REFRESH can be handled as a wholesale DELETE/INSERT.  Trying
to do this incremental-but-not-really maintenance where the whole query
is run but we try to skimp on what's actually getting updated in the
matview is a premature optimization, imv, and one which may be less
performant and more painful, with more gotchas and challenges for our
users, to deal with in the long run.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] SSL renegotiation

2013-09-24 Thread Robert Haas
On Mon, Sep 23, 2013 at 4:51 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Here's an updated version; this mainly simplifies code, per comments
 from Andres (things were a bit too baroque in places due to the way the
 code had evolved, and I hadn't gone over it to simplify it).

 The only behavior change is that the renegotiation is requested 1kB
 before the limit is hit: the raise to 1% of the configured limit was
 removed.

What basis do we have for thinking that 1kB is definitely enough to
avoid spurious disconnects?

(I have a bad feeling that you're going to say something along the
lines of well, we tried it a bunch of times, and)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] record identical operator

2013-09-24 Thread Kevin Grittner
Stephen Frost sfr...@snowman.net wrote:

 We need justification to add operators, imv, especially ones that
 expose our internal binary representation of data.

I believe I have done so.

 I worry that adding these will come back to bite us later

How?

 and that we're making promises we won't be able to keep.

The promise that a concurrent refresh will produce the same set of
rows as a non-concurrent one?

 If these inconsistencies in what happens with these data types
 are an issue then REFRESH can be handled as a wholesale
 DELETE/INSERT.

I have real-life experience with handling faux materialized views
by creating tables (at Wisconsin Courts).  We initially did it the
way you describe, but the run time was excessive (in Milwaukee
County the overnight run did not always complete before the start
of business hours the next day).  Switching to logic similar to
what I've implemented here, it completed an order of magnitude
faster, and generated a small fraction of the WAL and logical
replication data that the technique you describe did.  Performing
preliminary steps in temporary tables to minimize the changes
needed to permanent tables seems beneficial enough on the face of
it that I think the burden of proof should be on someone arguing
that deleting all rows and re-inserting (in the same transaction)
is, in general, a superior approach to finding the differences and
applying only those/

 Trying to do this incremental-but-not-really maintenance where
 the whole query is run but we try to skimp on what's actually
 getting updated in the matview is a premature optimization, imv,
 and one which may be less performant and more painful, with more
 gotchas and challenges for our users, to deal with in the long
 run.

I have the evidence of a ten-fold performance improvement plus
minimized WAL and replication work on my side.  What evidence do
you have to back your assertions?  (Don't forget to work in bloat
and vacuum truncation issues to the costs of your proposal.)

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving avg performance for numeric

2013-09-24 Thread Robert Haas
On Mon, Sep 23, 2013 at 4:15 PM, Tomas Vondra t...@fuzzy.cz wrote:
 Seems ready for commiter to me. I'll wait a few days for others to
 comment, and then I'll update the commitfest page.

Some thoughts:

The documentation doesn't reflect the restriction to type internal.
On a related note, why restrict this to type internal?

Formatting fixes are needed:

+   if (aggtransspace  0)
+   {
+   costs-transitionSpace += aggtransspace;
+   }

Project style is not to use curly-braces for single statements.  Also,
the changes to numeric.c add blank lines before and after function
header comments, which is not the usual style.

!   if (state == NULL)
!   PG_RETURN_NULL();
!   else
!   PG_RETURN_POINTER(state);

I think this should just say PG_RETURN_POINTER(state).  PG_RETURN_NULL
is for returning an SQL NULL, not (void *) 0.  Is there some reason
why we need an SQL NULL here, rather than a NULL pointer?

On the whole this looks fairly solid on a first read-through.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-09-24 Thread Robert Haas
On Mon, Sep 23, 2013 at 7:05 PM, Peter Geoghegan p...@heroku.com wrote:
 It suits my purposes to have the value locks be held for only an
 instant, because:

 [ detailed explanation ]

I don't really disagree with any of that.  TBH, I think the question
of how long value locks (as you call them) are held is going to boil
down to a question of how they end up being implemented.   As I
mentioned to you at PG Open (going through the details here for those
following along at home), we could optimistically insert the new heap
tuple, then go add index entries for it, and if we find a conflict,
then instead of erroring out, we mark the tuple we were inserting dead
and go try update the conflicting tuple instead.  In that
implementation, if we find that we have to wait for some other
transaction along the way, it's probably not worth reversing out the
index entries already inserted, because getting them into the index in
the first place was a WAL-logged operation, and therefore relatively
expensive, and IMHO it's most likely better to just hope things work
out than to risk having to redo all of that.

On the other hand, if the locks are strictly in-memory, then the cost
of releasing them all before we go to wait, and of reacquiring them
after we finish waiting, is pretty low.  There might be some
modularity issues to work through there, but they might not turn out
to be very painful, and the advantages you mention are certainly worth
accruing if it turns out to be fairly straightforward.

Personally, I think that trying to keep it all in-memory is going to
be hard.  The problem is that we can't de-optimize regular inserts or
updates to any significant degree to cater to this feature - because
as valuable as this feature is, the number of times it gets used is
still going to be a whole lot smaller than the number of times it
doesn't get used.  Also, I tend to think that we might want to define
the operation as a REPLACE-type operation with respect to a certain
set of key columns; and so we'll do the insert-or-update behavior with
respect only to the index on those columns and let the chips fall
where they may with respect to any others.  In that case this all
becomes much less urgent.

 Even *considering* this is largely academic, though, because without
 some kind of miracle guaranteeing serial ordering, a miracle that
 doesn't allow for serialization failures and also doesn't seriously
 slow down, for example, updates (by making them care about value locks
 *before* they do anything, or in the case of HOT updates *at all*),
 all of this is _impossible_. So, I say let's just do the
 actually-serial-ordering for value lock acquisition with serialization
 failures where we're  read committed. I've seriously considered what
 it would take to do it any other way so things would work how you and
 Andres expect for read committed, and it makes my head hurt, because
 apart from seeming unnecessary to me, it also seems completely
 hopeless.

 Am I being too negative here? Well, I guess that's possible. The fact
 is that it's really hard to judge, because all of this is really hard
 to reason about. That's what I really don't like about it.

Suppose we define the operation as REPLACE rather than INSERT...ON
DUPLICATE KEY LOCK FOR UPDATE.  Then we could do something like this:

1. Try to insert a tuple.  If no unique index conflicts occur, stop.
2. Note the identity of the conflicting tuple and mark the inserted
heap tuple dead.
3. If the conflicting tuple's inserting transaction is still in
progress, wait for the inserting transaction to end.
4. If the conflicting tuple is dead (e.g. because the inserter
aborted), start over.
5. If the conflicting tuple's key columns no longer match the key
columns of the REPLACE operation, start over.
6. If the conflicting tuple has a valid xmax, wait for the deleting or
locking transaction to end.  If xmax is still valid, follow the CTID
chain to the updated tuple, let that be the new conflicting tuple, and
resume from step 5.
7. Update the tuple, even though it may be invisible to our snapshot
(a deliberate MVCC violation!).

While this behavior is admittedly wonky from an MVCC perspective, I
suspect that it would make a lot of people happy.

 I respectfully
 suggest that that exact definition of upsert isn't a useful one,
 because other snapshot isolation/MVCC systems operating within the
 same constraints must have the same issues, and yet they manage to
 implement something that could be called upsert that people seem happy
 with.

 Yeah.  I wonder how they do that.

 My guess is that they have some fancy snapshot type that is used by
 the equivalent of ModifyTable subplans, that is appropriately paranoid
 about the Halloween problem and so on. How that actually might work is
 far from clear, but it's a question that I have begun to consider. As
 I said, a concern is that it would be in tension with the generalized,
 composable syntax, where we don't explicitly have a special 

Re: [HACKERS] record identical operator

2013-09-24 Thread Stephen Frost
* Kevin Grittner (kgri...@ymail.com) wrote:
 Stephen Frost sfr...@snowman.net wrote:
  I worry that adding these will come back to bite us later
 
 How?

User misuse is certainly one consideration, but I wonder what's going to
happen if we change our internal representation of data (eg: numerics
get changed again), or when incremental matview maintenance happens and
we start looking at subsets of rows instead of the entire query.  Will
the first update of a matview after a change to numeric's internal data
structure cause the entire thing to be rewritten?

  and that we're making promises we won't be able to keep.
 
 The promise that a concurrent refresh will produce the same set of
 rows as a non-concurrent one?

The promise that we'll always return the binary representation of the
data that we saw last.  When greatest(x,y) comes back 'false' for a
MAX(), we then have to go check well, does the type consider them
equal?, because, if the type considers them equal, we then have to
decide if we should replace x with y anyway, because it's different
at a binary level.  That's what we're saying we'll always do now.

We're also saying that we'll replace things based on plan differences
rather than based on if the rows underneath actually changed at all.
We could end up with material differences in the result of matviews
updated through incremental REFRESH and matviews updated through
actual incremental mainteance- and people may *care* about those
because we've told them (or they discover) they can depend on these
types of changes to be reflected in the result.

  Trying to do this incremental-but-not-really maintenance where
  the whole query is run but we try to skimp on what's actually
  getting updated in the matview is a premature optimization, imv,
  and one which may be less performant and more painful, with more
  gotchas and challenges for our users, to deal with in the long
  run.
 
 I have the evidence of a ten-fold performance improvement plus
 minimized WAL and replication work on my side.  What evidence do
 you have to back your assertions?  (Don't forget to work in bloat
 and vacuum truncation issues to the costs of your proposal.)

I don't doubt that there are cases in both directions and I'm not trying
to argue that it'd always be faster, but I doubt it's always slower.
I'm surprised that you had a case where the query was apparently quite
fast yet the data set hardly changed and resulted in a very large result
but I don't doubt that it happened.  What I was trying to get at is
really that the delete/insert approach would be good enough in very many
cases and it wouldn't have what look, to me anyway, as some pretty ugly
warts around these cases.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] ENABLE/DISABLE CONSTRAINT NAME

2013-09-24 Thread Robert Haas
On Tue, Sep 24, 2013 at 5:58 AM, Bernd Helmle maili...@oopsware.de wrote:
 --On 13. September 2013 20:17:19 -0400 Robert Haas robertmh...@gmail.com
 wrote:
 You're missing the point.  Peter wasn't worried that your patch throws
 an error; he's concerned about the fact that it doesn't.

 In PostgreSQL, you can only create the following view because test1
 has a primary key over column a:

 = create table test1 (a int constraint pk primary key, b text);
 = create view test2 as select a, b from test1 group by a;
 = alter table test1 drop constraint pk;

 The reason that, if the primary key weren't there, it would be
 ambiguous which row should be returned as among multiple values where
 a is equal and b is not.  If you can disable the constraint, then you
 can create precisely that problem.

 Hmm not sure i understand this argument either: this patch doesn't allow
 disabling a primary key. It only supports FKs and CHECK constraints
 explicitly.

Well, that is certainly one way of skating around the specific concern
Peter raised.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE - visibility semantics

2013-09-24 Thread Robert Haas
On Tue, Sep 24, 2013 at 5:14 AM, Andres Freund and...@2ndquadrant.com wrote:
 Various messages are discussing semantics around visibility. I by now
 have a hard time keeping track. So let's keep the discussion of the
 desired semantics to this thread.

 There have been some remarks about serialization failures in read
 committed transactions. I agree, those shouldn't occur. But I don't
 actually think they are so much of a problem if we follow the path set
 by existing uses of the EPQ logic. The scenario described seems to be an
 UPSERT conflicting with a row it cannot see in the original snapshot of
 the query.
 In that case I think we just have to follow the example laid by
 ExecUpdate, ExecDelete and heap_lock_tuple. Use the EPQ machinery (or an
 alternative approach with similar enough semantics) to get a new
 snapshot and follow the ctid chain. When we've found the end of the
 chain we try to update that tuple.
 That surely isn't free of surprising semantics, but it would follows existing
 semantics. Which everybody writing concurrent applications in read
 committed should (but doesn't) know. Adding a different set of semantics
 seems like a bad idea.
 Robert seems to have been the primary sceptic around this, what scenario
 are you actually concerned about?

I'm not skeptical about offering it as an option; in fact, I just
suggested basically the same thing on the other thread, before reading
this.  Nonetheless it IS an MVCC violation; the chances that someone
will be able to demonstrate serialization anomalies that can't occur
today with this new facility seem very high to me.  I feel it's
perfectly fine to respond to that by saying: yep, we know that's
possible, if it's a concern in your environment then don't use this
feature.  But it should be clearly documented.

I do think that it will be easier to get this to work if we have a
define the operation as REPLACE, bundling all of the magic inside a
single SQL command.  If the user issues an INSERT first and then must
try an UPDATE afterwards if the INSERT doesn't actually insert, then
you're going to have problems if the UPDATE can't see the tuple with
which the INSERT conflicted, and you're going to need some kind of a
loop in case the UPDATE itself fails.  Even if we can work out all the
details, a single command that does insert-or-update seems like it
will be easier to use and more efficient.  You might also want to
insert multiple tuples using INSERT ... VALUES (...), (...), (...);
figuring out which ones were inserted and which ones must now be
updated seems like a chore better avoided.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6

2013-09-24 Thread Robert Haas
On Tue, Sep 24, 2013 at 4:15 AM, Andres Freund and...@2ndquadrant.com wrote:
 There needs to be a client acking the reception of the data in some
 form. There's currently two output methods, SQL and walstreamer, but
 there easily could be further, it's basically two functions you have
 write.

 There are several reasons I think the tool is useful, starting with the
 fact that it makes the initial use of the feature easier. Writing a
 client for CopyBoth messages wrapping 'w' style binary messages, with the
 correct select() loop isn't exactly trivial. I also think it's actually
 useful in real scenarios where you want to ship the data to a
 remote system for auditing purposes.

I have two basic points here:

- Requiring a client is a short-sighted design.  There's no reason we
shouldn't *support* having a client, but IMHO it shouldn't be the only
way to use the feature.

- Suppose that you use pg_receivellog (or whatever we decide to call
it) to suck down logical replication messages.  What exactly are you
going to do with that data once you've got it?  In the case of
pg_receivexlog it's quite obvious what you will do with the received
files: you'll store them in archive of some kind and maybe eventually
use them for archive recovery, streaming replication, or PITR.  But
the answer here is a lot less obvious, at least to me.

 For example, for replication, I'd think you might want the
 plugin to connect to a remote database and directly shove the data in;

 That sounds like a bad idea to me. If you pull the data from the remote
 side, you get the data in a streaming fashion and the latency sensitive
 part of issuing statements to your local database is done locally.
 Doing things synchronously like that also makes it way harder to use
 synchronous_commit = off on the remote side, which is a tremendous
 efficiency win.

This sounds like the voice of experience talking, so I won't argue too
much, but I don't think it's central to my point.  And anyhow, even if
it is a bad idea, that doesn't mean someone won't want to do it.  :-)

 If somebody needs something like this, e.g. because they want to
 replicate into hundreds of shards depending on some key or such, the
 question I don't know is how to actually initiate the
 streaming. Somebody would need to start the logical decoding.

Sounds like a job for a background worker.  It would be pretty swell
if you could write a background worker that connects to a logical
replication slot and then does whatever.

 for materialized views, we might like to push the changes into delta
 relations within the source database.

 Yes, that's not a bad usecase and I think the only thing missing to use
 output plugins that way is a convenient function to tell up to where
 data has been received (aka synced to disk, aka applied).

Yes.  It feels to me (and I only work here) like the job of the output
plugin ought to be to put the data somewhere, and the replication code
shouldn't make too many assumptions about where it's actually going.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] trivial one-off memory leak in guc-file.l ParseConfigFile

2013-09-24 Thread Robert Haas
On Sun, Sep 22, 2013 at 3:40 PM, didier did...@gmail.com wrote:
 fix a small memory leak in guc-file.l ParseConfigFile

 AbsoluteConfigLocation() return a strdup string but it's never free or
 referenced outside ParseConfigFile

 Courtesy Valgrind and Noah Misch MEMPOOL work.

I'd like to look at this, but I haven't got time right now.  Could you
add it to the next CommitFest so it doesn't get forgotten about?

https://commitfest.postgresql.org/action/commitfest_view/open

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LDAP: bugfix and deprecated OpenLDAP API

2013-09-24 Thread Albe Laurenz
Abhijit Menon-Sen wrote:
 I read through the patch, and it looks sensible.

Thanks for the thorough review!

 I would have preferred the ldap_simple_bind_s() call in the HAVE_LIBLDAP
 branch to not be inside an else {} (the if block above returns if there
 is an error anyway), but that's a minor point.

I agree, changed.

 I tested the code as follows:
 
 1. Built the patched source --with-ldap
 2. Set up ~/.pg_service.conf:
 
 [foo]
 
 ldap://localhost:3343/cn=dbserver,cn=hosts?pgconnectinfo?base?(objectclass=*)
 
 ldap://localhost:3443/cn=dbserver,cn=hosts?pgconnectinfo?base?(objectclass=*)
 
 3. iptables -A INPUT -p tcp -d 127.0.0.1 --dport 3343 -j DROP
 4. netcat -l 127.0.0.1 3343 ; netcat -l 127.0.0.1 3443
 5. PGSERVICE=foo bin/psql
 
 psql did connect to localhost:3443 after a few seconds of trying to
 connect to :3343 and failing. (I tried without the iptables rule, so
 I know that it does try to connect to both.)
 
 This doesn't seem to handle timeouts in the sense of a server that
 doesn't respond after you connect (or perhaps the timeout was long
 enough that it outlasted my patience), but that's not the fault of
 this patch, anyway.

That's a bug.
I thought that setting LDAP_OPT_NETWORK_TIMEOUT would also
take care of an unresponsive server, but it seems that I was wrong.

The attached patch uses ldap_simple_bind / ldap_result to handle
this kind of timeout (like the original code).

 I can't say anything about the patch on Windows, but since Magnus seemed
 to think it was OK, I'm marking this ready for committer.

The Windows part should handle all kinds of timeouts the same.

Yours,
Laurenz Albe


ldap-bug-3.patch
Description: ldap-bug-3.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] record identical operator

2013-09-24 Thread Kevin Grittner
Stephen Frost sfr...@snowman.net wrote:
 Kevin Grittner (kgri...@ymail.com) wrote:
 Stephen Frost sfr...@snowman.net wrote:
  I worry that adding these will come back to bite us later

 How?

 User misuse is certainly one consideration,

I think that one has been talked to death already, with the
consensus that we should use different operator names and document
them as a result.  Any comparison operator can be misused.  Andres
said he has seen the operators from text_pattern_ops used in
production; but we have not removed, for example, ~=~ from our
text operators as a result.

 but I wonder what's going to
 happen if we change our internal representation of data (eg: numerics
 get changed again), or when incremental matview maintenance happens and
 we start looking at subsets of rows instead of the entire query.  Will
 the first update of a matview after a change to numeric's internal data
 structure cause the entire thing to be rewritten?

Something like that could happen, if the newly generated values,
for example, all took less space than the old.  On the first
REFRESH on the new version of the software it might delete and
insert all rows; then it would stay with the new representation.  I
have trouble seeing that as a problem, since presumably we only
come up with a new representation because it is smaller, faster, or
cures some bug.

 and that we're making promises we won't be able to keep.

 The promise that a concurrent refresh will produce the same set of
 rows as a non-concurrent one?

 The promise that we'll always return the binary representation of the
 data that we saw last.  When greatest(x,y) comes back 'false' for a
 MAX(), we then have to go check well, does the type consider them
 equal?, because, if the type considers them equal, we then have to
 decide if we should replace x with y anyway, because it's different
 at a binary level.  That's what we're saying we'll always do now.

I'm having a little trouble following that.  The query runs as it
always has, with all the old definitions of comparisons.  After it
is done, we check whether the rows are the same.  The operation of
MAX() will not be affected in any way.  If MAX() returns a value
which is not the same as what the matview has, the matview will be
modified to match what MAX() returned.

 We're also saying that we'll replace things based on plan differences
 rather than based on if the rows underneath actually changed at all.

Only if the user uses a query which does not produce deterministic
results.

 We could end up with material differences in the result of matviews
 updated through incremental REFRESH and matviews updated through
 actual incremental mainteance- and people may *care* about those
 because we've told them (or they discover) they can depend on these
 types of changes to be reflected in the result.

Perhaps we should document the recommendation that people not
create materialized views with non-deterministic results, but I
don't think that should be a hard restriction.  For example, I
could see someone creating a materialized view to pick a random
sample from a jury pool to be the on the jury panel for today (from
which juries are selected on that day), and not want that to be
predictable and not want it to change until the next refresh of the
panel matview.  (That would not be my first design choice, but it
would not be a horrid choice if there were sufficient logging of
the REFRESH statements in a place the user could not modify.)

 Trying to do this incremental-but-not-really maintenance where
 the whole query is run but we try to skimp on what's actually
 getting updated in the matview is a premature optimization, imv,
 and one which may be less performant and more painful, with more
 gotchas and challenges for our users, to deal with in the long
 run.

 I have the evidence of a ten-fold performance improvement plus
 minimized WAL and replication work on my side.  What evidence do
 you have to back your assertions?  (Don't forget to work in bloat
 and vacuum truncation issues to the costs of your proposal.)

 I don't doubt that there are cases in both directions and I'm not trying
 to argue that it'd always be faster, but I doubt it's always slower.

Sure.  We provide a way to support those cases, although it
involves blocking.  So far, even the tests I expected to be faster
with heap replacement have come out marginally slower that way than
with CONCURRENTLY, due to index activity; but I have no doubt cases
can be constructed that go the other way.

 I'm surprised that you had a case where the query was apparently quite
 fast yet the data set hardly changed and resulted in a very large result
 but I don't doubt that it happened.

The history of all events in the county (tens of millions of
records in Milwaukee County) need to be scanned to generate the
status of cases and the date of last activity, as well as scanning
all future calendar events to see what is scheduled.  This is
compared to tables setting 

Re: [HACKERS] logical changeset generation v6

2013-09-24 Thread Andres Freund
On 2013-09-24 11:04:06 -0400, Robert Haas wrote:
 - Requiring a client is a short-sighted design.  There's no reason we
 shouldn't *support* having a client, but IMHO it shouldn't be the only
 way to use the feature.

There really aren't many limitations preventing you from doing anything
else.

 - Suppose that you use pg_receivellog (or whatever we decide to call
 it) to suck down logical replication messages.  What exactly are you
 going to do with that data once you've got it?  In the case of
 pg_receivexlog it's quite obvious what you will do with the received
 files: you'll store them in archive of some kind and maybe eventually
 use them for archive recovery, streaming replication, or PITR.  But
 the answer here is a lot less obvious, at least to me.

Well, it's not like it's going to be the only client. But it's a useful
one. I don't see this as an argument against pg_receivelogical? Most
sites don't use pg_receivexlog either.
Not having a consumer of the walsender interface included sounds like a
bad idea to me, even if it were only useful for testing. Now, you could
argue it should be in /contrib - and I wouldn't argue against that
except it shares code with the rest of src/bin/pg_basebackup.

  If somebody needs something like this, e.g. because they want to
  replicate into hundreds of shards depending on some key or such, the
  question I don't know is how to actually initiate the
  streaming. Somebody would need to start the logical decoding.

 Sounds like a job for a background worker.  It would be pretty swell
 if you could write a background worker that connects to a logical
 replication slot and then does whatever.

That's already possible. In that case you don't have to connect to a
walsender, although doing so would give you some parallelism, one
decoding the data, the other processing it ;).

There's one usecase I do not forsee decoupling from the walsender
interface this release though - synchronous logical replication. There
currently is no code changes required to make sync rep work for this,
and decoupling sync rep from walsender is too much to bite off in one
go.

  for materialized views, we might like to push the changes into delta
  relations within the source database.
 
  Yes, that's not a bad usecase and I think the only thing missing to use
  output plugins that way is a convenient function to tell up to where
  data has been received (aka synced to disk, aka applied).
 
 Yes.  It feels to me (and I only work here) like the job of the output
 plugin ought to be to put the data somewhere, and the replication code
 shouldn't make too many assumptions about where it's actually going.

The output plugin just has two functions it calls to send out data,
'prepare_write' and 'write'. The callsite has to provide those
callbacks. Two are included. walsender and an SQL SRF.

Check the 'test_logical_decoding commit, it includes the SQL consumer.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Cube extension types support

2013-09-24 Thread Stas Kelvich
Hello, hackers.

In this patch I've implemented support for different storage types for cubes. 
Now it supports float8, float4, int4, int2, int1. Type stored in the header of 
each cube, one for all coordinates. So for cubes with int1 coordinates it can 
save up to 8x disk space. Typed cubes can be created in two ways:

1) Via cube_suffix() functions, where suffix can be f4, i4, i2, i1, and 
arguments are two or one numbers or arrays, i.e.

# select cube_i1(1,2) as c;
 c  

 (1),(2):i1

# select cube_f4(array[1,2,3], array[5,6,7]) as c;
   c

 (1, 2, 3),(5, 6, 7):f4

2) Via modificator in the end of string that will be casted to cube, i.e.

# select '(1,2,3):i2'::cube as c;
  c   
--
 (1, 2, 3):i2

When no modificator given float8 will be used by default. Old-style cubes 
without type in header also treated as float8 for backward compatibility.

This patch changes a lot of things in code, so it interfere with others patches 
to the cube extension. I will update this patch, if other patches will be 
accepted to commit.



types_cumulative.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] record identical operator

2013-09-24 Thread Stephen Frost
* Kevin Grittner (kgri...@ymail.com) wrote:
 Stephen Frost sfr...@snowman.net wrote:
  The promise that we'll always return the binary representation of the
  data that we saw last.  When greatest(x,y) comes back 'false' for a
  MAX(), we then have to go check well, does the type consider them
  equal?, because, if the type considers them equal, we then have to
  decide if we should replace x with y anyway, because it's different
  at a binary level.  That's what we're saying we'll always do now.
 
 I'm having a little trouble following that.  

You're looking at it from the perspective of what's committed today, I
think.

 The query runs as it
 always has, with all the old definitions of comparisons.  After it
 is done, we check whether the rows are the same.  The operation of
 MAX() will not be affected in any way.  If MAX() returns a value
 which is not the same as what the matview has, the matview will be
 modified to match what MAX() returned.

I'm referring to a case where we're doing incremental maintenance of the
matview (in the far distant future..) and all we've got is the current
MAX value and the value from the row being updated.  We're going to
update that MAX on cases where the values are
equal-but-binary-different.

  We're also saying that we'll replace things based on plan differences
  rather than based on if the rows underneath actually changed at all.
 
 Only if the user uses a query which does not produce deterministic
 results.

Which is apt to happen..

  We could end up with material differences in the result of matviews
  updated through incremental REFRESH and matviews updated through
  actual incremental mainteance- and people may *care* about those
  because we've told them (or they discover) they can depend on these
  types of changes to be reflected in the result.
 
 Perhaps we should document the recommendation that people not
 create materialized views with non-deterministic results, but I
 don't think that should be a hard restriction.  

I wasn't suggesting a hard restriction, but I was postulating about how
the behavior may change in the future (or we may need to do very hacky
things to prevent it from channging) and how these decisions may impact
that.

  What I was trying to get at is really that the delete/insert
  approach would be good enough in very many cases and it wouldn't
  have what look, to me anyway, as some pretty ugly warts around
  these cases.
 
 I guess we could add a DELETE everything and INSERT the new
 version of everything option for REFRESH in addition to what is
 there now, but I would be very reluctant to use it as a
 replacement.

It wouldn't address my concerns anyway, which are around these binary
operators and the update-in-place approach being risky and setting us up
for problems down the road.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] logical changeset generation v6

2013-09-24 Thread Steve Singer

On 09/24/2013 11:21 AM, Andres Freund wrote:

Not having a consumer of the walsender interface included sounds like a
bad idea to me, even if it were only useful for testing. Now, you could
argue it should be in /contrib - and I wouldn't argue against that
except it shares code with the rest of src/bin/pg_basebackup.


+1 on pg_receivellog (or whatever better name we pick) being somewhere.
I found the pg_receivellog code very useful as an example and for 
debugging/development purposes.
It isn't something that I see useful for the average user and I think 
the use-cases it meets are closer to other things we usually put in /contrib


Steve



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamic shared memory

2013-09-24 Thread Robert Haas
On Fri, Sep 20, 2013 at 7:44 AM, Andres Freund and...@2ndquadrant.com wrote:
  Hm, I guess you dont't want to add it  to global/ or so because of the
  mmap implementation where you presumably scan the directory?

 Yes, and also because I thought this way would make it easier to teach
 things like pg_basebackup (or anybody's home-brew scripts) to just
 skip that directory completely.  Actually, I was wondering if we ought
 to have a directory under pgdata whose explicit charter it was to
 contain files that shouldn't be copied as part of a base backup.
 pg_do_not_back_this_up.

 Wondered exactly about that as soon as you've mentioned
 pg_basebackup. pg_local/?

That seems reasonable.  It's not totally transparent what that's
supposed to mean, but it's fairly mnemonic once you know.  Other
suggestions welcome, if anyone has ideas.

Are there any other likely candidates for inclusion in that directory
other than this stuff?

  + /* Create or truncate the file. */
  + statefd = open(PG_DYNSHMEM_NEW_STATE_FILE, O_RDWR|O_CREAT|O_TRUNC, 
  0600);
 
  Doesn't this need a | PG_BINARY?

 It's a text file.  Do we need PG_BINARY anyway?

 I'd say yes. Non binary mode stuff on windows does stuff like
 transforming LF = CRLF on reading/writing, which makes sizes not match
 up and similar ugliness.
 Imo there's little reason to use non-binary mode for anything written
 for postgres' own consumption.

Well, I'm happy to do whatever the consensus is.  AFAICT you and Noah
are both for it and Amit's position is that it doesn't matter either
way, so I'll go ahead and change that unless further discussion sheds
a different light on things.

  Why are you using open() and not
  BasicOpenFile or even OpenTransientFile?

 Because those don't work in the postmaster.

 Oh, that's news to me. Seems strange, especially for BasicOpenFile.

Per its header comment, InitFileAccess is not called in the
postmaster, so there's no VFD cache.  Thus, any attempt by
BasicOpenFile to call ReleaseLruFile would be pointless at best.

 dsm_handle is an alias for uint32.  Is that always exactly an unsigned
 int or can it sometimes be an unsigned long?  I thought the latter, so
 couldn't figure out how to write this portably without casting to a
 type that explicitly matched the format string.

 That should always be an unsigned int on platforms we support. Note that
 we've printed TransactionIds that way (i.e. %u) for a long time and they
 are a uint32 as well.

Fixed.

  Not sure whether it's sensible to only LOG in these cases. After all
  there's something unexpected happening. The robustness argument doesn't
  count since we're already shutting down.

 I see no point in throwing an error.   The fact that we're having
 trouble cleaning up one dynamic shared memory segment doesn't mean we
 shouldn't try to clean up others, or that any remaining postmaster
 shutdown hooks shouldn't be executed.

 Well, it means we'll do a regular shutdown instead of PANICing
 and *not* writing a checkpoint.
 If something has corrupted our state to the point we cannot unregister
 shared memory we registered, something has gone terribly wrong. Quite
 possibly we've scribbled over our control structures or such. In that
 case it's not proper to do a normal shutdown, we're quite possibly
 writing bad data.

I have to admit I didn't consider the possibility of an
otherwise-clean shutdown that hit only this problem.  I'm not sure how
seriously to take that case.  I guess we could emit warnings for
individual failures and then throw an error at the end if there were 
0, but that seems a little ugly.  Or we could go whole hog and treat
any failure as a critical error.  Anyone else have an opinion on what
to do here?

  Imo that's a PANIC or at the very least a FATAL.

 Sure, that's a tempting option, but it doesn't seem to serve any very
 necessary point.  There's no data corruption problem if we proceed
 here.  Most likely either (1) there's a bug in the code, which
 panicking won't fix or (2) the DBA hand-edited the state file, in
 which case maybe he shouldn't have done that, but if he thinks the
 best way to recover from that is a cluster-wide restart, he can do
 that himself.

 There's no data corruption problem if we proceed - but there likely
 has been one leading to the current state.

I doubt it.  It's more likely that the file permissions got changed or
something.

  Do we rely on being run in an environment with proper setup for lwlock
  cleanup? I can imagine shared libraries doing this pretty early on...

 Yes, we rely on that.  I don't really see that as a problem.  You'd
 better connect to the main shared memory segment before starting to
 create your own.

 I am not talking about lwlocks itself being setup but an environment
 that has resource owners defined and catches errors. I am specifically
 asking because you're a) ereport()ing without releasing an LWLock b)
 unconditionally relying on the fact that there's a current resource
 owner.
 In 

Re: [HACKERS] SSL renegotiation

2013-09-24 Thread Alvaro Herrera
Robert Haas escribió:
 On Mon, Sep 23, 2013 at 4:51 PM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:
  Here's an updated version; this mainly simplifies code, per comments
  from Andres (things were a bit too baroque in places due to the way the
  code had evolved, and I hadn't gone over it to simplify it).
 
  The only behavior change is that the renegotiation is requested 1kB
  before the limit is hit: the raise to 1% of the configured limit was
  removed.
 
 What basis do we have for thinking that 1kB is definitely enough to
 avoid spurious disconnects?

I noticed that the count variable (which is what we use to determine
when to start the renegotiation and eventually kill the connection) is
only incremented when there's successful SSL transmission: it doesn't
count low-level network transmission.  If OpenSSL returns a WANT_READ or
WANT_WRITE error code, that variable is not incremented.  The number of
bytes returned does not include network data transmitted only to satisfy
the renegotiation.

Sadly, with the OpenSSL codebase, there isn't much documented field
experience to go by.  Even something battle-tested such as Apache's
mod_ssl gets this wrong; but apparently they don't care because their
sessions are normally so short-lived that they don't get these problems.

Also, I spent several days trying to understand the OpenSSL codebase to
figure out how this works, and I think there might be bugs in there too,
at least with nonblocking sockets.  I wasn't able to reproduce an actual
failure, though.  Funnily enough, their own test utilities do not stress
this area too much (at least the ones they include in their release
tarballs).

 (I have a bad feeling that you're going to say something along the
 lines of well, we tried it a bunch of times, and)

Well, I did try a few times and saw no failure :-)

I have heard about processes in production environments that are
restarted periodically to avoid SSL failures which they blame on
renegotiation.  Some other guys have ssl_renegotiation_limit=0 because
they know it causes network problems.  I suggest we need to get this
patch out there, so that they can test it; and if 1kB turns out not to
be sufficient, we will have field experience including appropriate error
messages on what is actually going on.  (Right now, the error messages
we get are complaining about completely the wrong thing.)

I mean, if that 1kB limit is the only quarrel you have with this patch,
I'm happy.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Completing PL support for Event Triggers

2013-09-24 Thread Alvaro Herrera
Peter Eisentraut wrote:

 In the source code, I'm dubious about the use of is_dml_trigger.  I can
 see where you are coming from, but in most of the code, a trigger is a
 trigger and an event trigger is an event trigger.  Let's not introduce
 more terminology.
 
 Other opinions on that?

I'm with you.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] record identical operator

2013-09-24 Thread Robert Haas
On Tue, Sep 24, 2013 at 12:00 PM, Stephen Frost sfr...@snowman.net wrote:
 It wouldn't address my concerns anyway, which are around these binary
 operators and the update-in-place approach being risky and setting us up
 for problems down the road.

I think that if you want to hold up a bug fix to a feature that's
already committed, you need to be more specific than to say that there
might be problems down the road.  I can't see that you've identified
any consequence of this change that is clearly bad.  I understand that
you don't like the idea of making binary-comparison operators
user-visible due to the risk of user-confusion, but we have lots of
confusing features already and we handle that by documenting them.
This one doesn't seem any worse than average; in fact, to me it seems
rather minor.

At any rate, I think we're getting distracted from the real point
here.  Here's what this patch is attempting to fix:

rhaas=# create table sales (txn_date date, amount numeric);
CREATE TABLE
rhaas=# insert into sales values ('2013-09-01', '100.00'),
('2013-09-01', '150.00'), ('2013-09-02', '75.0');
INSERT 0 3
rhaas=# create materialized view sales_summary as select txn_date,
sum(amount) as amount from sales group by 1;
SELECT 2
rhaas=# create unique index on sales_summary (txn_date);
CREATE INDEX
rhaas=# select * from sales_summary;
   txn_date  | amount
+
 2013-09-01 | 250.00
 2013-09-02 |   75.0
(2 rows)

rhaas=# update sales set amount = round(amount, 2);
UPDATE 3
rhaas=# refresh materialized view concurrently sales_summary;
REFRESH MATERIALIZED VIEW
rhaas=# select * from sales_summary;
  txn_date  | amount
+
 2013-09-01 | 250.00
 2013-09-02 |   75.0
(2 rows)

rhaas=# refresh materialized view sales_summary;
REFRESH MATERIALIZED VIEW
rhaas=# select * from sales_summary;
  txn_date  | amount
+
 2013-09-01 | 250.00
 2013-09-02 |  75.00
(2 rows)

Note that, after we change the data in the underlying table, the
output of the query changes.  But REFRESH MATERIALIZED VIEW
CONCURRENTLY doesn't fix it.  However, REFRESH MATERIALIZED VIEW
without CONCURRENTLY does fix it.  That's a bug, because if there are
two ways of refreshing a materialized view they should both produce
the same answer.  Shouldn't they?  The fact that users can write
queries that don't always give the same answer is not a reason why
it's OK for REFRESH CONCURRENTLY to misbehave on queries that do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSL renegotiation

2013-09-24 Thread Robert Haas
On Tue, Sep 24, 2013 at 12:30 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 I mean, if that 1kB limit is the only quarrel you have with this patch,
 I'm happy.

You should probably be happy, then.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FW: REVIEW: Allow formatting in log_line_prefix

2013-09-24 Thread Alvaro Herrera
David Rowley escribió:

 I do see a 15-18% slow down with the patched version, so perhaps I'll need
 to look to see if I can speed it up a bit, although I do feel this
 benchmark is not quite a normal workload.

Ouch.  That's certainly way too much.  Is the compiler inlining
process_log_prefix_padding()?  If not, does it do it if you add inline
to it?  That might speed up things a bit.  If that's not enough, maybe
you need some way to return to the original coding for the case where no
padding is set in front of each option.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] bitmap indexes

2013-09-24 Thread Jeff Janes
On Sat, Sep 14, 2013 at 11:14 AM, Abhijit Menon-Sen a...@2ndquadrant.comwrote:

 Hi.

 This is a cleaned-up and rebased version of the bitmap index patch from
 Gavin Sherry, later revised by Gianni Ciolli and Gabriele Bartolini, and
 others including Daniel Bausch.

 I've been working on this patch for a while, and have made some progress
 towards (a) general fixing, and (b) a working VACUUM implementation (the
 major remaining piece). Unfortunately, I've been busy moving house, and
 the latter is not complete (and not in this patch).

 I will continue working on the code, and I'll post updates. I expect to
 have more to show in just a few days.

 Nevertheless, I'm posting it for review now as I keep working. Given the
 size and age of the patch, I would appreciate any comments, no matter
 how nitpicky.


Hi Abhijit,

I get wrong answers from this index sometimes.  It seems to occur when the
position of the column within the index is not the same as its position
within the table.  So I think that what is happening is somewhere the
offset into the list of table columns is misused to offset into the list of
index columns.

I didn't see any XXX notes that indicate this is a known problem.

create table foo as select
   floor(random()*10) as a,
   floor(random()*10) as b,
   floor(random()*10) as c,
  d
from generate_series(1,1000) d;

vacuum ANALYZE;
create index on foo using bitmap (a);
create index on foo using bitmap (b);

select count(*) from foo where a=4;
1000173
select count(*) from foo where a+0=4;
1000173

select count(*) from foo where b=4;
0
select count(*) from foo where b+0=4;
999750

Cheers,

Jeff


Re: [HACKERS] record identical operator

2013-09-24 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 On Tue, Sep 24, 2013 at 12:00 PM, Stephen Frost sfr...@snowman.net wrote:
  It wouldn't address my concerns anyway, which are around these binary
  operators and the update-in-place approach being risky and setting us up
  for problems down the road.
 
 I think that if you want to hold up a bug fix to a feature that's
 already committed, you need to be more specific than to say that there
 might be problems down the road.  

Committed isn't released and I simply can't agree that introducing new
operators is a 'bug fix'.  Had it gone out as-is, I can't see us
back-patching a bunch of new operators to fix it.

 Note that, after we change the data in the underlying table, the
 output of the query changes.  But REFRESH MATERIALIZED VIEW
 CONCURRENTLY doesn't fix it.  However, REFRESH MATERIALIZED VIEW
 without CONCURRENTLY does fix it.  That's a bug, because if there are
 two ways of refreshing a materialized view they should both produce
 the same answer.  Shouldn't they?  

The same queries run over time without changes to the underlying data
really should return the same data, shoudln't they?  Is it a bug that
they don't?

In general, I agree that they should produce the same results, as should
incrementally maintained views when they happen.  I'm not convinced that
choosing whatever the 'new' value is to represent the value in the
matview for the equal-but-not-binary-identical will always be the
correct answer though.

 The fact that users can write
 queries that don't always give the same answer is not a reason why
 it's OK for REFRESH CONCURRENTLY to misbehave on queries that do.

This really is, imv, agreeing to hold a higher standard for matviews
than we do for what matviews are *based* on- which is queries.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Some interesting news about Linux 3.12 OOM

2013-09-24 Thread Josh Berkus
All,

I've send kernel.org a message that we're keen on seeing these changes
become committed.

BTW, in the future if anyone sees kernel.org contemplating a patch which
helps or hurts Postgres, don't hesiate to speak up to them.  They don't
get nearly enough feedback from DB developers.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Some interesting news about Linux 3.12 OOM

2013-09-24 Thread Josh Berkus
All,

I've send kernel.org a message that we're keen on seeing these changes
get committed.

BTW, in the future if anyone sees kernel.org contemplating a patch which
helps or hurts Postgres, don't hesiate to speak up to them.  They don't
get nearly enough feedback from DB developers.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FW: REVIEW: Allow formatting in log_line_prefix

2013-09-24 Thread Andres Freund
On 2013-09-24 13:51:04 -0300, Alvaro Herrera wrote:
 David Rowley escribió:
 
  I do see a 15-18% slow down with the patched version, so perhaps I'll need
  to look to see if I can speed it up a bit, although I do feel this
  benchmark is not quite a normal workload.
 
 Ouch.  That's certainly way too much.  Is the compiler inlining
 process_log_prefix_padding()?  If not, does it do it if you add inline
 to it?  That might speed up things a bit.  If that's not enough, maybe
 you need some way to return to the original coding for the case where no
 padding is set in front of each option.

From a very short look without actually running it I'd guess the issue
is all the $* things you're now passing to do appendStringInfo (which
passes them off to vsnprintf).
How does it look without that?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] record identical operator

2013-09-24 Thread Robert Haas
On Tue, Sep 24, 2013 at 1:04 PM, Stephen Frost sfr...@snowman.net wrote:
 * Robert Haas (robertmh...@gmail.com) wrote:
 On Tue, Sep 24, 2013 at 12:00 PM, Stephen Frost sfr...@snowman.net wrote:
  It wouldn't address my concerns anyway, which are around these binary
  operators and the update-in-place approach being risky and setting us up
  for problems down the road.

 I think that if you want to hold up a bug fix to a feature that's
 already committed, you need to be more specific than to say that there
 might be problems down the road.

 Committed isn't released and I simply can't agree that introducing new
 operators is a 'bug fix'.  Had it gone out as-is, I can't see us
 back-patching a bunch of new operators to fix it.

You're perfectly welcome to argue that we should rip REFRESH
MATERIALIZED VIEW CONCURRENTLY back out, or change the implementation
of it.  However, that's not the topic of this thread.  Had it gone out
as is, we would have either had to come up with some more complex fix
that could make things right without catalog changes, or we would have
had to document that it doesn't work correctly.  Fortunately we are
not faced with that conundrum.

 Note that, after we change the data in the underlying table, the
 output of the query changes.  But REFRESH MATERIALIZED VIEW
 CONCURRENTLY doesn't fix it.  However, REFRESH MATERIALIZED VIEW
 without CONCURRENTLY does fix it.  That's a bug, because if there are
 two ways of refreshing a materialized view they should both produce
 the same answer.  Shouldn't they?

 The same queries run over time without changes to the underlying data
 really should return the same data, shoudln't they?

Not necessarily.  There are and always have been plenty of cases where
that isn't true.

 Is it a bug that they don't?

No.

 In general, I agree that they should produce the same results, as should
 incrementally maintained views when they happen.  I'm not convinced that
 choosing whatever the 'new' value is to represent the value in the
 matview for the equal-but-not-binary-identical will always be the
 correct answer though.

Well, you have yet to provide an example of where it isn't the right
behavior.  I initially thought that perhaps we needed a type-specific
concept of exact equality, so that two things would be exactly equal
if they would both be dumped the same way by pg_dump, even if the
internal representations were different.

But on further thought I'm no longer convinced that's a good idea.
For example, consider the compact-numeric format that I introduced a
few releases back.  The changes are backward compatible, so you can
run pg_upgrade on a cluster containing the old format and everything
works just fine.  But your table will be larger than you really want
it to be until you rewrite it.  Any materialized view that was built
by selecting those numeric values will *also* be larger than you want
it to be until you rewrite it.  Fortunately, you can shrink the table
by using a rewriting variant of ALTER TABLE.  After you do that, you
will perhaps also want to shrink the materialized view.  And in fact,
REFRESH will do that for you, but currently, REFRESH CONCURRENTLY
won't.  It seems to me that it would be nicer if it did.

Now I admit that's an arguable point.  We could certainly define an
intermediate notion of equality that is more equal than whatever =
does, but not as equal as exact binary equality.  However, such a new
notion would have no precedence in our existing code, whereas the use
of binary equality does.  Going to a lot of work to build up this
intermediate notion of equality does not seem like a good idea to me;
I think a general rule of good programming is not to invent more ways
to do things than are clearly necessary, and requiring every core and
extension type to define an exact-equality operator just to support
this feature seems like excessive in the extreme.

Moreover, I think what to include in that intermediate notion of
equality would be kind of hard to decide, because there are a lot of
subtly different questions that one can ask, and we'd inevitably have
to answer the question how equal does it have to be to be equal
enough?.   Numerics and arrays have multiple ways of referencing what
is intended to be the exact same value, but those can be disambiguated
by passing them to a function that cares, like pg_column_size().
Then, on a user-visible level, numerics allow variation in the number
of trailing zeroes after the decimal point.  Floats have extra digits
that aren't shown except with extra_float_digits, so that the value
can be less than totally equal even if the text representation is the
same.  citext intentionally ignores case.  In every one of those
cases, it's possible that the user of materialized views might say
you know, if it's equal enough that the = operator says it's the
same, then I really don't mind if the materialized view maintenance
gets skipped.  But it's also possible that they might say, you know,
those 

Re: [HACKERS] dynamic shared memory

2013-09-24 Thread Andres Freund
On 2013-09-24 12:19:51 -0400, Robert Haas wrote:
 On Fri, Sep 20, 2013 at 7:44 AM, Andres Freund and...@2ndquadrant.com wrote:
   Hm, I guess you dont't want to add it  to global/ or so because of the
   mmap implementation where you presumably scan the directory?
 
  Yes, and also because I thought this way would make it easier to teach
  things like pg_basebackup (or anybody's home-brew scripts) to just
  skip that directory completely.  Actually, I was wondering if we ought
  to have a directory under pgdata whose explicit charter it was to
  contain files that shouldn't be copied as part of a base backup.
  pg_do_not_back_this_up.
 
  Wondered exactly about that as soon as you've mentioned
  pg_basebackup. pg_local/?
 
 That seems reasonable.  It's not totally transparent what that's
 supposed to mean, but it's fairly mnemonic once you know.  Other
 suggestions welcome, if anyone has ideas.

pg_node_local/ was the only reasonable thing I could think of otherwise,
and I disliked it because it seems we shouldn't introduce node as a
term just for this.

 Are there any other likely candidates for inclusion in that directory
 other than this stuff?

You could argue that pg_stat_tmp/ is one.

   Why are you using open() and not
   BasicOpenFile or even OpenTransientFile?
 
  Because those don't work in the postmaster.
 
  Oh, that's news to me. Seems strange, especially for BasicOpenFile.

 Per its header comment, InitFileAccess is not called in the
 postmaster, so there's no VFD cache.  Thus, any attempt by
 BasicOpenFile to call ReleaseLruFile would be pointless at best.

Well, but it makes code running in both backends and postmaster easier
to write. Good enough for me anyway.

   Imo that's a PANIC or at the very least a FATAL.
 
  Sure, that's a tempting option, but it doesn't seem to serve any very
  necessary point.  There's no data corruption problem if we proceed
  here.  Most likely either (1) there's a bug in the code, which
  panicking won't fix or (2) the DBA hand-edited the state file, in
  which case maybe he shouldn't have done that, but if he thinks the
  best way to recover from that is a cluster-wide restart, he can do
  that himself.
 
  There's no data corruption problem if we proceed - but there likely
  has been one leading to the current state.
 
 I doubt it.  It's more likely that the file permissions got changed or
 something.

We panic in that case during a shutdown, don't we? ... Yep:
PANIC:  could not open control file global/pg_control: Permission denied

  I am not talking about lwlocks itself being setup but an environment
  that has resource owners defined and catches errors. I am specifically
  asking because you're a) ereport()ing without releasing an LWLock b)
  unconditionally relying on the fact that there's a current resource
  owner.
  In shared_preload_libraries neither is the case afair?

 I don't really feel like solving all of those problems and, TBH, I
 don't see why it's particularly important.  If somebody wants a
 loadable module that can be loaded either from
 shared_preload_libraries or on the fly, and they use dynamic shared
 memory in the latter case, then they can use it in the former case as
 well.  If they've already got logic to create the DSM when it's first
 needed, it doesn't cost extra to do it that way in both cases.

Fair enough.

  They'll continue to see the portion they have mapped, but must do
  dsm_remap() if they want to see the whole thing.
 
  But resizing can shrink, can it not? And we do an ftruncate() in at
  least the posix shmem case. Which means the other backend will get a
  SIGSEGV accessing that memory IIRC.

 Yep.  Shrinking the shared memory segment will require special
 caution.  Caveat emptor.

Then a comment to that effect would be nice.

  We're not talking about a missed munmap() but about one that failed. If
  we unpin the leaked pins and notice that we haven't actually pinned it
  anymore we do error (well, Assert) out. Same for TupleDescs.
 
  If there were valid scenarios in which you could get into that
  situation, maybe. But which would that be? ISTM we can only get there if
  our internal state is messed up.

 I don't know.  I think that's part of why it's hard to decide what we
 want to happen.  But personally I think it's paranoid to say, well,
 something happened that we weren't expecting, so that must mean
 something totally horrible has happened and we'd better die in a fire.

Well, by that argument we wouldn't need to PANIC on a whole host of
issues. Like segfaults.

Anyway, I guess we need other opinions here.

   Why isn't the port number part of the posix shmem identifiers? Sure, we
   retry, but using a logic similar to sysv_shmem.c seems like a good idea.
 
  According to the man page for shm_open on Solaris, For maximum
  portability, name should include no more than 14 characters, but this
  limit is not enforced.
 
  What about /pgsql.%u or something similar? That should still fit.
 
 Well, if you 

Re: [HACKERS] FW: REVIEW: Allow formatting in log_line_prefix

2013-09-24 Thread Andres Freund
On 2013-09-24 19:56:32 +0200, Andres Freund wrote:
 On 2013-09-24 13:51:04 -0300, Alvaro Herrera wrote:
  David Rowley escribió:
  
   I do see a 15-18% slow down with the patched version, so perhaps I'll need
   to look to see if I can speed it up a bit, although I do feel this
   benchmark is not quite a normal workload.
  
  Ouch.  That's certainly way too much.  Is the compiler inlining
  process_log_prefix_padding()?  If not, does it do it if you add inline
  to it?  That might speed up things a bit.  If that's not enough, maybe
  you need some way to return to the original coding for the case where no
  padding is set in front of each option.
 
 From a very short look without actually running it I'd guess the issue
 is all the $* things you're now passing to do appendStringInfo (which
 passes them off to vsnprintf).
 How does it look without that?

That's maybe misunderstandable, what I mean is to have an if (padding 
0) around the the changed appendStringInfo invocations and use the old
ones otherwise.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench progress report improvements - split 2

2013-09-24 Thread Fabien COELHO


Hello Noah,


meet all those goals simultaneously with simpler code, can we not?

int64 wait = (int64) (throttle_delay *
Min(7.0, -log(1 - pg_erand48(thread-random_state;


If you truncate roughly the multipler, as it is done here with min, you 
necessarily create a bias, my random guess would be a 0.5% under 
estimation, but maybe it is more... Thus you would have to compute and the 
correcting factor as well. Its computation is a little bit less easy than 
with the quantized formula where I just used a simple SUM, and you have to 
really do the maths here. So I would keep the simple solution, but it is 
fine with me if you do the maths!


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] record identical operator

2013-09-24 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 Now I admit that's an arguable point.  We could certainly define an
 intermediate notion of equality that is more equal than whatever =
 does, but not as equal as exact binary equality.

I suggested it up-thread and don't recall seeing a response, so here it
is again- passing the data through the binary-out function for the type 
and comparing *that* would allow us to change the interal binary
representation of data types and would be something which we could at
least explain to users as to why X isn't the same as Y according to this
binary operator.

 I think the conservative (and therefore correct) approach is to decide
 that we're always going to update if we detect any difference at all.

And there may be users who are surprised that a refresh changed parts of
the table that have nothing to do with what was changed in the
underlying relation, because a different plan was used and the result
ended up being binary-different.  It's easy to explain to users why that
would be when we're doing a wholesale replace but I don't think it'll be
nearly as clear why that happened when we're not replacing the whole
table and why REFRESH can basically end up changing anything (but
doesn't consistently).  If we're paying attention to the records changed
and only updating the matview's records when they're involved, that
becomes pretty clear.  What's happening here feels very much like
unintended consequences.

 It is obviously true, and unavoidable, that if the query can produce
 more than one result set depending on the query plan or other factors,
 then the materialized view contents can match only one of those
 possible result sets.  But you are arguing that it should be allowed
 to produce some OTHER result set, that a direct execution of the query
 could *never* have produced, and I can't see how that can be right.

I agree that the matview shouldn't produce things which *can't* exist
through an execution of the query.  I don't intend to argue against that
but I realize that's a fallout of not accepting the patch to implement
the binary comparison operators.  My complaint is more generally that if
this approach needs such then there's going to be problems down the
road.  No, I can't predict exactly what they are and perhaps I'm all wet
here, but this kind of binary-equality operations are something I've
tried to avoid since discovering what happens when you try to compare
a couple of floating point numbers.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] pgbench progress report improvements

2013-09-24 Thread Fabien COELHO


Dear Robert,


(1) ...
I really don't think it's a good idea to change the default behavior. 
We've had many discussions about how the overhead of gettimeofday() can 
sometimes be surprisingly large; I don't think we should assume it's 
guaranteed to be small in this case.


Also, changing established defaults will annoy users who like to 
present defaults; I don't see any reason to assume that your preferences 
will be universal.  In doubtful cases we should favor leaving the 
behavior the way it's been in previous releases, because 
backward-compatibility is very desirable.


I argued in another mail precisely, with worst figures found on the net, 
about the relative overhead of gettimeofday as used by pgbench, which is 
also handling network traffic and waiting for the database to perform 
actual transactions. I do not thing that I'm assuming, and I'm trying to 
argument with objective data.


Anyway, this particular behavior is preserved by the current version of 
the patch, so no worry. The additional gettimeofday is *not* perform under 
standard silent benchmarking, and the standard deviation of the latency 
is not measured, because it can't. It is however measured under --rate and 
--progress. It is necessary for rate because otherwise the computed 
latency is false. It is done for progress because if you are interested to 
know what is going on, I assume that it would make sense to provide this 
data.


I must admit that I think, un-universaly, that people should care to know 
that their transaction latency can vary by several order of magnitudes, 
but this opinion is clearly not shared.


I tried to preserve the row-counting behavior because I thought that 
someone would object otherwise, but I would be really in favor of 
dropping the row-counting report behavior altogether and only keep the 
time triggered report.



-1 for changing this again.
Frankly, I might have come down in a different place if I had understood 
exactly how this was going to end up being committed; it ended up being 
quite different from what I was expecting.  But I really think that 
relitigating this just so we can break backward compatibility again one 
release later is not a good use of anyone's time, or a good idea in 
general.


The current status on my small (SSD) laptop is that pgbench -i throws 
about 10 lines of 100,000-rows progress report per second. I must be a 
slow reader because I can't read that fast. So either other users can read 
much faster than me, or they have slower computers:-)


ISTM that it is no big deal to change this kind of thing on a minor 
contrib tool of postgresql if it is reasonnably better and useful, and I 
would be surprise and seeing protests about changing an unreadable flow to 
a readable one.


Anyway, let us keep this default behavior, as I hope there must be people 
who like it. Well, if anyone could tell me that he/she likes better having 
10 lines a second thrown on the screen than a regular progress report 
every few seconds, I would feel less stupid at reinstating this behavior 
and re-submitting a new version of the patch.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FW: REVIEW: Allow formatting in log_line_prefix

2013-09-24 Thread David Rowley
On Wed, Sep 25, 2013 at 6:25 AM, Andres Freund and...@2ndquadrant.comwrote:

 On 2013-09-24 19:56:32 +0200, Andres Freund wrote:
  On 2013-09-24 13:51:04 -0300, Alvaro Herrera wrote:
   David Rowley escribió:
  
I do see a 15-18% slow down with the patched version, so perhaps
 I'll need
to look to see if I can speed it up a bit, although I do feel this
benchmark is not quite a normal workload.
  
   Ouch.  That's certainly way too much.  Is the compiler inlining
   process_log_prefix_padding()?  If not, does it do it if you add
 inline
   to it?  That might speed up things a bit.  If that's not enough, maybe
   you need some way to return to the original coding for the case where
 no
   padding is set in front of each option.
 
  From a very short look without actually running it I'd guess the issue
  is all the $* things you're now passing to do appendStringInfo (which
  passes them off to vsnprintf).
  How does it look without that?

 That's maybe misunderstandable, what I mean is to have an if (padding 
 0) around the the changed appendStringInfo invocations and use the old
 ones otherwise.


Yeah I had the same idea to try that next. I suspect that's where the slow
down is rather than the processing of the padding. I'm thinking these small
tweaks are going to make the code a bit ugly, but I agree about the 15-18%
slowdown is a no go. The only other thing apart from checking if padding 
0 is to check if the char after the % is  '9', in that case it can't be
formatting as we're only allowing '-' and '0' to '9'. Although I think
that's a bit hackish, but perhaps it is acceptable if it helps narrow the
performance gap.

More benchmarks to follow soon.

David



 Greetings,

 Andres Freund

 --
  Andres Freund http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services



Re: [HACKERS] 9.3 Json Array's

2013-09-24 Thread Adam Jelinek
I agree with the best effort type of conversion, and only being able to
handle JSON array's that conform to an SQL array.  With that said I would
love to collaborate with you on this, but there is one thing holding me
back. The current company I work for (an insurance company) says it is a
conflict of interest so I have to be careful.  I can try to help out in
other ways if possible, and I will double check with our HR.


On Tue, Sep 24, 2013 at 8:12 AM, Chris Travers ch...@2ndquadrant.comwrote:

 **


  On 24 September 2013 at 13:46 Andrew Dunstan and...@dunslane.net wrote:

 
 
  Feel free to ask questions.
 
  The heart of the API is the event handlers defined in this stuct in
  include/utils/jsonapi.h:
 
  typedef struct JsonSemAction
  {
   void   *semstate;
   json_struct_action object_start;
   json_struct_action object_end;
   json_struct_action array_start;
   json_struct_action array_end;
   json_ofield_action object_field_start;
   json_ofield_action object_field_end;
   json_aelem_action array_element_start;
   json_aelem_action array_element_end;
   json_scalar_action scalar;
  } JsonSemAction;
 
 
  Basically there is a handler for the start and end of each non-scalar
  structural element in JSON, plus a handler for scalars.
 
  There are several problems that will be posed by processing nested
  arrays and objects, including:
 
* in effect you would need to construct a stack of state that could be
  pushed and popped

  True.

* JSON arrays aren't a very good match for SQL arrays - they are
  unidimensional and heterogenous.

  This is true, but I think one would have to start with an assumption that
 the data is valid for an SQL type and then check again once one gets it
 done.JSON is a pretty flexible format which makes it a poor match in
 many cases for SQL types generally.  But I think the example so far (with
 json_populate_recordset) is a good one, namely a best effort conversion.

 
 
  I'm not saying this can't be done - it will just take a bit of effort.

  Yeah, looking through the code, I think it will be more work than I
 originally thought but that just means it will take longer.
 
  cheers
 
  andrew
 
 
 
  --
  Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-hackers
  Best Wishes,
 Chris Travers
 http://www.2ndquadrant.com
 PostgreSQL Services, Training, and Support



Re: [HACKERS] 9.3 Json Array's

2013-09-24 Thread Merlin Moncure
On Tue, Sep 24, 2013 at 3:14 PM, Adam Jelinek ajeli...@gmail.com wrote:
 I agree with the best effort type of conversion, and only being able to
 handle JSON array's that conform to an SQL array.  With that said I would
 love to collaborate with you on this, but there is one thing holding me
 back. The current company I work for (an insurance company) says it is a
 conflict of interest so I have to be careful.  I can try to help out in
 other ways if possible, and I will double check with our HR.

pro tip: don't ask until you already did the work.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench progress report improvements - split 3

2013-09-24 Thread Josh Berkus
On 09/22/2013 10:44 PM, Fabien COELHO wrote:
 Due to the possibly repetitive table structure of the data, maybe CSV
 would make sense as well. It is less extensible, but it is directly
 usable by sqlite or pg.

I'd be OK with CSV.  At least I wouldn't be regexing the output.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench progress report improvements - split 3 v2

2013-09-24 Thread Fabien COELHO


Split 3 of the initial submission, which actually deal with data measured and 
reported on stderr under various options.


This version currently takes into account many comments by Noah Mish. In 
particular, the default no report behavior under benchmarking is not 
changed, although I really think that it should. Also, the standard deviation 
is only shown when available, which is not in under all settings, because of 
concerns expressed about the cost of additional calls to gettimeofday. ISTM 
that these concerned are misplaced in this particular case.



Yet another version to fulfill requirements by Robert Haas not to change 
current default behaviors. The initialization progress is reported every 
100-k rows, and there is no progress report for benchmarking.




Improve pgbench measurements  progress report

These changes are coupled because measures are changed, and their
reporting as well. Submitting separate patches for these different
features would result in conflicting or dependent patches, so I
wish to avoid that if possible.

 - Use same progress and quiet options both under init  bench.

   However, the default reporting is NOT changed, as required by
   Robert Haas. It is currently every 100k rows when initializing,
   and nothing when benchmarking.

   I would suggest to change this default to a few seconds for both.
   The 100-k row report is very verbose an unreadable on good hardware.
   For benchmarking, the rational is that it is error prone as it must
   run a long time to be significant because of warmup effects (esp on HDD,
   less true on SSD). Seeing how the current performance evolve would
   help pgbench users realise that. See down below a sample output.

 - Measure transaction latency instead of computing it,
   for --rate and --progress.

   The previous computed figure does not make sense under throttling:
   as sleep throttling time was included in the figures, the displayed
   results were plain false.

   The latency and its standard deviation are printed out under progress
   and in the final report when available.

   It could be made always available, but that would require to accept
   additional gettimeofday calls. I do not think that there is a
   performance issue here, but there is significant opposition to the idea.

 - Take thread start time at the beginning of the thread (!)

   Otherwise it includes pretty slow thread/fork system start times in
   the measurements. May help with bug #8326. This also helps with throttling,
   as the start time is used to adjust the rate: if it is not the actual
   start time, there is a bit of a rush at the beginning in order to
   catch up. Taking the start time when the thread actually starts is
   the sane thing to do to avoid surprises at possibly strange measures
   on short runs.

Sample output under initialization with --progress=1

  creating tables...
  1126000 of 300 tuples (37%) done (elapsed 1.00 s, remaining 1.67 s).
  2106000 of 300 tuples (70%) done (elapsed 2.00 s, remaining 0.85 s).
  2824000 of 300 tuples (94%) done (elapsed 3.00 s, remaining 0.19 s).
  300 of 300 tuples (100%) done (elapsed 3.19 s, remaining 0.00 s).
  vacuum...
  set primary keys...
  done.

Sample output under benchmarking with --progress=1

  progress: 1.0 s, 2626.1 tps, 0.374 stddev 0.597 ms lat
  progress: 2.0 s, 2766.6 tps, 0.358 stddev 0.588 ms lat
  progress: 3.0 s, 2567.4 tps, 0.385 stddev 0.665 ms lat
  progress: 4.0 s, 3014.2 tps, 0.328 stddev 0.593 ms lat
  progress: 5.0 s, 2959.3 tps, 0.334 stddev 0.553 ms lat
  ...
  progress: 16.0 s, 5009.2 tps, 0.197 stddev 0.381 ms lat
  ...
  progress: 24.0 s, 7051.2 tps, 0.139 stddev 0.284 ms lat
  ...
  progress: 50.0 s, 6860.5 tps, 0.143 stddev 0.052 ms lat
  ...

The warmup is quite fast because the DB is on a SSD. In the beginning
the standard deviation is well over the average transaction time, but
when the steady state is reached (later) it is much below.

--
Fabien.diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index ad8e272..934244b 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -167,11 +167,13 @@ char	   *index_tablespace = NULL;
 #define SCALE_32BIT_THRESHOLD 2
 
 bool		use_log;			/* log transaction latencies to a file */
-bool		use_quiet;			/* quiet logging onto stderr */
 int			agg_interval;		/* log aggregates instead of individual
  * transactions */
-int			progress = 0;   /* thread progress report every this seconds */
+bool		use_quiet = false;	/* more or less quiet logging onto stderr */
+int			progress = -1;  /* report every this seconds.
+   0 is no report, -1 is not set yet */
 int progress_nclients = 0; /* number of clients for progress report */
+int progress_nthreads = 0; /* number of threads for progress report */
 bool		is_connect;			/* establish connection for each transaction */
 bool		is_latencies;		/* report per-command latencies */
 int			main_pid;			

Re: [HACKERS] Suggestion: Issue warning when calling SET TRANSACTION outside transaction block

2013-09-24 Thread Bruce Momjian
On Thu, Sep 12, 2013 at 09:38:43AM +0530, Amit Kapila wrote:
  I have created the attached patch which issues an error when SET
  TRANSACTION and SET LOCAL are used outside of transactions:
 
  test= set transaction isolation level serializable;
  ERROR:  SET TRANSACTION can only be used in transaction blocks
  test= reset transaction isolation level;
  ERROR:  RESET TRANSACTION can only be used in transaction blocks
 
  test= set local effective_cache_size = '3MB';
  ERROR:  SET LOCAL can only be used in transaction blocks
  test= set local effective_cache_size = default;
  ERROR:  SET LOCAL can only be used in transaction blocks
 
 Shouldn't we do it for Set Constraints as well?

Oh, very good point.  I missed that one.  Updated patch attached.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
new file mode 100644
index b1023c4..3ffdfe0
*** a/src/backend/tcop/utility.c
--- b/src/backend/tcop/utility.c
*** standard_ProcessUtility(Node *parsetree,
*** 688,694 
  			break;
  
  		case T_VariableSetStmt:
! 			ExecSetVariableStmt((VariableSetStmt *) parsetree);
  			break;
  
  		case T_VariableShowStmt:
--- 688,694 
  			break;
  
  		case T_VariableSetStmt:
! 			ExecSetVariableStmt((VariableSetStmt *) parsetree, isTopLevel);
  			break;
  
  		case T_VariableShowStmt:
*** standard_ProcessUtility(Node *parsetree,
*** 754,759 
--- 754,760 
  			break;
  
  		case T_ConstraintsSetStmt:
+ 			RequireTransactionChain(isTopLevel, SET CONSTRAINTS);
  			AfterTriggerSetState((ConstraintsSetStmt *) parsetree);
  			break;
  
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
new file mode 100644
index 3107f9c..ff39920
*** a/src/backend/utils/misc/guc.c
--- b/src/backend/utils/misc/guc.c
*** flatten_set_variable_args(const char *na
*** 6252,6258 
   * SET command
   */
  void
! ExecSetVariableStmt(VariableSetStmt *stmt)
  {
  	GucAction	action = stmt-is_local ? GUC_ACTION_LOCAL : GUC_ACTION_SET;
  
--- 6252,6258 
   * SET command
   */
  void
! ExecSetVariableStmt(VariableSetStmt *stmt, bool isTopLevel)
  {
  	GucAction	action = stmt-is_local ? GUC_ACTION_LOCAL : GUC_ACTION_SET;
  
*** ExecSetVariableStmt(VariableSetStmt *stm
*** 6260,6265 
--- 6260,6267 
  	{
  		case VAR_SET_VALUE:
  		case VAR_SET_CURRENT:
+ 			if (stmt-is_local)
+ RequireTransactionChain(isTopLevel, SET LOCAL);
  			(void) set_config_option(stmt-name,
  	 ExtractSetVariableArgs(stmt),
  	 (superuser() ? PGC_SUSET : PGC_USERSET),
*** ExecSetVariableStmt(VariableSetStmt *stm
*** 6269,6275 
  	 0);
  			break;
  		case VAR_SET_MULTI:
- 
  			/*
  			 * Special-case SQL syntaxes.  The TRANSACTION and SESSION
  			 * CHARACTERISTICS cases effectively set more than one variable
--- 6271,6276 
*** ExecSetVariableStmt(VariableSetStmt *stm
*** 6281,6286 
--- 6282,6289 
  			{
  ListCell   *head;
  
+ RequireTransactionChain(isTopLevel, SET TRANSACTION);
+ 
  foreach(head, stmt-args)
  {
  	DefElem*item = (DefElem *) lfirst(head);
*** ExecSetVariableStmt(VariableSetStmt *stm
*** 6325,6330 
--- 6328,6335 
  			{
  A_Const*con = (A_Const *) linitial(stmt-args);
  
+ RequireTransactionChain(isTopLevel, SET TRANSACTION);
+ 
  if (stmt-is_local)
  	ereport(ERROR,
  			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
*** ExecSetVariableStmt(VariableSetStmt *stm
*** 6338,6344 
--- 6343,6355 
  	 stmt-name);
  			break;
  		case VAR_SET_DEFAULT:
+ 			if (stmt-is_local)
+ RequireTransactionChain(isTopLevel, SET LOCAL);
+ 			/* fall through */
  		case VAR_RESET:
+ 			if (strcmp(stmt-name, transaction_isolation) == 0)
+ RequireTransactionChain(isTopLevel, RESET TRANSACTION);
+ 
  			(void) set_config_option(stmt-name,
  	 NULL,
  	 (superuser() ? PGC_SUSET : PGC_USERSET),
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
new file mode 100644
index 99211c1..89ee40c
*** a/src/include/utils/guc.h
--- b/src/include/utils/guc.h
*** extern void SetPGVariable(const char *na
*** 334,340 
  extern void GetPGVariable(const char *name, DestReceiver *dest);
  extern TupleDesc GetPGVariableResultDesc(const char *name);
  
! extern void ExecSetVariableStmt(VariableSetStmt *stmt);
  extern char *ExtractSetVariableArgs(VariableSetStmt *stmt);
  
  extern void ProcessGUCArray(ArrayType *array,
--- 334,340 
  extern void GetPGVariable(const char *name, DestReceiver *dest);
  extern TupleDesc GetPGVariableResultDesc(const char *name);
  
! extern void ExecSetVariableStmt(VariableSetStmt *stmt, bool isTopLevel);
  extern 

Re: [HACKERS] unaccent module - two params function should be immutable

2013-09-24 Thread Bruce Momjian
On Tue, Sep 17, 2013 at 10:15:47AM -0400, Robert Haas wrote:
 On Sat, Sep 14, 2013 at 9:42 AM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
  I have developed the attached patch based on your suggestion.  I did not
  see anything in the code that would make it STABLE, except a lookup of a
  dictionary library:
 
  dictOid = get_ts_dict_oid(stringToQualifiedNameList(unaccent),
  false);
 
  yes, it risk, but similar is with timezones, and other external data.
 
 That's a catalog lookup - not a reliance on external data.

Sorry, I was wrong.  Only unaccent_dict() calls get_ts_dict_oid(), and
we aren't changing the signature of that function.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minmax indexes

2013-09-24 Thread Alvaro Herrera
Jaime Casanova wrote:

 Found another problem with the this steps:
 
 create table t1 (i int);
 create index idx_t1_i on t1 using minmax(i);
 insert into t1 select generate_series(1, 200);
 ERROR:  could not read block 1 in file base/12645/16397_vm: read
 only 0 of 8192 bytes

Thanks.  This was a trivial off-by-one bug; fixed in the attached patch.
While studying it, I noticed that I was also failing to notice extension
of the fork by another process.  I have tried to fix that also in the
current patch, but I'm afraid that a fully robust solution for this will
involve having a cached fork size in the index's relcache entry -- just
like we have smgr_vm_nblocks.  In fact, since the revmap fork is
currently reusing the VM forknum, I might even be able to use the same
variable to keep track of the fork size.  But I don't really like this
bit of reusing the VM forknum for revmap, so I've refrained from
extending that assumption into further code for the time being.

There was also a bug that we would try to initialize a revmap page twice
during recovery, if two backends thought they needed to extend it; that
would cause the data written by the first extender to be lost.

This patch applies on top of the two previous incremental patches.  I
will send a full patch later, including all those fixes and the fix for
the opr_sanity regression test.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services
*** a/src/backend/access/minmax/mmrevmap.c
--- b/src/backend/access/minmax/mmrevmap.c
***
*** 30,36 
  #define HEAPBLK_TO_REVMAP_INDEX(pagesPerRange, heapBlk) \
  	((heapBlk / pagesPerRange) % IDXITEMS_PER_PAGE)
  
! static void mmRevmapExtend(mmRevmapAccess *rmAccess, BlockNumber blkno);
  
  /* typedef appears in minmax_revmap.h */
  struct mmRevmapAccess
--- 30,36 
  #define HEAPBLK_TO_REVMAP_INDEX(pagesPerRange, heapBlk) \
  	((heapBlk / pagesPerRange) % IDXITEMS_PER_PAGE)
  
! static bool mmRevmapExtend(mmRevmapAccess *rmAccess, BlockNumber blkno);
  
  /* typedef appears in minmax_revmap.h */
  struct mmRevmapAccess
***
*** 52,62  mmRevmapAccessInit(Relation idxrel, BlockNumber pagesPerRange)
  {
  	mmRevmapAccess *rmAccess = palloc(sizeof(mmRevmapAccess));
  
  	rmAccess-idxrel = idxrel;
  	rmAccess-pagesPerRange = pagesPerRange;
  	rmAccess-currBuf = InvalidBuffer;
  	rmAccess-physPagesInRevmap =
! 		RelationGetNumberOfBlocksInFork(idxrel, MM_REVMAP_FORKNUM);
  
  	return rmAccess;
  }
--- 52,64 
  {
  	mmRevmapAccess *rmAccess = palloc(sizeof(mmRevmapAccess));
  
+ 	RelationOpenSmgr(idxrel);
+ 
  	rmAccess-idxrel = idxrel;
  	rmAccess-pagesPerRange = pagesPerRange;
  	rmAccess-currBuf = InvalidBuffer;
  	rmAccess-physPagesInRevmap =
! 		smgrnblocks(idxrel-rd_smgr, MM_REVMAP_FORKNUM);
  
  	return rmAccess;
  }
***
*** 111,121  mmSetHeapBlockItemptr(mmRevmapAccess *rmAccess, BlockNumber heapBlk,
  	/*
  	 * If the revmap is out of space, extend it first.
  	 */
! 	if (mapBlk  rmAccess-physPagesInRevmap - 1)
! 	{
! 		mmRevmapExtend(rmAccess, mapBlk);
! 		extend = true;
! 	}
  
  	/*
  	 * Obtain the buffer from which we need to read.  If we already have the
--- 113,120 
  	/*
  	 * If the revmap is out of space, extend it first.
  	 */
! 	if (mapBlk = rmAccess-physPagesInRevmap)
! 		extend = mmRevmapExtend(rmAccess, mapBlk);
  
  	/*
  	 * Obtain the buffer from which we need to read.  If we already have the
***
*** 202,211  mmGetHeapBlockItemptr(mmRevmapAccess *rmAccess, BlockNumber heapBlk,
  
  	mapBlk = HEAPBLK_TO_REVMAP_BLK(rmAccess-pagesPerRange, heapBlk);
  
! 	if (mapBlk  rmAccess-physPagesInRevmap)
  	{
! 		ItemPointerSetInvalid(out);
! 		return;
  	}
  
  	if (rmAccess-currBuf == InvalidBuffer ||
--- 201,229 
  
  	mapBlk = HEAPBLK_TO_REVMAP_BLK(rmAccess-pagesPerRange, heapBlk);
  
! 	/*
! 	 * If we are asked for a block of the map which is beyond what we know
! 	 * about it, try to see if our fork has grown since we last checked its
! 	 * size; a concurrent inserter could have extended it.
! 	 */
! 	if (mapBlk = rmAccess-physPagesInRevmap)
  	{
! 		RelationOpenSmgr(rmAccess-idxrel);
! 		LockRelationForExtension(rmAccess-idxrel, ShareLock);
! 		rmAccess-physPagesInRevmap =
! 			smgrnblocks(rmAccess-idxrel-rd_smgr, MM_REVMAP_FORKNUM);
! 
! 		if (mapBlk = rmAccess-physPagesInRevmap)
! 		{
! 			/* definitely not in range */
! 
! 			UnlockRelationForExtension(rmAccess-idxrel, ShareLock);
! 			ItemPointerSetInvalid(out);
! 			return;
! 		}
! 
! 		/* the block exists now, proceed */
! 		UnlockRelationForExtension(rmAccess-idxrel, ShareLock);
  	}
  
  	if (rmAccess-currBuf == InvalidBuffer ||
***
*** 273,286  mmRevmapCreate(Relation idxrel)
  }
  
  /*
!  * Extend the reverse range map to cover the given block number.
   *
   * NB -- caller is responsible for ensuring this action is properly WAL-logged.
   */
! static void
  

Re: [HACKERS] record identical operator

2013-09-24 Thread Kevin Grittner
Stephen Frost sfr...@snowman.net wrote:

 I think the conservative (and therefore correct) approach is to
 decide that we're always going to update if we detect any
 difference at all.

 And there may be users who are surprised that a refresh changed
 parts of the table that have nothing to do with what was changed
 in the underlying relation, because a different plan was used and
 the result ended up being binary-different.

Binary different for equal values could include box (or other
geometry) objects moved to completely new locations and/or not
quite the same size.

Here is v2 of the patch which changes from the universally disliked
operator names v1 used.  It also fixes bugs in the row comparisons
for pass-by-reference types, fixes a couple nearby comments, and
adds regression tests for a matview containing a box column.

The box type is interesting in that its = operator only worries
about the area of the box, and considers two boxes with no more
than EPSILON difference to be equal.  This means that boxes at
totally different locations can be equal, and that if A = B and B =
C it is not necessarily true that A = C.  This doesn't cause too
much trouble in general because boxes don't have btree opclasses.

Since there are types, including core types, without a default
btree opclass, materialized views containing them cannot use RMVC
as currently committed.  This patch would fix that, or we could rip
out the current implementation and go to the delete everything and
insert the entire new matview contents approach.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company*** a/contrib/citext/expected/citext.out
--- b/contrib/citext/expected/citext.out
***
*** 2276,2278  SELECT like_escape( name::text, ''::citext ) = like_escape( name::text, '' ) AS
--- 2276,2319 
   t
  (5 rows)
  
+ -- Ensure correct behavior for citext with materialized views.
+ CREATE TABLE citext_table (
+   id serial primary key,
+   name citext
+ );
+ INSERT INTO citext_table (name)
+   VALUES ('one'), ('two'), ('three'), (NULL), (NULL);
+ CREATE MATERIALIZED VIEW citext_matview AS
+   SELECT * FROM citext_table;
+ CREATE UNIQUE INDEX citext_matview_id
+   ON citext_matview (id);
+ SELECT *
+   FROM citext_matview m
+   FULL JOIN citext_table t ON (t.id = m.id AND t *= m)
+   WHERE t.id IS NULL OR m.id IS NULL;
+  id | name | id | name 
+ +--++--
+ (0 rows)
+ 
+ UPDATE citext_table SET name = 'Two' WHERE name = 'TWO';
+ SELECT *
+   FROM citext_matview m
+   FULL JOIN citext_table t ON (t.id = m.id AND t *= m)
+   WHERE t.id IS NULL OR m.id IS NULL;
+  id | name | id | name 
+ +--++--
+ |  |  2 | Two
+   2 | two  || 
+ (2 rows)
+ 
+ REFRESH MATERIALIZED VIEW CONCURRENTLY citext_matview;
+ SELECT * FROM citext_matview ORDER BY id;
+  id | name  
+ +---
+   1 | one
+   2 | Two
+   3 | three
+   4 | 
+   5 | 
+ (5 rows)
+ 
*** a/contrib/citext/expected/citext_1.out
--- b/contrib/citext/expected/citext_1.out
***
*** 2276,2278  SELECT like_escape( name::text, ''::citext ) = like_escape( name::text, '' ) AS
--- 2276,2319 
   t
  (5 rows)
  
+ -- Ensure correct behavior for citext with materialized views.
+ CREATE TABLE citext_table (
+   id serial primary key,
+   name citext
+ );
+ INSERT INTO citext_table (name)
+   VALUES ('one'), ('two'), ('three'), (NULL), (NULL);
+ CREATE MATERIALIZED VIEW citext_matview AS
+   SELECT * FROM citext_table;
+ CREATE UNIQUE INDEX citext_matview_id
+   ON citext_matview (id);
+ SELECT *
+   FROM citext_matview m
+   FULL JOIN citext_table t ON (t.id = m.id AND t *= m)
+   WHERE t.id IS NULL OR m.id IS NULL;
+  id | name | id | name 
+ +--++--
+ (0 rows)
+ 
+ UPDATE citext_table SET name = 'Two' WHERE name = 'TWO';
+ SELECT *
+   FROM citext_matview m
+   FULL JOIN citext_table t ON (t.id = m.id AND t *= m)
+   WHERE t.id IS NULL OR m.id IS NULL;
+  id | name | id | name 
+ +--++--
+ |  |  2 | Two
+   2 | two  || 
+ (2 rows)
+ 
+ REFRESH MATERIALIZED VIEW CONCURRENTLY citext_matview;
+ SELECT * FROM citext_matview ORDER BY id;
+  id | name  
+ +---
+   1 | one
+   2 | Two
+   3 | three
+   4 | 
+   5 | 
+ (5 rows)
+ 
*** a/contrib/citext/sql/citext.sql
--- b/contrib/citext/sql/citext.sql
***
*** 711,713  SELECT COUNT(*) = 19::bigint AS t FROM try;
--- 711,736 
  
  SELECT like_escape( name, '' ) = like_escape( name::text, '' ) AS t FROM srt;
  SELECT like_escape( name::text, ''::citext ) = like_escape( name::text, '' ) AS t FROM srt;
+ 
+ -- Ensure correct behavior for citext with materialized views.
+ CREATE TABLE citext_table (
+   id serial primary key,
+   name citext
+ );
+ INSERT INTO citext_table (name)
+   VALUES ('one'), ('two'), ('three'), (NULL), (NULL);
+ CREATE MATERIALIZED VIEW citext_matview AS
+   SELECT * FROM citext_table;
+ CREATE UNIQUE INDEX citext_matview_id
+   ON citext_matview (id);
+ 

Re: [HACKERS] Some interesting news about Linux 3.12 OOM

2013-09-24 Thread Daniel Farina
On Sep 24, 2013 10:12 AM, Josh Berkus j...@agliodbs.com wrote:

 All,

 I've send kernel.org a message that we're keen on seeing these changes
 become committed.

I thought it was merged already in 3.12. There are a few related
patches, but here's one:

commit 519e52473ebe9db5cdef44670d5a97f1fd53d721
Author: Johannes Weiner han...@cmpxchg.org
Date:   Thu Sep 12 15:13:42 2013 -0700

mm: memcg: enable memcg OOM killer only for user faults

System calls and kernel faults (uaccess, gup) can handle an out of memory
situation gracefully and just return -ENOMEM.

Enable the memcg OOM killer only for user faults, where it's really the
only option available.

Signed-off-by: Johannes Weiner han...@cmpxchg.org
Acked-by: Michal Hocko mho...@suse.cz
Cc: David Rientjes rient...@google.com
Cc: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com
Cc: azurIt azu...@pobox.sk
Cc: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
Signed-off-by: Andrew Morton a...@linux-foundation.org
Signed-off-by: Linus Torvalds torva...@linux-foundation.org

$ git tag --contains 519e52473ebe9db5cdef44670d5a97f1fd53d721
v3.12-rc1
v3.12-rc2

Searching for recent work by Johannes Weiner shows the pertinent stuff
more exhaustively.

 BTW, in the future if anyone sees kernel.org contemplating a patch which
 helps or hurts Postgres, don't hesiate to speak up to them.  They don't
 get nearly enough feedback from DB developers.

I don't hesitate, most of the time I simply don't know.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] bitmap indexes

2013-09-24 Thread Abhijit Menon-Sen
At 2013-09-24 09:51:00 -0700, jeff.ja...@gmail.com wrote:

 I get wrong answers from this index sometimes.

Thanks for the report and the test case. I'll investigate.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] record identical operator

2013-09-24 Thread Merlin Moncure
On Tue, Sep 24, 2013 at 2:22 PM, Stephen Frost sfr...@snowman.net wrote:
 * Robert Haas (robertmh...@gmail.com) wrote:
 Now I admit that's an arguable point.  We could certainly define an
 intermediate notion of equality that is more equal than whatever =
 does, but not as equal as exact binary equality.

 I suggested it up-thread and don't recall seeing a response, so here it
 is again- passing the data through the binary-out function for the type
 and comparing *that* would allow us to change the interal binary
 representation of data types and would be something which we could at
 least explain to users as to why X isn't the same as Y according to this
 binary operator.

 I think the conservative (and therefore correct) approach is to decide
 that we're always going to update if we detect any difference at all.

 And there may be users who are surprised that a refresh changed parts of
 the table that have nothing to do with what was changed in the
 underlying relation, because a different plan was used and the result
 ended up being binary-different.  It's easy to explain to users why that
 would be when we're doing a wholesale replace but I don't think it'll be
 nearly as clear why that happened when we're not replacing the whole
 table and why REFRESH can basically end up changing anything (but
 doesn't consistently).  If we're paying attention to the records changed
 and only updating the matview's records when they're involved, that
 becomes pretty clear.  What's happening here feels very much like
 unintended consequences.

FWIW you make some interesting points (I did a triple take on the plan
dependent changes) but I'm 100% ok with the proposed behavior.
Matviews satisfy 'truth' as *defined by the underlying query only*.
This is key: there may be N candidate 'truths' for that query: it's
not IMNSHO reasonable to expect the post-refresh truth to be
approximately based in the pre-refresh truth.  Even if the
implementation happened to do what you're asking  for it would only be
demonstrating undefined but superficially useful behavior...a good
analogy would be the old scan behavior where an unordered scan would
come up in 'last update order'.  That (again, superficially useful)
behavior was undefined and we reserved the right to change it.  And we
did.  Unnecessarily defined behaviors defeat future performance
optimizations.

So Kevin's patch AIUI defines a hitherto non-user accessible (except
in the very special case of row-wise comparison) mechanic to try and
cut down the number of rows that *must* be refreshed.  It may or may
not do a good job at that on a situational basis -- if it was always
better we'd probably be overriding the default behavior.  I don't
think it's astonishing at all for matview to pseudo-randomly adjust
case over a citext column; that's just part of the deal with equality
ambiguous types.  As long as the matview doesn't expose a dataset that
was impossible to have been generated by the underlying query, I'm
good.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-09-24 Thread Peter Geoghegan
On Tue, Sep 24, 2013 at 7:35 AM, Robert Haas robertmh...@gmail.com wrote:
 I don't really disagree with any of that.  TBH, I think the question
 of how long value locks (as you call them) are held is going to boil
 down to a question of how they end up being implemented.

Well, I think we can rule out value locks that are held for the
duration of a transaction right away. That's just not going to fly.

 As I mentioned to you at PG Open (going through the details here for those
 following along at home), we could optimistically insert the new heap
 tuple, then go add index entries for it, and if we find a conflict,
 then instead of erroring out, we mark the tuple we were inserting dead
 and go try update the conflicting tuple instead.  In that
 implementation, if we find that we have to wait for some other
 transaction along the way, it's probably not worth reversing out the
 index entries already inserted, because getting them into the index in
 the first place was a WAL-logged operation, and therefore relatively
 expensive, and IMHO it's most likely better to just hope things work
 out than to risk having to redo all of that.

I'm afraid that there are things that concern me about this design. It
does have one big advantage over promise-tuples, which is that the
possibility of index-only bloat, and even the possible need to freeze
indexes separately from their heap relation is averted (or are you
going to have recovery do promise clean-up instead? Does recovery look
for an eventual successful insertion relating to the promise? How far
does it look?). However, while I'm just as concerned as you that
backing out is too expensive, I'm equally concerned that there is no
reasonable alternative to backing out, which is why cheap, quick
in-memory value locks are so compelling to me. See my remarks below.

 On the other hand, if the locks are strictly in-memory, then the cost
 of releasing them all before we go to wait, and of reacquiring them
 after we finish waiting, is pretty low.  There might be some
 modularity issues to work through there, but they might not turn out
 to be very painful, and the advantages you mention are certainly worth
 accruing if it turns out to be fairly straightforward.

It's certainly a difficult situation to judge.

 Personally, I think that trying to keep it all in-memory is going to
 be hard.  The problem is that we can't de-optimize regular inserts or
 updates to any significant degree to cater to this feature - because
 as valuable as this feature is, the number of times it gets used is
 still going to be a whole lot smaller than the number of times it
 doesn't get used.

Right - I don't think that anyone would argue that any other standard
should be applied. Fortunately, I'm reasonably confident that it can
work. The last part of index tuple insertion, where we acquire an
exclusive lock on a buffer, needs to look out for a page header bit
(on pages considered for insertion of its value). The cost of that to
anyone not using this feature is likely to be infinitesimally small.
We can leave clean-up of that bit to the next inserter, who needs the
exclusive lock anyway and doesn't find a corresponding SLRU entry. But
really, that's a discussion for another day. I think we'd want to
track value locks per pinned-by-upserter buffer, to localize any
downsides on concurrency. If we forbid page-splits in respect of a
value-locked page, we can still have a stable value (buffer number) to
use within a shared memory hash table, or something along those lines.
We're still going to want to minimize the duration of locking under
this scheme, by doing TOASTing before locking values and so on, which
is quite possible.

If we're really lucky, maybe the value locking stuff can be
generalized or re-used as part of a btree index insertion buffer
feature.

 Also, I tend to think that we might want to define
 the operation as a REPLACE-type operation with respect to a certain
 set of key columns; and so we'll do the insert-or-update behavior with
 respect only to the index on those columns and let the chips fall
 where they may with respect to any others.  In that case this all
 becomes much less urgent.

Well, MySQL's REPLACE does zero or more DELETEs followed by an INSERT,
not try an INSERT, then maybe mark the heap tuple if there's a unique
index dup and then go UPDATE the conflicting tuple. I mention this
only because the term REPLACE has a certain baggage, and I feel it's
important to be careful about such things.

The only way that's going to work is if you say use this unique
index, which will look pretty gross in DML. That might actually be
okay with me if we had somewhere to go from there in a future release,
but I doubt that's the case. Another issue is that I'm not sure that
this helps Andres much (or rather, clients of the logical changeset
generation infrastructure that need to do conflict resolution), and
that matters a lot to me here.

 Suppose we define the operation as REPLACE rather 

Re: [HACKERS] ENABLE/DISABLE CONSTRAINT NAME

2013-09-24 Thread Peter Eisentraut
On Tue, 2013-09-24 at 11:58 +0200, Bernd Helmle wrote:
 Hmm not sure i understand this argument either: this patch doesn't
 allow disabling a primary key. It only supports FKs and CHECK
 constraints explicitly. 

Well, as soon as the patch for cataloging not-null constraints as check
constraints is available, it will be possible to create views that
depend functionally on check constraints.  Then you'll have the same
problem there.

It's also not clear why this patch only supports foreign keys and check
constraints.  Maybe that's what was convenient to implement, but it's
not a principled solution to the general issue that constraints can be
involved in dependencies.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-24 Thread Peter Eisentraut
On Tue, 2013-09-24 at 21:04 +0900, MauMau wrote:
 4. I guess some users really want to continue to use ShiftJIS or EUC_JP for
 database encoding, and use NCHAR for a limited set of columns to store
 international text in Unicode:
 - to avoid code conversion between the server and the client for performance
 - because ShiftJIS and EUC_JP require less amount of storage (2 bytes for
 most Kanji) than UTF-8 (3 bytes)
 This use case is described in chapter 6 of Oracle Database Globalization
 Support Guide.

But your proposal wouldn't address the first point, because data would
have to go client - server - NCHAR.

The second point is valid, but it's going to be an awful amount of work
for that limited result.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamic shared memory

2013-09-24 Thread Amit Kapila
On Tue, Sep 24, 2013 at 9:49 PM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Sep 20, 2013 at 7:44 AM, Andres Freund and...@2ndquadrant.com wrote:
  Hm, I guess you dont't want to add it  to global/ or so because of the
  mmap implementation where you presumably scan the directory?

 Yes, and also because I thought this way would make it easier to teach
 things like pg_basebackup (or anybody's home-brew scripts) to just
 skip that directory completely.  Actually, I was wondering if we ought
 to have a directory under pgdata whose explicit charter it was to
 contain files that shouldn't be copied as part of a base backup.
 pg_do_not_back_this_up.

 Wondered exactly about that as soon as you've mentioned
 pg_basebackup. pg_local/?

 That seems reasonable.  It's not totally transparent what that's
 supposed to mean, but it's fairly mnemonic once you know.  Other
 suggestions welcome, if anyone has ideas.

 Are there any other likely candidates for inclusion in that directory
 other than this stuff?
   pgsql_tmp. Refer sendDir() in basebackup.c, there we avoid sending
files in backup.
   Some of future features like ALTER SYSTEM, can also use it for tmp file.

  + /* Create or truncate the file. */
  + statefd = open(PG_DYNSHMEM_NEW_STATE_FILE, O_RDWR|O_CREAT|O_TRUNC, 
  0600);
 
  Doesn't this need a | PG_BINARY?

 It's a text file.  Do we need PG_BINARY anyway?

 I'd say yes. Non binary mode stuff on windows does stuff like
 transforming LF = CRLF on reading/writing, which makes sizes not match
 up and similar ugliness.
 Imo there's little reason to use non-binary mode for anything written
 for postgres' own consumption.

 Well, I'm happy to do whatever the consensus is.  AFAICT you and Noah
 are both for it and Amit's position is that it doesn't matter either
 way

I am sorry If my mails doesn't say that I am in favour of keeping code
as it is unless there is really a case which requires it.
Basically as per my understanding, I have presented some facts in
above mails which indicates, there is no need for PG_BINARY in this
case.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers