Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-22 Thread Michael Paquier
On Tue, Apr 22, 2014 at 9:58 AM, Tatsuo Ishii is...@postgresql.org wrote:

  * Tatsuo Ishii (is...@postgresql.org) wrote:
  I observe performance degradation with PostgreSQL 9.3 vs 9.2 on Linux
  as well.  The hardware is HP DL980G7, 80 cores, 2TB mem, RHEL 6,
  pgbench is used (read only query), scale factor is 1,000 (DB size
  15GB).
 
  Can you isolate the sysv-vs-mmap patch and see what happens with just
  that change..?

 Unfortunately I don't have an access to the machine at this moment.

  Included is the graph (from PostgreSQL Enterprise Consortium's 2014
  report page 13: https://www.pgecons.org/downloads/43). I see up to 14%
  degration (at 128 concurrent users) comparing with 9.2.
 
  That URL returns 'Forbidden'...

 Sorry for this. I sent a problem report to the person in charge.  In
 the mean time, please go to:
 https://www.pgecons.org/download/works_2013/ then click the link 2013
 年度WG1活動報告 (sorry for not English). You should be able to
 download a report (PDF).

 Also the report is written in Japanese. I hope you can read at leat
 the graph in page 13 and the table in page 14.

Is pgecons planning to do a translation of that at some point? It looks
like good material, and the audience able to understand it is rather
limited now :)
-- 
Michael


Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-22 Thread Mark Kirkwood

On 22/04/14 09:25, Andres Freund wrote:

On 2014-04-21 17:21:20 -0400, Bruce Momjian wrote:

On Mon, Apr 21, 2014 at 02:08:51PM -0700, Joshua Drake wrote:

If the community had more *BSD presence I think it would be great
but it isn't all that viable at this point. I do know however that
no-one in this community would turn down a team of FreeBSD advocates
helping us make PostgreSQL awesome for PostgreSQL.


I don't think we would even implement a run-time control for Linux or
Windows for this, so I don't even think it is a FreeBSD issue.


I think some of the arguments in this thread are pretty damn absurd. We
have just introduced dynamic_shared_memory_type.



+1

I was just thinking the same thing...



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] assertion failure 9.3.4

2014-04-22 Thread Andres Freund
On 2014-04-21 19:43:15 -0400, Andrew Dunstan wrote:
 
 On 04/21/2014 02:54 PM, Andres Freund wrote:
 Hi,
 
 I spent the last two hours poking arounds in the environment Andrew
 provided and I was able to reproduce the issue, find a assert to
 reproduce it much faster and find a possible root cause.
 
 
 What's the assert that makes it happen faster? That might help a lot in
 constructing a self-contained test.

Assertion and *preliminary*, *hacky* fix attached.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From 40c09d8a85db137caf42c2dfe36adda63a381c75 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Tue, 22 Apr 2014 08:42:11 +0200
Subject: [PATCH 1/2] Error out when creating a multixact with more than one
 updater.

---
 src/backend/access/heap/heapam.c   |  3 ---
 src/backend/access/transam/multixact.c | 15 +++
 src/include/access/multixact.h |  3 +++
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 2605f20..bd26b80 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -190,9 +190,6 @@ static const int MultiXactStatusLock[MaxMultiXactStatus + 1] =
 /* Get the LockTupleMode for a given MultiXactStatus */
 #define TUPLOCK_from_mxstatus(status) \
 			(MultiXactStatusLock[(status)])
-/* Get the is_update bit for a given MultiXactStatus */
-#define ISUPDATE_from_mxstatus(status) \
-			((status)  MultiXactStatusForUpdate)
 
 /* 
  *		 heap support routines
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index d4ad678..4ff1e58 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -713,6 +713,21 @@ MultiXactIdCreateFromMembers(int nmembers, MultiXactMember *members)
 		return multi;
 	}
 
+	{
+		int			i;
+		TransactionId update_xact = InvalidTransactionId;
+
+		for (i = 0; i  nmembers; i++)
+		{
+			/* Ignore lockers */
+			if (!ISUPDATE_from_mxstatus(members[i].status))
+continue;
+			if (update_xact != InvalidTransactionId)
+elog(ERROR, new multixact has more than one updating member);
+			update_xact = members[i].xid;
+		}
+	}
+
 	/*
 	 * Assign the MXID and offsets range to use, and make sure there is space
 	 * in the OFFSETs and MEMBERs files.  NB: this routine does
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 9729f27..95ffe0e 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -48,6 +48,9 @@ typedef enum
 
 #define MaxMultiXactStatus MultiXactStatusUpdate
 
+/* Get the is_update bit for a given MultiXactStatus */
+#define ISUPDATE_from_mxstatus(status) \
+			((status)  MultiXactStatusForUpdate)
 
 typedef struct MultiXactMember
 {
-- 
1.8.5.rc2.dirty

From ebd4f834b7cdccb9a9fe336dd981811e26bfc271 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Tue, 22 Apr 2014 08:42:44 +0200
Subject: [PATCH 2/2] preliminary fix for corruption issue.

---
 src/backend/access/heap/heapam.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index bd26b80..9c075c4 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2722,7 +2722,7 @@ l1:
 			 * update this tuple before we get to this point.  Check for xmax
 			 * change, and start over if so.
 			 */
-			if (!(tp.t_data-t_infomask  HEAP_XMAX_IS_MULTI) ||
+			if (tp.t_data-t_infomask != infomask ||
 !TransactionIdEquals(HeapTupleHeaderGetRawXmax(tp.t_data),
 	 xwait))
 goto l1;
@@ -2748,7 +2748,7 @@ l1:
 			 * other xact could update this tuple before we get to this point.
 			 * Check for xmax change, and start over if so.
 			 */
-			if ((tp.t_data-t_infomask  HEAP_XMAX_IS_MULTI) ||
+			if (tp.t_data-t_infomask != infomask ||
 !TransactionIdEquals(HeapTupleHeaderGetRawXmax(tp.t_data),
 	 xwait))
 goto l1;
@@ -3275,7 +3275,7 @@ l2:
 			 * update this tuple before we get to this point.  Check for xmax
 			 * change, and start over if so.
 			 */
-			if (!(oldtup.t_data-t_infomask  HEAP_XMAX_IS_MULTI) ||
+			if (oldtup.t_data-t_infomask != infomask ||
 !TransactionIdEquals(HeapTupleHeaderGetRawXmax(oldtup.t_data),
 	 xwait))
 goto l2;
@@ -3329,7 +3329,7 @@ l2:
  * recheck the locker; if someone else changed the tuple while
  * we weren't looking, start over.
  */
-if ((oldtup.t_data-t_infomask  HEAP_XMAX_IS_MULTI) ||
+if (oldtup.t_data-t_infomask != infomask ||
 	!TransactionIdEquals(
 	HeapTupleHeaderGetRawXmax(oldtup.t_data),
 		 xwait))
@@ -3350,7 +3350,7 @@ l2:
  * some other xact could update this tuple before we get to
 

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-22 Thread Peter Geoghegan
Here is a benchmark that is similar to my earlier one, but with a rate
limit of 125 tps, to help us better characterize how the prototype
patch helps performance:

http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/3-sec-delay-limit/

Again, these are 15 minute runs with unlogged tables at multiple
client counts, and scale 5,000.

Every test run should have managed to hit that limit, but in the case
of master 1 test did not. I have included vmstat, iostat and meminfo
OS instrumentation this time around, which is really interesting for
this particular limit based benchmark. The prototype patch tested here
is a slight refinement on my earlier prototype. Apart from reducing
the number of gettimeofday() calls, and doing them out of the critical
path, I increased the initial usage_count value to 6. I also set
BM_MAX_USAGE_COUNT to 30. I guess I couldn't resist the temptation to
tweak things, which I actually did very little of prior to publishing
my initial results. This helps, but there isn't a huge additional
benefit.

The benchmark results show that master cannot even meet the 125 tps
limit on 1 test out of 9. More interestingly, the background writer
consistently cleans about 10,000 buffers per test run when testing the
patch. At the same time, buffers are allocated at a very consistent
rate of around 262,000 for the patched test runs. Leaving aside the
first test run, with the patch there is only a tiny variance in the
number cleaned between each test, a variance of just a few hundred
buffers. In contrast, master has enormous variance. During just over
half of the tests, the background writer does not clean even a single
buffer. Then, on 2 tests out of 9, it cleans an enormous ~350,000
buffers. The second time this happens leads to master failing to even
meet the 125 tps limit (albeit with only one client).

If you drill down to individual test runs, a similar pattern is
evident. You'll now find operating system information (meminfo dirty
memory) graphed here. The majority of the time, master does not hold
more than 1,000 kB of dirty memory at a time. Once or twice it's 0 kB
for multiple minutes. However, during the test runs where we also see
huge spikes in background-writer-cleaned pages, we also see huge
spikes in the total amount of dirty memory (and correlated huge spikes
in latency). It can get to highs of ~700,000 kB at one point. In
contrast, the patched tests show very consistent amounts of dirty
memory. Per test, it almost always tops out at 4,000 kB - 6,000 kB
(there is a single 12,000 kB spike, though). There is consistently a
distinct zig-zag pattern to the dirty memory graph with the patched
tests, regardless of client count or where checkpoints occur. Master
shows mountains and valleys for those two particularly problematic
tests, correlating with a panicked background writer's aggressive
feedback loop. Master also shows less rhythmic zig-zag patterns that
only peak at about 600 kB - 1,000 kB for the entire duration of many
individual test runs.

Perhaps most notably, average and worst case latency is far improved
with the patch. On average it's less than half of master with 1
client, and less than a quarter of master with 32 clients.

I think that the rate limiting feature of pgbench is really useful for
characterizing how work like this improves performance. I see a far
smoother and more consistent pattern of I/O that superficially looks
like Postgres is cooperating with the operating system much more than
it does in the baseline. It sort of makes sense that the operating
system cache doesn't care about frequency while Postgres does. If the
OS cache did weigh frequency, it would surely not alter the outcome
very much, since OS cached data has presumably not been accessed very
frequently recently. I suspect that I've cut down on double buffering
by quite a bit. I would like to come up with a simple way of measuring
that, using something like pgfincore, but the available interfaces
don't seem well-suited to quantifying how much of a problem this is
and remains. I guess call pgfincore on the index segment files might
be interesting, since shared_buffers mostly holds index pages. This
has been verified using pg_buffercache.

It would be great to test this out with something involving
non-uniform distributions, like Gaussian and Zipfian distributions.
The LRU-K paper tests Zipfian too. The uniform distribution pgbench
uses here, while interesting, doesn't tell the full story at all and
is less representative of reality (TPB-C is formally required to have
a non-uniform distribution [1] for some things, for example). A
Gaussian distribution might show essentially the same failure to
properly credit pages with frequency of access in one additional
dimension, so to speak. Someone should probably look at the TPC-C-like
DBT-2, since in the past that was considered to be a problematic
workload for PostgreSQL [2] due to the heavy I/O. Zipfian is a lot
less sympathetic than uniform if the 

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-22 Thread Albe Laurenz
Jason Petersen wrote:
 Yes, we obviously want a virtual clock. Focusing on the use of gettimeofday 
 seems silly to me: it was
 something quick for the prototype.
 
 The problem with the clocksweeps is they don’t actually track the progression 
 of “time” within the
 PostgreSQL system.

Would it make sense to just cache the result of the latest gettimeofday() call
and use that as an approximation for wall time?
The busier the system is, the more accurate that should be.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-22 Thread Atri Sharma
On Tue, Apr 22, 2014 at 12:59 PM, Albe Laurenz laurenz.a...@wien.gv.atwrote:

 Jason Petersen wrote:
  Yes, we obviously want a virtual clock. Focusing on the use of
 gettimeofday seems silly to me: it was
  something quick for the prototype.
 
  The problem with the clocksweeps is they don’t actually track the
 progression of “time” within the
  PostgreSQL system.

 Would it make sense to just cache the result of the latest gettimeofday()
 call
 and use that as an approximation for wall time?
 The busier the system is, the more accurate that should be.


That sounds...risky. How will the invalidation/updation of the cache work?

How will we track the time window in which the cached value is still valid
and applicable?

My first thoughts only. I may be missing the point though.

Regards,

Atri



-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-22 Thread Hannu Krosing
On 04/17/2014 10:39 PM, Andres Freund wrote:
 On 2014-04-17 13:33:27 -0700, Peter Geoghegan wrote:
 Just over 99.6% of pages (leaving aside the meta page) in the big 10
 GB pgbench_accounts_pkey index are leaf pages.

What is the depth of b-tree at this percentage ?

Cheers
Hannu


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A question about code in DefineRelation()

2014-04-22 Thread Etsuro Fujita
(2014/04/04 13:35), Etsuro Fujita wrote:
 If I understand correctly, foreign tables cannot have an OID column, but
 the following code in DefineRelation() assumes that foreign tables *can*
 have that coulum:

On second thought I noticed that that makes CREATE FOREIGN TABLE include
an OID column in newly-created foreign tables wrongly, when the
default_with_oids parameter is set to on.  Please find attached a patch.

Thanks,

Best regards,
Etsuro Fujita
*** a/src/backend/commands/tablecmds.c
--- b/src/backend/commands/tablecmds.c
***
*** 565,572  DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId)
descriptor = BuildDescForRelation(schema);
  
localHasOids = interpretOidsOption(stmt-options,
!  
(relkind == RELKIND_RELATION ||
!   
relkind == RELKIND_FOREIGN_TABLE));
descriptor-tdhasoid = (localHasOids || parentOidCount  0);
  
/*
--- 565,571 
descriptor = BuildDescForRelation(schema);
  
localHasOids = interpretOidsOption(stmt-options,
!  
relkind == RELKIND_RELATION);
descriptor-tdhasoid = (localHasOids || parentOidCount  0);
  
/*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] AXLE Plans for 9.5 and 9.6

2014-04-22 Thread Jov
what about runtime code generation using LLVM?
http://blog.cloudera.com/blog/2013/02/inside-cloudera-impala-runtime-code-generation/
http://llvm.org/devmtg/2013-11/slides/Wanderman-Milne-Cloudera.pdf

Jov
blog: http:amutu.com/blog http://amutu.com/blog


2014-04-22 6:41 GMT+08:00 Simon Riggs si...@2ndquadrant.com:

 I've discussed 2ndQuadrant's involvement in the AXLE project a few
 times publicly, but never on this mailing list. The project relates to
 innovation and improvement in Business Intelligence for systems based
 upon PostgreSQL in the range of 10-100TB.

 Our work will span the 9.5 and 9.6 cycles. We're looking to make
 measurable improvements in a number of cases; one of those is TPC-H,
 since its a publicly accessible benchmark, another is a more private
 benchmark on healthcare data. In brief, this means speeding up the
 performance of large queries, data loading and looking at very large
 systems issues.

 Some of areas of RD are definitely on the roadmap, others are more
 flexible. Some of this is in progress, other stuff is not even at the
 design stage - yet, just a few paragraphs along the lines of we will
 look at these topics. If we have room, its possible we may
 accommodate other topics; this is not carte blanche, but the reason
 for posting here is so people know we will take input, following the
 normal community process. Detailed in-person discussions at PGCon are
 expected and the Wiki pages will be updated for each aspect.

 BI-related Indexing
 * MinMax indexes
 * Bitmap indexes

 Large Systems
 * Freeze avoidance
 * Storage management issues for very large systems

 Storage Efficiency
 * Compression
 * Column Orientation

 Optimisation
 * Bulk loading speed improvements
 * Bulk FK evaluation
 * Executor tuning for very large queries

 Query tuning
 * Approximate queries, sampling
 * Materialized Views

 ...and possibly some other aspects.

 2ndQuadrant is also assisting other researchers on GPU and FPGA
 topics, which may also yield work of interest to PostgreSQL project.

 Couple of points: The project is time limited, so if work gets pushed
 back beyond that then we'll lose the opportunity to contribute. Please
 support our work with timely objections, assistance in defining the
 path forwards and limiting the scope to something that avoids wasting
 this opportunity. Further funding is possible if we don't squander
 this. We are being funded to make best efforts to contribute to open
 source PostgreSQL, not pay-for-commit.

 AXLE is funded by the EU under FP7 Grant Agreement 318633.  Further
 details are available here http://www.axleproject.eu/

 (There are also other 2ndQuadrant development projects in progress,
 this is just one of the larger ones).

 Best Regards

 --
  Simon Riggs   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] Store data in pg_toast for custom type fails (bug?)

2014-04-22 Thread Honza

On 03/28/2014 07:02 PM, Tom Lane wrote:
 I wrote:
 Honza hon...@gmail.com writes:
 after a months I've found a time to make test-case for this bug, probably:
 
 Confirmed that this reproduces a problem on HEAD.  Will look into it,
 thanks!
 
 I believe I understand what's going on here, and it's not quite as
 exciting as it first appears.  The issue is that we are failing to
 honor the toasting goes only one level deep rule in the specific
 case of arrays of composite type.  So while it's definitely a nasty
 bug, it affects only narrow use cases, and doesn't call into question
 our whole vacuuming strategy or anything like that.

I would like to ask if there is anything new in this bug?

I've made a simple script which checks if daily backups are complete. For a 
week I've been deleting
a few records every day and hope the backup will be successfully done after 
that. The problem is
it's not possible to read all data from table during making backup using 
pg_dump too (not only
selecting data from table). I've found there is only one possibility to 
temporarily solve it and
have full backups - delete corrupted records.

Thanks for any news,

Jan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: Async query processing

2014-04-22 Thread Florian Weimer

On 01/05/2014 01:56 PM, Craig Ringer wrote:


JDBC also has a statement batching interface. Right now PgJDBC just
unwraps the batch and runs each query individually. Any async-support
improvements server-side should probably consider the need of executing
a batch. The batch might be one PreparedStatement with many different
parameters, or it might be a series of unrelated statements. A way for
PgJDBC to run the batch without syncing with the server after each query
would be really helpful.


The server already supports this because it supports pipelining.

Anyway, I have now switched my application to COPY FROM STDIN where 
possible, and the increase in throughput is phenomenal—even though it is 
running completely locally.  I could squeeze out more round-trips if I 
had an asynchronous query facility in libpq.


Feedback in this thread was, we want something like this in libpq, but 
not the thing you proposed.  But there have been no concrete 
counter-proposals, and some of the responses did not take into account 
the inherent complexities of round-trip avoidance.  So I'm not sure how 
to move this topic forward.


--
Florian Weimer / Red Hat Product Security Team


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] AXLE Plans for 9.5 and 9.6

2014-04-22 Thread Simon Riggs
On 22 April 2014 00:24, Josh Berkus j...@agliodbs.com wrote:
 On 04/21/2014 03:41 PM, Simon Riggs wrote:
 Storage Efficiency
 * Compression
 * Column Orientation

 You might look at turning this:

 http://citusdata.github.io/cstore_fdw/

 ... into a more integrated part of Postgres.

Of course I'm aware of that work - credit to them. Certainly, many
people feel that it is now time to do as you suggest and include
column store features within PostgreSQL.

As to turning it into a more integrated part of Postgres, we have a
few problems there

1. cstore_fdw code has an incompatible licence

2. I don't think FDWs are the right place for complex new
architectures such as column store, massively parallel processing or
sharding. The fact that it is probably the best place to implement it
in user space doesn't mean it transfers well into core code. That's a
shame and I don't know what to do about it, because it would be nice
to simply ask for change of licence and then integrate it, but it
seems more work than that (to me).

cstore_fdw uses ORC, which interestingly stores lightweight index
values that look exactly like MinMax indexes, so at least PostgreSQL
shoiuld be getting that soon.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] AXLE Plans for 9.5 and 9.6

2014-04-22 Thread Simon Riggs
On 22 April 2014 10:42, Jov am...@amutu.com wrote:

 what about runtime code generation using LLVM?
 http://blog.cloudera.com/blog/2013/02/inside-cloudera-impala-runtime-code-generation/
 http://llvm.org/devmtg/2013-11/slides/Wanderman-Milne-Cloudera.pdf

Those techniques have been in use for at least 20 years on various platforms.

The main issues PostgreSQL faces is supporting many platforms and
compilers, while at the same time supporting extensible data types.

I believe there is some research work into run-time compilation in
progress, but that seems unlikely to make it into Postgres core.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] AXLE Plans for 9.5 and 9.6

2014-04-22 Thread Hannu Krosing
On 04/22/2014 01:24 AM, Josh Berkus wrote:
 On 04/21/2014 03:41 PM, Simon Riggs wrote:
 Storage Efficiency
 * Compression
 * Column Orientation
 You might look at turning this:

 http://citusdata.github.io/cstore_fdw/

 ... into a more integrated part of Postgres.
What would be of more general usefulness is probably
better planning and better performance of FDW interface.

So instead of integrating one specific FDW it would make
sense to improve postgresql so that it can use (properly written)
FDWs at native speeds

Regards

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] AXLE Plans for 9.5 and 9.6

2014-04-22 Thread MauMau

From: Simon Riggs si...@2ndquadrant.com

Some of areas of RD are definitely on the roadmap, others are more
flexible. Some of this is in progress, other stuff is not even at the
design stage - yet, just a few paragraphs along the lines of we will
look at these topics. If we have room, its possible we may
accommodate other topics; this is not carte blanche, but the reason
for posting here is so people know we will take input, following the
normal community process. Detailed in-person discussions at PGCon are
expected and the Wiki pages will be updated for each aspect.

BI-related Indexing
* MinMax indexes
* Bitmap indexes

Large Systems
* Freeze avoidance
* Storage management issues for very large systems

Storage Efficiency
* Compression
* Column Orientation

Optimisation
* Bulk loading speed improvements
* Bulk FK evaluation
* Executor tuning for very large queries

Query tuning
* Approximate queries, sampling
* Materialized Views


Great!  I'm looking forward to seeing PostgreSQL evolve as an analytics 
database for data warehousing.  Is there any reason why in-memory database 
and MPP is not included?


Are you planning to include the above features in 9.5 and 9.6?  Are you 
recommending other developers not implement these features to avoid 
duplication of work with AXLE?


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-22 Thread Andres Freund
Hi,

Attached you can find a short (compile tested only ) patch implementing
a 'shared_memory_type' GUC, akin to 'dynamic_shared_memory_type'. Will
only apply to 9.4, not 9.3, but it should be easy to convert for it.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From e090326d8b5933a162e8e503ccec714bde3a49b7 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Tue, 22 Apr 2014 14:17:34 +0200
Subject: [PATCH] Add shared_memory_type GUC.

---
 src/backend/port/sysv_shmem.c | 18 --
 src/backend/storage/ipc/ipci.c|  2 ++
 src/backend/utils/misc/guc.c  | 23 +++
 src/backend/utils/misc/postgresql.conf.sample |  7 ++-
 src/include/storage/pg_shmem.h| 21 +++--
 5 files changed, 62 insertions(+), 9 deletions(-)

diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c
index 5e3850b..f6343d3 100644
--- a/src/backend/port/sysv_shmem.c
+++ b/src/backend/port/sysv_shmem.c
@@ -465,14 +465,20 @@ PGSharedMemoryCreate(Size size, bool makePrivate, int port,
 	 * developer use, this shouldn't be a big problem.
 	 */
 #ifndef EXEC_BACKEND
-	AnonymousShmem = CreateAnonymousSegment(size);
-	AnonymousShmemSize = size;
+	if (shared_memory_type == SHMEM_TYPE_MMAP)
+	{
+		AnonymousShmem = CreateAnonymousSegment(size);
+		AnonymousShmemSize = size;
 
-	/* Now we need only allocate a minimal-sized SysV shmem block. */
-	sysvsize = sizeof(PGShmemHeader);
-#else
-	sysvsize = size;
+		/* Now we need only allocate a minimal-sized SysV shmem block. */
+		sysvsize = sizeof(PGShmemHeader);
+	}
+	else
 #endif
+	{
+		Assert(shared_memory_type == SHMEM_TYPE_SYSV);
+		sysvsize = size;
+	}
 
 	/* Make sure PGSharedMemoryAttach doesn't fail without need */
 	UsedShmemSegAddr = NULL;
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 4290d2d..c58f171 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -42,6 +42,8 @@
 #include storage/sinvaladt.h
 #include storage/spin.h
 
+/* GUCs */
+int			shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
 
 shmem_startup_hook_type shmem_startup_hook = NULL;
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index ea54d16..f6d9dc4 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -403,6 +403,19 @@ static const struct config_enum_entry huge_pages_options[] = {
 	{NULL, 0, false}
 };
 
+static struct config_enum_entry shared_memory_options[] = {
+#ifndef WIN32
+	{ sysv, SHMEM_TYPE_SYSV, false},
+#endif
+#ifndef EXEC_BACKEND
+	{ mmap, SHMEM_TYPE_MMAP, false},
+#endif
+#ifdef WIN32
+	{ windows, SHMEM_TYPE_WINDOWS, false},
+#endif
+	{NULL, 0, false}
+};
+
 /*
  * Options for enum values stored in other modules
  */
@@ -3453,6 +3466,16 @@ static struct config_enum ConfigureNamesEnum[] =
 	},
 
 	{
+		{shared_memory_type, PGC_POSTMASTER, RESOURCES_MEM,
+			gettext_noop(Selects the shared memory implementation used.),
+			NULL
+		},
+		shared_memory_type,
+		DEFAULT_SHARED_MEMORY_TYPE, shared_memory_options,
+		NULL, NULL, NULL
+	},
+
+	{
 		{wal_sync_method, PGC_SIGHUP, WAL_SETTINGS,
 			gettext_noop(Selects the method used for forcing WAL updates to disk.),
 			NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 70e5a51..1c5f02a 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -127,7 +127,12 @@
 #maintenance_work_mem = 64MB		# min 1MB
 #autovacuum_work_mem = -1		# min 1MB, or -1 to use maintenance_work_mem
 #max_stack_depth = 2MB			# min 100kB
-#dynamic_shared_memory_type = posix # the default is the first option
+#shared_memory_type = mmap		# the default is the first option
+	# supported by the operating system:
+	#   mmap
+	#   sysv
+	#   windows
+#dynamic_shared_memory_type = posix	# the default is the first option
 	# supported by the operating system:
 	#   posix
 	#   sysv
diff --git a/src/include/storage/pg_shmem.h b/src/include/storage/pg_shmem.h
index ab28ebe..53ea43a 100644
--- a/src/include/storage/pg_shmem.h
+++ b/src/include/storage/pg_shmem.h
@@ -41,8 +41,13 @@ typedef struct PGShmemHeader	/* standard header for all Postgres shmem */
 #endif
 } PGShmemHeader;
 
-/* GUC variable */
-extern int huge_pages;
+/* Possible values for shared_memory_type */
+typedef enum
+{
+	SHMEM_TYPE_WINDOWS,
+	SHMEM_TYPE_SYSV,
+	SHMEM_TYPE_MMAP
+} PGShmemType;
 
 /* Possible values for huge_pages */
 typedef enum
@@ -52,6 +57,10 @@ typedef enum
 	HUGE_PAGES_TRY
 } HugePagesType;
 
+/* GUC variables */
+extern int shared_memory_type;
+extern int huge_pages;
+
 #ifndef WIN32
 extern unsigned long UsedShmemSegID;
 #else
@@ -59,6 +68,14 @@ extern HANDLE UsedShmemSegID;
 #endif
 extern void 

Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-22 Thread Magnus Hagander
On Tue, Apr 22, 2014 at 8:26 AM, Mark Kirkwood 
mark.kirkw...@catalyst.net.nz wrote:

 On 22/04/14 09:25, Andres Freund wrote:

 On 2014-04-21 17:21:20 -0400, Bruce Momjian wrote:

 On Mon, Apr 21, 2014 at 02:08:51PM -0700, Joshua Drake wrote:

 If the community had more *BSD presence I think it would be great
 but it isn't all that viable at this point. I do know however that
 no-one in this community would turn down a team of FreeBSD advocates
 helping us make PostgreSQL awesome for PostgreSQL.


 I don't think we would even implement a run-time control for Linux or
 Windows for this, so I don't even think it is a FreeBSD issue.


 I think some of the arguments in this thread are pretty damn absurd. We
 have just introduced dynamic_shared_memory_type.


 +1

 I was just thinking the same thing...


I didn't realize we had a guc for dynamic shared memory, must've missed
that in the discussion about that one. I agree that if we have that, it
makes perfect sense to have the same setting available for the main shared
memory segment.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] AXLE Plans for 9.5 and 9.6

2014-04-22 Thread Hannu Krosing
On 04/22/2014 02:04 PM, Simon Riggs wrote:
 On 22 April 2014 00:24, Josh Berkus j...@agliodbs.com wrote:
 On 04/21/2014 03:41 PM, Simon Riggs wrote:
 Storage Efficiency
 * Compression
 * Column Orientation
 You might look at turning this:

 http://citusdata.github.io/cstore_fdw/

 ... into a more integrated part of Postgres.
 Of course I'm aware of that work - credit to them. Certainly, many
 people feel that it is now time to do as you suggest and include
 column store features within PostgreSQL.

 As to turning it into a more integrated part of Postgres, we have a
 few problems there

 1. cstore_fdw code has an incompatible licence

 2. I don't think FDWs are the right place for complex new
 architectures such as column store, massively parallel processing or
 sharding. 
I agree that FDW is not an end-all solution for all these, but it is a
reasonable starting point and it just might be that the extra things
needed could be added to our FDW API instead of sewing it directly
into backend guts.


I recently tried to implement sharding at FDW level and the main
problem I ran into was a missing join type for efficiently using it
for certain queries.

The specific use case was queries of form

select l.*, r*
from remotetable r
join localtable l
on l.key1 = r.id and l.n = N;

PostgreSQL offered only two options:

1) full scan on remote table

2) single id=$ selects

neither of which are what is actually needed, as firs performs badly
if there are more than a few rows in remote table and 2nd performs
badly if l.n = N returns more than a few rows

when I manually rewrote the query to

select l.*, r*
from remotetable r where r.id = ANY(ARRAY(select key1 from localtable
where n = N))
join localtable l
on l.key1 = r.id and l.n = N;

it run really well.

Unfortunately this is not something that postgreSQL considers by itself
while optimising.

BTW, this kind of optimisation should also be a win for really large IN
queries if we
could have an indexed IN whic would not start each lookup from the index
root, but
rather would sort the IN contents and do an index merge vis skipping
from current position.


Cheers










-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-22 Thread Stephen Frost
* Magnus Hagander (mag...@hagander.net) wrote:
 I didn't realize we had a guc for dynamic shared memory, must've missed
 that in the discussion about that one. I agree that if we have that, it
 makes perfect sense to have the same setting available for the main shared
 memory segment.

I recall the back-and-forth about the issues with dynamic shared memory
but I had hoped they would result in a solution that didn't involve
having to support both..

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] AXLE Plans for 9.5 and 9.6

2014-04-22 Thread Andrew Dunstan


On 04/22/2014 08:15 AM, MauMau wrote:



Are you planning to include the above features in 9.5 and 9.6? Are you 
recommending other developers not implement these features to avoid 
duplication of work with AXLE?






Without pointing any fingers, I should note that I have learned the hard 
way to take such recommendations with a grain of salt. More than once I 
have been stopped from working on something because someone else said 
they were, only for nothing to appear, and in the interests of full 
disclosure I can think of two significant instances when I have been 
similarly guilty, although the most serious of those has since been 
rectified by someone else.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] AXLE Plans for 9.5 and 9.6

2014-04-22 Thread Andrew Dunstan


On 04/22/2014 08:04 AM, Simon Riggs wrote:

On 22 April 2014 00:24, Josh Berkus j...@agliodbs.com wrote:

On 04/21/2014 03:41 PM, Simon Riggs wrote:

Storage Efficiency
* Compression
* Column Orientation

You might look at turning this:

http://citusdata.github.io/cstore_fdw/

... into a more integrated part of Postgres.

Of course I'm aware of that work - credit to them. Certainly, many
people feel that it is now time to do as you suggest and include
column store features within PostgreSQL.

As to turning it into a more integrated part of Postgres, we have a
few problems there

1. cstore_fdw code has an incompatible licence

2. I don't think FDWs are the right place for complex new
architectures such as column store, massively parallel processing or
sharding. The fact that it is probably the best place to implement it
in user space doesn't mean it transfers well into core code. That's a
shame and I don't know what to do about it, because it would be nice
to simply ask for change of licence and then integrate it, but it
seems more work than that (to me).





I agree, and indeed that was something like my first reaction to hearing 
about this development - FDW seems like a very odd way to handle this. 
But the notion of builtin columnar storage suggests to me that we really 
need first to tackle how various storage engines might be incorporated 
into Postgres. I know this has been a bugbear for many years, but maybe 
now with serious proposals for alternative storage engines on the 
horizon we can no longer afford to put off the evil day when we grapple 
with it.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] AXLE Plans for 9.5 and 9.6

2014-04-22 Thread Simon Riggs
On 22 April 2014 13:15, MauMau maumau...@gmail.com wrote:

 Great!  I'm looking forward to seeing PostgreSQL evolve as an analytics
 database for data warehousing.  Is there any reason why in-memory database
 and MPP is not included?

Those ideas are valid; the features are bounded by resource
constraints of time and money, as well as by technical skills/
capacities of my fellow developers. My analysis has been that
implementing parallelism has lower benefit/cost ratio than other
features, as well as requiring more expensive servers (for MPP). I
expect MPP to be an eventual end goal from BDR project.

 Are you planning to include the above features in 9.5 and 9.6?

Yes

 Are you
 recommending other developers not implement these features to avoid
 duplication of work with AXLE?

This was more to draw attention to the work so that all interested
parties can participate in producing something useful.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] shm_mq inconsistent behavior of SHM_MQ_DETACHED

2014-04-22 Thread Petr Jelinek

Hi,

I was playing with shm_mq and found a little odd behavior with detaching 
after sending messages.


Following sequence behaves as expected (receiver gets 2 messages):
P1 - set_sender
P1 - attach
P2 - set_receiver
P2 - attach
P1 - send
P2 - receive
P1 - send
P1 - detach
P2 - receive
P2 - detach

But if I do first receive after detach like in this sequence:
P1 - set_sender
P1 - attach
P2 - set_receiver
P2 - attach
P1 - send
P1 - send
P1 - detach
P2 - receive

I get SHM_MQ_DETACHED on the receiver even though there are messages in 
the ring buffer.


The reason for this behavior is that mqh_counterparty_attached is only 
set by shm_mq_receive. This does not seem to be consistent - I would 
either expect to get SHM_MQ_DETACHED always when other party has 
detached or always get all remaining messages that are in queue (and I 
would strongly prefer the latter).


Maybe the shm_mq_get_bytes_written should be used to determine if there 
is something left for us to read in the receiver if we hit the 
!mqh_counterparty_attached code path with detached sender?



--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Store data in pg_toast for custom type fails (bug?)

2014-04-22 Thread Tom Lane
Honza hon...@gmail.com writes:
 On 03/28/2014 07:02 PM, Tom Lane wrote:
 I believe I understand what's going on here, and it's not quite as
 exciting as it first appears.  The issue is that we are failing to
 honor the toasting goes only one level deep rule in the specific
 case of arrays of composite type.  So while it's definitely a nasty
 bug, it affects only narrow use cases, and doesn't call into question
 our whole vacuuming strategy or anything like that.

 I would like to ask if there is anything new in this bug?

Still working on it --- haven't you been following the other thread?
http://www.postgresql.org/message-id/flat/29007.1396038...@sss.pgh.pa.us

We should have a fix in time for the next set of minor releases, whenever
that is (probably a month or two).  If you're feeling desperate for a
temporary solution, you could make use of one of the patches I already
posted, even though I now don't like those as a long-term answer.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] AXLE Plans for 9.5 and 9.6

2014-04-22 Thread Stephen Frost
* Andrew Dunstan (and...@dunslane.net) wrote:
 I agree, and indeed that was something like my first reaction to
 hearing about this development - FDW seems like a very odd way to
 handle this. But the notion of builtin columnar storage suggests to
 me that we really need first to tackle how various storage engines
 might be incorporated into Postgres. I know this has been a bugbear
 for many years, but maybe now with serious proposals for alternative
 storage engines on the horizon we can no longer afford to put off
 the evil day when we grapple with it.

Agreed, and it goes beyond just columnar stores- I could see IOTs being
implemented using this notion of a different 'storage engine', but
calling it a 'storage engine' makes it sound like we want to change how
we access files and I don't think we really want to change that but
rather come up with a way to have an alternative heap..  Columnar or
IOTs would still be page-based and go through shared buffers, etc, I'd
think..

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)

2014-04-22 Thread Tom Lane
Noah Misch n...@leadboat.com writes:
 On Mon, Apr 21, 2014 at 10:57:34AM -0400, Tom Lane wrote:
 I'm actually planning to set this patch on the shelf for a bit and go
 investigate the other alternative, ie, not generating composite Datums
 containing toast pointers in the first place.

 I maintain that the potential slowdown is too great to consider adopting that
 for the sake of a cleaner patch.  Your last message examined a 67% performance
 regression.  The strategy you're outlining now can slow a query by 1,000,000%.

[ shrug... ]  It could also speed up a query by similar factors.  I see
no good reason to suppose that it would be a net loss overall.  I agree
that it might change performance characteristics in a way that we'd
ideally not do in the back branches.  But the fact remains that we've
got a bad bug to fix, and absent a reasonably trustworthy functional fix,
arguing about performance characteristics is a waste of breath.  I can
make it arbitrarily fast if it's not required to give the right answer.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-22 Thread Andrew Dunstan


On 04/22/2014 01:36 AM, Joshua D. Drake wrote:


On 04/21/2014 06:19 PM, Andrew Dunstan wrote:



If we never start we'll never get there.

I can think of several organizations that might be approached to donate
hardware.


Like .Org?

We have a hardware farm, a rack full of hardware and spindles. It 
isn't the most current but it is there.






I'm going away tomorrow for a few days RR. when I'm back next week I 
will set up a demo client running this module. If you can have a machine 
prepped for this purpose by then so much the better, otherwise I will 
have to drag out a box I recently rescued and have been waiting for 
something to use it with. It's more important that it's stable (i.e. 
nothing else running on it) than that it's very powerful. It could be 
running Ubuntu or some Redhattish variant or, yes, even FreeBSD.


cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-22 Thread Palle Girgensohn

22 apr 2014 kl. 17:26 skrev Andrew Dunstan and...@dunslane.net:

 
 On 04/22/2014 01:36 AM, Joshua D. Drake wrote:
 
 On 04/21/2014 06:19 PM, Andrew Dunstan wrote:
 
 
 If we never start we'll never get there.
 
 I can think of several organizations that might be approached to donate
 hardware.
 
 Like .Org?
 
 We have a hardware farm, a rack full of hardware and spindles. It isn't the 
 most current but it is there.
 
 
 
 
 I'm going away tomorrow for a few days RR. when I'm back next week I will 
 set up a demo client running this module. If you can have a machine prepped 
 for this purpose by then so much the better, otherwise I will have to drag 
 out a box I recently rescued and have been waiting for something to use it 
 with. It's more important that it's stable (i.e. nothing else running on it) 
 than that it's very powerful. It could be running Ubuntu or some Redhattish 
 variant or, yes, even FreeBSD.

If you need help with the FreeBSD setup, I'm at you service.

Palle


signature.asc
Description: Message signed with OpenPGP using GPGMail


[HACKERS] Implied BETWEEN from join quals

2014-04-22 Thread Simon Riggs
I was recently nudged to describe an optimisation of merge
joins/sorts. Rather than decribe that, I've looked at the general
case:

There are some additional implications we may make when joining
tables... a particularly interesting one is that

SELECT *
FROM Fact JOIN Dim on Fact.col = Dim.col

can be rewritten as

SELECT *
FROM Fact JOIN Dim on Fact.col = Dim.col
WHERE
(
Fact.col BETWEEN
 (SELECT min(col) FROM Dim)
AND
 (SELECT max(col) FROM Dim)
)
AND
(
Dim.col BETWEEN
 (SELECT min(col) FROM Fact)
AND
 (SELECT max(col) FROM Fact)
)

which also works similarly for anti/semi-joins.

If we have indexes on A.col and B.col then these additional quals can
be derived cheaply at run-time and could have an important effect on
optimisation.

1) With various kinds of index, we would be able to use these implied
quals to further restrict the scan. Perhaps that doesn't sound very
interesting, but it is very important when solving an outside-in
join on a star schema, such as...

SELECT count(*)
FROM Fact JOIN Dim on Fact.col = Dim.col
WHERE Dim.other = 1

since there is no qual that can be applied directly to the Fact table,
causing us to scan the entire table.

We can rewrite this query as

EXPLAIN (ANALYZE on, TIMING off, COSTS on)
SELECT count(*)
FROM Fact JOIN Dim on Fact.col = Dim.col
WHERE Dim.other = 1
AND
(
Fact.col BETWEEN
 (SELECT min(col) FROM Dim WHERE Dim.other = 1)
AND
 (SELECT max(col) FROM Dim WHERE Dim.other = 9)
)

Note that the implied qual on the Dim table has been dropped as
uninteresting. This is because we can calculate the cost and potential
benefit of applying the rewrite, allowing us to discard one or both
implied clauses.

Note also that this has nothing to do with join order. This is solely
about making inferences using the join quals between any two tables.

2) We calculate the join selectivity by comparing the MFVs of the join
columns on the tables being joined. ISTM that we could use the min()
and max() values to refine the selectivity, which can often be wrong
as a result.

- - -

The current planner doesn't add these predicates automatically, but if
it did, it would come up with the following slightly sub-optimal
plan...

EXPLAIN (ANALYZE on, TIMING off, COSTS on)
SELECT count(*)
FROM Fact JOIN Dim on Fact.col = Dim.col
WHERE Dim.other = 1
AND
(
Fact.col BETWEEN
 (SELECT min(col) FROM Dim WHERE Dim.other = 1)
AND
 (SELECT max(col) FROM Dim WHERE Dim.other = 1)
)

 Aggregate  (cost=31.79..31.80 rows=1 width=0) (actual rows=1 loops=1)
   InitPlan 1 (returns $0)
 -  Aggregate  (cost=1.03..1.04 rows=1 width=4) (actual rows=1 loops=1)
   -  Seq Scan on dim dim_1  (cost=0.00..1.02 rows=1 width=4)
(actual rows=1 loops=1)
 Filter: (other = 1)
 Rows Removed by Filter: 1
   InitPlan 2 (returns $1)
 -  Aggregate  (cost=1.03..1.04 rows=1 width=4) (actual rows=1 loops=1)
   -  Seq Scan on dim dim_2  (cost=0.00..1.02 rows=1 width=4)
(actual rows=1 loops=1)
 Filter: (other = 1)
 Rows Removed by Filter: 1
   -  Merge Join  (cost=1.33..29.09 rows=250 width=0) (actual
rows=10 loops=1)
 Merge Cond: (dim.col = fact.col)
 -  Sort  (cost=1.03..1.04 rows=1 width=4) (actual rows=1 loops=1)
   Sort Key: dim.col
   Sort Method: quicksort  Memory: 25kB
   -  Seq Scan on dim  (cost=0.00..1.02 rows=1 width=4)
(actual rows=1 loops=1)
 Filter: (other = 1)
 Rows Removed by Filter: 1
 -  Index Only Scan using fact_col_idx on fact
(cost=0.29..24.29 rows=500 width=4) (actual rows=10 loops=1)
   Index Cond: ((col = $0) AND (col = $1))
   Heap Fetches: 10

which is sub-optimal only because of the mis-estimation of the effect
of the min() and max(), so we will still benefit from resolving the
parameters to a constant before proceeding with the main query. A
better plan would be

EXPLAIN (ANALYZE on, TIMING off, COSTS on)
SELECT count(*)
FROM Fact JOIN Dim on Fact.col = Dim.col
WHERE Dim.other = 1
AND Fact.col BETWEEN 1 AND 1

 Aggregate  (cost=2944.04..2944.05 rows=1 width=0) (actual rows=1 loops=1)
   -  Hash Join  (cost=1.04..2819.04 rows=5 width=0) (actual
rows=10 loops=1)
 Hash Cond: (fact.col = dim.col)
 -  Seq Scan on fact  (cost=0.00..1943.00 rows=10
width=4) (actual rows=10 loops=1)
   Filter: ((col = 1) AND (col = 1))
 -  Hash  (cost=1.02..1.02 rows=1 width=4) (actual rows=1 loops=1)
   Buckets: 1024  Batches: 1  Memory Usage: 1kB
   -  Seq Scan on dim  (cost=0.00..1.02 rows=1 width=4)
(actual rows=1 loops=1)
 Filter: (other = 1)
 Rows Removed by Filter: 1

but we can probably live with that, as we do with other dynamic index plans.

So when can we use this?

The additional cost of adding this to the query is
* additional qual:  2 * cpu_operator_cost * rows
* getting values: 2 * 

Re: [HACKERS] Implied BETWEEN from join quals

2014-04-22 Thread Stephen Frost
Simon,

This all looks good, and at the risk of being slightly off-topic for
this thread, I just wanted to mention..

* Simon Riggs (si...@2ndquadrant.com) wrote:
 Current proposal ends there, but there is a further optimization that
 allows us to remove the join altogether if
 * There is a FK between Fact and Dim

It'd be great if we could start by looking for the above requirement and
then doing join-removal when it exists and no columns from Dim are
referenced (outside of the FK reference, which must result in 'true' or
we've screwed something up).

I had been looking at this about a month ago and just ran out of time to
play with it, but it doesn't seem like it'd be terribly difficult to
do..

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] AXLE Plans for 9.5 and 9.6

2014-04-22 Thread Josh Berkus
On 04/22/2014 06:39 AM, Andrew Dunstan wrote:
 I agree, and indeed that was something like my first reaction to hearing
 about this development - FDW seems like a very odd way to handle this.
 But the notion of builtin columnar storage suggests to me that we really
 need first to tackle how various storage engines might be incorporated
 into Postgres. I know this has been a bugbear for many years, but maybe
 now with serious proposals for alternative storage engines on the
 horizon we can no longer afford to put off the evil day when we grapple
 with it.

Yes.  *IF* PostgreSQL already supported alternate storage, then the
Citus folks might have released their CStore as a storage plugin instead
of an FDW.  However, if they'd waited for pluggable storage, they'd
still be waiting.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implied BETWEEN from join quals

2014-04-22 Thread Simon Riggs
On 22 April 2014 17:00, Stephen Frost sfr...@snowman.net wrote:
 Simon,

 This all looks good, and at the risk of being slightly off-topic for
 this thread, I just wanted to mention..

 * Simon Riggs (si...@2ndquadrant.com) wrote:
 Current proposal ends there, but there is a further optimization that
 allows us to remove the join altogether if
 * There is a FK between Fact and Dim

 It'd be great if we could start by looking for the above requirement and
 then doing join-removal when it exists and no columns from Dim are
 referenced (outside of the FK reference, which must result in 'true' or
 we've screwed something up).

 I had been looking at this about a month ago and just ran out of time to
 play with it, but it doesn't seem like it'd be terribly difficult to
 do..

Yeh, it was on my list. I had a few optimizer changes I've been
looking to make for a while.

Hopefully 9.5 is the year I get the chance at a running start on that
since they're never as easy as you'd hope.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: Async query processing

2014-04-22 Thread Claudio Freire
On Tue, Apr 22, 2014 at 8:19 AM, Florian Weimer fwei...@redhat.com wrote:
 Feedback in this thread was, we want something like this in libpq, but not
 the thing you proposed.  But there have been no concrete counter-proposals,
 and some of the responses did not take into account the inherent
 complexities of round-trip avoidance.  So I'm not sure how to move this
 topic forward.

What exactly do you mean by not taking into account?

I do not believe you responded to my proposed interface. I'm not
requesting a change in roundtrip avoidance measures, just the
interface.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-22 Thread Joshua D. Drake


On 04/22/2014 08:26 AM, Andrew Dunstan wrote:


I'm going away tomorrow for a few days RR. when I'm back next week I
will set up a demo client running this module. If you can have a machine
prepped for this purpose by then so much the better, otherwise I will
have to drag out a box I recently rescued and have been waiting for
something to use it with. It's more important that it's stable (i.e.
nothing else running on it) than that it's very powerful. It could be
running Ubuntu or some Redhattish variant or, yes, even FreeBSD.


This is best handled by Mark. Mark can you help Andrew with this? I 
assume we would use the DL385 with the MS70?


JD




cheers

andrew





--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
Political Correctness is for cowards.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] AXLE Plans for 9.5 and 9.6

2014-04-22 Thread Pavel Stehule
2014-04-22 19:02 GMT+02:00 Josh Berkus j...@agliodbs.com:

 On 04/22/2014 06:39 AM, Andrew Dunstan wrote:
  I agree, and indeed that was something like my first reaction to hearing
  about this development - FDW seems like a very odd way to handle this.
  But the notion of builtin columnar storage suggests to me that we really
  need first to tackle how various storage engines might be incorporated
  into Postgres. I know this has been a bugbear for many years, but maybe
  now with serious proposals for alternative storage engines on the
  horizon we can no longer afford to put off the evil day when we grapple
  with it.

 Yes.  *IF* PostgreSQL already supported alternate storage, then the
 Citus folks might have released their CStore as a storage plugin instead
 of an FDW.  However, if they'd waited for pluggable storage, they'd
 still be waiting.


I am sceptical - what I know about OLAP column store databases - they need
a hardly different planner, so just engine or storage is not enough. Vector
Wise try to merge Ingres with Monet engine more than four years - and still
has some issues.

Our extensibility is probably major barrier against fast OLAP - I see a
most realistic way to support better partitioning and going in direction
higher parallelism and distribution - and maybe map/reduce support.

In GoodData we use successfully Postgres for BI projects to 20G with fast
response - and most painfulness are missing MERGE, missing fault tolerant
copy, IO expensive update of large tables with lot of indexes and missing
simple massive partitioning. On second hand - Postgres works perfectly on
thousands databases with thousands tables without errors with terrible
simple deploying in cloud environment.

Regards

Pavel



 --
 Josh Berkus
 PostgreSQL Experts Inc.
 http://pgexperts.com


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-22 Thread Merlin Moncure
On Mon, Apr 21, 2014 at 8:06 PM, Andrew Dunstan and...@dunslane.net wrote:

 On 04/21/2014 08:49 PM, Stephen Frost wrote:

 * Tatsuo Ishii (is...@postgresql.org) wrote:

 I observe performance degradation with PostgreSQL 9.3 vs 9.2 on Linux
 as well.  The hardware is HP DL980G7, 80 cores, 2TB mem, RHEL 6,
 pgbench is used (read only query), scale factor is 1,000 (DB size
 15GB).

 Can you isolate the sysv-vs-mmap patch and see what happens with just
 that change..?



 This is exactly why we need a benchfarm.

 I actually have a client working based on Greg Smith's pgbench tools.

 What we would need is a way to graph the results - that's something beyond
 my very rudimentary expertise in web programming. If anyone feels like
 collaborating I'd be glad to hear from them (The web site is programmed in
 perl + TemplateToolkit, but even that's not immutable. I'm open to using,
 say, node.js plus one of its templating engines.

Hm, you got me interested.  Is the data you want to visualize stored
in a database?  Got some example data? (this is pretty off topic, feel
free to contact off-list or on a new thread etc).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: Async query processing

2014-04-22 Thread Florian Weimer

On 01/03/2014 06:06 PM, Claudio Freire wrote:


Per-query expectations could be such a thing. And it can even work with PQexec:

PQexec(con, SELECT nextval('a_id_seq') FROM generate_series(1,10););
--read--
PQexec(con, SELECT nextval('b_id_seq') FROM generate_series(1,10););
--read--
PQexec(con, INSERT INTO a (...);, PQEXPECT_NO_RESULT | PQASYNC_CORK);
PQexec(con, INSERT INTO b (...);, PQEXPECT_NO_RESULT | PQASYNC_CORK);
PQexec(con, INSERT INTO a (...);, PQEXPECT_NO_RESULT | PQASYNC_CORK);
PQexec(con, INSERT INTO b (...);, PQEXPECT_NO_RESULT | PQASYNC_CORK);
PQexec(con, INSERT INTO a (...);, PQEXPECT_NO_RESULT | PQASYNC_CORK);
PQexec(con, INSERT INTO b (...);, PQEXPECT_NO_RESULT | PQASYNC_CORK);
... 9 times...
PQexec(con, INSERT INTO a (...);, PQEXPECT_NO_RESULT | PQASYNC_CORK);
PQexec(con, INSERT INTO b (...);, PQEXPECT_NO_RESULT | PQASYNC);
do {
// do something useful
} while (PQflush());

Here, the PQASYNC flag would temporarily switch to non-blocking I/O,
and buffer what cannot be sent. PQASNC_CORK, would only buffer (only
send if the buffer is full). After any ASYNC call, PQflush would be
necessary (to flush the send queue and to consume the expected
responses), but I can imagine any synchronous call (PQexec,
PQsendQuery or whatever) could detect a non-empty buffer and just
blockingly flush right there.


How would you consume results once they arrive?  I think this only 
covers the no-result case, and it has since come to my attention that 
the Node.js folks are looking for general async response processing.


(In order to implement your proposal, we'd have to add quite a few 
additional APIs to include the flag argument, but that's a minor detail.)


--
Florian Weimer / Red Hat Product Security Team


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: Async query processing

2014-04-22 Thread Florian Weimer

On 04/22/2014 07:03 PM, Claudio Freire wrote:

On Tue, Apr 22, 2014 at 8:19 AM, Florian Weimer fwei...@redhat.com wrote:

Feedback in this thread was, we want something like this in libpq, but not
the thing you proposed.  But there have been no concrete counter-proposals,
and some of the responses did not take into account the inherent
complexities of round-trip avoidance.  So I'm not sure how to move this
topic forward.


What exactly do you mean by not taking into account?


A couple of them were along the lines let's just send batches of 
queries and sync between batches.  This does not seem very helpful to 
me because sizing the batches is difficult, and the sizes could be quite 
large.



I do not believe you responded to my proposed interface. I'm not
requesting a change in roundtrip avoidance measures, just the
interface.


I've sent a reply now, I think I missed this part.  Sorry.

--
Florian Weimer / Red Hat Product Security Team


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: Async query processing

2014-04-22 Thread Claudio Freire
On Tue, Apr 22, 2014 at 3:45 PM, Florian Weimer fwei...@redhat.com wrote:
 On 01/03/2014 06:06 PM, Claudio Freire wrote:

 Per-query expectations could be such a thing. And it can even work with
 PQexec:

 PQexec(con, SELECT nextval('a_id_seq') FROM generate_series(1,10););
 --read--
 PQexec(con, SELECT nextval('b_id_seq') FROM generate_series(1,10););
 --read--
 PQexec(con, INSERT INTO a (...);, PQEXPECT_NO_RESULT | PQASYNC_CORK);
 PQexec(con, INSERT INTO b (...);, PQEXPECT_NO_RESULT | PQASYNC_CORK);
 PQexec(con, INSERT INTO a (...);, PQEXPECT_NO_RESULT | PQASYNC_CORK);
 PQexec(con, INSERT INTO b (...);, PQEXPECT_NO_RESULT | PQASYNC_CORK);
 PQexec(con, INSERT INTO a (...);, PQEXPECT_NO_RESULT | PQASYNC_CORK);
 PQexec(con, INSERT INTO b (...);, PQEXPECT_NO_RESULT | PQASYNC_CORK);
 ... 9 times...
 PQexec(con, INSERT INTO a (...);, PQEXPECT_NO_RESULT | PQASYNC_CORK);
 PQexec(con, INSERT INTO b (...);, PQEXPECT_NO_RESULT | PQASYNC);
 do {
 // do something useful
 } while (PQflush());

 Here, the PQASYNC flag would temporarily switch to non-blocking I/O,
 and buffer what cannot be sent. PQASNC_CORK, would only buffer (only
 send if the buffer is full). After any ASYNC call, PQflush would be
 necessary (to flush the send queue and to consume the expected
 responses), but I can imagine any synchronous call (PQexec,
 PQsendQuery or whatever) could detect a non-empty buffer and just
 blockingly flush right there.


 How would you consume results once they arrive?  I think this only covers
 the no-result case,

You could do PQEXPECT_ROWCOUNT for storing rowcounts (and supply a
pointer to a result buffer), or PQ_BUFFER_RESULTS and do the same. The
user would have to know beforehand the size of the result set (or an
upper bound of it), and readiness notification would also need to be
solved. There could also be PQEXPECT_DISCARD_RESULTS.

Alternatively, you could use a callback, node-style, and it would
solve everything (including readiness and error notification), but I'm
not sure about the portability of such a thing. Callbacks certainly
would be tricky when ABI compatibility has to be maintained. It would
however be a much better interface.

The pattern here, is the API needs to perform all the magic and
complex buffering and flushing, it should not be on the application
side.

 and it has since come to my attention that the Node.js
 folks are looking for general async response processing.

Node support would take a little more work. Specifically, for node to
work with this API, the non-blocking case has to be handled properly,
allowing node to wait on the FDs instead of requiring it to flush and
block on the event loop thread.

That means a buffer at least as large as the query parameters, which
should be no problem (but might be tricky to implement), and handling
the equivalent of EWOULDBLOCK at the PQexec(.., PQASYNC) calls.

In any case, on any specific connection, query processing is linear.
So you really need a result callback queue (however you implement
it, be the aplication or the API). What I propose is moving as much as
possible to the API, since it will be common to all users of the async
functionality, and it will make it possible to fix bugs in that code
centrally too.

My earlier examples where all about discarding results, because that's
what enables the most thoughput, and it covers lots of cases. But, as
has been mentioned in previous posts, rowcounts at the very least have
to be handled as well, so there's that. I guess we can throw in
generic result callbacks (and errbacks) if the ABI allows it, and it
will be a net win in clarity and simplicity.


On Tue, Apr 22, 2014 at 3:49 PM, Florian Weimer fwei...@redhat.com wrote:
 On 04/22/2014 07:03 PM, Claudio Freire wrote:

 On Tue, Apr 22, 2014 at 8:19 AM, Florian Weimer fwei...@redhat.com
 wrote:

 Feedback in this thread was, we want something like this in libpq, but
 not
 the thing you proposed.  But there have been no concrete
 counter-proposals,
 and some of the responses did not take into account the inherent
 complexities of round-trip avoidance.  So I'm not sure how to move this
 topic forward.


 What exactly do you mean by not taking into account?


 A couple of them were along the lines let's just send batches of queries
 and sync between batches.  This does not seem very helpful to me because
 sizing the batches is difficult, and the sizes could be quite large.

Not at all.

The proposal certainly has that in their examples, but the API can be
used with no explicit synchronization. That's what I was going for
when I suggested that other API calls could flush implicitly when
needed. If you never call flush, flush happens on its own when
necessary. If you never call synchronous API, you never wait for
replies. Though when you need the results (as in right now), you need
to flush explicitly, there's no way around that.

Also, feel free to consider all this mere opinion. I'm certainly not
the one implementing it ;-)


-- 
Sent via 

Re: [HACKERS] [PATCH] `pg_dump -Fd` doesn't check write return status...

2014-04-22 Thread Bruce Momjian
On Sat, Mar 1, 2014 at 12:27:19PM -0800, Sean Chittenden wrote:
 The attached patch fixes the case when `pg_dump -Fd …` is called
 on a partition where write(2) fails for some reason or another. In
 this case, backup jobs were returning with a successful exit code even
 though most of the files in the dump directory were all zero length.

 I haven’t tested this patch’s failure conditions but the fix seems
 simple enough: cfwrite() needs to have its return status checked
 everywhere and exit_horribly() upon any failure. In this case,
 callers of _WriteData() were not checking the return status and were
 discarding the negative return status (e.g. ENOSPC).

 I made a cursory pass over the code and found one other instance where
 write status wasn’t being checked and also included that.

As is often the case with pg_dump, the problems you saw are a small part
of a larger set of problems in that code --- there is general ignoring of
read and write errors.

I have developed a comprehensive patch that addresses all the issues I
could find.  The use of function pointers and the calling of functions
directly and through function pointers makes the fix quite complex.  I
ended up placing checks at the lowest level and removing checks at
higher layers where they were sporadically placed.  

Patch attached.  I have tested dump/restore of all four dump output
formats with the 9.3 regression database and all tests passed.  I
believe this patch is too complex to backpatch, and I don't know how it
could be fixed more simply.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/src/bin/pg_dump/compress_io.c b/src/bin/pg_dump/compress_io.c
new file mode 100644
index 10bc3f0..bad21b5
*** a/src/bin/pg_dump/compress_io.c
--- b/src/bin/pg_dump/compress_io.c
*** static void InitCompressorZlib(Compresso
*** 86,99 
  static void DeflateCompressorZlib(ArchiveHandle *AH, CompressorState *cs,
  	  bool flush);
  static void ReadDataFromArchiveZlib(ArchiveHandle *AH, ReadFunc readF);
! static size_t WriteDataToArchiveZlib(ArchiveHandle *AH, CompressorState *cs,
  	   const char *data, size_t dLen);
  static void EndCompressorZlib(ArchiveHandle *AH, CompressorState *cs);
  #endif
  
  /* Routines that support uncompressed data I/O */
  static void ReadDataFromArchiveNone(ArchiveHandle *AH, ReadFunc readF);
! static size_t WriteDataToArchiveNone(ArchiveHandle *AH, CompressorState *cs,
  	   const char *data, size_t dLen);
  
  /*
--- 86,99 
  static void DeflateCompressorZlib(ArchiveHandle *AH, CompressorState *cs,
  	  bool flush);
  static void ReadDataFromArchiveZlib(ArchiveHandle *AH, ReadFunc readF);
! static void WriteDataToArchiveZlib(ArchiveHandle *AH, CompressorState *cs,
  	   const char *data, size_t dLen);
  static void EndCompressorZlib(ArchiveHandle *AH, CompressorState *cs);
  #endif
  
  /* Routines that support uncompressed data I/O */
  static void ReadDataFromArchiveNone(ArchiveHandle *AH, ReadFunc readF);
! static void WriteDataToArchiveNone(ArchiveHandle *AH, CompressorState *cs,
  	   const char *data, size_t dLen);
  
  /*
*** ReadDataFromArchive(ArchiveHandle *AH, i
*** 179,185 
  /*
   * Compress and write data to the output stream (via writeF).
   */
! size_t
  WriteDataToArchive(ArchiveHandle *AH, CompressorState *cs,
     const void *data, size_t dLen)
  {
--- 179,185 
  /*
   * Compress and write data to the output stream (via writeF).
   */
! void
  WriteDataToArchive(ArchiveHandle *AH, CompressorState *cs,
     const void *data, size_t dLen)
  {
*** WriteDataToArchive(ArchiveHandle *AH, Co
*** 190,203 
  	{
  		case COMPR_ALG_LIBZ:
  #ifdef HAVE_LIBZ
! 			return WriteDataToArchiveZlib(AH, cs, data, dLen);
  #else
  			exit_horribly(modulename, not built with zlib support\n);
  #endif
  		case COMPR_ALG_NONE:
! 			return WriteDataToArchiveNone(AH, cs, data, dLen);
  	}
! 	return 0;	/* keep compiler quiet */
  }
  
  /*
--- 190,205 
  	{
  		case COMPR_ALG_LIBZ:
  #ifdef HAVE_LIBZ
! 			WriteDataToArchiveZlib(AH, cs, data, dLen);
  #else
  			exit_horribly(modulename, not built with zlib support\n);
  #endif
+ 			break;
  		case COMPR_ALG_NONE:
! 			WriteDataToArchiveNone(AH, cs, data, dLen);
! 			break;
  	}
! 	return;
  }
  
  /*
*** DeflateCompressorZlib(ArchiveHandle *AH,
*** 298,307 
   */
  size_t		len = cs-zlibOutSize - zp-avail_out;
  
! if (cs-writeF(AH, out, len) != len)
! 	exit_horribly(modulename,
!   could not write to output file: %s\n,
!   strerror(errno));
  			}
  			zp-next_out = (void *) out;
  			zp-avail_out = cs-zlibOutSize;
--- 300,306 
   */
  size_t		len = cs-zlibOutSize - zp-avail_out;
  
! cs-writeF(AH, out, len);
  			}
  			zp-next_out = (void *) out;
  			zp-avail_out = cs-zlibOutSize;

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-22 Thread Jim Nasby

On 4/21/14, 6:07 PM, David G Johnston wrote:

Jim Nasby-2 wrote

I feel that if there is no memory pressure, frankly it doesnt matter much
about what gets out and what not. The case I am specifically targeting is
when the clocksweep gets to move about a lot i.e. high memory pressure
workloads. Of course,  I may be totally wrong here.


Well, there's either memory pressure or there isn't. If there isn't then
it's all moot*because we're not evicting anything*.

The trade-off I'm seeing here is between measuring when there is no memory
pressure - and thus eating at performance while not actually evicting
buffers - and not measuring but then encountering memory pressure and not
having a clue as to what should be evicted.


Right. OSes handle this by keeping a certain ratio of active vs inactive pages, 
regardless of pressure for free pages. That way when you need more pages in the 
free list you can pull them from the inactive list knowing that you're making a 
good decision.

One of the really nice things about this approach is that if memory pressure is 
low enough that you don't need more pages on the inactive list you don't even 
need to run that clock.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Patch FORCE_NULL option for copy COPY in CSV mode

2014-04-22 Thread Bruce Momjian
On Wed, Mar  5, 2014 at 09:49:30PM +0900, Michael Paquier wrote:
 On Wed, Mar 5, 2014 at 7:44 AM, Andrew Dunstan and...@dunslane.net wrote:
  I have picked this up and committed the patch. Thanks to all.
 Sorry for coming after the battle, but while looking at what has been
 committed I noticed that copy2.sql is actually doing twice in a row
 the same test:
 COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv,
 FORCE_NOT_NULL(b), FORCE_NULL(c));
 1,,
 \.
 -- should succeed with no effect (b remains an empty string, c remains 
 NULL)
 COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv,
 FORCE_NOT_NULL(b), FORCE_NULL(c));
 2,,
 \.
 
 See? For both tests the quotes are placed on the same column, the 3rd.
 I think that one of them should be like that, with the quotes on the
 2nd column = 2,,
 The attached patch corrects that... and a misplaced comment.
 Regards,

Thanks.  Patch applied.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Patch FORCE_NULL option for copy COPY in CSV mode

2014-04-22 Thread Bruce Momjian
On Fri, Mar  7, 2014 at 05:08:54PM +0900, Michael Paquier wrote:
 On Thu, Mar 6, 2014 at 12:09 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  Andrew Dunstan and...@dunslane.net writes:
  On 03/05/2014 09:11 AM, Michael Paquier wrote:
  After testing this feature, I noticed that FORCE_NULL and
  FORCE_NOT_NULL can both be specified with COPY on the same column.
 
  Strictly they are not actually contradictory, since FORCE NULL relates
  to quoted null strings and FORCE NOT NULL relates to unquoted null
  strings. Arguably the docs are slightly loose on this point. Still,
  applying both FORCE NULL and FORCE NOT NULL to the same column would be
  rather perverse, since it would result in a quoted null string becoming
  null and an unquoted null string becoming not null.
 
  Given the remarkable lack of standardization of CSV output, who's
  to say that there might not be data sources out there for which this
  is the desired behavior?  It's weird, I agree, but I think throwing
  an error for the combination is not going to be helpful.  It's not
  like somebody might accidentally write both on the same column.
 
  +1 for clarifying the docs, though, more or less in the words you
  used above.
 Following that, I have hacked the patch attached to update the docs
 with an additional regression test (actually replaces a test that was
 the same as the one before in copy2).
 
 I am attaching as well a second patch for file_fdw, to allow the use
 of force_null and force_not_null on the same column, to be consistent
 with COPY.
 Regards,

Correction, this is the patch applied, not the earlier version.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-22 Thread Peter Geoghegan
On Tue, Apr 22, 2014 at 2:03 AM, Hannu Krosing ha...@krosing.net wrote:
 What is the depth of b-tree at this percentage ?

Well, this percentage of B-Tree pages that are leaf pages doesn't have
much to do with the depth. The percentage seems very consistent for
each B-Tree, irrespective of the total size of the B-Tree.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] assertion failure 9.3.4

2014-04-22 Thread Alvaro Herrera
Andres Freund wrote:
 On 2014-04-21 19:43:15 -0400, Andrew Dunstan wrote:
  
  On 04/21/2014 02:54 PM, Andres Freund wrote:
  Hi,
  
  I spent the last two hours poking arounds in the environment Andrew
  provided and I was able to reproduce the issue, find a assert to
  reproduce it much faster and find a possible root cause.
  
  
  What's the assert that makes it happen faster? That might help a lot in
  constructing a self-contained test.
 
 Assertion and *preliminary*, *hacky* fix attached.

Thanks for the analysis and patches.  I've been playing with this on my
own a bit, and one thing that I just noticed is that at least for
heap_update I cannot reproduce a problem when the xmax is originally a
multixact, so AFAICT the number of places that need patched aren't as
many.

Some testing later, I think the issue only occurs if we determine that
we don't need to wait for the xid/multi to complete, because otherwise
the wait itself saves us.  (It's easy to cause the problem by adding a
breakpoint in heapam.c:3325, i.e. just before re-acquiring the buffer
lock, and then having transaction A lock for key share, then transaction
B update the tuple which stops at the breakpoint, then transaction A
also update the tuple, and finally release transaction B).

For now I offer a cleaned up version of your patch to add the assertion
that multis don't contain multiple updates.  I considered the idea of
making this #ifdef USE_ASSERT_CHECKING, because it has to walk the
complete array of members; and then have full elogs in MultiXactIdExpand
and MultiXactIdCreate, which are lighter because they can check more
easily.  But on second thoughts I refrained from doing that, because
surely the arrays are not as large anyway, are they.

I think I should push this patch first, so that Andrew and Josh can try
their respective test cases which should start throwing errors, then
push the actual fixes.  Does that sound okay?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services
*** a/src/backend/access/heap/heapam.c
--- b/src/backend/access/heap/heapam.c
***
*** 190,198  static const int MultiXactStatusLock[MaxMultiXactStatus + 1] =
  /* Get the LockTupleMode for a given MultiXactStatus */
  #define TUPLOCK_from_mxstatus(status) \
  			(MultiXactStatusLock[(status)])
- /* Get the is_update bit for a given MultiXactStatus */
- #define ISUPDATE_from_mxstatus(status) \
- 			((status)  MultiXactStatusForUpdate)
  
  /* 
   *		 heap support routines
--- 190,195 
*** a/src/backend/access/transam/multixact.c
--- b/src/backend/access/transam/multixact.c
***
*** 457,463  MultiXactIdExpand(MultiXactId multi, TransactionId xid, MultiXactStatus status)
  	for (i = 0, j = 0; i  nmembers; i++)
  	{
  		if (TransactionIdIsInProgress(members[i].xid) ||
! 			((members[i].status  MultiXactStatusForUpdate) 
  			 TransactionIdDidCommit(members[i].xid)))
  		{
  			newMembers[j].xid = members[i].xid;
--- 457,463 
  	for (i = 0, j = 0; i  nmembers; i++)
  	{
  		if (TransactionIdIsInProgress(members[i].xid) ||
! 			(ISUPDATE_from_mxstatus(members[i].status) 
  			 TransactionIdDidCommit(members[i].xid)))
  		{
  			newMembers[j].xid = members[i].xid;
***
*** 713,718  MultiXactIdCreateFromMembers(int nmembers, MultiXactMember *members)
--- 713,734 
  		return multi;
  	}
  
+ 	/* Verify that there is a single update Xid among the given members. */
+ 	{
+ 		int			i;
+ 		bool		has_update = false;
+ 
+ 		for (i = 0; i  nmembers; i++)
+ 		{
+ 			if (ISUPDATE_from_mxstatus(members[i].status))
+ 			{
+ if (has_update)
+ 	elog(ERROR, new multixact has more than one updating member);
+ has_update = true;
+ 			}
+ 		}
+ 	}
+ 
  	/*
  	 * Assign the MXID and offsets range to use, and make sure there is space
  	 * in the OFFSETs and MEMBERs files.  NB: this routine does
*** a/src/include/access/multixact.h
--- b/src/include/access/multixact.h
***
*** 48,53  typedef enum
--- 48,57 
  
  #define MaxMultiXactStatus MultiXactStatusUpdate
  
+ /* does a status value correspond to a tuple update? */
+ #define ISUPDATE_from_mxstatus(status) \
+ 			((status)  MultiXactStatusForUpdate)
+ 
  
  typedef struct MultiXactMember
  {

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] assertion failure 9.3.4

2014-04-22 Thread Josh Berkus
On 04/22/2014 02:01 PM, Alvaro Herrera wrote:
 I think I should push this patch first, so that Andrew and Josh can try
 their respective test cases which should start throwing errors, then
 push the actual fixes.  Does that sound okay?

Note that I have a limited ability to actually test my failing test case
-- I have to fire up the user's full application test suite to produce
it, and that needs to be scheduled.  However, Andrew is able to rerun
his test case repeatedly.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Regression test errors

2014-04-22 Thread Bruce Momjian
On Sun, Mar  9, 2014 at 09:23:33AM -0300, Martín Marqués wrote:
 OK, noticed how horrible this patch was (thanks for the heads up from
 Jaime Casanova). This happens when trying to fetch changes one made on
 a test copy after a day of lots of work back to a git repository: you
 just make very silly mistakes.
 
 Well, now I got the changes right (tested the patch, because silly
 changes should be tested as well ;)).

Patch applied to head.  Thanks.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] assertion failure 9.3.4

2014-04-22 Thread Andrew Dunstan


On 04/22/2014 05:20 PM, Josh Berkus wrote:

On 04/22/2014 02:01 PM, Alvaro Herrera wrote:

I think I should push this patch first, so that Andrew and Josh can try
their respective test cases which should start throwing errors, then
push the actual fixes.  Does that sound okay?

Note that I have a limited ability to actually test my failing test case
-- I have to fire up the user's full application test suite to produce
it, and that needs to be scheduled.  However, Andrew is able to rerun
his test case repeatedly.



Alvaro has access (as does Andres) to my setup, and instructions on how 
to run the test. I am going to be offline most of the time from tonight 
until next Tuesday.


cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] assertion failure 9.3.4

2014-04-22 Thread Josh Berkus
On 04/22/2014 02:01 PM, Alvaro Herrera wrote:
 Some testing later, I think the issue only occurs if we determine that
 we don't need to wait for the xid/multi to complete, because otherwise
 the wait itself saves us.  (It's easy to cause the problem by adding a
 breakpoint in heapam.c:3325, i.e. just before re-acquiring the buffer
 lock, and then having transaction A lock for key share, then transaction
 B update the tuple which stops at the breakpoint, then transaction A
 also update the tuple, and finally release transaction B).

So, trying to make my synthetic test work:

In order to encounter this issue, I'd need to have two concurrent
processes update the child records of the same parent record?  That is:

A --- B1
  \--- B2

... and the issue should only happen if I update both B1 and B2
concurrently in separate sessions?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] assertion failure 9.3.4

2014-04-22 Thread Andres Freund
On 2014-04-22 17:36:42 -0400, Andrew Dunstan wrote:
 
 On 04/22/2014 05:20 PM, Josh Berkus wrote:
 On 04/22/2014 02:01 PM, Alvaro Herrera wrote:
 I think I should push this patch first, so that Andrew and Josh can try
 their respective test cases which should start throwing errors, then
 push the actual fixes.  Does that sound okay?
 Note that I have a limited ability to actually test my failing test case
 -- I have to fire up the user's full application test suite to produce
 it, and that needs to be scheduled.  However, Andrew is able to rerun
 his test case repeatedly.
 
 
 Alvaro has access (as does Andres) to my setup, and instructions on how to
 run the test. I am going to be offline most of the time from tonight until
 next Tuesday.

My preliminary patch seems to have fixed the problem in your setup...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] assertion failure 9.3.4

2014-04-22 Thread Andres Freund
On 2014-04-22 14:40:46 -0700, Josh Berkus wrote:
 On 04/22/2014 02:01 PM, Alvaro Herrera wrote:
  Some testing later, I think the issue only occurs if we determine that
  we don't need to wait for the xid/multi to complete, because otherwise
  the wait itself saves us.  (It's easy to cause the problem by adding a
  breakpoint in heapam.c:3325, i.e. just before re-acquiring the buffer
  lock, and then having transaction A lock for key share, then transaction
  B update the tuple which stops at the breakpoint, then transaction A
  also update the tuple, and finally release transaction B).
 
 So, trying to make my synthetic test work:
 
 In order to encounter this issue, I'd need to have two concurrent
 processes update the child records of the same parent record?  That is:
 
 A --- B1
   \--- B2
 
 ... and the issue should only happen if I update both B1 and B2
 concurrently in separate sessions?

I don't think that'll trigger it. You need rows that are first key share
locked and then updated by the locking transaction. Under
concurrency. And the timewindow really is rather small..

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] assertion failure 9.3.4

2014-04-22 Thread Andres Freund
On 2014-04-22 18:01:40 -0300, Alvaro Herrera wrote:
 Thanks for the analysis and patches.  I've been playing with this on my
 own a bit, and one thing that I just noticed is that at least for
 heap_update I cannot reproduce a problem when the xmax is originally a
 multixact, so AFAICT the number of places that need patched aren't as
 many.

I am quite uncomfortable with that assumption. I don't immediately see a
problem for some of the cases, but leaving them in a weaker state than
9.2 makes me uncomfortable.

 For now I offer a cleaned up version of your patch to add the assertion
 that multis don't contain multiple updates.  I considered the idea of
 making this #ifdef USE_ASSERT_CHECKING, because it has to walk the
 complete array of members; and then have full elogs in MultiXactIdExpand
 and MultiXactIdCreate, which are lighter because they can check more
 easily.  But on second thoughts I refrained from doing that, because
 surely the arrays are not as large anyway, are they.

Yea, I think it's fine to do it where it's in the proposed patch.

 I think I should push this patch first, so that Andrew and Josh can try
 their respective test cases which should start throwing errors, then
 push the actual fixes.  Does that sound okay?

+1

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] assertion failure 9.3.4

2014-04-22 Thread Josh Berkus

 In order to encounter this issue, I'd need to have two concurrent
 processes update the child records of the same parent record?  That is:

 A --- B1
   \--- B2

 ... and the issue should only happen if I update both B1 and B2
 concurrently in separate sessions?
 
 I don't think that'll trigger it. You need rows that are first key share
 locked and then updated by the locking transaction. Under
 concurrency. And the timewindow really is rather small..

Well, currently I have a test which locks A and B1, then updates B1
(twice, actually), and then updates A.  However, since there's a lock on
A, there's no concurrent updating of B1 and B2. This is based on the
behavior of the queue where I originally saw the problem, but it doesn't
reproduce the bug.

I'm thinking I need to just lock B1, update B1, then A, while allowing a
concurrent session to update B2 and and A.  No?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] assertion failure 9.3.4

2014-04-22 Thread Andres Freund
On 2014-04-22 14:49:00 -0700, Josh Berkus wrote:
 
  In order to encounter this issue, I'd need to have two concurrent
  processes update the child records of the same parent record?  That is:
 
  A --- B1
\--- B2
 
  ... and the issue should only happen if I update both B1 and B2
  concurrently in separate sessions?
  
  I don't think that'll trigger it. You need rows that are first key share
  locked and then updated by the locking transaction. Under
  concurrency. And the timewindow really is rather small..
 
 Well, currently I have a test which locks A and B1, then updates B1
 (twice, actually), and then updates A.  However, since there's a lock on
 A, there's no concurrent updating of B1 and B2. This is based on the
 behavior of the queue where I originally saw the problem, but it doesn't
 reproduce the bug.
 
 I'm thinking I need to just lock B1, update B1, then A, while allowing a
 concurrent session to update B2 and and A.  No?

I don't think this gets any easier to reproduce by introducing more than
one relation. Have one session acquire key share locks and then update,
and another one just doing updates.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Patch FORCE_NULL option for copy COPY in CSV mode

2014-04-22 Thread Michael Paquier
On Wed, Apr 23, 2014 at 5:07 AM, Bruce Momjian br...@momjian.us wrote:

 On Fri, Mar  7, 2014 at 05:08:54PM +0900, Michael Paquier wrote:
  On Thu, Mar 6, 2014 at 12:09 AM, Tom Lane t...@sss.pgh.pa.us wrote:
   Andrew Dunstan and...@dunslane.net writes:
   On 03/05/2014 09:11 AM, Michael Paquier wrote:
   After testing this feature, I noticed that FORCE_NULL and
   FORCE_NOT_NULL can both be specified with COPY on the same column.
  
   Strictly they are not actually contradictory, since FORCE NULL relates
   to quoted null strings and FORCE NOT NULL relates to unquoted null
   strings. Arguably the docs are slightly loose on this point. Still,
   applying both FORCE NULL and FORCE NOT NULL to the same column would
 be
   rather perverse, since it would result in a quoted null string
 becoming
   null and an unquoted null string becoming not null.
  
   Given the remarkable lack of standardization of CSV output, who's
   to say that there might not be data sources out there for which this
   is the desired behavior?  It's weird, I agree, but I think throwing
   an error for the combination is not going to be helpful.  It's not
   like somebody might accidentally write both on the same column.
  
   +1 for clarifying the docs, though, more or less in the words you
   used above.
  Following that, I have hacked the patch attached to update the docs
  with an additional regression test (actually replaces a test that was
  the same as the one before in copy2).
 
  I am attaching as well a second patch for file_fdw, to allow the use
  of force_null and force_not_null on the same column, to be consistent
  with COPY.
  Regards,

 Correction, this is the patch applied, not the earlier version.

Thanks for taking the time to look at that.
-- 
Michael


Re: [HACKERS] Patch for CREATE RULE sgml -- Was in: [DOCS]

2014-04-22 Thread Bruce Momjian
On Mon, Mar 24, 2014 at 09:51:07PM +0900, Fujii Masao wrote:
 On Sat, Mar 22, 2014 at 12:56 AM, Emanuel Calvo
 emanuel.ca...@2ndquadrant.com wrote:
  -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA512
 
 
 
  Hi guys,
 
  I realized that the output of the CREATE RULE has not a detailed
  output for the events parameter.
 
  But the question here is that I'm not sure which format follow:
 
  { INSERT | UPDATE | DELETE | SELECT}
 
  or
 
  INSERT
  UPDATE
  DELETE
  SELECT
  - --
 
 
  I attach a patch for each one.
 
 Though I'm not sure the right policy of the format in synopsis, ISTM that
 the following format is suitable in this case, i.e., if the value list
 is very simple.
 Patch attached.
 
 SELECT | INSERT | UPDATE | DELETE

Patch applied.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-22 Thread Alfred Perlstein


On 4/22/14, 8:26 AM, Andrew Dunstan wrote:


On 04/22/2014 01:36 AM, Joshua D. Drake wrote:


On 04/21/2014 06:19 PM, Andrew Dunstan wrote:



If we never start we'll never get there.

I can think of several organizations that might be approached to donate
hardware.


Like .Org?

We have a hardware farm, a rack full of hardware and spindles. It 
isn't the most current but it is there.






I'm going away tomorrow for a few days RR. when I'm back next week I 
will set up a demo client running this module. If you can have a 
machine prepped for this purpose by then so much the better, otherwise 
I will have to drag out a box I recently rescued and have been waiting 
for something to use it with. It's more important that it's stable 
(i.e. nothing else running on it) than that it's very powerful. It 
could be running Ubuntu or some Redhattish variant or, yes, even FreeBSD.


cheers

andrew




Hey folks, I just spoke with our director of netops Tom Sparks here at 
Norse and we have a vested interest in Postgresql.  We can throw 
together a cluster of 4 machines with specs approximately in the range 
of dual quad core westmere with ~64GB of ram running FreeBSD 10 or 11.  
We can also do an Ubungu install as well or other Linux distro.  Please 
let me know if that this would be a something that the project could 
make use of please.


We also have colo space and power, etc.  So this would be the whole 
deal.  The cluster would be up for as long as needed.


Are the machine specs sufficient?  Any other things we should look for?

CC'd Tom on this email.

-Alfred


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Missing pfree in logical_heap_rewrite_flush_mappings()

2014-04-22 Thread Bruce Momjian
On Wed, Mar 26, 2014 at 06:29:38PM +0200, Ants Aasma wrote:
 It seems to me that when flushing logical mappings to disk, each
 mapping file leaks the buffer used to pass the mappings to XLogInsert.
 Also, it seems consistent to allocate that buffer in the RewriteState
 memory context. Patch attached.

Patch applied.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] Adding EXTRA_REGRESS_OPTS to all pg_regress invocations

2014-04-22 Thread Bruce Momjian
On Thu, Mar 27, 2014 at 06:03:24PM +0100, Christoph Berg wrote:
 Re: Bruce Momjian 2013-12-04 20131204151533.gb17...@momjian.us
  On Mon, May  6, 2013 at 11:51:47PM -0700, Christoph Berg wrote:
   make check supports EXTRA_REGRESS_OPTS to pass extra options to
   pg_regress, but all the other places where pg_regress is used do not
   allow this. The attached patch adds EXTRA_REGRESS_OPTS to
   Makefile.global.in (for contrib modules) and two more special
   Makefiles (isolation and pg_upgrade).
   
   The use case here is that Debian needs to be able to redirect the unix
   socket directory used to /tmp, because /var/run/postgresql isn't
   writable for the buildd user. The matching part for this inside
   pg_regress is still in discussion here, but the addition of
   EXTRA_REGRESS_OPTS is an independent step that is also useful for
   others, so I'd like to propose it for inclusion.
  
  Thanks, patch applied.  This will appear in PG 9.4.  I suppose we could
  backpatch this but I would need community feedback on that.
 
 Thanks for pushing this. In the meantime, a new bit has appeared:
 The new contrib/test_decoding checks make use of the
 pg_isolation_regress_check macros (which the isolation test itself
 doesn't). These macros also need EXTRA_REGRESS_OPTS, on top of
 86ef4796f5120c55d1a48cfab52e51df8ed271b5:

Applied.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MultiXactId error after upgrade to 9.3.4

2014-04-22 Thread Bruce Momjian
On Mon, Mar 31, 2014 at 09:36:03AM -0400, Stephen Frost wrote:
 Andres,
 
 * Andres Freund (and...@2ndquadrant.com) wrote:
  Without having looked at the code, IIRC this looks like some place
  misses passing allow_old=true where it's actually required. Any chance
  you can get a backtrace for the error message? I know you said somewhere
  below that you'd worked around the problem, but maybe you have a copy of
  the database somewhere?

   Looks like your idea that is has to do w/ freezeing is accurate...

Where are we on this?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)

2014-04-22 Thread Bruce Momjian
On Wed, Apr  9, 2014 at 02:22:54PM -0400, Greg Stark wrote:
 On Wed, Apr 9, 2014 at 11:24 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  Maybe we should make *neither* of these the default opclass, and give
  *neither* the name json_ops.
 
  There's definitely something to be said for that.  Default opclasses are
  sensible when there's basically only one behavior that's interesting for
  most people.  We can already see that that's not going to be the case
  for jsonb indexes, at least not with the currently available alternatives.
 
  Not having a default would force users to make decisions explicitly.
  Is that what we want?
 
 I don't like the idea of having no default opclass. I think there's a
 huge usability gain in being able to just create an index on a
 column and have it do something reasonable for most use cases.
 
 I can get behind the idea of having separate index opclasses for paths
 and path-value pairs but I suspect the default should just be to index
 both in the same index. If we can have one default index opclass that
 supports containment and existence and then other opclasses that are
 smaller but only support a subset of the operators that would seem
 like the best compromise.
 
 I'm a bit confused by Heikki's list though. I would expect path and
 path-value pair to be the only useful ones. I'm not clear what an
 index on keys or key-value would be -- it would index just the
 top-level keys and values without recursing?

Where are we on the default JSONB opclass change?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [DOCS] [doc] EXPLAIN CREATE MATERIALIZED VIEW AS?

2014-04-22 Thread Bruce Momjian
On Wed, Apr 16, 2014 at 12:35:26PM +0900, Amit Langote wrote:
 Hi,
 
 Attached adds CREATE MATERIALIZED VIEW AS to the list of statements
 that can be EXPLAINed.

Patch applied.  Thanks.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)

2014-04-22 Thread Peter Geoghegan
On Tue, Apr 22, 2014 at 3:32 PM, Bruce Momjian br...@momjian.us wrote:
 Where are we on the default JSONB opclass change?

FWIW, I still don't have any strong opinion here. I defer to others on
this question.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-22 Thread Mark Wong
On Tue, Apr 22, 2014 at 10:06 AM, Joshua D. Drake j...@commandprompt.comwrote:


 On 04/22/2014 08:26 AM, Andrew Dunstan wrote:

  I'm going away tomorrow for a few days RR. when I'm back next week I
 will set up a demo client running this module. If you can have a machine
 prepped for this purpose by then so much the better, otherwise I will
 have to drag out a box I recently rescued and have been waiting for
 something to use it with. It's more important that it's stable (i.e.
 nothing else running on it) than that it's very powerful. It could be
 running Ubuntu or some Redhattish variant or, yes, even FreeBSD.


 This is best handled by Mark. Mark can you help Andrew with this? I assume
 we would use the DL385 with the MS70?


Yeah, I can help.  But let me know if Alfred's offer is preferred.

Regards,
Mark


Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-22 Thread Andrew Dunstan


On 04/22/2014 06:43 PM, Mark Wong wrote:
On Tue, Apr 22, 2014 at 10:06 AM, Joshua D. Drake 
j...@commandprompt.com mailto:j...@commandprompt.com wrote:



On 04/22/2014 08:26 AM, Andrew Dunstan wrote:

I'm going away tomorrow for a few days RR. when I'm back next
week I
will set up a demo client running this module. If you can have
a machine
prepped for this purpose by then so much the better, otherwise
I will
have to drag out a box I recently rescued and have been
waiting for
something to use it with. It's more important that it's stable
(i.e.
nothing else running on it) than that it's very powerful. It
could be

running Ubuntu or some Redhattish variant or, yes, even FreeBSD.


This is best handled by Mark. Mark can you help Andrew with this?
I assume we would use the DL385 with the MS70?


Yeah, I can help.  But let me know if Alfred's offer is preferred.



I don't think they are mutually exclusive, but I'd rather start off with 
one machine. I would find it easiest if it were on something like 
CentOS6.5.


When we have that running and reporting like we want it we can add a 
FreeBSD server.


The idea is that these machines would be available for a long time, 
ideally quite a few years. We want to have them with a stable time 
series of performance data so that when something disturbs the 
performance it sticks out like a sore thumb.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] assertion failure 9.3.4

2014-04-22 Thread Alvaro Herrera
Josh Berkus wrote:
 
  In order to encounter this issue, I'd need to have two concurrent
  processes update the child records of the same parent record?  That is:
 
  A --- B1
\--- B2
 
  ... and the issue should only happen if I update both B1 and B2
  concurrently in separate sessions?
  
  I don't think that'll trigger it. You need rows that are first key share
  locked and then updated by the locking transaction. Under
  concurrency. And the timewindow really is rather small..
 
 Well, currently I have a test which locks A and B1, then updates B1
 (twice, actually), and then updates A.  However, since there's a lock on
 A, there's no concurrent updating of B1 and B2. This is based on the
 behavior of the queue where I originally saw the problem, but it doesn't
 reproduce the bug.

If you want to make it easier to reproduce, you need to insert some
pg_usleep() calls in carefully selected spots.  As Andres says, the
window is small normally.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] assertion failure 9.3.4

2014-04-22 Thread Alvaro Herrera
Andres Freund wrote:
 On 2014-04-22 18:01:40 -0300, Alvaro Herrera wrote:
  Thanks for the analysis and patches.  I've been playing with this on my
  own a bit, and one thing that I just noticed is that at least for
  heap_update I cannot reproduce a problem when the xmax is originally a
  multixact, so AFAICT the number of places that need patched aren't as
  many.
 
 I am quite uncomfortable with that assumption. I don't immediately see a
 problem for some of the cases, but leaving them in a weaker state than
 9.2 makes me uncomfortable.

That's true too.

I'm thinking about the comparison of full infomask as you propose
instead of just the bits that we actually care about.   I think the only
thing that could cause a spurious failure (causing an extra execution of
the HeapTupleSatisfiesUpdate call and the stuff below) is somebody
setting HEAP_XMIN_COMMITTED concurrently; but that seems infrequent
enough that it should pretty harmless.  However, should we worry about
possible future infomask bit changes that could negatively affect this
behavior?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] assertion failure 9.3.4

2014-04-22 Thread Josh Berkus
On 04/22/2014 05:07 PM, Alvaro Herrera wrote:
 If you want to make it easier to reproduce, you need to insert some
 pg_usleep() calls in carefully selected spots.  As Andres says, the
 window is small normally.

Yeah, but the whole point of this is that having
pg_stat-statements/auto_explain loaded should make the window much
bigger.  In the cases where we're hitting it, we're hitting it with a
fair degree of predicability.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] What use case is make_tuple_indirect() supposed to illustrate?

2014-04-22 Thread Tom Lane
Pursuant to
http://www.postgresql.org/message-id/29007.1396038...@sss.pgh.pa.us
I've been working on a patch to prevent external toast pointers from
appearing in composite Datums.  I noticed that this patch completely
breaks the make_tuple_indirect() test case added by commit 36820250.
The regression test doesn't fail (meaning the test case fails to
actually prove that any indirection is happening), but it certainly
isn't doing what's intended, because the indirect pointers get
flattened out of the tuple returned by make_tuple_indirect() before
control ever leaves the function.

Even in the code as it stands in HEAD, such indirect pointers would
get flattened out of other container types such as arrays and ranges.

And for that matter, it's a bit silly to be testing make_tuple_indirect
in a BEFORE INSERT/UPDATE trigger, because even if the tuple gets out
of the trigger without being flattened, it will certainly get flattened
mere nanoseconds later before it gets written out to disk.  (If it did
not, the test case would fail altogether, since the indirect values
in memory only survive for the length of the current transaction.)

So I'm wondering exactly what use-case this test is supposed to represent.

Or is the whole thing just a toy anyway?  Because the more I look at that
patch, the less it looks like it could do anything useful, short of adding
a ton of infrastructure that's not there now.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] aggregate returning anyarray and 'cannot determine result data type'

2014-04-22 Thread Tom Lane
[ redirecting to -hackers ]

Tomas Vondra t...@fuzzy.cz writes:
 So my plan was to do something like this:

   sample_append(internal, anyelement, int) - internal
   sample_final(internal) - anyarray

   CREATE AGGREGATE sample_agg(anyelement, int) (
   SFUNC = sample_append,
   STYPE = internal,
   FINALFUNC = sample_final
   );

 However this leads to
   ERROR:  cannot determine result data type
   DETAIL:  A function returning a polymorphic type must have at least
one polymorphic argument

 because 'sample_final' produces anyarray but has no polymorphic
 argument.

Yeah, this is a problem with trying to use internal stype for polymorphic
aggregates.

The same problem came up in connection with the ordered set aggregates
that were added recently, and that patch implemented an interesting
workaround: the final function for an OSA gets additional dummy arguments
of the same type as the aggregate inputs.  They are always passed as NULLs
at runtime, and have no real value except if the aggregate is polymorphic
--- but when it is, they provide a way to resolve the result type of a
polymorphic final function, even if the state type is internal or
otherwise non-polymorphic.

I thought at the time that maybe we should offer this feature for regular
aggregates as well as ordered-set ones, but didn't do anything about
it because there hadn't been demand.  If we did have it, you could solve
this problem with

  sample_append(internal, anyelement, int) - internal
  sample_final(internal, anyelement, int) - anyarray

  CREATE AGGREGATE sample_agg(anyelement, int) (
  SFUNC = sample_append,
  STYPE = internal,
  FINALFUNC = sample_final
  );

where sample_final would have to be declared non-strict (since it'd always
be getting some NULL arguments), but that's a small price to pay.

I think it'd be a pretty small adjustment to the already-committed
code to allow this to happen.  Basically we'd just have to decouple
the extra-arguments-to-finalfn behavior from ordered-set aggregates.

One potential issue though is that if sample_final existed in both
signatures it wouldn't be very clear which one got selected for the
aggregate.  Perhaps the best fix would be to invent a different
CREATE AGGREGATE keyword for finalfns with extra arguments?  If
so, that's something we ought to do *now*, not in 9.5, because
it'll be too late to redefine how to create OSAs once 9.4 ships.

Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)

2014-04-22 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 Where are we on the default JSONB opclass change?

Not sure.  I'm for changing it, I think, but it wasn't at all clear
that we had consensus on that.  We did not have a proposed new name
for the opclass either ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-22 Thread Mark Kirkwood

On 23/04/14 00:19, Andres Freund wrote:

Hi,

Attached you can find a short (compile tested only ) patch implementing
a 'shared_memory_type' GUC, akin to 'dynamic_shared_memory_type'. Will
only apply to 9.4, not 9.3, but it should be easy to convert for it.



Have just tried this out (on Ubuntu 14.04 rather than Freebsd, as it is 
what I happened to be running), certainly works for me (big shared 
memory segment when I set it to 'sysv', only a tiny one when I use 'mmap').


The regression tests pass in both cases.

regards

Mark



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_upgrade and epoch

2014-04-22 Thread Sergey Burladyan
Hi All!

Current pg_upgrade copy XID into new cluster, but not it epoch. Why?

Without epoch from old cluster txid_current() in upgraded database return
lower value than before upgrade. This break, for example, PgQ and it must
be fixed by hand after upgrade with pg_resetxlog.

PS: see
http://lists.pgfoundry.org/pipermail/skytools-users/2014-April/001812.html

-- 
Sergey Burladyan


Re: [HACKERS] Missing pfree in logical_heap_rewrite_flush_mappings()

2014-04-22 Thread Bruce Momjian
On Tue, Apr 22, 2014 at 06:05:53PM -0400, Bruce Momjian wrote:
 On Wed, Mar 26, 2014 at 06:29:38PM +0200, Ants Aasma wrote:
  It seems to me that when flushing logical mappings to disk, each
  mapping file leaks the buffer used to pass the mappings to XLogInsert.
  Also, it seems consistent to allocate that buffer in the RewriteState
  memory context. Patch attached.
 
 Patch applied.

I had to revert this patch.  It causes a failure in the
/contrib/test_decoding regression test.


-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Missing pfree in logical_heap_rewrite_flush_mappings()

2014-04-22 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 I had to revert this patch.  It causes a failure in the
 /contrib/test_decoding regression test.

On closer inspection, it was simply pfree'ing the wrong pointer.

I fixed that and also undid the allocation in a different memory
context, which didn't seem to be a particularly good idea, unless
you've got a specific reason why CurrentMemoryContext would be the
wrong place for a transient allocation.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade and epoch

2014-04-22 Thread Sergey Konoplev
On Tue, Apr 22, 2014 at 6:33 PM, Sergey Burladyan eshkin...@gmail.com wrote:
 Current pg_upgrade copy XID into new cluster, but not it epoch. Why?

 Without epoch from old cluster txid_current() in upgraded database return
 lower value than before upgrade. This break, for example, PgQ and it must
 be fixed by hand after upgrade with pg_resetxlog.

 PS: see
 http://lists.pgfoundry.org/pipermail/skytools-users/2014-April/001812.html

BTW, I didn't manage to make a test case yet. Recently, when I was
migrating several servers to skytools3 and upgrading from 9.0 to 9.2,
I noticed that epoch was copied, timeline id was 0 after upgrade, but
skytools3 sometimes still didn't like it. Also note sometimes here,
so in some cases everything was okay, but in some it wasn't. I still
can't explain this, but incrementing timeline id always helped.

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray...@gmail.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade and epoch

2014-04-22 Thread Sergey Burladyan
On Wed, Apr 23, 2014 at 6:38 AM, Sergey Konoplev gray...@gmail.com wrote:


 BTW, I didn't manage to make a test case yet. Recently, when I was
 migrating several servers to skytools3 and upgrading from 9.0 to 9.2,
 I noticed that epoch was copied, timeline id was 0 after upgrade, but

...

This is strange, if I not mistaken XID copied by  copy_clog_xlog_xid(void):
http://doxygen.postgresql.org/pg__upgrade_8c_source.html#l00398
and there is no epoch (-e XIDEPOCH) in pg_resetxlog call args

I test it like this:
~/inst/pg9.2/bin/initdb d9.2
/usr/lib/postgresql/9.3/bin/initdb d9.3
~/inst/pg9.2/bin/postgres -D d9.2 -p 5433
^C
~/inst/pg9.2/bin/pg_resetxlog -e 8 d9.2/
~/inst/pg9.2/bin/postgres -D d9.2 -p 5433
psql -At -h localhost -p 5433 -c 'select txid_current()' postgres
34359739064
^C
/usr/lib/postgresql/9.3/bin/pg_upgrade -d d9.2 -D d9.3 -b ~/inst/pg9.2/bin
-B /usr/lib/postgresql/9.3/bin -k
/usr/lib/postgresql/9.3/bin/postgres -D d9.3 -p 5433 -k`pwd`/s
psql -At -h localhost -p 5433 -c 'select txid_current()' postgres
756

34359739064 switched to 756 after upgrade

-- 
Sergey Burladyan


Re: [HACKERS] pg_upgrade and epoch

2014-04-22 Thread Sergey Konoplev
On Tue, Apr 22, 2014 at 8:08 PM, Sergey Burladyan eshkin...@gmail.com wrote:
 On Wed, Apr 23, 2014 at 6:38 AM, Sergey Konoplev gray...@gmail.com wrote:
 BTW, I didn't manage to make a test case yet. Recently, when I was
 migrating several servers to skytools3 and upgrading from 9.0 to 9.2,
 I noticed that epoch was copied, timeline id was 0 after upgrade, but
 ...

Sorry, just noticed a typo: * timeline id = next xid

 This is strange, if I not mistaken XID copied by  copy_clog_xlog_xid(void):
 http://doxygen.postgresql.org/pg__upgrade_8c_source.html#l00398
 and there is no epoch (-e XIDEPOCH) in pg_resetxlog call args

I can only guess here.

+ Bruce Momjian

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray...@gmail.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD

2014-04-22 Thread YAMAMOTO Takashi
 - NetBSD: crashes under load; this could have been fixed but when I ran the
   benchmarks in 2012 none of the developers seemed to care.

do you mean this?
https://mail-index.netbsd.org/tech-kern/2012/08/29/msg013918.html

YAMAMOTO Takashi


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How can we make beta testing better?

2014-04-22 Thread Jehan-Guillaume de Rorthais
On Thu, 17 Apr 2014 16:42:21 -0700
Josh Berkus j...@agliodbs.com wrote:

 On 04/15/2014 09:53 PM, Rod Taylor wrote:
  A documented beta test process/toolset which does the following would help:
  1) Enables full query logging
  2) Creates a replica of a production DB, record $TIME when it stops.
  3) Allow user to make changes (upgrade to 9.4, change hardware, change
  kernel settings, ...)
  4) Plays queries from the CSV logs starting from $TIME mimicking actual
  timing and transaction boundaries
  
  If Pg can make it easy to duplicate activities currently going on in
  production inside another environment, I would be pleased to fire a couple
  billion queries through it over the next few weeks.
  
  #4 should include reporting useful to the project, such as a sampling of
  queries which performed significantly worse and a few relative performance
  stats for overall execution time.
 
 So we have some software we've been procrastinating on OSS'ing, which does:
 
 1) Takes full query CSV logs from a running postgres instance
 2) Runs them against a target instance in parallel
 3) Records response times for all queries
 
 tsung and pgreplay also do this, but have some limitations which make
 them impractical for a general set of logs to replay.

I've been working on another tool able to replay scenario recorded directly
from a network dump (see [pgshark]). It works, can be totally transparent from
the application point of view, the tcpdump can run anywhere, and **ALL** the
real traffic can be replayed...but it needs some more work for reporting and
handling parallel sessions. The drawback of using libpcap is that you can lost
packets while capturing and a very large capture buffer can not keep you safe
for hours of high-speed scenario. So it might require multiple capture and
adjusting the buffer size to capture 100% of the traffic on the required period.

I tried to quickly write a simple proxy using Perl POE to capture ALL the
traffic safely. My POC was doing nothing but forwarding packets and IIRC a 30s
stress test with 10 or 20 sessions using pgbench showed a drop of ~60% of
performances. But it was a very quick POC with a mono-processus/mono-thread
POC.

Maybe another path would be to be able to generate some this traffic dump
from PostgreSQL (which only have the application level to deal with) itself in a
format we can feed to pgbench. 

 What it would need is:
 
 A) scripting around coordinated backups
 B) Scripting for single-command runs, including changing pg.conf to
 record data.

Changing the pg.conf is pretty easy with alter system now. But I'm sure we all
have some scripts out there doing this (at least I do)

 C) tools to *analyze* the output data, including error messages.

That's what I lack in pgshark so far.

[pgshark] https://github.com/dalibo/pgshark

Cheers,
-- 
Jehan-Guillaume de Rorthais
Dalibo
http://www.dalibo.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers