Re: [HACKERS] The question about the type numeric

2014-04-17 Thread sure.postgres
I am so sorry for that.
This is my fault.
My mail client does not receive any e-mail.
So I tried to re-send the e-mail again.

My problem has been resolved.
Thank you for your reply.


2014-04-16



wangshuo
HighGo Software Co.,Ltd.
Address: A203 Block D QILU Soft Park, High-Tech Zone, Lixia district, Jinan 
Shandong, China(Head Office)
Tel:+86-0531-55701530
Fax:+86-0531-55701544
Website:www.highgo.com
Mobile:18766416137



发件人:David G Johnston david.g.johns...@gmail.com
发送时间:2014-04-16 10:23
主题:Re: [HACKERS] The question about the type numeric
收件人:pgsql-hackerspgsql-hackers@postgresql.org
抄送:

sure.postgres wrote 
 Hi hackers, 
  
 I am learning about numeric . 
 The comment of NumericShort format is: 
  * In the NumericShort format, the remaining 14 bits of the header word 
  * (n_short.n_header) are allocated as follows: 1 for sign (positive or 
  * negative), 6 for dynamic scale, and 7 for weight.  In practice, most 
  * commonly-encountered values can be represented this way. 
  
 So the Max of the NumericShort format should be up to 508 digits before 
 the decimal point. 
 So the sign of the number 
 12345678901234567890123456789012345678901234567890 
 12345678901234567890123456789012345678901234567890123456789012345678901234567890
  
 12345678901234567890123456789012345678901234567890123456789012345678901234567890
  
 12345678901234567890123456789012345678901234567  
 should be 0x807F. 
 The number is 257 digits before the decimal point. 
 But the sign is 0. 
 So is there anything wrong? 

[1000  00][0][00111 0][001] 

I appreciate that you got no responses on the original e-mail but if you are 
going to re-raise the question at least have the courtesy to respond to your 
original thread and not go and start a new one.  And maybe trying rephrasing 
the question since most likely your original question was not worded in such 
a way to garner a response. 

I may have this totally wrong but I don't see why the sign of your number 
should be anything but zero since that is, I presume, the value of the 
specific bit for a positive number - which yours is. 

So, in short, nothing seems to be wrong.  If you think something is wrong 
you should probably state what that is explicitly and ask someone to explain 
what is happening. 

I would have said all this when I saw the first e-mail but I wasn't (and 
still am not) totally clear on what you are asking and was hoping someone 
more familiar could make better sense of it. 

David J. 




-- 
View this message in context: 
http://postgresql.1045698.n5.nabble.com/The-question-about-the-type-numeric-tp5800173p5800174.html
 
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. 


--  
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) 
To make changes to your subscription: 
http://www.postgresql.org/mailpref/pgsql-hackers 

Re: [HACKERS] [COMMITTERS] pgsql: Include planning time in EXPLAIN ANALYZE output.

2014-04-17 Thread Oleg Bartunov
I found a bit confusing, when planning time is greater total time, so
+1 for execution time.

On Thu, Apr 17, 2014 at 3:35 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Bruce Momjian br...@momjian.us writes:
 Where are we on this?  I still see:

   test= EXPLAIN ANALYZE SELECT 1;
QUERY PLAN
   
 
Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 
 rows=1 loops=1)
Planning time: 0.009 ms
 --Total runtime: 0.009 ms
   (3 rows)

 There seemed to be a clear majority of votes in favor of changing it to
 say Execution time.  Robert was arguing for no change, but I don't think
 that's tenable in view of the fact that the addition of the Planning
 time line is already a change, and one that makes the old wording
 confusing.

 I'll go change it.

 regards, tom lane


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about optimising (Postgres_)FDW

2014-04-17 Thread Etsuro Fujita

(2014/04/16 22:16), Hannu Krosing wrote:

On 04/16/2014 01:35 PM, Etsuro Fujita wrote:



Maybe I'm missing something, but I think that you can do what I think
you'd like to do by the following procedure:

No, what I'd like PostgreSQL to do is to

1. select the id+set from local table
2. select the rows from remote table with WHERE ID IN (set selected in
step 1)
3. then join the original set to selected set, with any suitable join
strategy

The things I do not want are

A. selecting all rows from remote table
 (this is what your examples below do)

or

B. selecting rows from remote table by single selects using ID = $
 (this is something that I managed to do by some tweaking of costs)

as A will be always slow if there are millions of rows in remote table
and B is slow(ish) when the idset is over a few hundred ids

I hope this is a bit better explanation than I provided before .


Ah, I understand what you'd like to do.  Thank you for the explanation.


P.S. I am not sure if this is a limitation of postgres_fdw or postgres
itself


If I understand correctly, neither the current postgres_fdw planning 
function nor the current postgres planner itself support such a plan. 
For that I think we would probably need to implement a distributed query 
processing technique such as semijoin or bloomjoin in those modules.


Thanks,

P.S.

or, that as Tom mentioned, by disabling the use_remote_estimate function:


I misunderstood the meaning of what Tom pointed out.  Sorry for that.

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] New windows compiler warning from 585bca39

2014-04-17 Thread David Rowley
I've attached a tiny patch that fixes a new compiler warning on the windows
build...

Perhaps the #ifndef could be placed in a nicer spot in the patch, but the
attached should at least describe where the problem lies...

(ClCompile target) -
  src\interfaces\libpq\fe-connect.c(3365): warning C4101: 'msgid' :
unreferenced local variable [D:\Postgres\b\libpq.vcxproj]

1 Warning(s)


fe-connect_win32_warning_fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] slow startup due to LWLockAssign() spinlock

2014-04-17 Thread Andres Freund
On 2014-04-16 19:33:52 -0400, Bruce Momjian wrote:
 On Tue, Feb  4, 2014 at 12:58:49AM +0100, Andres Freund wrote:
  On 2014-02-03 11:22:45 -0500, Tom Lane wrote:
   Andres Freund and...@2ndquadrant.com writes:
On larger, multi-socket, machines, startup takes a fair bit of time. As
I was profiling anyway I looked into it and noticed that just about all
of it is spent in LWLockAssign() called by InitBufferPool(). Starting
with shared_buffers=48GB on the server Nate Boley provided, takes about
12 seconds. Nearly all of it spent taking the ShmemLock spinlock.
Simply modifying LWLockAssign() to not take the spinlock when
!IsUnderPostmaster speeds it up to 2 seconds. While certainly not making
LWLockAssign() prettier it seems enough of a speedup to be worthwile
nonetheless.
   
   Hm.  This patch only works if the postmaster itself never assigns any
   LWLocks except during startup.  That's *probably* all right, but it
   seems a bit scary.  Is there any cheap way to make the logic actually
   be what your comment claims, namely Interlocking is not necessary during
   postmaster startup?  I guess we could invent a ShmemInitInProgress global
   flag ...
  
  So, here's a flag implementing things with that flag. I kept your name,
  as it's more in line with ipci.c's naming, but it looks kinda odd
  besides proc_exit_inprogress.
 
 Uh, where are we on this?

I guess it's waiting for the next CF :(.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD

2014-04-17 Thread Andres Freund
On 2014-04-16 19:18:02 -0400, Bruce Momjian wrote:
 On Thu, Feb  6, 2014 at 09:40:32AM +0100, Andres Freund wrote:
  On 2014-02-05 12:36:42 -0500, Robert Haas wrote:
It may well be that your proposal is spot on.  But I'd like to see some
data-structure-by-data-structure measurements, rather than assuming 
that
alignment must be a good thing.
   
I am fine with just aligning BufferDescriptors properly. That has
clearly shown massive improvements.
   
   I thought your previous idea of increasing BUFFERALIGN to 64 bytes had
   a lot to recommend it.
  
  Good.
  
  I wonder if we shouldn't move that bit of logic:
  if (size = BUFSIZ)
  newStart = BUFFERALIGN(newStart);
  out of ShmemAlloc() and instead have a ShmemAllocAligned() and
  ShmemInitStructAligned() that does it. So we can sensibly can control it
  per struct.
  
   But that doesn't mean it doesn't need testing.
  
  I feel the need here, to say that I never said it doesn't need testing
  and never thought it didn't...
 
 Where are we on this?

It needs somebody with time to evaluate possible performance regressions
- I personally won't have time to look into this in detail before pgcon.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Include planning time in EXPLAIN ANALYZE output.

2014-04-17 Thread Andreas Karlsson

On 04/17/2014 01:35 AM, Tom Lane wrote:

I'll go change it.


Thanks for fixing this. The new name Execution time is much clearer.

--
Andreas Karlsson


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: iff - if

2014-04-17 Thread Nicolas Barbier
2014-04-17 Michael Paquier michael.paqu...@gmail.com:

 Is there no equivalent in German? For example in French there is ssi.

gdw (genau dann, wenn)

Nicolas

-- 
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How can we make beta testing better?

2014-04-17 Thread Greg Stark
On Wed, Apr 16, 2014 at 12:53 AM, Rod Taylor rod.tay...@gmail.com wrote:
 4) Plays queries from the CSV logs starting from $TIME mimicking actual
 timing and transaction boundaries

This ^^

But I recall a number of previous attempts including plugins for
general load testing systems, what happened to them?

Honestly if you really want to load test properly though what you
really want to do is deploy a copy of your entire application and feed
it requests simulating user traffic. That results in more accurate
representation and gives you data that's easier to act on.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Buildfarm master-next branch? (was: Dynamic Shared Memory stuff)

2014-04-17 Thread Robert Haas
On Wed, Apr 16, 2014 at 8:24 PM, Craig Ringer cr...@2ndquadrant.com wrote:
 On 04/17/2014 12:08 AM, Robert Haas wrote:
 On Tue, Apr 15, 2014 at 10:46 PM, Amit Kapila amit.kapil...@gmail.com 
 wrote:
 On Wed, Apr 16, 2014 at 3:01 AM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Apr 15, 2014 at 12:33 AM, Amit Kapila amit.kapil...@gmail.com 
 wrote:
 On Mon, Apr 14, 2014 at 10:03 PM, Robert Haas robertmh...@gmail.com 
 wrote:
 For the create case, I'm wondering if we should put the block that
 tests for !hmap *before* the _dosmaperr() and check for EEXIST.  What
 is your opinion?

 Either way is okay, but I think the way you are suggesting is better as it
 will make code consistent with other place (PGSharedMemoryCreate()).

 OK, can you prepare a patch?

 Please find attached patch to address this issue.
 One minor point to note is that now we have to call GetLastError() twice,
 once inside error path and once to check EEXIST, but I think that is okay
 as existing code in PGSharedMemoryCreate() does it that way.

 OK.  I committed this blindly, but I don't have a Windows dev
 environment, so please keep an eye on the Windows buildfarm members
 and provide follow-on patches if any of them get unhappy about this.

 Given that we're doing this a fair bit, is it reasonable to define a
 master-next branch in git and have the buildfarm (or at least the
 Windows members) build that?

 Permit master-next to be rebased and reset.

 That way it's possible to fire stuff off and see what happens on the
 buildfarm without introducing broken commits unnecessarily.

 Thoughts?

In this particular case, I have a lot of confidence that Amit tested
this on his own machine before sending in the patch; and moreover, he
wrote that code in the first place.  So it's no worse than it would
have been if that change had been in the originally committed version,
which I didn't personally test on Windows, either, but which has
nevertheless mostly passed buildfarm testing.  Arguably, if I'm going
to be hacking on platform-dependent things very often, I should get my
own Windows build environment set up so that I can test it myself, but
it hasn't quite been worth it to me thus far, and Amit has proven to
be pretty reliable in terms of getting things right.

In terms of improving the buildfarm infrastructure, the thing I would
most like to have is more frequent runs.  It would be great if pushing
a commit to the master repository triggered an immediate build on
every buildfarm animal so that you could see all of the failures in a
short period of time.  But that would require more resources for the
buildfarm machines, which are provided on a strictly volunteer basis,
so it's hard to see how to arrange that.

But the ability to easily spin up temporary branches for testing would
also be great.  Unfortunately, I suspect that only a minority of the
buildfarm owners would choose to participate, which would make it less
useful, but if we could solve that problem I'd be all in favor of it.
I'm not volunteering to do the work, though.

Honestly, I don't think we have a huge problem here today.  Yeah, the
buildfarm turns pretty colors on a fairly regular basis, but those
issues are also generally fixed very quickly.  With the unfortunate
exception of the seemingly never-ending stream multixact-related bugs,
a user who took a snapshot of our master branch at a randomly selected
point during the 9.4 development cycle would likely have gotten code
reliable enough to be run in production.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Need Multixact Freezing Docs

2014-04-17 Thread Robert Haas
On Wed, Apr 16, 2014 at 4:39 PM, Josh Berkus j...@agliodbs.com wrote:
 Hmm, are you sure it's INT_MAX and not 4244967297?  Heikki reported
 that: http://www.postgresql.org/message-id/52401aea.9000...@vmware.com
 The absolute value is not important; I think that's mostly harmless.  I
 don't think applying age() to a multixact value is meaningful, though;
 that's only good for Xids.

 Yeah, I'm sure:

 josh=# select relname, age(relminmxid) from pg_class;
  relname |age
 -+
  pg_statistic| 2147483647
  pg_type | 2147483647
  random  | 2147483647
  dblink_pkey_results | 2147483647
  pg_toast_17395  | 2147483647

 ...

 So if age() doesn't mean anything, then how are users to know when the
 need to freeze?

Or, in other words, this is another example of xid-freezing
infrastructure that needed to be copied for mxid-freezing and wasn't.
We need an analogue of age() for mxids.  Perhaps just mxid_age()?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor performance improvement in transition to external sort

2014-04-17 Thread Robert Haas
On Wed, Apr 16, 2014 at 7:38 PM, Bruce Momjian br...@momjian.us wrote:
 On Thu, Apr 10, 2014 at 06:03:15PM +0100, Simon Riggs wrote:
 On 6 February 2014 18:21, Jeff Janes jeff.ja...@gmail.com wrote:
  On Tue, Feb 4, 2014 at 2:22 PM, Jeremy Harris j...@wizmail.org wrote:
 
  The attached patch replaces the existing siftup method for heapify with
  a siftdown method. Tested with random integers it does 18% fewer
  compares and takes 10% less time for the heapify, over the work_mem
  range 1024 to 1048576.
 
 
  Thanks for working on this.

 +1

 Your patch isn't linked properly from the CF manager though.

 If you like patches like this then there's a long(er) list of
 optimizations already proposed previously around sorting. It would be
 good to have someone work through them for external sorts. I believe
 Noah is working on parallel internal sort (as an aside).

 There's also an optimization possible for merge joins where we use the
 output of the first sort as an additional filter on the second sort.
 That can help when we're going to join two disjoint tables.

 Where should this be recorded?  TODO?  Commitfest manager?

IIUC, the original patch was withdrawn; any remaining action items
should probably go to TODO.  I'm not sure which specific idea you're
referring to, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Greg Stark
On Wed, Apr 16, 2014 at 12:44 AM, Robert Haas robertmh...@gmail.com wrote:
 This isn't a fundamental property of the usage-count idea; it's an
 artifact of the fact that usage count decreases are tied to eviction
 pressure rather than access pressure.  For example, suppose we made a
 rule that if the total usage counts of all buffers exceed 3 *
 NBuffers, then every time you bump the usage count of a buffer from N
 to N+1, you're required to advance the clock sweep far enough to
 decrease the reference count of a buffer by one.

This sounds like the right way to reason about it.

From what I remember in school the idea with the clock sweep is to set
the usage flags to the maximum whenever the buffer is used and
decrement (actually iirc typically shift right)  it when the clock
sweep goes by. Ie, simulate a LRU where when the buffer is accessed it
jumps to the head of the list and when the clock comes by it moves
gradually down the list.

What you're pointing out is that the clock might not come by very
often resulting everything being at the head of the list. In that case
I'm not clear it really matters what gets evicted though. And the cpu
effort of running the clock n times sounds bad but doing the work
earlier doesn't really change the amount of work being done, it just
amortizes it over more calls.

But if you want to do that it seems to me the way to do it is every
time a buffer is pinned set to the maximum and then run the clock
max_value - previous_value. So the total usage counts of all buffers
remains constant. If that results in contention one way to reduce it
is to do this probabilistically. Run the clock 1% of the time but run
it 100x as much as you would normally.

But I think you've misidentified the problem and what those other
algorithms are trying to solve. The problem is not that Postgres will
pick a bad buffer to evict. If all the buffers have been since the
last time the clock came around then they're all hot anyways and it
doesn't really matter which one we evict. The problem is that we
expend an inordinate amount of work finding the few non-hot buffers.
When you have a really large amount of memory and 99.9% of it is hot
but 0.1% is whatever random non-hot page was needed last then there's
an obvious buffer to evict when you need a new one. But we spend a lot
of work decrementing every hot buffer's usage count 4 times only to
have them immediately incremented again just to find the 1 buffer
where the usage count was 4 or 3. The goal of these algorithms that
divide the buffers into groups is to avoid having to do so much work
to find the colder buffers. Once the hot buffers migrate to the hot
pool we only need to run the clock there when we find we have new hot
pages that we want to promote. All the thrashing in the cold pool can
be more efficient because there's many fewer pages to consider.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] four minor proposals for 9.5

2014-04-17 Thread Pavel Stehule
2014-04-17 7:12 GMT+02:00 Amit Kapila amit.kapil...@gmail.com:

 On Mon, Apr 14, 2014 at 6:27 PM, Robert Haas robertmh...@gmail.com
 wrote:
  I agree.  I don't think the idea of pushing this into the
  log_line_prefix stuff as a one-off is a very good one.  Sure, we could
  wedge it in there, but we've got an existing precedent that everything
  that you can get with log_line_prefix also shows up in the CSV output
  file.  And it's easy to imagine LOTS more counters that somebody might
  want to have.  Time spent planning, time spent executing, time spent
  waiting for disk I/O, time spent returning results to client, and I'm
  sure people will think of many others.  I think this will balloon out
  of control if we don't have a more systematic design for this sort of
  thing.

 Can't we think of some infrastructure similar to what is done for
 log_duration and log_min_duration_statement?
 Current it prints like below:
 LOG:  duration: 343.000 ms  statement: create table t1(c1 int);

 Let us say if user wants to track lock wait time a statement has
 spent, then enable some config parameter (either log_lock_duration
 or some other convenient way)

 LOG:  lock duration: 'x' ms  statement: create table t1(c1 int);


isn't it log_line_prefix analogy?

We can introduce new feature without hard dependency on CSV format

I am thinking so there are clean requests:

simply parseable - vector of numbers is ideal
simply activated, deactivated - maybe list of flags in GUC

Regards

Pavel



 With Regards,
 Amit Kapila.
 EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] BGWorkers, shared memory pointers, and postmaster restart

2014-04-17 Thread Robert Haas
On Wed, Apr 16, 2014 at 8:46 PM, Craig Ringer cr...@2ndquadrant.com wrote:
 On 04/17/2014 12:16 AM, Robert Haas wrote:
 On Wed, Apr 16, 2014 at 7:11 AM, Craig Ringer cr...@2ndquadrant.com wrote:

 - A flag like BGW_UNREGISTER_ON_RESTART;

 I would be OK with this, maybe modulo the name.

 - To always unregister dynamic bgws on postmaster shm clear + restart;

 I don't particularly care for this.  Let's suppose the background
 worker is a long-running daemon, like a PG equivalent of cron.  In
 static-background worker land, the admin has to restart the cluster to
 get this going.  In dynamic-background worker land, he can load it on
 the fly.  But once he gets it up and running, he wants it to stay up
 and running, surviving crashes and everything.  That's a big part of
 the value of having a background worker interface in the first place.

 It'd be the job of the extension that provides the bgworker to make sure
 that it gets relaunched on postmaster restart.

 I tend to agree though. The problem I'm describing only causes issues
 for extensions that launch dynamic bgworkers during extension startup.

 Extensions that launch bgworkers in response to SQL commands and don't
 rely on passing state to the bgworker via shared memory shouldn't have
 to deal with restarting them on postmaster restart.


 - A way to list bgws, inspect their BackgroundWorker structs and obtain
 their handles; or

 This is certainly a good idea.

 - A way to idempotently register a bgw only if it doesn't already exist

 This is probably a good idea, too.

 I think we need _one_ solution for 9.4, and need it soon.

 The simplest is probably to flush all dynamic bgworkers. But I think
 you're probably right - that solves the problem discussed here, but is
 likely to make life harder for other use cases.

 A last-minute API addition for bgworker listing/inspection might not be
 a great idea - it's too late for it to see much testing and analysis and
 it might introduce bigger API problems than it solves.

 Duplicate-free registration might be OK, but there are some questions
 around how we'd handle differing parameters, what should be the
 determinant for uniquenes, whether we should go for idempotency or
 return/raise an error to indicate it already exists, etc. So similar
 issue with doing it at the last minute.

 To me, that says let's add a flag to allow a dynamic bgworker to be
 unregistered on postmaster restart. Seems simple and low risk.

 I'll follow up with a proposed patch, then we can spend some quality
 shed time on the flag name ;-)

I think I can live with that.  However, I tend to think that the best
solution here is really don't put try to pass pointers via the
BackgroundWorker structure, because it doesn't [ expletive ] work.
We've had several instances of that already.  When I added support for
dynamic background workers, I had to add bgw_library_name and
bgw_function_name members to that structure because bgw_main won't
work for extension code unless the library is loaded from
shared_preload_libraries AND either we're not running under
EXEC_BACKEND (i.e. Windows) or the system is kind enough to load the
shared library at the same address in both processes, in which case it
will accidentally fail to fail.  We would have had to give bgw_sighup
and bgw_sigterm the same treatment, but since they weren't really
necessary in the first place we just ripped them out instead.

worker_spi also had a bad case of this disease.  It went to elaborate
lengths to pass a pointer via bgw_main_arg, but the pointer was to
*backend-private memory*, so it was completely broken on EXEC_BACKEND
builds.  I'm not sure whether it actually failed there, or managed to
work just because the background worker backend also ran the _PG_init
hook and managed to accidentally place the same data structure at the
same address. I fixed that as part of introducing the dynamic
background worker facility; now it passes an index instead of a
pointer.

What you're complaining about here is basically another instance of
the same problem.  It's not as bad because the main shared memory
segment never moves or has any memory freed except after a
crash-and-restart cycle, so your suggested plug seems likely to be
adequate.  But it also requires that you can allocate enough space in
shared memory to pass around whatever state you need to carry around,
and space in the main shared memory segment is at a premium; storage
there also can't be freed.  Noah suggested to me a while back that we
might do better to change bgw_main_arg to something like char[64]
rather than Datum, which would allow passing a reasonable-size payload
without having to futz with shared memory.  Then, instead of passing a
pointer, you can pass the name of a shared memory region to look up
and an index into the data structure stored there, or something like
that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list 

Re: [HACKERS] four minor proposals for 9.5

2014-04-17 Thread Tom Lane
Pavel Stehule pavel.steh...@gmail.com writes:
 We can introduce new feature without hard dependency on CSV format

Look, the long and the short of it is that there is not consensus
that this measurement is worth creating a new CSV log column for.
And from that, there is also not consensus that it's worth putting
into log_line_prefix.  Therefore, this proposal is dead.  Please
stop prolonging the argument.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] assertion failure 9.3.4

2014-04-17 Thread Andrew Dunstan


On 04/16/2014 10:28 PM, Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:

On 04/16/2014 07:19 PM, Tom Lane wrote:

Yeah, it would be real nice to see a self-contained test case for this.

Well, that might be hard to put together, but I did try running without
pg_stat_statements and auto_explain loaded and the error did not occur.
Not sure where that gets us in terms of deciding on a culprit.

Could we at least see the exact parameter settings for pg_stat_statements
and auto_explain?  (And any other GUCs with nondefault values?)





Here are all the settings from the run that failed:

   listen_addresses = '*'
   port = 5432
   fsync = on
   synchronous_commit = off
   checkpoint_segments = 128
   checkpoint_completion_target = 0.9
   shared_buffers = 512MB
   max_connections = 300
   work_mem = 128MB
   maintenance_work_mem = 32MB
   effective_cache_size = 16GB
   effective_io_concurrency = 2
   logging_collector = on
   log_destination = 'stderr'
   log_filename = 'postgresql-%a.log'
   log_rotation_size = 0
   log_truncate_on_rotation = on
   log_line_prefix = '%t [%p] %l: '
   log_connections = on
   log_disconnections = on
   log_statement = 'all'
   track_activity_query_size = 10240
   shared_preload_libraries = 'auto_explain,pg_stat_statements'

As you can see, auto_explain's log_min_duration hasn't been set, so it 
shouldn't be doing anything very much, I should think.



cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Robert Haas
On Thu, Apr 17, 2014 at 9:40 AM, Greg Stark st...@mit.edu wrote:
 On Wed, Apr 16, 2014 at 12:44 AM, Robert Haas robertmh...@gmail.com wrote:
 This isn't a fundamental property of the usage-count idea; it's an
 artifact of the fact that usage count decreases are tied to eviction
 pressure rather than access pressure.  For example, suppose we made a
 rule that if the total usage counts of all buffers exceed 3 *
 NBuffers, then every time you bump the usage count of a buffer from N
 to N+1, you're required to advance the clock sweep far enough to
 decrease the reference count of a buffer by one.

 This sounds like the right way to reason about it.

 From what I remember in school the idea with the clock sweep is to set
 the usage flags to the maximum whenever the buffer is used and
 decrement (actually iirc typically shift right)  it when the clock
 sweep goes by. Ie, simulate a LRU where when the buffer is accessed it
 jumps to the head of the list and when the clock comes by it moves
 gradually down the list.

 What you're pointing out is that the clock might not come by very
 often resulting everything being at the head of the list. In that case
 I'm not clear it really matters what gets evicted though. And the cpu
 effort of running the clock n times sounds bad but doing the work
 earlier doesn't really change the amount of work being done, it just
 amortizes it over more calls.

 But if you want to do that it seems to me the way to do it is every
 time a buffer is pinned set to the maximum and then run the clock
 max_value - previous_value. So the total usage counts of all buffers
 remains constant. If that results in contention one way to reduce it
 is to do this probabilistically. Run the clock 1% of the time but run
 it 100x as much as you would normally.

 But I think you've misidentified the problem and what those other
 algorithms are trying to solve. The problem is not that Postgres will
 pick a bad buffer to evict. If all the buffers have been since the
 last time the clock came around then they're all hot anyways and it
 doesn't really matter which one we evict. The problem is that we
 expend an inordinate amount of work finding the few non-hot buffers.
 When you have a really large amount of memory and 99.9% of it is hot
 but 0.1% is whatever random non-hot page was needed last then there's
 an obvious buffer to evict when you need a new one. But we spend a lot
 of work decrementing every hot buffer's usage count 4 times only to
 have them immediately incremented again just to find the 1 buffer
 where the usage count was 4 or 3. The goal of these algorithms that
 divide the buffers into groups is to avoid having to do so much work
 to find the colder buffers. Once the hot buffers migrate to the hot
 pool we only need to run the clock there when we find we have new hot
 pages that we want to promote. All the thrashing in the cold pool can
 be more efficient because there's many fewer pages to consider.

Well, I think Peter has proved that PostgreSQL *will* pick a bad
buffer to evict.  The proof is that when he changed the choice of
buffer to evict, he got a significant performance improvement.

I also believe this to be the case on first principles and my own
experiments.  Suppose you have a workload that fits inside
shared_buffers.  All of the usage counts will converge to 5.  Then,
somebody accesses a table that is not cached, so something's got to be
evicted.  Because all the usage counts are the same, the eviction at
this point is completely indiscriminate.  We're just as likely to kick
out a btree root page or a visibility map page as we are to kick out a
random heap page, even though the former have probably been accessed
several orders of magnitude more often.  That's clearly bad.  On
systems that are not too heavily loaded it doesn't matter too much
because we just fault the page right back in from the OS pagecache.
But I've done pgbench runs where such decisions lead to long stalls,
because the page has to be brought back in from disk, and there's a
long I/O queue; or maybe just because the kernel thinks PostgreSQL is
issuing too many I/O requests and makes some of them wait to cool
things down.

Of course, the overhead of repeated clock sweeps to push down the
usage counts isn't a great thing either.  I'm not saying that isn't a
problem.  But I think bad decisions about what to evict are also a
problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Buildfarm master-next branch?

2014-04-17 Thread Andrew Dunstan


On 04/17/2014 09:17 AM, Robert Haas wrote:



In terms of improving the buildfarm infrastructure, the thing I would
most like to have is more frequent runs.  It would be great if pushing
a commit to the master repository triggered an immediate build on
every buildfarm animal so that you could see all of the failures in a
short period of time.  But that would require more resources for the
buildfarm machines, which are provided on a strictly volunteer basis,
so it's hard to see how to arrange that.



Some buildfarm owners run at pretty high frequency - I know there are 
cron jobs on some running every 15 minutes. My own Linux and FBSD 
machines run every hour. Windows builds take longer - depending on other 
use of resources they can run a couple of hours per branch. Also my two 
Windows machines doing buildfarm work are running a total of 5 animals, 
so the runs are staggered - on Windows 8 the two animals each run every 
3 hours. Note that each run potentially builds all the branches, if 
there has been some backported change, and the windows animals are set 
up so that if animal A on the same machine is running when animal B's 
run time comes around animal B skips it scheduled run. So sometimes you 
do have to wait a bit. If someone were to providfe me with a bunch of 
nice fast Windows VMs I would set them up with one animal a piece with 
frequent runs and we might get a lot better coverage. But I am tapped 
out as far as the resources I can provide go.





But the ability to easily spin up temporary branches for testing would
also be great.  Unfortunately, I suspect that only a minority of the
buildfarm owners would choose to participate, which would make it less
useful, but if we could solve that problem I'd be all in favor of it.
I'm not volunteering to do the work, though.


The buildfarm's original purpose was to give early warning of 
platform-specific problems of code we had *already* decided on. Now 
projects morph, so we might decide to do something like this. But we'd 
need to think long and hard about it. Postgres has not historically used 
short-lived branches. I don't much like Craig's idea of a long-lived 
testing branch that we're going to do commits and reverts on. If we're 
going to do something like this it would be much better to make some 
provision for short-lived topic branches. e.g. say we allowed branches 
with names like testme-platformname-featurename, (platformname here 
could be a magic all, or a comma-separated list of names such as 
linux, freebsd, windows). Wnen testing is done, we could merge the 
branch if the testing worked out OK, or drop it if the testing proved to 
be a failure.



There would be some work to make the buildfarm client suitable for this. 
And we'd probably need a testing dashboard so as to keep the main 
dashboard page free of test branch results.


Of course, all this would be done in my copious spare time *cough*. I'm 
not sure this would be the best use of it.


cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Verbose output of pg_dump not show schema name

2014-04-17 Thread Fabrízio de Royes Mello
On Thu, Apr 17, 2014 at 2:14 AM, Michael Paquier michael.paqu...@gmail.com
wrote:

 On Thu, Apr 17, 2014 at 11:41 AM, Fabrízio de Royes Mello
 fabriziome...@gmail.com wrote:
  Hi all,
 
  There are some reason to verbose output of pg_dump don't show schema
name?
 
  A output example of using pg_dump -Fd -j8 -v
 Specifying a target directory with -f is better here...


Yeah... I'm just show the relevant options used... ;-)


  This database have a lot of different schemas with same structure and
if I
  need do view the status of dump I don't know what schema the table are
dump
  from.
 Yes this may be helpful. The attached quick'n dirty patch implements it.


Very nice... thanks!!!

I add schema name do the following messages too:

pg_restore: processing data for table public.bar
pg_restore: processing data for table public.foo
pg_restore: processing data for table s1.bar
pg_restore: processing data for table s1.foo
pg_restore: processing data for table s2.bar
pg_restore: processing data for table s2.foo
pg_restore: processing data for table s3.bar
pg_restore: processing data for table s3.foo

And:

pg_dump: finding the columns and types of table s1.foo
pg_dump: finding the columns and types of table s1.bar
pg_dump: finding the columns and types of table s2.foo
pg_dump: finding the columns and types of table s2.bar
pg_dump: finding the columns and types of table s3.foo
pg_dump: finding the columns and types of table s3.bar
pg_dump: finding the columns and types of table public.foo
pg_dump: finding the columns and types of table public.bar

And:

pg_dump: processing data for table public.bar
pg_dump: dumping contents of table public.bar
pg_dump: processing data for table public.foo
pg_dump: dumping contents of table public.foo
pg_dump: processing data for table s1.bar
pg_dump: dumping contents of table s1.bar
pg_dump: processing data for table s1.foo
pg_dump: dumping contents of table s1.foo
pg_dump: processing data for table s2.bar
pg_dump: dumping contents of table s2.bar
pg_dump: processing data for table s2.foo
pg_dump: dumping contents of table s2.foo
pg_dump: processing data for table s3.bar
pg_dump: dumping contents of table s3.bar
pg_dump: processing data for table s3.foo
pg_dump: dumping contents of table s3.foo


Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 9464540..7f73e8d 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -706,8 +706,8 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te,
 	_becomeOwner(AH, te);
 	_selectOutputSchema(AH, te-namespace);
 
-	ahlog(AH, 1, processing data for table \%s\\n,
-		  te-tag);
+	ahlog(AH, 1, processing data for table \%s\.\%s\\n,
+		  AH-currSchema, te-tag);
 
 	/*
 	 * In parallel restore, if we created the table earlier in
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index a6c0428..f763a88 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -1400,7 +1400,17 @@ dumpTableData_copy(Archive *fout, void *dcontext)
 	const char *column_list;
 
 	if (g_verbose)
-		write_msg(NULL, dumping contents of table %s\n, classname);
+	{
+		/* Print namespace information if available */
+		if (tbinfo-dobj.namespace != NULL 
+			tbinfo-dobj.namespace-dobj.name != NULL)
+			write_msg(NULL, dumping contents of table %s.%s\n,
+	  tbinfo-dobj.namespace-dobj.name,
+	  classname);
+		else
+			write_msg(NULL, dumping contents of table %s\n,
+	  classname);
+	}
 
 	/*
 	 * Make sure we are in proper schema.  We will qualify the table name
@@ -6291,8 +6301,17 @@ getTableAttrs(Archive *fout, TableInfo *tblinfo, int numTables)
 		 * the output of an indexscan on pg_attribute_relid_attnum_index.
 		 */
 		if (g_verbose)
-			write_msg(NULL, finding the columns and types of table \%s\\n,
-	  tbinfo-dobj.name);
+		{
+			/* Print namespace information if available */
+			if (tbinfo-dobj.namespace != NULL 
+tbinfo-dobj.namespace-dobj.name != NULL)
+write_msg(NULL, finding the columns and types of table \%s\.\%s\\n,
+			  tbinfo-dobj.namespace-dobj.name,
+			  tbinfo-dobj.name);
+			else
+write_msg(NULL, finding the columns and types of table \%s\\n,
+		  tbinfo-dobj.name);
+		}
 
 		resetPQExpBuffer(q);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Bruce Momjian
On Thu, Apr 17, 2014 at 10:18:43AM -0400, Robert Haas wrote:
 I also believe this to be the case on first principles and my own
 experiments.  Suppose you have a workload that fits inside
 shared_buffers.  All of the usage counts will converge to 5.  Then,
 somebody accesses a table that is not cached, so something's got to be
 evicted.  Because all the usage counts are the same, the eviction at
 this point is completely indiscriminate.  We're just as likely to kick
 out a btree root page or a visibility map page as we are to kick out a
 random heap page, even though the former have probably been accessed
 several orders of magnitude more often.  That's clearly bad.  On
 systems that are not too heavily loaded it doesn't matter too much
 because we just fault the page right back in from the OS pagecache.
 But I've done pgbench runs where such decisions lead to long stalls,
 because the page has to be brought back in from disk, and there's a
 long I/O queue; or maybe just because the kernel thinks PostgreSQL is
 issuing too many I/O requests and makes some of them wait to cool
 things down.

I understand now.  If there is no memory pressure, every buffer gets the
max usage count, and when a new buffer comes in, it isn't the max so it
is swiftly removed until the clock sweep has time to decrement the old
buffers.  Decaying buffers when there is no memory pressure creates
additional overhead and gets into timing issues of when to decay.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Verbose output of pg_dump not show schema name

2014-04-17 Thread Bruce Momjian
On Thu, Apr 17, 2014 at 11:29:03AM -0300, Fabrízio de Royes Mello wrote:
   This database have a lot of different schemas with same structure and if I
   need do view the status of dump I don't know what schema the table are 
   dump
   from.
  Yes this may be helpful. The attached quick'n dirty patch implements it.
 
 
 Very nice... thanks!!!
 
 I add schema name do the following messages too:
 
 pg_restore: processing data for table public.bar
 pg_restore: processing data for table public.foo
 pg_restore: processing data for table s1.bar
 pg_restore: processing data for table s1.foo
 pg_restore: processing data for table s2.bar
 pg_restore: processing data for table s2.foo
 pg_restore: processing data for table s3.bar
 pg_restore: processing data for table s3.foo

Can you get that to _conditionally_ double-quote the strings?  In fact,
maybe we don't even need the double-quotes.  How do we double-quote
other places?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Buildfarm master-next branch?

2014-04-17 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 04/17/2014 09:17 AM, Robert Haas wrote:
 In terms of improving the buildfarm infrastructure, the thing I would
 most like to have is more frequent runs.

IMO the best single thing that could happen for the buildfarm is if
we had more critters (at least twice as many) running a wider variety of
platforms, compilers, and configuration options than there are today.
More frequent runs would come out of that automatically.

 ... But that would require more resources for the
 buildfarm machines, which are provided on a strictly volunteer basis,
 so it's hard to see how to arrange that.

I don't think we've tried hard lately to get people to sign up.  Maybe
we should ask the -advocacy crew to do something.

 But the ability to easily spin up temporary branches for testing would
 also be great.  Unfortunately, I suspect that only a minority of the
 buildfarm owners would choose to participate, which would make it less
 useful, but if we could solve that problem I'd be all in favor of it.

 ... Of course, all this would be done in my copious spare time *cough*. I'm 
 not sure this would be the best use of it.

I agree that this would not be worth the effort needed to make it happen.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Robert Haas
On Thu, Apr 17, 2014 at 10:32 AM, Bruce Momjian br...@momjian.us wrote:
 On Thu, Apr 17, 2014 at 10:18:43AM -0400, Robert Haas wrote:
 I also believe this to be the case on first principles and my own
 experiments.  Suppose you have a workload that fits inside
 shared_buffers.  All of the usage counts will converge to 5.  Then,
 somebody accesses a table that is not cached, so something's got to be
 evicted.  Because all the usage counts are the same, the eviction at
 this point is completely indiscriminate.  We're just as likely to kick
 out a btree root page or a visibility map page as we are to kick out a
 random heap page, even though the former have probably been accessed
 several orders of magnitude more often.  That's clearly bad.  On
 systems that are not too heavily loaded it doesn't matter too much
 because we just fault the page right back in from the OS pagecache.
 But I've done pgbench runs where such decisions lead to long stalls,
 because the page has to be brought back in from disk, and there's a
 long I/O queue; or maybe just because the kernel thinks PostgreSQL is
 issuing too many I/O requests and makes some of them wait to cool
 things down.

 I understand now.  If there is no memory pressure, every buffer gets the
 max usage count, and when a new buffer comes in, it isn't the max so it
 is swiftly removed until the clock sweep has time to decrement the old
 buffers.  Decaying buffers when there is no memory pressure creates
 additional overhead and gets into timing issues of when to decay.

That can happen, but the real problem I was trying to get at is that
when all the buffers get up to max usage count, they all appear
equally important.  But in reality they're not.  So when we do start
evicting those long-resident buffers, it's essentially random which
one we kick out.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Bruce Momjian
On Thu, Apr 17, 2014 at 10:40:40AM -0400, Robert Haas wrote:
 On Thu, Apr 17, 2014 at 10:32 AM, Bruce Momjian br...@momjian.us wrote:
  On Thu, Apr 17, 2014 at 10:18:43AM -0400, Robert Haas wrote:
  I also believe this to be the case on first principles and my own
  experiments.  Suppose you have a workload that fits inside
  shared_buffers.  All of the usage counts will converge to 5.  Then,
  somebody accesses a table that is not cached, so something's got to be
  evicted.  Because all the usage counts are the same, the eviction at
  this point is completely indiscriminate.  We're just as likely to kick
  out a btree root page or a visibility map page as we are to kick out a
  random heap page, even though the former have probably been accessed
  several orders of magnitude more often.  That's clearly bad.  On
  systems that are not too heavily loaded it doesn't matter too much
  because we just fault the page right back in from the OS pagecache.
  But I've done pgbench runs where such decisions lead to long stalls,
  because the page has to be brought back in from disk, and there's a
  long I/O queue; or maybe just because the kernel thinks PostgreSQL is
  issuing too many I/O requests and makes some of them wait to cool
  things down.
 
  I understand now.  If there is no memory pressure, every buffer gets the
  max usage count, and when a new buffer comes in, it isn't the max so it
  is swiftly removed until the clock sweep has time to decrement the old
  buffers.  Decaying buffers when there is no memory pressure creates
  additional overhead and gets into timing issues of when to decay.
 
 That can happen, but the real problem I was trying to get at is that
 when all the buffers get up to max usage count, they all appear
 equally important.  But in reality they're not.  So when we do start
 evicting those long-resident buffers, it's essentially random which
 one we kick out.

True.  Ideally we would have some way to know that _all_ the buffers had
reached the maximum and kick off a sweep to decrement them all.  I am
unclear how we would do that.  One odd idea would be to have a global
counter that is incremented everytime a buffer goes from 4 to 5 (max)
--- when the counter equals 50% of all buffers, do a clock sweep.  Of
course, then the counter becomes a bottleneck.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Robert Haas
On Thu, Apr 17, 2014 at 10:48 AM, Bruce Momjian br...@momjian.us wrote:
  I understand now.  If there is no memory pressure, every buffer gets the
  max usage count, and when a new buffer comes in, it isn't the max so it
  is swiftly removed until the clock sweep has time to decrement the old
  buffers.  Decaying buffers when there is no memory pressure creates
  additional overhead and gets into timing issues of when to decay.

 That can happen, but the real problem I was trying to get at is that
 when all the buffers get up to max usage count, they all appear
 equally important.  But in reality they're not.  So when we do start
 evicting those long-resident buffers, it's essentially random which
 one we kick out.

 True.  Ideally we would have some way to know that _all_ the buffers had
 reached the maximum and kick off a sweep to decrement them all.  I am
 unclear how we would do that.  One odd idea would be to have a global
 counter that is incremented everytime a buffer goes from 4 to 5 (max)
 --- when the counter equals 50% of all buffers, do a clock sweep.  Of
 course, then the counter becomes a bottleneck.

Yeah, I think that's the right general line of thinking.  But it
doesn't have to be as coarse-grained as do a whole clock sweep.  It
can be, you know, for every buffer that gets incremented from 4 to 5,
run the clock sweep far enough to decrement the usage count of some
other buffer by one.  That's similar to your idea but you can do it a
bit at a time rather than having to make a complete pass over
shared_buffers all at once.

Your other point, that the counter can become the bottleneck, is quite
right also and a major problem in this area.  I don't know how to
solve it right at the moment, but I'm hopeful that there may be a way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Andres Freund
On 2014-04-17 10:48:15 -0400, Bruce Momjian wrote:
 On Thu, Apr 17, 2014 at 10:40:40AM -0400, Robert Haas wrote:
  That can happen, but the real problem I was trying to get at is that
  when all the buffers get up to max usage count, they all appear
  equally important.  But in reality they're not.  So when we do start
  evicting those long-resident buffers, it's essentially random which
  one we kick out.
 
 True.  Ideally we would have some way to know that _all_ the buffers had
 reached the maximum and kick off a sweep to decrement them all.  I am
 unclear how we would do that.  One odd idea would be to have a global
 counter that is incremented everytime a buffer goes from 4 to 5 (max)
 --- when the counter equals 50% of all buffers, do a clock sweep.  Of
 course, then the counter becomes a bottleneck.

I have my doubts that we'll make the current scheme, where buffer
reclaim essentially is O(NBuffers), work much better. Especially as CPU
cache effects make such large, high frequency, accesses really
expensive.
I think we need more drastic changes.

I am *not* suggesting that we do that, but I believe it'd be possible to
implement a full LRU and be faster than today in scenarios with
nontrivial amounts of shared buffers.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Greg Stark
On Thu, Apr 17, 2014 at 10:18 AM, Robert Haas robertmh...@gmail.com wrote:
 Because all the usage counts are the same, the eviction at
 this point is completely indiscriminate.  We're just as likely to kick
 out a btree root page or a visibility map page as we are to kick out a
 random heap page, even though the former have probably been accessed
 several orders of magnitude more often.  That's clearly bad.

That's not clear at all. In that circumstance regardless of what page
you evict you're incurring precisely one page fault i/o when the page
is read back in. Incurring that i/o is bad but it's unavoidable and
it's the same badness regardless of what page it's for. The only way
to prefer one page over another is if one page won't be needed for
long enough for the page to be useful for caching this new buffer (or
mixture of buffers) for multiple accesses. If you can't do that then
it doesn't matter which buffer you use since it'll just be evicted to
read back in the original page again.



-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Verbose output of pg_dump not show schema name

2014-04-17 Thread Fabrízio de Royes Mello
On Thu, Apr 17, 2014 at 11:36 AM, Bruce Momjian br...@momjian.us wrote:

 On Thu, Apr 17, 2014 at 11:29:03AM -0300, Fabrízio de Royes Mello wrote:
This database have a lot of different schemas with same structure
and if I
need do view the status of dump I don't know what schema the table
are dump
from.
   Yes this may be helpful. The attached quick'n dirty patch implements
it.
  
 
  Very nice... thanks!!!
 
  I add schema name do the following messages too:
 
  pg_restore: processing data for table public.bar
  pg_restore: processing data for table public.foo
  pg_restore: processing data for table s1.bar
  pg_restore: processing data for table s1.foo
  pg_restore: processing data for table s2.bar
  pg_restore: processing data for table s2.foo
  pg_restore: processing data for table s3.bar
  pg_restore: processing data for table s3.foo

 Can you get that to _conditionally_ double-quote the strings?

Sorry, I didn't understand what you means? Your idea is to check if the
namespace is available and then don't show the double-quote, is that?


 In fact,
 maybe we don't even need the double-quotes.  How do we double-quote
 other places?


Checking that more deeply I found some other places that show the table
name and all of them are double-quoted.

$ grep 'table \\\%s' src/bin/pg_dump/*.c
src/bin/pg_dump/common.c:write_msg(NULL, failed sanity
check, parent OID %u of table \%s\ (OID %u) not found\n,
src/bin/pg_dump/pg_backup_archiver.c:ahlog(AH, 1,
processing data for table \%s\.\%s\\n,
src/bin/pg_dump/pg_backup_archiver.c:ahlog(AH, 1, table \%s\ could
not be created, will not restore its data\n,
src/bin/pg_dump/pg_backup_db.c:warn_or_exit_horribly(AH,
modulename, COPY failed for table \%s\: %s,
src/bin/pg_dump/pg_dump.c:write_msg(NULL, Dumping the contents of
table \%s\ failed: PQgetCopyData() failed.\n, classname);
src/bin/pg_dump/pg_dump.c:write_msg(NULL, Dumping the contents of
table \%s\ failed: PQgetResult() failed.\n, classname);
src/bin/pg_dump/pg_dump.c:write_msg(NULL, WARNING: owner of
table \%s\ appears to be invalid\n,
src/bin/pg_dump/pg_dump.c:write_msg(NULL, reading indexes for
table \%s\\n,
src/bin/pg_dump/pg_dump.c:write_msg(NULL, reading foreign key
constraints for table \%s\\n,
src/bin/pg_dump/pg_dump.c:write_msg(NULL, reading triggers for
table \%s\\n,
src/bin/pg_dump/pg_dump.c:exit_horribly(NULL,
query produced null referenced table name for foreign key trigger \%s\
on table \%s\ (OID of table: %u)\n,
src/bin/pg_dump/pg_dump.c:write_msg(NULL, finding the
columns and types of table \%s\.\%s\\n,
src/bin/pg_dump/pg_dump.c:write_msg(NULL, finding the
columns and types of table \%s\\n,
src/bin/pg_dump/pg_dump.c:  invalid column
numbering in table \%s\\n,
src/bin/pg_dump/pg_dump.c:write_msg(NULL, finding default
expressions of table \%s\\n,
src/bin/pg_dump/pg_dump.c:  invalid adnum
value %d for table \%s\\n,
src/bin/pg_dump/pg_dump.c:write_msg(NULL, finding check
constraints for table \%s\\n,
src/bin/pg_dump/pg_dump.c:write_msg(NULL,
ngettext(expected %d check constraint on table \%s\ but found %d\n,
src/bin/pg_dump/pg_dump.c:
 expected %d check constraints on table \%s\ but found %d\n,
src/bin/pg_dump/pg_dump.c:exit_horribly(NULL, invalid column number %d
for table \%s\\n,
src/bin/pg_dump/pg_dump.c:write_msg(NULL, invalid argument
string (%s) for trigger \%s\ on table \%s\\n,
src/bin/pg_dump/pg_dump.c:write_msg(NULL, query to get rule \%s\
for table \%s\ failed: wrong number of rows returned\n,


Just the dumping contents of table.. message isn't double-quoted:

$ grep 'table %s' src/bin/pg_dump/*.c
src/bin/pg_dump/pg_dump.c:write_msg(NULL, dumping contents of
table %s\n,


So maybe we must double-quote of all string, i.e. public.foo, including
the missing bellow.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Greg Stark
On Tue, Apr 15, 2014 at 7:30 PM, Peter Geoghegan p...@heroku.com wrote:
 Frankly, there doesn't need to be any research on this, because it's
 just common sense that probabilistically, leaf pages are much more
 useful than heap pages in servicing index scan queries if we assume a
 uniform distribution. If we don't assume that, then they're still more
 useful on average.

I don't think common sense is compelling. I think you need to pin
down exactly what it is about btree intermediate pages that the LRU
isn't capturing and not just argue they're more useful. The LRU is
already capturing which pages are more heavily used than others so you
need to identify what it is that makes index pages *even more* useful
than their frequency and recency of access indicates. Not just that
they're more useful than an average page.

So what I think is missing is that indexes are always accessed from
the root down to the leaf. So the most recent page accessed will
always be the leaf. And in whatever chain of pages was used to reach
the last leaf page the least recently accessed will always be the
root. But we'll need the root page again on the subsequent descent
even if it's to reach the same leaf page we kept in ram in preference
to it.

Now it doesn't *always* make sense to keep an intermediate page over
leaf pages. Imagine an index that we always do full traversals of.
We'll always descend from the root down the left-most pages and then
follow the right pointers across. All the other intermediate pages
will be cold. If we do an occasional descent probing for other keys
those leaf pages shouldn't be cached since they won't be needed again
for the common full index traversals and the next occasional probe
will probably be looking for different keys.

But if we're often probing for the same keys the last thing we want to
do is throw away one of the intermediate pages for those keys when we
could throw away a leaf page. But that's what would happen in a strict
LRU.  It's almost like what we would really want to do is mark the
pages as least recently used in the opposite order from the order
they're actually accessed when descending. Or perhaps bump the usage
count to max+1 when it's an intermediate page so that it takes one
extra cycle of decrementing before it's considered old compared to a
leaf page.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Verbose output of pg_dump not show schema name

2014-04-17 Thread Bruce Momjian
On Thu, Apr 17, 2014 at 12:07:39PM -0300, Fabrízio de Royes Mello wrote:
  Can you get that to _conditionally_ double-quote the strings? 
 
 Sorry, I didn't understand what you means? Your idea is to check if the
 namespace is available and then don't show the double-quote, is that?

The idea is that we only need quotes when there are odd characters in
the identifier.  We do that right now in some places, though I can't
find them in pg_dump.  I know psql does that, see quote_ident().

  In fact,
  maybe we don't even need the double-quotes.  How do we double-quote
  other places?
 
 
 Checking that more deeply I found some other places that show the table name
 and all of them are double-quoted.

OK.

 Just the dumping contents of table.. message isn't double-quoted:
 
 $ grep 'table %s' src/bin/pg_dump/*.c
 src/bin/pg_dump/pg_dump.c:            write_msg(NULL, dumping contents of
 table %s\n,
 
 
 So maybe we must double-quote of all string, i.e. public.foo, including the
 missing bellow.

No, I think double-quoting each part is the correct way.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] pg_upgrade tablespaces

2014-04-17 Thread Bruce Momjian
On Wed, Apr 16, 2014 at 01:49:20PM -0400, Bruce Momjian wrote:
 On Sun, Jan 12, 2014 at 11:04:41PM -0500, Bruce Momjian wrote:
   In the pgsql_old installation you have symlinks pointing back to the
   current default location. As well pg_tablespace points back to
   /usr/local/pgsql/data/ The issue is that there is not actually
   anything there in the way of a tablespace. So when pg_upgrade runs
   it tries to upgrade from /usr/local/pgsql/data/tblspc_dir to
   /usr/local/pgsql/data/tblspc_dir where the first directory either
   does not exist. or if the user went ahead and created the directory
   in the new installation, is empty. What is really wanted is to
   upgrade from /usr/local/pgsql_old/data/tblspc_dir to
   /usr/local/pgsql/data/tblspc_dir. Right now the only way that
   happens is with user intervention.
  
  Right, it points to _nothing_ in the _new_ cluster.  Perhaps the
  simplest approach would be to check all the pg_tablespace locations to
  see if they point at real directories.  If not, we would have to have
  the user update pg_tablespace and the symlinks.  :-(  Actually, even in
  9.2+, those symlinks are going to point at the same nothing.  That
  would support checking the symlinks in all versions.
 
 I have developed the attached patch which checks all tablespaces to make
 sure the directories exist.  I plan to backpatch this.
 
 The reason we haven't seen this bug reported more frequently is that a
 _database_ defined in a non-existent tablespace directory already throws
 an backend error, so this check is only necessary where tables/indexes
 (not databases) are defined in non-existant tablespace directories.

Patch applied and backpatched to 9.3.  I beefed up the C comment to
explain how this can happen:

   Check that the tablespace path exists and is a directory.
   Effectively, this is checking only for tables/indexes in
   non-existent tablespace directories.  Databases located
   in non-existent tablespaces already throw a backend error.
   Non-existent tablespace directories can occur when a data directory
   that contains user tablespaces is moved as part of pg_upgrade
   preparation and the symbolic links are not updated.

Thanks for the report and debugging.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Verbose output of pg_dump not show schema name

2014-04-17 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 The idea is that we only need quotes when there are odd characters in
 the identifier.  We do that right now in some places, though I can't
 find them in pg_dump.  I know psql does that, see quote_ident().

I think our general style rule is that identifiers embedded in messages
are always double-quoted.  There's an exception for type names, but
not otherwise.  You're confusing the message case with printing SQL.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Verbose output of pg_dump not show schema name

2014-04-17 Thread Bruce Momjian
On Thu, Apr 17, 2014 at 11:44:37AM -0400, Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  The idea is that we only need quotes when there are odd characters in
  the identifier.  We do that right now in some places, though I can't
  find them in pg_dump.  I know psql does that, see quote_ident().
 
 I think our general style rule is that identifiers embedded in messages
 are always double-quoted.  There's an exception for type names, but
 not otherwise.  You're confusing the message case with printing SQL.

OK.  I was unclear if a status _display_ was a message like an error
message.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] shouldn't we log permission errors when accessing the configured trigger file?

2014-04-17 Thread Bruce Momjian
On Wed, Apr 16, 2014 at 06:55:14PM -0400, Bruce Momjian wrote:
  Seems reasonable.  It could lead to quite a bit of log spam, I
  suppose, but the way things are now could be pretty mystifying if
  you've located your trigger file somewhere outside $PGDATA, and a
  parent directory is lacking permissions.
  
  
  +1. Since it actually indicates something that's quite broken (since with 
  that
  you can never make the trigger work until you fix it), the log spam seems 
  like
  it would be appropriate. (Logspam is never nice, but a single log line is 
  also
  very easy to miss - this should log enough that you wouldn't) 
 
 I have developed the attached patch to address this issue.

Patch applied.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 several orders of magnitude more often.  That's clearly bad.  On
 systems that are not too heavily loaded it doesn't matter too much
 because we just fault the page right back in from the OS pagecache.

Ehhh.  No.  If it's a hot page that we've been holding in *our* cache
long enough, the kernel will happily evict it as 'cold' from *its*
cache, leading to...

 But I've done pgbench runs where such decisions lead to long stalls,
 because the page has to be brought back in from disk, and there's a
 long I/O queue; or maybe just because the kernel thinks PostgreSQL is
 issuing too many I/O requests and makes some of them wait to cool
 things down.

Exactly this.

 Of course, the overhead of repeated clock sweeps to push down the
 usage counts isn't a great thing either.  I'm not saying that isn't a
 problem.  But I think bad decisions about what to evict are also a
 problem.

Using a bit more CPU here and there, particularly if it's done in a
background worker, or ideally multiple background workers (for each
buffer pool) would be much better than evicting a hot page that isn't in
the kernel's buffer either 'cause we've held on to it long enough that
the kernel thinks it's cold.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Fix memset usage in pgcrypto

2014-04-17 Thread Bruce Momjian
On Mon, Mar 31, 2014 at 09:03:41PM -0400, Bruce Momjian wrote:
 On Thu, Dec 26, 2013 at 03:42:12PM +0200, Marko Kreen wrote:
  http://www.viva64.com/en/b/0227/ reported that on-stack memset()s
  might be optimized away by compilers.  Fix it.
  
  * Replace memset() with px_memset()
  * Add px_memset to copy_crlf()
  * ADd px_memset to pgp-s2k.c
 
 Where are we on this patch?  Seems it needs backpatching too.

Patch backpatched through 8.4.  Thanks.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-04-17 Thread Bruce Momjian
On Mon, Feb 10, 2014 at 06:40:30PM +, Peter Geoghegan wrote:
 On Sun, Jan 19, 2014 at 2:17 AM, Peter Geoghegan p...@heroku.com wrote:
  I'm just throwing an error when locking the tuple returns
  HeapTupleInvisible, and the xmin of the tuple is our xid.
 
 I would like some feedback on this point. We need to consider how
 exactly to avoid updating the same tuple inserted by our command.
 Updating a tuple we inserted cannot be allowed to happen, not least
 because to do so causes livelock.
 
 A related consideration that I raised in mid to late January that
 hasn't been commented on is avoiding updating the same tuple twice,
 and where we come down on that with respect to where our
 responsibility to the user starts and ends. For example, SQL MERGE
 officially forbids this, but MySQL's INSERT...ON DUPLICATE KEY UPDATE
 seems not to, probably due to implementation considerations.

Where are we on this?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Greg Stark
On Thu, Apr 17, 2014 at 12:21 PM, Stephen Frost sfr...@snowman.net wrote:
 Ehhh.  No.  If it's a hot page that we've been holding in *our* cache
 long enough, the kernel will happily evict it as 'cold' from *its*
 cache, leading to...


This is a whole nother problem.

It is worrisome that we could be benchmarking the page replacement
algorithm in Postgres and choose a page replacement algorithm that
chooses pages that performs well because it tends to evict pages that
are in the OS cache. And then one day (hopefully not too far off)
we'll fix the double buffering problem and end up with a strange
choice of page replacement algorithm.

It also means that every benchmark is super sensitive to the how large
a fraction of system memory Postgres is managing. If A benchmark of a
page replacement algorithm with 3GB shared buffers might perform well
compared to others on a system with 8GB or 32GB total RAM but actually
be choosing pages very poorly in normal terms and perform terribly on
a system with 4GB total ram.

Ideally what I would like to see is instrumentation of Postgres's
buffer pinning so we can generate various test loads and then just run
the different algorithms on them and measure precisely how many page
evictions it's causing and when how often it's choosing pages that
need to be read in soon after and so on. We shouldn't have to run
Postgres to get these counts at all, just run the algorithm as we read
through a text file (or database table) listing the pages being
accessed.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL replay bugs

2014-04-17 Thread Heikki Linnakangas

On 04/08/2014 06:41 AM, Michael Paquier wrote:

On Tue, Apr 8, 2014 at 3:16 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:


I've been playing with a little hack that records a before and after image
of every page modification that is WAL-logged, and writes the images to a
file along with the LSN of the corresponding WAL record. I set up a
master-standby replication with that hack in place in both servers, and ran
the regression suite. Then I compared the after images after every WAL
record, as written on master, and as replayed by the standby.

Assuming that adding some dedicated hooks in the core able to do
actions before and after a page modification occur is not *that*
costly (well I imagine that it is not acceptable in terms of
performance), could it be possible to get that in the shape of a
extension that could be used to test WAL record consistency? This may
be an idea to think about...


Yeah, working on it. It can live as a patch set if nothing else.

This has been very fruitful, I just committed another fix for a bug I 
found with this earlier today.


There are quite a few things that cause differences between master and 
standby. We have hint bits in many places, unused space that isn't 
zeroed etc.


Two things that are not bugs, but I'd like to change just to make this 
tool easier to maintain, and to generally clean things up:


1. When creating a sequence, we first use simple_heap_insert() to insert 
the sequence tuple, which creates a WAL record. Then we write a new 
sequence RM WAL record about the same thing. The reason is that the WAL 
record written by regular heap_insert is bogus for a sequence tuple. 
After replaying just the heap insertion, but not the other record, the 
page doesn't have the magic value indicating that it's a sequence, i.e. 
it's broken as a sequence page. That's OK because we only do this when 
creating a new sequence, so if we crash between those two records, the 
whole relation is not visible to anyone. Nevertheless, I'd like to fix 
that by using PageAddItem directly to insert the tuple, instead of 
simple_heap_insert. We have to override the xmin field of the tuple 
anyway, and we don't need any of the other services like finding the 
insert location, toasting, visibility map or freespace map updates, that 
simple_heap_insert() provides.


2. _bt_restore_page, when restoring a B-tree page split record. It adds 
tuples to the page in reverse order compared to how it's done in master. 
There is a comment noting that, and it asks Is it worth changing just 
on general principles?. Yes, I think it is.


Any objections to changing those two?

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL replay bugs

2014-04-17 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 Two things that are not bugs, but I'd like to change just to make this 
 tool easier to maintain, and to generally clean things up:

 1. When creating a sequence, we first use simple_heap_insert() to insert 
 the sequence tuple, which creates a WAL record. Then we write a new 
 sequence RM WAL record about the same thing. The reason is that the WAL 
 record written by regular heap_insert is bogus for a sequence tuple. 
 After replaying just the heap insertion, but not the other record, the 
 page doesn't have the magic value indicating that it's a sequence, i.e. 
 it's broken as a sequence page. That's OK because we only do this when 
 creating a new sequence, so if we crash between those two records, the 
 whole relation is not visible to anyone. Nevertheless, I'd like to fix 
 that by using PageAddItem directly to insert the tuple, instead of 
 simple_heap_insert. We have to override the xmin field of the tuple 
 anyway, and we don't need any of the other services like finding the 
 insert location, toasting, visibility map or freespace map updates, that 
 simple_heap_insert() provides.

 2. _bt_restore_page, when restoring a B-tree page split record. It adds 
 tuples to the page in reverse order compared to how it's done in master. 
 There is a comment noting that, and it asks Is it worth changing just 
 on general principles?. Yes, I think it is.

 Any objections to changing those two?

Not here.  I've always suspected #2 was going to bite us someday anyway.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Peter Geoghegan
On Thu, Apr 17, 2014 at 9:21 AM, Stephen Frost sfr...@snowman.net wrote:
 * Robert Haas (robertmh...@gmail.com) wrote:
 several orders of magnitude more often.  That's clearly bad.  On
 systems that are not too heavily loaded it doesn't matter too much
 because we just fault the page right back in from the OS pagecache.

 Ehhh.  No.  If it's a hot page that we've been holding in *our* cache
 long enough, the kernel will happily evict it as 'cold' from *its*
 cache, leading to...

 But I've done pgbench runs where such decisions lead to long stalls,
 because the page has to be brought back in from disk, and there's a
 long I/O queue; or maybe just because the kernel thinks PostgreSQL is
 issuing too many I/O requests and makes some of them wait to cool
 things down.

 Exactly this.

Yes, I believe that's why this is so effective.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How can we make beta testing better?

2014-04-17 Thread Josh Berkus
On 04/17/2014 05:39 AM, Greg Stark wrote:
 On Wed, Apr 16, 2014 at 12:53 AM, Rod Taylor rod.tay...@gmail.com wrote:
 4) Plays queries from the CSV logs starting from $TIME mimicking actual
 timing and transaction boundaries
 
 This ^^
 
 But I recall a number of previous attempts including plugins for
 general load testing systems, what happened to them?
 
 Honestly if you really want to load test properly though what you
 really want to do is deploy a copy of your entire application and feed
 it requests simulating user traffic. That results in more accurate
 representation and gives you data that's easier to act on.

Software is available which can do this.  The problem is getting the
workload in the first place.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Stephen Frost
* Greg Stark (st...@mit.edu) wrote:
 On Thu, Apr 17, 2014 at 12:21 PM, Stephen Frost sfr...@snowman.net wrote:
  Ehhh.  No.  If it's a hot page that we've been holding in *our* cache
  long enough, the kernel will happily evict it as 'cold' from *its*
  cache, leading to...
 
 This is a whole nother problem.
 
 It is worrisome that we could be benchmarking the page replacement
 algorithm in Postgres and choose a page replacement algorithm that
 chooses pages that performs well because it tends to evict pages that
 are in the OS cache. And then one day (hopefully not too far off)
 we'll fix the double buffering problem and end up with a strange
 choice of page replacement algorithm.

That's certainly possible but I don't see the double buffering problem
going away any time particularly soon and, even if it does, it's likely
to either a) mean we're just using the kernel's cache (eg: something w/
mmap, etc), or b) will involve so many other changes that this will end
up getting changed anyway.  In any case, while I think we should
document any such cache management system we employ as having this risk,
I don't think we should worry about it terribly much.

 It also means that every benchmark is super sensitive to the how large
 a fraction of system memory Postgres is managing. If A benchmark of a
 page replacement algorithm with 3GB shared buffers might perform well
 compared to others on a system with 8GB or 32GB total RAM but actually
 be choosing pages very poorly in normal terms and perform terribly on
 a system with 4GB total ram.

I'm not following you here- benchmarks are already sensitive to how much
of the system's memory PG is managing (and how much ends up being
*dedicated* to PG's cache and therefore unavailable for other work).

 Ideally what I would like to see is instrumentation of Postgres's
 buffer pinning so we can generate various test loads and then just run
 the different algorithms on them and measure precisely how many page
 evictions it's causing and when how often it's choosing pages that
 need to be read in soon after and so on. We shouldn't have to run
 Postgres to get these counts at all, just run the algorithm as we read
 through a text file (or database table) listing the pages being
 accessed.

Go for it.  I'd love to see that also.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2014-04-17 Thread Peter Geoghegan
On Thu, Apr 17, 2014 at 9:52 AM, Bruce Momjian br...@momjian.us wrote:
 Where are we on this?

My hope is that I can get agreement on a way forward during pgCon. Or,
at the very least, explain the issues as I see them in a relatively
accessible and succinct way to those interested.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Andres Freund
On 2014-04-17 21:44:47 +0300, Heikki Linnakangas wrote:
 On 04/17/2014 09:38 PM, Stephen Frost wrote:
 * Greg Stark (st...@mit.edu) wrote:
 On Thu, Apr 17, 2014 at 12:21 PM, Stephen Frost sfr...@snowman.net wrote:
 Ehhh.  No.  If it's a hot page that we've been holding in *our* cache
 long enough, the kernel will happily evict it as 'cold' from *its*
 cache, leading to...
 
 This is a whole nother problem.
 
 It is worrisome that we could be benchmarking the page replacement
 algorithm in Postgres and choose a page replacement algorithm that
 chooses pages that performs well because it tends to evict pages that
 are in the OS cache. And then one day (hopefully not too far off)
 we'll fix the double buffering problem and end up with a strange
 choice of page replacement algorithm.
 
 That's certainly possible but I don't see the double buffering problem
 going away any time particularly soon and, even if it does, it's likely
 to either a) mean we're just using the kernel's cache (eg: something w/
 mmap, etc), or b) will involve so many other changes that this will end
 up getting changed anyway.  In any case, while I think we should
 document any such cache management system we employ as having this risk,
 I don't think we should worry about it terribly much.
 
 Note that if we somehow come up with a page replacement algorithm that tends
 to evict pages that are in the OS cache, we have effectively solved the
 double buffering problem. When a page is cached in both caches, evicting it
 from one of them eliminates the double buffering. Granted, you might prefer
 to evict it from the OS cache instead, and such an algorithm could be bad in
 other ways. But if a page replacement algorithm happens avoid double
 buffering, that's a genuine merit for that algorithm.

I don't think it's a good idea to try to synchronize algorithms with the
OSs. There's so much change about the caching logic in e.g. linux that
it won't stay effective for very long.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Stephen Frost
* Andres Freund (and...@2ndquadrant.com) wrote:
  Note that if we somehow come up with a page replacement algorithm that tends
  to evict pages that are in the OS cache, we have effectively solved the
  double buffering problem. When a page is cached in both caches, evicting it
  from one of them eliminates the double buffering. Granted, you might prefer
  to evict it from the OS cache instead, and such an algorithm could be bad in
  other ways. But if a page replacement algorithm happens avoid double
  buffering, that's a genuine merit for that algorithm.
 
 I don't think it's a good idea to try to synchronize algorithms with the
 OSs. There's so much change about the caching logic in e.g. linux that
 it won't stay effective for very long.

There's also more than one OS...

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Heikki Linnakangas

On 04/17/2014 09:38 PM, Stephen Frost wrote:

* Greg Stark (st...@mit.edu) wrote:

On Thu, Apr 17, 2014 at 12:21 PM, Stephen Frost sfr...@snowman.net wrote:

Ehhh.  No.  If it's a hot page that we've been holding in *our* cache
long enough, the kernel will happily evict it as 'cold' from *its*
cache, leading to...


This is a whole nother problem.

It is worrisome that we could be benchmarking the page replacement
algorithm in Postgres and choose a page replacement algorithm that
chooses pages that performs well because it tends to evict pages that
are in the OS cache. And then one day (hopefully not too far off)
we'll fix the double buffering problem and end up with a strange
choice of page replacement algorithm.


That's certainly possible but I don't see the double buffering problem
going away any time particularly soon and, even if it does, it's likely
to either a) mean we're just using the kernel's cache (eg: something w/
mmap, etc), or b) will involve so many other changes that this will end
up getting changed anyway.  In any case, while I think we should
document any such cache management system we employ as having this risk,
I don't think we should worry about it terribly much.


Note that if we somehow come up with a page replacement algorithm that 
tends to evict pages that are in the OS cache, we have effectively 
solved the double buffering problem. When a page is cached in both 
caches, evicting it from one of them eliminates the double buffering. 
Granted, you might prefer to evict it from the OS cache instead, and 
such an algorithm could be bad in other ways. But if a page replacement 
algorithm happens avoid double buffering, that's a genuine merit for 
that algorithm.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Merlin Moncure
On Thu, Apr 17, 2014 at 1:48 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-04-17 21:44:47 +0300, Heikki Linnakangas wrote:
 On 04/17/2014 09:38 PM, Stephen Frost wrote:
 * Greg Stark (st...@mit.edu) wrote:
 On Thu, Apr 17, 2014 at 12:21 PM, Stephen Frost sfr...@snowman.net wrote:
 Ehhh.  No.  If it's a hot page that we've been holding in *our* cache
 long enough, the kernel will happily evict it as 'cold' from *its*
 cache, leading to...
 
 This is a whole nother problem.
 
 It is worrisome that we could be benchmarking the page replacement
 algorithm in Postgres and choose a page replacement algorithm that
 chooses pages that performs well because it tends to evict pages that
 are in the OS cache. And then one day (hopefully not too far off)
 we'll fix the double buffering problem and end up with a strange
 choice of page replacement algorithm.
 
 That's certainly possible but I don't see the double buffering problem
 going away any time particularly soon and, even if it does, it's likely
 to either a) mean we're just using the kernel's cache (eg: something w/
 mmap, etc), or b) will involve so many other changes that this will end
 up getting changed anyway.  In any case, while I think we should
 document any such cache management system we employ as having this risk,
 I don't think we should worry about it terribly much.

 Note that if we somehow come up with a page replacement algorithm that tends
 to evict pages that are in the OS cache, we have effectively solved the
 double buffering problem. When a page is cached in both caches, evicting it
 from one of them eliminates the double buffering. Granted, you might prefer
 to evict it from the OS cache instead, and such an algorithm could be bad in
 other ways. But if a page replacement algorithm happens avoid double
 buffering, that's a genuine merit for that algorithm.

 I don't think it's a good idea to try to synchronize algorithms with the
 OSs. There's so much change about the caching logic in e.g. linux that
 it won't stay effective for very long.

No. but if you were very judicious, maybe you could hint the o/s
(posix_fadvise) about pages that are likely to stay hot that you don't
need them.

I doubt that's necessary though -- if the postgres caching algorithm
improves such that there is a better tendency for hot pages to stay in
s_b,  Eventually the O/S will deschedule the page for something else
that needs it.   In other words, otherwise preventable double
buffering is really a measurement of bad eviction policy because it
manifests in volatility of frequency accessed pages.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Peter Geoghegan
On Thu, Apr 17, 2014 at 11:53 AM, Merlin Moncure mmonc...@gmail.com wrote:
 No. but if you were very judicious, maybe you could hint the o/s
 (posix_fadvise) about pages that are likely to stay hot that you don't
 need them.

Mitsumasa KONDO wrote a patch like that. I don't think the results
were that promising, but things change quickly.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Stephen Frost
* Merlin Moncure (mmonc...@gmail.com) wrote:
 I doubt that's necessary though -- if the postgres caching algorithm
 improves such that there is a better tendency for hot pages to stay in
 s_b,  Eventually the O/S will deschedule the page for something else
 that needs it.   In other words, otherwise preventable double
 buffering is really a measurement of bad eviction policy because it
 manifests in volatility of frequency accessed pages.

I wonder if it would help to actually tell the OS to read in buffers
that we're *evicting*...  On the general notion that if the OS already
has them buffered then it's almost a no-op, and if it doesn't and it's
actually a 'hot' buffer that we're gonna need again shortly, the OS will
have it.

In other words, try to make the OS more like a secondary cache to ours
by encouraging it to cache things we're evicting.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Merlin Moncure
On Thu, Apr 17, 2014 at 2:00 PM, Stephen Frost sfr...@snowman.net wrote:
 * Merlin Moncure (mmonc...@gmail.com) wrote:
 I doubt that's necessary though -- if the postgres caching algorithm
 improves such that there is a better tendency for hot pages to stay in
 s_b,  Eventually the O/S will deschedule the page for something else
 that needs it.   In other words, otherwise preventable double
 buffering is really a measurement of bad eviction policy because it
 manifests in volatility of frequency accessed pages.

 I wonder if it would help to actually tell the OS to read in buffers
 that we're *evicting*...  On the general notion that if the OS already
 has them buffered then it's almost a no-op, and if it doesn't and it's
 actually a 'hot' buffer that we're gonna need again shortly, the OS will
 have it.

 In other words, try to make the OS more like a secondary cache to ours
 by encouraging it to cache things we're evicting.

 I don't think this would work unless we would keep some kind of
tracking information on the page itself which seems not worth a write
operation to do (maybe if the page is dirtied it could be snuck in
there though...).  IOW, it would only make sense to do this if we knew
that this page was likely to be read in again.  This might be true in
general on particular workloads but is probably a pretty flimsy
assumption without supporting evidence; probably better to let the O/S
deal with it.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes:
 I wonder if it would help to actually tell the OS to read in buffers
 that we're *evicting*...  On the general notion that if the OS already
 has them buffered then it's almost a no-op, and if it doesn't and it's
 actually a 'hot' buffer that we're gonna need again shortly, the OS will
 have it.

But if it's actually gone cold, you're just forcing unnecessary read I/O,
not to mention possibly causing something slightly warmer to be lost from
kernel cache.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Stephen Frost
* Merlin Moncure (mmonc...@gmail.com) wrote:
  I don't think this would work unless we would keep some kind of
 tracking information on the page itself which seems not worth a write
 operation to do (maybe if the page is dirtied it could be snuck in
 there though...).  IOW, it would only make sense to do this if we knew
 that this page was likely to be read in again.  This might be true in
 general on particular workloads but is probably a pretty flimsy
 assumption without supporting evidence; probably better to let the O/S
 deal with it.

The trouble is that we're ending up hiding the information from the OS
about the frequency of utilization of that page.  You have a good point
and we wouldn't want to do this for pages that are just accessed once or
similar, but perhaps just mark a page that's reached the 'max' as having
been 'hot' and then, for those pages, advise the OS that while we're
under pressure and need to push this page out, it was once pretty hottly
used and therefore we may want it again soon.

For pages that never reach the 'max' level, we wouldn't do anything on
the assumption that those were only temporairly needed.

Just some thoughts.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Stephen Frost sfr...@snowman.net writes:
  I wonder if it would help to actually tell the OS to read in buffers
  that we're *evicting*...  On the general notion that if the OS already
  has them buffered then it's almost a no-op, and if it doesn't and it's
  actually a 'hot' buffer that we're gonna need again shortly, the OS will
  have it.
 
 But if it's actually gone cold, you're just forcing unnecessary read I/O,
 not to mention possibly causing something slightly warmer to be lost from
 kernel cache.

Certainly possible- see the email I just sent about another thought
around this.

Obviously, none of these thoughts are really fully formed solutions and
are, instead, just speculation and ideas.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] How can we make beta testing better?

2014-04-17 Thread Merlin Moncure
On Tue, Apr 15, 2014 at 4:47 PM, Josh Berkus j...@agliodbs.com wrote:
 Hackers,

 I think 9.3 has given us evidence that our users aren't giving new
 versions of PostgreSQL substantial beta testing, or if they are, they
 aren't sharing the results with us.

 How can we make beta testing better and more effective?  How can we get
 more users to actually throw serious workloads at new versions and share
 the results?

 I've tried a couple of things over the last two years and they haven't
 worked all that well.  Since we're about to go into another beta testing
 period, we need something new.  Ideas?

I've seen lots of bugs reported and fixed in the beta period over the
years.  My take is that it's basically unrealistic to expect volunteer
beta testers to replace bone fide regression testing.

I think it's a pretty fair statement that we've had some QC issues in
the general area of replication technologies.  What this is indicating
to me is that replication needs substantially more coverage in 'make
check'.  Since I'm wishing for things, it would be nice to see an
expansion of the buildfarm so that we could [optionally] run various
performance tests as well as various replication scenarios.  Then we
could go back to users and say, please donate 'repeatable tests and
machines to run them on' and reap the long term value.

Not at all making light out of any of this...it's a huge project.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Merlin Moncure
On Thu, Apr 17, 2014 at 2:16 PM, Stephen Frost sfr...@snowman.net wrote:
 * Merlin Moncure (mmonc...@gmail.com) wrote:
  I don't think this would work unless we would keep some kind of
 tracking information on the page itself which seems not worth a write
 operation to do (maybe if the page is dirtied it could be snuck in
 there though...).  IOW, it would only make sense to do this if we knew
 that this page was likely to be read in again.  This might be true in
 general on particular workloads but is probably a pretty flimsy
 assumption without supporting evidence; probably better to let the O/S
 deal with it.

 The trouble is that we're ending up hiding the information from the OS
 about the frequency of utilization of that page.  You have a good point
 and we wouldn't want to do this for pages that are just accessed once or
 similar, but perhaps just mark a page that's reached the 'max' as having
 been 'hot' and then, for those pages, advise the OS that while we're
 under pressure and need to push this page out, it was once pretty hottly
 used and therefore we may want it again soon.

 For pages that never reach the 'max' level, we wouldn't do anything on
 the assumption that those were only temporairly needed.

yeah -- the thing is, we are already too spendy already on
supplemental write i/o (hint bits, visible bits, freezing, etc) and
likely not worth it to throw something else on the pile unless the
page is already dirty; the medium term trend in storage is that read
vs write performance is becoming increasingly asymmetric, particularly
on the random side so it's very unlikely to balance out.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Stephen Frost
On Thursday, April 17, 2014, Merlin Moncure mmonc...@gmail.com wrote:

 yeah -- the thing is, we are already too spendy already on
 supplemental write i/o (hint bits, visible bits, freezing, etc) and
 likely not worth it to throw something else on the pile unless the
 page is already dirty; the medium term trend in storage is that read
 vs write performance is becoming increasingly asymmetric, particularly
 on the random side so it's very unlikely to balance out.


Guess I wasn't clear but I was thinking to read the page in, not do any
writing, and do it in a asynchronous way to the process doing the evicting.

Thanks,

Stephen


Re: [HACKERS] Verbose output of pg_dump not show schema name

2014-04-17 Thread Fabrízio de Royes Mello
On Thu, Apr 17, 2014 at 12:46 PM, Bruce Momjian br...@momjian.us wrote:

 On Thu, Apr 17, 2014 at 11:44:37AM -0400, Tom Lane wrote:
  Bruce Momjian br...@momjian.us writes:
   The idea is that we only need quotes when there are odd characters in
   the identifier.  We do that right now in some places, though I can't
   find them in pg_dump.  I know psql does that, see quote_ident().
 
  I think our general style rule is that identifiers embedded in messages
  are always double-quoted.  There's an exception for type names, but
  not otherwise.  You're confusing the message case with printing SQL.

 OK.  I was unclear if a status _display_ was a message like an error
 message.


The attached patch fix missing double-quoted in dumping contents of
table.. message and add schema name to other messages:
- reading indexes for table \%s\.\%s\\n
- reading foreign key constraints for table \%s\.\%s\\n
- reading triggers for table \%s\.\%s\\n
- finding the columns and types of table \%s\.\%s\\n
- finding default expressions of table \%s\.\%s\\n
- finding check constraints for table \%s\.\%s\\n

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 9464540..7f73e8d 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -706,8 +706,8 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te,
 	_becomeOwner(AH, te);
 	_selectOutputSchema(AH, te-namespace);
 
-	ahlog(AH, 1, processing data for table \%s\\n,
-		  te-tag);
+	ahlog(AH, 1, processing data for table \%s\.\%s\\n,
+		  AH-currSchema, te-tag);
 
 	/*
 	 * In parallel restore, if we created the table earlier in
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index a6c0428..78ec5bf 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -1400,7 +1400,17 @@ dumpTableData_copy(Archive *fout, void *dcontext)
 	const char *column_list;
 
 	if (g_verbose)
-		write_msg(NULL, dumping contents of table %s\n, classname);
+	{
+		/* Print namespace information if available */
+		if (tbinfo-dobj.namespace != NULL 
+			tbinfo-dobj.namespace-dobj.name != NULL)
+			write_msg(NULL, dumping contents of table \%s\.\%s\\n,
+	  tbinfo-dobj.namespace-dobj.name,
+	  classname);
+		else
+			write_msg(NULL, dumping contents of table \%s\\n,
+	  classname);
+	}
 
 	/*
 	 * Make sure we are in proper schema.  We will qualify the table name
@@ -4974,8 +4984,17 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
 			continue;
 
 		if (g_verbose)
-			write_msg(NULL, reading indexes for table \%s\\n,
-	  tbinfo-dobj.name);
+		{
+			/* Print namespace information if available */
+			if (tbinfo-dobj.namespace != NULL 
+tbinfo-dobj.namespace-dobj.name != NULL)
+write_msg(NULL, reading indexes for table \%s\.\%s\\n,
+		  tbinfo-dobj.namespace-dobj.name,
+		  tbinfo-dobj.name);
+			else
+write_msg(NULL, reading indexes for table \%s\\n,
+		  tbinfo-dobj.name);
+		}
 
 		/* Make sure we are in proper schema so indexdef is right */
 		selectSourceSchema(fout, tbinfo-dobj.namespace-dobj.name);
@@ -5340,8 +5359,17 @@ getConstraints(Archive *fout, TableInfo tblinfo[], int numTables)
 			continue;
 
 		if (g_verbose)
-			write_msg(NULL, reading foreign key constraints for table \%s\\n,
-	  tbinfo-dobj.name);
+		{
+			/* Print namespace information if available */
+			if (tbinfo-dobj.namespace != NULL 
+tbinfo-dobj.namespace-dobj.name != NULL)
+write_msg(NULL, reading foreign key constraints for table \%s\.\%s\\n,
+		  tbinfo-dobj.namespace-dobj.name,
+		  tbinfo-dobj.name);
+			else
+write_msg(NULL, reading foreign key constraints for table \%s\\n,
+		  tbinfo-dobj.name);
+		}
 
 		/*
 		 * select table schema to ensure constraint expr is qualified if
@@ -5678,8 +5706,17 @@ getTriggers(Archive *fout, TableInfo tblinfo[], int numTables)
 			continue;
 
 		if (g_verbose)
-			write_msg(NULL, reading triggers for table \%s\\n,
-	  tbinfo-dobj.name);
+		{
+			/* Print namespace information if available */
+			if (tbinfo-dobj.namespace != NULL 
+tbinfo-dobj.namespace-dobj.name != NULL)
+write_msg(NULL, reading triggers for table \%s\.\%s\\n,
+		  tbinfo-dobj.namespace-dobj.name,
+		  tbinfo-dobj.name);
+			else
+write_msg(NULL, reading triggers for table \%s\\n,
+		  tbinfo-dobj.name);
+		}
 
 		/*
 		 * select table schema to ensure regproc name is qualified if needed
@@ -6291,8 +6328,17 @@ getTableAttrs(Archive *fout, TableInfo *tblinfo, int numTables)
 		 * the output of an indexscan on pg_attribute_relid_attnum_index.
 		 */
 		if (g_verbose)
-			write_msg(NULL, finding the columns and types of table 

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Merlin Moncure
On Thu, Apr 17, 2014 at 2:28 PM, Stephen Frost sfr...@snowman.net wrote:


 On Thursday, April 17, 2014, Merlin Moncure mmonc...@gmail.com wrote:

 yeah -- the thing is, we are already too spendy already on
 supplemental write i/o (hint bits, visible bits, freezing, etc) and
 likely not worth it to throw something else on the pile unless the
 page is already dirty; the medium term trend in storage is that read
 vs write performance is becoming increasingly asymmetric, particularly
 on the random side so it's very unlikely to balance out.

 Guess I wasn't clear but I was thinking to read the page in, not do any
 writing, and do it in a asynchronous way to the process doing the evicting.

no -- I got you. My point was, that's a pure guess unless you base it
on evidence recorded on the page itself.  Without that evidence,
(which requires writing) the operating is in a a better place to make
that guess so it's probably better to defer that decision.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Stephen Frost
On Thursday, April 17, 2014, Merlin Moncure mmonc...@gmail.com wrote:

 no -- I got you. My point was, that's a pure guess unless you base it
 on evidence recorded on the page itself.  Without that evidence,
 (which requires writing) the operating is in a a better place to make
 that guess so it's probably better to defer that decision.


Well, we'd only need that info to be stored in the buffer cache somehow-
wouldn't have to go to disk or cause more I/O, of course. My thinking was
that we could track it with the existing counter too, avoiding even that
small amount of locking to write to the buffer page.

Thanks,

Stephen


Re: [HACKERS] New windows compiler warning from 585bca39

2014-04-17 Thread Tom Lane
David Rowley dgrowle...@gmail.com writes:
 I've attached a tiny patch that fixes a new compiler warning on the windows
 build...

Applied, thanks.

 Perhaps the #ifndef could be placed in a nicer spot in the patch, but the
 attached should at least describe where the problem lies...

Yeah, I thought it better to make a separate declaration to wrap in
#ifndef.  pgindent is probably going to insist on adding some vertical
whitespace around the #if, and that'll look horrid if it's just in the
middle of a list of variables.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] assertion in 9.4 with wal_level=logical

2014-04-17 Thread Steve Singer

With master/9.4 from today (52e757420fa98a76015c2c88432db94269f3e8f4)

I am getting an assertion when doing a truncate via SPI when I have 
wal_level=logical.


Stack trace is below.

I am just replicating a table with normal slony (2.2) I don't need to 
establish any replication slots to get this.





(gdb) where
#0  0x7fc9b4f58295 in __GI_raise (sig=sig@entry=6)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x7fc9b4f5b438 in __GI_abort () at abort.c:90
#2  0x007a10f7 in ExceptionalCondition (
conditionName=conditionName@entry=0x955d90 !(CritSectionCount == 0 
|| (CurrentMemoryContext) == ErrorContext || (MyAuxProcType == 
CheckpointerProcess)),

errorType=errorType@entry=0x7da7b0 FailedAssertion,
fileName=fileName@entry=0x955a2e mcxt.c, 
lineNumber=lineNumber@entry=670)

at assert.c:54
#3  0x007c3090 in palloc (size=16) at mcxt.c:670
#4  0x004dd83f in mXactCacheGetById (members=0x7fff679a3d18, 
multi=58)

at multixact.c:1411
#5  GetMultiXactIdMembers (multi=58, members=members@entry=0x7fff679a3d18,
allow_old=allow_old@entry=0 '\000') at multixact.c:1080
#6  0x0049e43f in MultiXactIdGetUpdateXid (xmax=optimized out,
t_infomask=optimized out) at heapam.c:6042
#7  0x004a1ccc in HeapTupleGetUpdateXid (tuple=optimized out)
at heapam.c:6083
#8  0x007cf7d9 in HeapTupleHeaderGetCmax 
(tup=tup@entry=0x7fc9ac838e38)

at combocid.c:122
#9  0x0049eb98 in log_heap_new_cid (
relation=relation@entry=0x7fc9b5a67dc0, tup=tup@entry=0x7fff679a3ea0)
at heapam.c:7047
#10 0x004a48a5 in heap_update 
(relation=relation@entry=0x7fc9b5a67dc0,

otid=otid@entry=0x2678c6c, newtup=newtup@entry=0x2678c68, cid=26,
crosscheck=crosscheck@entry=0x0, wait=wait@entry=1 '\001',
hufd=hufd@entry=0x7fff679a4080, lockmode=lockmode@entry=0x7fff679a407c)
at heapam.c:3734
#11 0x004a5842 in simple_heap_update (
relation=relation@entry=0x7fc9b5a67dc0, otid=otid@entry=0x2678c6c,
tup=tup@entry=0x2678c68) at heapam.c:4010
#12 0x00797cf7 in RelationSetNewRelfilenode (
relation=relation@entry=0x7fc9ab270b68, freezeXid=19459,
minmulti=minmulti@entry=58) at relcache.c:2956
#13 0x0059ddde in ExecuteTruncate (stmt=0x3a, stmt@entry=0x2678a58)
at tablecmds.c:1187
#14 0x006c3870 in standard_ProcessUtility (parsetree=0x2678a58,
queryString=optimized out, context=optimized out, params=0x0,
dest=optimized out, completionTag=optimized out) at utility.c:515
#15 0x005e79d9 in _SPI_execute_plan 
(plan=plan@entry=0x7fff679a4320,

paramLI=paramLI@entry=0x0, snapshot=snapshot@entry=0x0,
crosscheck_snapshot=crosscheck_snapshot@entry=0x0,
read_only=read_only@entry=0 '\000',
fire_triggers=fire_triggers@entry=1 '\001', tcount=tcount@entry=0)
at spi.c:2171
#16 0x005e83c1 in SPI_execute (
#16 0x005e83c1 in SPI_execute (
---Type return to continue, or q return to quit---
src=src@entry=0x25bde90 truncate only \disorder\.\do_restock\,
read_only=0 '\000', tcount=tcount@entry=0) at spi.c:386


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Peter Geoghegan
On Thu, Apr 17, 2014 at 8:10 AM, Greg Stark st...@mit.edu wrote:
 I don't think common sense is compelling. I think you need to pin
 down exactly what it is about btree intermediate pages that the LRU
 isn't capturing and not just argue they're more useful. The LRU is
 already capturing which pages are more heavily used than others so you
 need to identify what it is that makes index pages *even more* useful
 than their frequency and recency of access indicates. Not just that
 they're more useful than an average page.

See example 1.1 within the LRU-K paper.

 So what I think is missing is that indexes are always accessed from
 the root down to the leaf. So the most recent page accessed will
 always be the leaf. And in whatever chain of pages was used to reach
 the last leaf page the least recently accessed will always be the
 root. But we'll need the root page again on the subsequent descent
 even if it's to reach the same leaf page we kept in ram in preference
 to it.

I can't imagine that this is much of a problem in practice. Consider
the break-down of pages within indexes when pgbench scale is 5,000, as
in my original benchmark:

[local] pg@pgbench=# with tots as (
SELECT count(*) c, type, relname from
(select relname, relpages, generate_series(1, relpages - 1) i
from pg_class c join pg_namespace n on c.relnamespace = n.oid where
relkind = 'i' and nspname = 'public') r,
lateral (select * from bt_page_stats(relname, i)) u
group by relname, type)
select tots.relname, relpages -1 as non_meta_pages, c, c/sum(c)
over(partition by tots.relname) as prop_of_index, type from tots join
pg_class c on c.relname = tots.relname order by 2 desc, 1, type;

relname| non_meta_pages |c|
prop_of_index| type
---++-++--
 pgbench_accounts_pkey |1370950 |4828 |
0.00352164557423684307 | i
 pgbench_accounts_pkey |1370950 | 1366121 |
0.99647762500455888253 | l
 pgbench_accounts_pkey |1370950 |   1 |
0.00729421204274408257 | r
 pgbench_tellers_pkey  |274 | 273 |
0.99635036496350364964 | l
 pgbench_tellers_pkey  |274 |   1 |
0.00364963503649635036 | r
 pgbench_branches_pkey | 28 |  27 |
0.96428571428571428571 | l
 pgbench_branches_pkey | 28 |   1 |
0.03571428571428571429 | r
(7 rows)

Time: 14562.297 ms

Just over 99.6% of pages (leaving aside the meta page) in the big 10
GB pgbench_accounts_pkey index are leaf pages. The inner pages and
root page are at an enormous advantage. In this example, the other
indexes don't even have what would be separately classified as an
inner page (and not a root page) at all, because it's perfectly
sufficient to only have a root page to get to any one of, say, 273
leaf pages (in the case of pgbench_tellers_pkey here).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] assertion in 9.4 with wal_level=logical

2014-04-17 Thread Andres Freund
Hi,

On 2014-04-17 16:23:54 -0400, Steve Singer wrote:
 With master/9.4 from today (52e757420fa98a76015c2c88432db94269f3e8f4)
 
 I am getting an assertion when doing a truncate via SPI when I have
 wal_level=logical.
 
 Stack trace is below.
 
 I am just replicating a table with normal slony (2.2) I don't need to
 establish any replication slots to get this.

Uh, that's somewhat nasty... You probably only get that because of
slony's habit of share locking catalogs. Could that be?

For now, to circumvent the problem you could just revert
4a170ee9e0ebd7021cb1190fabd5b0cbe2effb8e for now.

I'll look into fixing it properly.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Andres Freund
On 2014-04-17 13:33:27 -0700, Peter Geoghegan wrote:
 Just over 99.6% of pages (leaving aside the meta page) in the big 10
 GB pgbench_accounts_pkey index are leaf pages.

That's a rather nice number. I knew it was big, but I'd have guessed
it'd be a percent lower.

Do you happen to have the same stat handy for a sensibly wide text or
numeric real world index? It'd be interesting to see what the worst case
there is.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] assertion in 9.4 with wal_level=logical

2014-04-17 Thread Alvaro Herrera
Steve Singer wrote:
 With master/9.4 from today (52e757420fa98a76015c2c88432db94269f3e8f4)
 
 I am getting an assertion when doing a truncate via SPI when I have
 wal_level=logical.
 
 Stack trace is below.
 
 I am just replicating a table with normal slony (2.2) I don't need
 to establish any replication slots to get this.

For once, this looks more like a problem in logical decoding, which is
trying to assert about the tuple being updated; the assertion failing is
the one added a week ago about not palloc'ing in a critical section.

Andres told me on IM that there's another weird thing going on (which is
how the catalog tuple gets a multixact in the first place) which is that
Slony does a SELECT FOR SHARE in the catalog tuple.

One simple approach would be to just disable that particular assert when
in a critical section.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] DISCARD ALL (Again)

2014-04-17 Thread Joshua D. Drake


Hello,

Over at my quaint establishment we have been working on some plython 
work that makes use of GD. We wrote this code with the assumption (per 
the docs) that when you issued a DISCARD ALL, the GD would be cleared. 
Apparently this is not the case. The docs themselves are clearly wrong, 
and this has been brought up before multiple times (that DISCARD ALL 
doesn't do what people think it does).


Does it seem reasonable based on the docs:

DISCARD ALL:

Releases all temporary resources associated with the current session and 
resets the session to its initial state.


That we should also release the GD?

Sincerely,

Joshua D. Drake


--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
Political Correctness is for cowards.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Peter Geoghegan
On Thu, Apr 17, 2014 at 1:39 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-04-17 13:33:27 -0700, Peter Geoghegan wrote:
 Just over 99.6% of pages (leaving aside the meta page) in the big 10
 GB pgbench_accounts_pkey index are leaf pages.

 That's a rather nice number. I knew it was big, but I'd have guessed
 it'd be a percent lower.

Yes, it's usually past 99.5% for int4. It's really bad if it's as low
as 96%, and I think that often points to what are arguably bad
indexing choices, like indexing text columns that have long text
strings.

 Do you happen to have the same stat handy for a sensibly wide text or
 numeric real world index? It'd be interesting to see what the worst case
 there is.

Yes, as it happens I do:
http://www.postgresql.org/message-id/CAM3SWZTcXrdDZSpA11qZXiyo4_jtxwjaNdZpnY54yjzq7d64=a...@mail.gmail.com

I was working of my Mouse Genome database, which is actually
real-world data use by medical researchers, stored in a PostgreSQL
database by those researchers and made available for the benefit of
other medical researchers.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] assertion in 9.4 with wal_level=logical

2014-04-17 Thread Andres Freund
On 2014-04-17 17:40:01 -0300, Alvaro Herrera wrote:
 For once, this looks more like a problem in logical decoding, which is
 trying to assert about the tuple being updated; the assertion failing is
 the one added a week ago about not palloc'ing in a critical section.

It's this (older) assertion in HeapTupleHeaderGetCmax():


Assert(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tup)));

That can allocate memory if xmax is a multixact... Does anybody have a
better idea to solve this than adding a CritSectionCount == 0  in
there?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Peter Geoghegan
On Thu, Apr 17, 2014 at 1:33 PM, Peter Geoghegan p...@heroku.com wrote:
 I can't imagine that this is much of a problem in practice.

Although I will add that not caching highly useful inner pages for the
medium term, because that index isn't being used at all for 5 minutes
probably is very bad. Using the 4,828 buffers that it would take to
store all the inner pages (as in my large primary index example) to go
store something else is probably penny wise and pound foolish.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] assertion in 9.4 with wal_level=logical

2014-04-17 Thread Steve Singer

On 04/17/2014 04:33 PM, Andres Freund wrote:

Hi,

On 2014-04-17 16:23:54 -0400, Steve Singer wrote:

With master/9.4 from today (52e757420fa98a76015c2c88432db94269f3e8f4)

I am getting an assertion when doing a truncate via SPI when I have
wal_level=logical.

Stack trace is below.

I am just replicating a table with normal slony (2.2) I don't need to
establish any replication slots to get this.

Uh, that's somewhat nasty... You probably only get that because of
slony's habit of share locking catalogs. Could that be?


Yes slony does a select from pg_catalog and pg_namespace  in  the stored 
function just before doing the truncate.




For now, to circumvent the problem you could just revert
4a170ee9e0ebd7021cb1190fabd5b0cbe2effb8e for now.

I'll look into fixing it properly.

Greetings,

Andres Freund






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DISCARD ALL (Again)

2014-04-17 Thread Josh Berkus
On 04/17/2014 01:44 PM, Joshua D. Drake wrote:
 Does it seem reasonable based on the docs:
 
 DISCARD ALL:
 
 Releases all temporary resources associated with the current session and
 resets the session to its initial state.
 
 That we should also release the GD?

It does, but that's a feature request, not a bug.  Same with pl/perl.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: iff - if

2014-04-17 Thread Andreas 'ads' Scherbaum

On 04/17/2014 12:33 PM, Nicolas Barbier wrote:

2014-04-17 Michael Paquier michael.paqu...@gmail.com:


Is there no equivalent in German? For example in French there is ssi.


gdw (genau dann, wenn)


More likely that you see

 \equiv

or:

 \leftrightarrow


Regards,

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How can we make beta testing better?

2014-04-17 Thread Jeff Janes
On Tue, Apr 15, 2014 at 2:47 PM, Josh Berkus j...@agliodbs.com wrote:

 Hackers,

 I think 9.3 has given us evidence that our users aren't giving new
 versions of PostgreSQL substantial beta testing, or if they are, they
 aren't sharing the results with us.


A lot of the bugs that turned up are not the kind I would expect to have
been found in most beta testing done by non-hacking users.  Weren't they
mostly around rare race conditions, crash recovery, and freezing?




 How can we make beta testing better and more effective?  How can we get
 more users to actually throw serious workloads at new versions and share
 the results?


If we are interested in positive results as well as negative, we should
change https://wiki.postgresql.org/wiki/HowToBetaTest

pgsql-hackers: bugs, questions, and successful test reports are welcome
here if you are already subscribed to pgsql-hackers. Note that
pgsql-hackers is a high-traffic mailing list with a lot of development
discussion.

So successful reports are welcome, provided that you are willing to
subscribe to a list that generates tons of noise you won't understand.
 That doesn't sound all that welcoming.  (I already am subscribed, but I
still usually don't report successful tests, because yeah, I did a bunch
of stuff, and nothing failed in an obvious way just doesn't sound very
useful, and it is hard to get motivated to write up an exhaustive
description of a test that doesn't prove anything anyway--maybe if I did
for a few more hours, it would have found a problem.)

If we want to know how much beta testing is really going on, perhaps we
could do a survey asking people whether they did any beta testing, and if
so whether they reported the results.  Otherwise it would be hard to
distinguish We aren't doing enough testing from We do lots of testing,
but it isn't strenuous enough to find the problems, or is testing the wrong
aspects of the system.

Cheers,

Jeff


Re: [HACKERS] How can we make beta testing better?

2014-04-17 Thread Jan Wieck

On 04/17/14 15:16, Merlin Moncure wrote:

On Tue, Apr 15, 2014 at 4:47 PM, Josh Berkus j...@agliodbs.com wrote:

Hackers,

I think 9.3 has given us evidence that our users aren't giving new
versions of PostgreSQL substantial beta testing, or if they are, they
aren't sharing the results with us.

How can we make beta testing better and more effective?  How can we get
more users to actually throw serious workloads at new versions and share
the results?

I've tried a couple of things over the last two years and they haven't
worked all that well.  Since we're about to go into another beta testing
period, we need something new.  Ideas?


I've seen lots of bugs reported and fixed in the beta period over the
years.  My take is that it's basically unrealistic to expect volunteer
beta testers to replace bone fide regression testing.

I think it's a pretty fair statement that we've had some QC issues in
the general area of replication technologies.  What this is indicating
to me is that replication needs substantially more coverage in 'make
check'.  Since I'm wishing for things, it would be nice to see an
expansion of the buildfarm so that we could [optionally] run various
performance tests as well as various replication scenarios.  Then we
could go back to users and say, please donate 'repeatable tests and
machines to run them on' and reap the long term value.

Not at all making light out of any of this...it's a huge project.


The problem with testing replication is that it doesn't fit well into 
our standard regression testing. There are way too many moving parts as 
well as dependencies on the underlying OS and network topology.


You will discover a ton of race conditions once you actually move from 
testing with multiple postmasters (so you can kill one) on the same box 
to using multiple virtual machines and instead of completely severing a 
network connection using some packet shaping/filtering to introduce 
packet loss, limited bandwidth, async routing and so on. At least that 
is my experience from throwing that sort of sh*t at Slony at full speed.


Not trying to discourage anyone from trying. Just saying that it doesn't 
fit into our existing regression test framework.



Jan

--
Jan Wieck
Senior Software Engineer
http://slony.info


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DISCARD ALL (Again)

2014-04-17 Thread Joshua D. Drake


On 04/17/2014 02:17 PM, Josh Berkus wrote:


On 04/17/2014 01:44 PM, Joshua D. Drake wrote:

Does it seem reasonable based on the docs:

DISCARD ALL:

Releases all temporary resources associated with the current session and
resets the session to its initial state.

That we should also release the GD?


It does, but that's a feature request, not a bug.  Same with pl/perl.



Well I would argue it is a documentation bug/error and that yes, we have 
a legitimate feature request for DISCARD ALL to clear the GD.


Sincerely,

JD

--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
Political Correctness is for cowards.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DISCARD ALL (Again)

2014-04-17 Thread Alvaro Herrera
Joshua D. Drake wrote:
 
 On 04/17/2014 02:17 PM, Josh Berkus wrote:
 
 On 04/17/2014 01:44 PM, Joshua D. Drake wrote:
 Does it seem reasonable based on the docs:
 
 DISCARD ALL:
 
 Releases all temporary resources associated with the current session and
 resets the session to its initial state.
 
 That we should also release the GD?
 
 It does, but that's a feature request, not a bug.  Same with pl/perl.
 
 
 Well I would argue it is a documentation bug/error and that yes, we
 have a legitimate feature request for DISCARD ALL to clear the GD.

It does sounds a legitimate feature request to me.  I don't remember if
we honored the request to add resetting of cached sequences, though; if
we didn't, this one is probably going to be tough too.

Another point is that to implement this I think there will need to be
another per-PL entry point to discard session data; are we okay with
that?  Since this probably means a new column in pg_language, we
couldn't even consider the idea of back-patching.  Unless we add a hook,
which is registered in the PL's _PG_init()?

Are we going to backpatch a doc change that says releases all temporary
resources, except for plptyhon's and plperl's GD?  Surely not ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DISCARD ALL (Again)

2014-04-17 Thread David G Johnston
Alvaro Herrera-9 wrote
 Are we going to backpatch a doc change that says releases all temporary
 resources, except for plptyhon's and plperl's GD?  Surely not ...

GD = Global Dictionary

I don't see why something like the following wouldn't have value.

For those languages that make use of a Global Dictionary (or other global)
construct to maintain state across function calls the content of these
global constructs may not be reset upon a call to DISCARD ALL - please see
your language's documentation for specifics..

Then, more attention should be made within the language sections both to the
constructs in general and also their limitations in the face of DISCARD ALL.

Note, the session-level GD construct is not informed of the execution of
DISCARD ALL and thus is not cleared when such is executed.

It would be helpful to note any work-around options as well.

It sounds like, generally, PostgreSQL should promise to notify each
language/session of a pending DISCARD ALL and then leave it up to the
language to define how it will respond.  The documentation should reflect
this promise/action division-of-responsibilities.

David J.












--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/DISCARD-ALL-Again-tp5800623p5800640.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DISCARD ALL (Again)

2014-04-17 Thread Fabrízio de Royes Mello
On Thu, Apr 17, 2014 at 6:51 PM, Alvaro Herrera alvhe...@2ndquadrant.com
wrote:

 It does sounds a legitimate feature request to me.  I don't remember if
 we honored the request to add resetting of cached sequences, though; if
 we didn't, this one is probably going to be tough too.


+1


 Another point is that to implement this I think there will need to be
 another per-PL entry point to discard session data; are we okay with
 that?  Since this probably means a new column in pg_language, we
 couldn't even consider the idea of back-patching.  Unless we add a hook,
 which is registered in the PL's _PG_init()?


This week I had some similar trouble, but using dblink and pgbouncer.
As expected DISCARD ALL don't clear the extension resources.

I was thinking if is possible to every extension register his own discard
procedure and then the DISCARD ALL can execute all registered extension
cleanup procedures. Makes sense?


 Are we going to backpatch a doc change that says releases all temporary
 resources, except for plptyhon's and plperl's GD?  Surely not ...


Maybe this doc can be like that:

releases all temporary resources, except for extensions

Grettings,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello


Re: [HACKERS] How can we make beta testing better?

2014-04-17 Thread Josh Berkus
On 04/15/2014 09:53 PM, Rod Taylor wrote:
 A documented beta test process/toolset which does the following would help:
 1) Enables full query logging
 2) Creates a replica of a production DB, record $TIME when it stops.
 3) Allow user to make changes (upgrade to 9.4, change hardware, change
 kernel settings, ...)
 4) Plays queries from the CSV logs starting from $TIME mimicking actual
 timing and transaction boundaries
 
 If Pg can make it easy to duplicate activities currently going on in
 production inside another environment, I would be pleased to fire a couple
 billion queries through it over the next few weeks.
 
 #4 should include reporting useful to the project, such as a sampling of
 queries which performed significantly worse and a few relative performance
 stats for overall execution time.

So we have some software we've been procrastinating on OSS'ing, which does:

1) Takes full query CSV logs from a running postgres instance
2) Runs them against a target instance in parallel
3) Records response times for all queries

tsung and pgreplay also do this, but have some limitations which make
them impractical for a general set of logs to replay.

What it would need is:

A) scripting around coordinated backups
B) Scripting for single-command runs, including changing pg.conf to
record data.
C) tools to *analyze* the output data, including error messages.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] assertion failure 9.3.4

2014-04-17 Thread Josh Berkus
All,

So have encountered a 2nd report of this issue, or of an issue which
sounds very similar:

- corruption in two queue tables

- the tables are written in a high-concurrency, lock-contested environment

- user uses SELECT FOR UPDATE with these tables.

- pg_stat_statements .so is loaded, but the extension is not installed

- four rows were added to the queue tables, while not being added to the
PK index.  This allowed duplicate PKs to be added.

Currently the user is testing not loading the pg_stat_statements.so to
see if the problem goes away.  They have a destruction test environment,
so we should be able to confirm/deny in a couple days.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Verbose output of pg_dump not show schema name

2014-04-17 Thread Michael Paquier
On Fri, Apr 18, 2014 at 4:29 AM, Fabrízio de Royes Mello
fabriziome...@gmail.com wrote:

 On Thu, Apr 17, 2014 at 12:46 PM, Bruce Momjian br...@momjian.us wrote:

 On Thu, Apr 17, 2014 at 11:44:37AM -0400, Tom Lane wrote:
  Bruce Momjian br...@momjian.us writes:
   The idea is that we only need quotes when there are odd characters in
   the identifier.  We do that right now in some places, though I can't
   find them in pg_dump.  I know psql does that, see quote_ident().
 
  I think our general style rule is that identifiers embedded in messages
  are always double-quoted.  There's an exception for type names, but
  not otherwise.  You're confusing the message case with printing SQL.

 OK.  I was unclear if a status _display_ was a message like an error
 message.


 The attached patch fix missing double-quoted in dumping contents of
 table.. message and add schema name to other messages:
 - reading indexes for table \%s\.\%s\\n
 - reading foreign key constraints for table \%s\.\%s\\n
 - reading triggers for table \%s\.\%s\\n

 - finding the columns and types of table \%s\.\%s\\n
 - finding default expressions of table \%s\.\%s\\n
 - finding check constraints for table \%s\.\%s\\n
Cool additions. There may be a more elegant way to check if namespace
is NULL, but I couldn't come up with one myself. So patch may be fine.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DISCARD ALL (Again)

2014-04-17 Thread Tom Lane
Joshua D. Drake j...@commandprompt.com writes:
 Does it seem reasonable based on the docs:
 DISCARD ALL:
 Releases all temporary resources associated with the current session and 
 resets the session to its initial state.
 That we should also release the GD?

There are a couple of reasons why this isn't anywhere near as
open-and-shut as you seem to think:

1. The things that DISCARD ALL resets are explicitly enumerated in its
documentation page; it is not an open-ended promise to clean up anything
anybody happens to think of.  In particular, I would object to adding any
such functionality that doesn't correspond to a more specific form of
DISCARD, RESET, or whatever.

2. While I'm no Python expert, I believe GD is just a specific instance
of a general capability for global state in Python.  Are we going to
promise that any and all user-created data inside Python goes away?
What about other PLs?  Will users thank us if this suddenly starts
happening?

3. Since PL/Python is a superuser-only language, destroying either all
global state or just GD should require superuser privileges.  It will
definitely break things to make DISCARD ALL require superuser privs.
It's not very much more attractive to say that it only destroys Python
state if issued by a superuser.

We could in fact implement #2, I imagine, by destroying and recreating
the entire language interpreter.  So I could imagine implementing a
DISCARD INTERPRETERS kind of command that would zap the current
interpreter(s) for whichever PL languages happened to feel like
cooperating with the command.  But I'm not at all sure about the
superuser-privileges aspect of that.

More: IIRC, in at least some of the PLs, the interpreters are
SQL-user-specific.  Should a non-superuser session user have the ability
to destroy state in interpreters running for other user IDs (presumably,
state created by SECURITY DEFINER functions)?  I'd guess not.

On the whole I'm not sure this is something we ought to get into.
If you really need a fresh session, maybe you should start a
fresh session.  But if we are going to get into it, I think the
permissions aspect needs some careful thought.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DISCARD ALL (Again)

2014-04-17 Thread Joshua D. Drake


On 04/17/2014 05:24 PM, Tom Lane wrote:

Joshua D. Drake j...@commandprompt.com writes:

Does it seem reasonable based on the docs:
DISCARD ALL:
Releases all temporary resources associated with the current session and
resets the session to its initial state.
That we should also release the GD?


There are a couple of reasons why this isn't anywhere near as
open-and-shut as you seem to think:

1. The things that DISCARD ALL resets are explicitly enumerated in its
documentation page; it is not an open-ended promise to clean up anything
anybody happens to think of.


Actually, it is. Unless we change the wording in the documentation 
because right now, DISCARD ALL states that it will return the session 
back to its initial state. It does not do that.


[... snipping technical details for those who would be better at arguing 
than I ... ]




On the whole I'm not sure this is something we ought to get into.
If you really need a fresh session, maybe you should start a
fresh session.


This isn't nearly as simple as it sounds. That said, the work around we 
came up with was to have server_reset_query in pg_bouncer call a 
function called discard_gd() that explicitly discards the GD for that 
session.



But if we are going to get into it, I think the
permissions aspect needs some careful thought.



Agreed.

Sincerely,

JD

--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
Political Correctness is for cowards.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DISCARD ALL (Again)

2014-04-17 Thread Tom Lane
Joshua D. Drake j...@commandprompt.com writes:
 On 04/17/2014 05:24 PM, Tom Lane wrote:
 1. The things that DISCARD ALL resets are explicitly enumerated in its
 documentation page; it is not an open-ended promise to clean up anything
 anybody happens to think of.

 Actually, it is. Unless we change the wording in the documentation 

Well, we definitely need to tweak the short description of DISCARD ALL.
But the long description a bit further down the page is quite clear.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] assertion failure 9.3.4

2014-04-17 Thread Peter Geoghegan
On Thu, Apr 17, 2014 at 7:15 AM, Andrew Dunstan and...@dunslane.net wrote:
  track_activity_query_size = 10240
shared_preload_libraries = 'auto_explain,pg_stat_statements'

 As you can see, auto_explain's log_min_duration hasn't been set, so it
 shouldn't be doing anything very much, I should think.

track_activity_query_size is 10240? Not 1024?


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Buildfarm master-next branch?

2014-04-17 Thread Craig Ringer
On 04/17/2014 10:38 PM, Tom Lane wrote:
 IMO the best single thing that could happen for the buildfarm is if
 we had more critters (at least twice as many) running a wider variety of
 platforms, compilers, and configuration options than there are today.
 More frequent runs would come out of that automatically.

I'll be bringing up a new Windows buildfarm member once I've got a
current project knocked off. It's a pretty fast dedicated Windows Server
2012 box with a wide range of SDKs on it that can do 32-bit and 64-bit
builds.

Should help a little.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How can we make beta testing better?

2014-04-17 Thread Greg Stark
On Thu, Apr 17, 2014 at 5:26 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 A lot of the bugs that turned up are not the kind I would expect to have
 been found in most beta testing done by non-hacking users.  Weren't they
 mostly around rare race conditions, crash recovery, and freezing?

Actually I was struck by how the bugs in 9.3 were the kinds of bugs
that should have turned up pretty quickly by user testing. They
certainly turned up pretty quickly after users put their production
applications on it. They *didn't* require rare race conditions, just
certain patterns of workloads for long enough to reliably reproduce.

They were specifically *not* the kinds of bugs that regression testing
would have found. Regression testing only finds bugs you anticipate
and think to put in the specification of correct behaviour. If you had
thought of these problems you would have tested them manually and in
any case you would have seen the omissions immediately on inspected
the code.

Crash recovery and freezing aren't rare things once you have hot
standbys everywhere and run 24x7 applications (or load tests) on your
systems. We could make freezing more frequent by having a mode that
bumps the xid by a few million randomly. That would still be pretty
hit and miss whether it happens to wrap around in any particular
state.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] pg_ctl always uses the same event source

2014-04-17 Thread Amit Kapila
On Sat, Apr 12, 2014 at 1:21 PM, MauMau maumau...@gmail.com wrote:
 Hello, Amit san, Tom san,

 I'm sorry for my late response.  I've just caught up with the discussion.
 I'm almost convinced.

 Please find attached the revised patch.  I'd like to follow the idea of
 adding a switch to pg_ctl.  The newly added -e event_source sets the
 event source name for pg_ctl to use.  When -e is used with pg_ctl register,
 it will be added to the command line for Windows service (pg_ctl
 runservice).

Currently -e option is accepted with all the options that can be provided
in pg_ctl.  Shouldn't we accept it only with options related to service,
because that is only when it will be used.  Basically write_stderr() will
write to event log only incase of service.

Another minor point is you have forgotten to remove below declaration:
+ static void get_config_value(const char *name, char *buf, int buf_size);

Sorry for delayed response and I am not sure that I will be able to
complete the review of patch in next few days as I will be on vacation.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Greg Stark
On Thu, Apr 17, 2014 at 4:48 PM, Peter Geoghegan p...@heroku.com wrote:
 Although I will add that not caching highly useful inner pages for the
 medium term, because that index isn't being used at all for 5 minutes
 probably is very bad. Using the 4,828 buffers that it would take to
 store all the inner pages (as in my large primary index example) to go
 store something else is probably penny wise and pound foolish.

But there could easily be 20 unused indexes for every 1 index that is
being used.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-17 Thread Peter Geoghegan
On Thu, Apr 17, 2014 at 6:50 PM, Greg Stark st...@mit.edu wrote:
 On Thu, Apr 17, 2014 at 4:48 PM, Peter Geoghegan p...@heroku.com wrote:
 Although I will add that not caching highly useful inner pages for the
 medium term, because that index isn't being used at all for 5 minutes
 probably is very bad. Using the 4,828 buffers that it would take to
 store all the inner pages (as in my large primary index example) to go
 store something else is probably penny wise and pound foolish.

 But there could easily be 20 unused indexes for every 1 index that is
 being used.

Sure, but then there might not be. Obviously there is a trade-off to
be made between recency and frequency. One interesting observation in
the LRU-K paper is that for their test case, a pure LFU actually works
very well, despite, as the authors acknowledge, being a terrible
algorithm in the real world. That's because their test case is so
simple, and concerns only one table/index, with a uniform
distribution.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DISCARD ALL (Again)

2014-04-17 Thread David G Johnston
 On 04/17/2014 05:24 PM, Tom Lane wrote:
  On the whole I'm not sure this is something we ought to get into.
  If you really need a fresh session, maybe you should start a
  fresh session.


Isn't the whole point to avoid the reconnection overhead, especially for
connection poolers?  DISCARD ALL shouldn't cause any cleanup that wouldn't
otherwise occur when a session disconnects.  True global data (not just
session global) should be excluded.

A better wording of the promise would be: discard all leaves the session
in the same state it would be in if the underlying connection were dropped
and re-established.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/DISCARD-ALL-Again-tp5800623p5800662.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: [HACKERS] DISCARD ALL (Again)

2014-04-17 Thread Stephen Frost
* David G Johnston (david.g.johns...@gmail.com) wrote:
  On 04/17/2014 05:24 PM, Tom Lane wrote:
   On the whole I'm not sure this is something we ought to get into.
   If you really need a fresh session, maybe you should start a
   fresh session.
 
 Isn't the whole point to avoid the reconnection overhead, especially for
 connection poolers?  DISCARD ALL shouldn't cause any cleanup that wouldn't
 otherwise occur when a session disconnects.  True global data (not just
 session global) should be excluded.

Err, this is per-backend global data, so it will get cleaned up on
session disconnect.

 A better wording of the promise would be: discard all leaves the session
 in the same state it would be in if the underlying connection were dropped
 and re-established.

This is not accurate.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] DISCARD ALL (Again)

2014-04-17 Thread Joshua D. Drake


On 04/17/2014 07:07 PM, David G Johnston wrote:


On 04/17/2014 05:24 PM, Tom Lane wrote:
  On the whole I'm not sure this is something we ought to get into.
  If you really need a fresh session, maybe you should start a
  fresh session.


Isn't the whole point to avoid the reconnection overhead, especially for
connection poolers?  DISCARD ALL shouldn't cause any cleanup that
wouldn't otherwise occur when a session disconnects.  True global data
(not just session global) should be excluded.


The GD is global to the session only (Like temp tables).



A better wording of the promise would be: discard all leaves the
session in the same state it would be in if the underlying connection
were dropped and re-established.


Except that it doesn't.

JD


--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
Political Correctness is for cowards.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DISCARD ALL (Again)

2014-04-17 Thread David Johnston
On Thursday, April 17, 2014, Joshua D. Drake j...@commandprompt.com wrote:


 On 04/17/2014 07:07 PM, David G Johnston wrote:


 On 04/17/2014 05:24 PM, Tom Lane wrote:
   On the whole I'm not sure this is something we ought to get into.
   If you really need a fresh session, maybe you should start a
   fresh session.


 Isn't the whole point to avoid the reconnection overhead, especially for
 connection poolers?  DISCARD ALL shouldn't cause any cleanup that
 wouldn't otherwise occur when a session disconnects.  True global data
 (not just session global) should be excluded.


 The GD is global to the session only (Like temp tables).


Yes.  Tom's response makes it sound like the proposal is to throw away the
entire language environment for the whole server (thus needing super user
privilege) so I'm pointing out that what we are discussing is not that
invasive.




 A better wording of the promise would be: discard all leaves the
 session in the same state it would be in if the underlying connection
 were dropped and re-established.


 Except that it doesn't.


But is this what you intend it to mean, by implementing these features, or
are you thinking something different?

David J.


Re: [HACKERS] assertion failure 9.3.4

2014-04-17 Thread Andrew Dunstan


On 04/17/2014 09:04 PM, Peter Geoghegan wrote:

On Thu, Apr 17, 2014 at 7:15 AM, Andrew Dunstan and...@dunslane.net wrote:

  track_activity_query_size = 10240
shared_preload_libraries = 'auto_explain,pg_stat_statements'

As you can see, auto_explain's log_min_duration hasn't been set, so it
shouldn't be doing anything very much, I should think.

track_activity_query_size is 10240? Not 1024?




yes.

I normally find 1024 is far too small.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers