Re: [HACKERS] WAL replay bugs

2014-04-18 Thread Peter Geoghegan
On Thu, Apr 17, 2014 at 10:33 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Any objections to changing those two?

 Not here.  I've always suspected #2 was going to bite us someday anyway.

+1


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: iff - if

2014-04-18 Thread Albe Laurenz
Nicolas Barbier wrote:
 2014-04-17 Michael Paquier michael.paqu...@gmail.com:
 Is there no equivalent in German? For example in French there is ssi.
 
 gdw (genau dann, wenn)

Sorry, but I as a German native speaker and mathematitian have never
encountered this abbreviation.  I am familiar with iff though.

I think there just isn't a widely known German equivalent, and
(as Andreas has mentioned) symbols are usually used in writing.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] client encoding that psql command sets

2014-04-18 Thread Albe Laurenz
Bruce Momjian wrote:
 I suggest the attached documentation fix.
 
 Patch applied and backpatched to 9.3.  Thanks.

What would PostgreSQL do without Bruce who undertakes the
Herculean task of making sure that nothing gets forgotten
and slips through the cracks?

Thanks!

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New windows compiler warning from 585bca39

2014-04-18 Thread Bruce Momjian
On Thu, Apr 17, 2014 at 04:14:55PM -0400, Tom Lane wrote:
  Perhaps the #ifndef could be placed in a nicer spot in the patch, but the
  attached should at least describe where the problem lies...
 
 Yeah, I thought it better to make a separate declaration to wrap in
 #ifndef.  pgindent is probably going to insist on adding some vertical
 whitespace around the #if, and that'll look horrid if it's just in the
 middle of a list of variables.

Uh, I don't think that is the case anymore.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Archive recovery won't be completed on some situation.

2014-04-18 Thread Kyotaro HORIGUCHI
Hello,

 I don't think we should consider changing long-established behavior in
 the back-branches.  The old behavior may not be ideal, but that
 doesn't mean it's a bug.

Ok, I understand that. I give it up.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Minor improvement in src/backend/access/gin/README

2014-04-18 Thread Etsuro Fujita
The attached improves a document in src/backend/access/gin/README.

Thanks,

Best regards,
Etsuro Fujita
diff --git a/src/backend/access/gin/README b/src/backend/access/gin/README
index 3f0c3e2..8a71d28 100644
--- a/src/backend/access/gin/README
+++ b/src/backend/access/gin/README
@@ -181,7 +181,7 @@ algorithms.
   GinSetPostingTree macro.
 
 * If IndexTupleHasNulls(itup) is true, the null category byte can be
-  accessed/set with GinGetNullCategory(itup) / GinSetNullCategory(itup,c)
+  accessed/set with GinGetNullCategory(itup,gs) / GinSetNullCategory(itup,gs,c)
 
 * The posting list is not present and must not be accessed.
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Minor improvement in gin_private.h

2014-04-18 Thread Etsuro Fujita
The attached improves a comment in gin_private.h a little bit.

Thanks,

Best regards,
Etsuro Fujita
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
index 81a3bee..3aa4276 100644
--- a/src/include/access/gin_private.h
+++ b/src/include/access/gin_private.h
@@ -130,7 +130,7 @@ typedef struct GinMetaPageData
 #define GinPageRightMost(page) ( GinPageGetOpaque(page)-rightlink == 
InvalidBlockNumber)
 
 /*
- * We use our own ItemPointerGet(BlockNumber|GetOffsetNumber)
+ * We use our own ItemPointerGet(BlockNumber|OffsetNumber)
  * to avoid Asserts, since sometimes the ip_posid isn't valid
  */
 #define GinItemPointerGetBlockNumber(pointer) \

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2014-04-18 Thread Kyotaro HORIGUCHI
Hello,

 I thought some more about this patch, and realized that it's more or less
 morally equivalent to allowing references to ungrouped variables when the
 query has a GROUP BY clause listing all the columns of the primary key.
 In that case the parser is effectively pretending that the GROUP BY list
 contains additional implicit entries that are functionally dependent on
 the entries that are actually there.  In this patch, what we want to do
 is recognize that trailing entries in an ORDER BY list are semantically
 no-ops and can be ignored because they are functionally dependent on
 earlier entries.

Ah, that sounds smarter than extending pathekys. I feel it preferable.

 Now, the reason that the parser restricts the functional dependency
 deduction to a primary key is that it wants to be able to identify a
 constraint OID that the query is dependent on to be semantically valid.
 In this case, we don't need such an OID, so just finding any old unique
 index on not-null columns is good enough.  (If someone drops the index,
 the optimization might become incorrect, but that would force replanning
 anyway.)

Agreed,

 However, this way of thinking about it shows that the patch is missing
 possible optimizations.  If we have ORDER BY a, b, c and (a,b) is the
 primary key, then including c in the ORDER BY list is semantically
 redundant, *whether or not we use an indexscan on the pkey index at all*.
 More: if we have ORDER BY a, b, c and the primary key is (b,a), we
 can still discard c from the sort requirement, even though the pkey
 index as such isn't helpful for producing the required order.

Hmm yes, it really seems expectable.

 So hacking up the pathkeys attributed to the indexscan is the wrong thing.
 Rather, what we should be looking to do is decide that c is a useless
 pathkey and remove it from the query_pathkeys, much as we'd do if we found
 c = constant in WHERE.  That would allow optimization of other query
 plans besides scan-the-pkey-index plans.

Ok, I am convinced that your suggestion - truncating
query_pathkeys by removing eventually no-op entries - seems
preferable and will have wider effect naturally - more promised.

I won't persist with the way this patch currently does but the
new patch of course can't come up within this CF. I will agree if
you decide to make this patch 'Returned with Feedback'. (I feel a
little sad for 'Rejected' but you can justly do that if you think
that the patch comming up next is utterly different from this
one:()

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Typo fix in src/backend/access/transam/recovery.conf.sample

2014-04-18 Thread Amit Langote
Hi,

Attached fixes a minor typo in src/backend/access/transam/recovery.conf.sample.

--
Amit


recovery-conf-sample-typo-fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Typo fix in src/backend/access/transam/recovery.conf.sample

2014-04-18 Thread Magnus Hagander
On Fri, Apr 18, 2014 at 12:24 PM, Amit Langote amitlangot...@gmail.comwrote:

 Hi,

 Attached fixes a minor typo in
 src/backend/access/transam/recovery.conf.sample.


Applied, except I also rewrapped the line to make it shorter. Thanks!


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Typo fix in src/backend/access/transam/recovery.conf.sample

2014-04-18 Thread Amit Langote
On Fri, Apr 18, 2014 at 7:50 PM, Magnus Hagander mag...@hagander.net wrote:

 On Fri, Apr 18, 2014 at 12:24 PM, Amit Langote amitlangot...@gmail.com
 wrote:

 Hi,

 Attached fixes a minor typo in
 src/backend/access/transam/recovery.conf.sample.


 Applied, except I also rewrapped the line to make it shorter. Thanks!


Ah, thanks!


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-18 Thread Atri Sharma
On Fri, Apr 18, 2014 at 7:27 AM, Peter Geoghegan p...@heroku.com wrote:

A way I have in mind about eviction policy is to introduce a way to have an
ageing factor in each buffer and take the ageing factor into consideration
when evicting a buffer.

Consider a case where a table is pretty huge and spread across multiple
pages. The querying pattern is like a time series pattern i.e. a set of
rows is queried pretty frequently for some time, making the corresponding
page hot. Then, the next set of rows is queried frequently making that page
hot and so on.

Consider a new page entering the shared buffers with refcount 1 and
usage_count 1. If that page is a part of the workload described above, it
is likely that it shall not be used for a considerable amount of time after
it has entered the buffers but will be used eventually.

Now, the current hypothetical situation is that we have three pages:

1) The page that used to be hot at the previous time window but is no
longer hot and is actually the correct candidate for eviction.
2) The current hot page (It wont be evicted anyway for now).
3) The new page which just got in and should not be evicted since it can be
hot soon (for this workload it will be hot in the next time window).

When Clocksweep algorithm runs the next time, it will see the new buffer
page as the one to be evicted (since page (1) may still have usage_count 
0 i.e. it may be 'cooling' but not 'cool' yet.)

This can be changed by introducing an ageing factor that sees how much time
the current buffer has spend in shared buffers. If the time that the buffer
has spent is large enough (relatively) and it is not hot currently, that
means it has had its chance and can be evicted. This shall save the new
page (3) from being evicted since it's time in shared buffers shall not be
high enough to mandate eviction and it shall be given more chances.

Since gettimeofday() is an expensive call and hence cannot be done in the
tight loop, we can count the number of clocksweeps the current buffer has
seen (rather, survived). This shall give us a rough idea of the estimate of
the relative age of the buffer.

When an eviction happens, all the candidates with refcount = 0 shall be
taken.Then, among them, the one with highest ageing factor shall be evicted.

Of course, there may be better ways of doing the same, but I want to
highlight the point (or possibility) of introducing an ageing factor to
prevent eviction of relatively younger pages early in the eviction process.

The overhead isnt too big. We just need to add another attribute in buffer
header for the number of clocksweeps seen (rather, survived) and check it
when an eviction is taking place.The existing spinlock for buffer headers
shall be good for protecting contention and access. The access rules can be
similar to that of usage_count.

Thoughts and comments?

Regards,

Atri

-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] [bug fix] pg_ctl always uses the same event source

2014-04-18 Thread MauMau

From: Amit Kapila amit.kapil...@gmail.com

Currently -e option is accepted with all the options that can be provided
in pg_ctl.  Shouldn't we accept it only with options related to service,
because that is only when it will be used.  Basically write_stderr() will
write to event log only incase of service.


Thank you for your kind and patient review.  I hope this will bear fruit.

I don't find it so strange that -e is accepted by all operation modes of 
pg_ctl --- pg_ctl seems to handle switches that way.  For example, -c is 
only relevant to start and restart, but it is accepted by all modes.  -D is 
not relevant to pg_ctl kill, but it is not rejected.  Plus, I prepared for 
the future possibility that other modes of pg_ctl will use event log.



Another minor point is you have forgotten to remove below declaration:
+ static void get_config_value(const char *name, char *buf, int buf_size);


Oh, removed.


Sorry for delayed response and I am not sure that I will be able to
complete the review of patch in next few days as I will be on vacation.


OK, I'm waiting for you.  Please have a nice vacation.

Regards
MauMau


pg_ctl_eventsrc_v8.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How can we make beta testing better?

2014-04-18 Thread MauMau

From: Josh Berkus j...@agliodbs.com

How can we make beta testing better and more effective?  How can we get
more users to actually throw serious workloads at new versions and share
the results?

I've tried a couple of things over the last two years and they haven't
worked all that well.  Since we're about to go into another beta testing
period, we need something new.  Ideas?


I've had a look at the mail for 9.3 beta 1:

http://www.postgresql.org/message-id/517c64f4.6000...@agliodbs.com

Please excuse me for my misunderstanding and cheekiness.  I feel the 
followings may need to be addressed:


* Existing and (more importantly) new users don't know about the beta 
release.
I remember I saw the news about new features of MySQL on the daily mail news 
of some famous IT media, such as IDG's Computerworld, Infoworld, and 
Japanese IT industry news media.  On the other hand, I'm afraid I hadn't see 
news about new features of PostgreSQL until its final release was out.  Is 
subscribing to pgsql-announce the only way to know the beta release?


* Existing users are satisfied with the version they are using for their 
current use cases, so they don't have motivation to try a new release.
To increase the use cases of existing users and get more users, more 
attractive features may be necessary such as in-memory database, columnar 
database, MPP for scaleout, database compression, integration with Hadoop, 
multi-tenant database, Oracle compatibility (this should be very important 
to get many users in practice), etc.  I think eye-catching features like 
streaming replication and materialized views are necessary to widen 
PostgreSQL use.


* Explain new features by associating them with the new trends like cloud, 
mobile, social, big data.


Regards
MauMau





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get more from indices.

2014-04-18 Thread Tom Lane
Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp writes:
 Ok, I am convinced that your suggestion - truncating
 query_pathkeys by removing eventually no-op entries - seems
 preferable and will have wider effect naturally - more promised.

 I won't persist with the way this patch currently does but the
 new patch of course can't come up within this CF. I will agree if
 you decide to make this patch 'Returned with Feedback'. (I feel a
 little sad for 'Rejected' but you can justly do that if you think
 that the patch comming up next is utterly different from this
 one:()

I marked it as returned with feedback.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] assertion in 9.4 with wal_level=logical

2014-04-18 Thread Robert Haas
On Thu, Apr 17, 2014 at 10:47 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-04-17 17:40:01 -0300, Alvaro Herrera wrote:
 For once, this looks more like a problem in logical decoding, which is
 trying to assert about the tuple being updated; the assertion failing is
 the one added a week ago about not palloc'ing in a critical section.

 It's this (older) assertion in HeapTupleHeaderGetCmax():

 
 Assert(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tup)));

 That can allocate memory if xmax is a multixact... Does anybody have a
 better idea to solve this than adding a CritSectionCount == 0  in
 there?

Blech.  Isn't that just nerfing the assertion?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] assertion in 9.4 with wal_level=logical

2014-04-18 Thread Alvaro Herrera
Robert Haas wrote:
 On Thu, Apr 17, 2014 at 10:47 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
  On 2014-04-17 17:40:01 -0300, Alvaro Herrera wrote:
  For once, this looks more like a problem in logical decoding, which is
  trying to assert about the tuple being updated; the assertion failing is
  the one added a week ago about not palloc'ing in a critical section.
 
  It's this (older) assertion in HeapTupleHeaderGetCmax():
 
  
  Assert(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tup)));
 
  That can allocate memory if xmax is a multixact... Does anybody have a
  better idea to solve this than adding a CritSectionCount == 0  in
  there?
 
 Blech.  Isn't that just nerfing the assertion?

Well, that's exactly the point.  Most of the time,
HeapTupleHeaderGetCmax gets called in a non-critical section, and we
want to run the assertion in that case.  But it's not huge trouble if
the assertion is not run in the rare case where HeapTupleHeaderGetCmax
is being used to write a Xlog record.

It's a bit painful that HeapTupleHeaderGetUpdateXid allocates memory,
but to fix that we would have to remove all allocations from
GetMultiXactIdMembers which doesn't sound feasible.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] assertion in 9.4 with wal_level=logical

2014-04-18 Thread Andres Freund
On 2014-04-18 16:44:55 +0200, Robert Haas wrote:
 On Thu, Apr 17, 2014 at 10:47 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
  On 2014-04-17 17:40:01 -0300, Alvaro Herrera wrote:
  For once, this looks more like a problem in logical decoding, which is
  trying to assert about the tuple being updated; the assertion failing is
  the one added a week ago about not palloc'ing in a critical section.
 
  It's this (older) assertion in HeapTupleHeaderGetCmax():
 
  
  Assert(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tup)));
 
  That can allocate memory if xmax is a multixact... Does anybody have a
  better idea to solve this than adding a CritSectionCount == 0  in
  there?
 
 Blech.  Isn't that just nerfing the assertion?

Not precicisely sure what you mean, but the only memory allocation in
HeapTupleHeaderGetCmax() and log_heap_new_cid() is that Assert(). And
that's the only forbidden thing in that codepath.
Now, we could alternatively restructure the codepaths so they pass in
xmax from outside the critical section, but I had a quick look and the
risk/complications from that seems bigger than the assertion buys us
there.
I don't have a better idea unfortunately :(

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DISCARD ALL (Again)

2014-04-18 Thread Peter Eisentraut
On 4/17/14, 8:24 PM, Tom Lane wrote:
 We could in fact implement #2, I imagine, by destroying and recreating
 the entire language interpreter.  So I could imagine implementing a
 DISCARD INTERPRETERS kind of command that would zap the current
 interpreter(s) for whichever PL languages happened to feel like
 cooperating with the command.

More generally, any extension could maintain any kind of cross-call
state.  plproxy, dblink, pgmemcache come to mind.  A general hook into
DISCARD might be doable, but then it's not clear how to categorize this
into DISCARD subcommands.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DISCARD ALL (Again)

2014-04-18 Thread Peter Eisentraut
On 4/17/14, 4:44 PM, Joshua D. Drake wrote:
 That we should also release the GD?

In some cases, SD or GD are used to cache things.  Having the connection
pooler blow that away would defeat the point.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] assertion in 9.4 with wal_level=logical

2014-04-18 Thread Alvaro Herrera
Andres Freund wrote:
 On 2014-04-18 11:50:55 -0300, Alvaro Herrera wrote:
  It's a bit painful that HeapTupleHeaderGetUpdateXid allocates memory,
  but to fix that we would have to remove all allocations from
  GetMultiXactIdMembers which doesn't sound feasible.
 
 I was thinking for a second we could just allocate something during
 startup based on max_connections (+ other possible backends), but I
 think that won't work because of subtransactions?

Yeah, I don't think the number of members in a multixact is bound.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] assertion in 9.4 with wal_level=logical

2014-04-18 Thread Andres Freund
On 2014-04-18 11:50:55 -0300, Alvaro Herrera wrote:
 It's a bit painful that HeapTupleHeaderGetUpdateXid allocates memory,
 but to fix that we would have to remove all allocations from
 GetMultiXactIdMembers which doesn't sound feasible.

I was thinking for a second we could just allocate something during
startup based on max_connections (+ other possible backends), but I
think that won't work because of subtransactions?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DISCARD ALL (Again)

2014-04-18 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On 4/17/14, 8:24 PM, Tom Lane wrote:
 We could in fact implement #2, I imagine, by destroying and recreating
 the entire language interpreter.  So I could imagine implementing a
 DISCARD INTERPRETERS kind of command that would zap the current
 interpreter(s) for whichever PL languages happened to feel like
 cooperating with the command.

 More generally, any extension could maintain any kind of cross-call
 state.  plproxy, dblink, pgmemcache come to mind.  A general hook into
 DISCARD might be doable, but then it's not clear how to categorize this
 into DISCARD subcommands.

Right.  So if we go down that path, we're basically giving up the ability
to define what DISCARD ALL does by specifying an equivalent list of
subcommands.  Maybe that's an acceptable tradeoff, but I don't like it
much.

I wonder whether we could have a DISCARD identifier variant where
any hooked-in extension could recognize the identifier and zap some
appropriate subset of its own state.  This would get us to a situation
where we could say DISCARD ALL is the union of all DISCARD operations,
but some of those are documented in relevant extensions' docs rather
than here.

The permissions problem remains, of course.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-18 Thread Robert Haas
On Thu, Apr 17, 2014 at 5:00 PM, Greg Stark st...@mit.edu wrote:
 On Thu, Apr 17, 2014 at 10:18 AM, Robert Haas robertmh...@gmail.com wrote:
 Because all the usage counts are the same, the eviction at
 this point is completely indiscriminate.  We're just as likely to kick
 out a btree root page or a visibility map page as we are to kick out a
 random heap page, even though the former have probably been accessed
 several orders of magnitude more often.  That's clearly bad.

 That's not clear at all. In that circumstance regardless of what page
 you evict you're incurring precisely one page fault i/o when the page
 is read back in.

I am a bit confused by this remark.  In *any* circumstance when you
evict you're incurring precisely one page fault I/O when the page is
read back in.   That doesn't mean that the choice of which page to
evict is irrelevant.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How can we make beta testing better?

2014-04-18 Thread Alvaro Herrera
MauMau wrote:
 From: Josh Berkus j...@agliodbs.com
 How can we make beta testing better and more effective?  How can we get
 more users to actually throw serious workloads at new versions and share
 the results?
 
 I've tried a couple of things over the last two years and they haven't
 worked all that well.  Since we're about to go into another beta testing
 period, we need something new.  Ideas?
 
 I've had a look at the mail for 9.3 beta 1:
 
 http://www.postgresql.org/message-id/517c64f4.6000...@agliodbs.com

This call for testing is interesting: note there was not a single
mention of foreign key locking getting a huge change, and please test it
and see whether it still works at all for you.  I asked Josh
specifically to mention it in a followup to this message which you can
see in that thread.  There was no reply, which I took as this isn't a
new feature and isn't user visible anyway, so what would be the point?

But hey, what the hell do I know about advocacy anyway, huh?  So I shut
up.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-18 Thread Greg Stark
On Fri, Apr 18, 2014 at 4:14 PM, Robert Haas robertmh...@gmail.com wrote:
 I am a bit confused by this remark.  In *any* circumstance when you
 evict you're incurring precisely one page fault I/O when the page is
 read back in.   That doesn't mean that the choice of which page to
 evict is irrelevant.

But you might be evicting a page that will be needed soon or one that
won't be needed for a while. If it's not needed for a while you might
be able to avoid many page evictions by caching a page that will be
used several times.

If all the pages currently in RAM are hot -- meaning they're hot
enough that they'll be needed again before the page you're reading in
-- then they're all equally bad to evict.

I'm trying to push us away from the gut instinct that frequently used
pages are important to cache and towards actually counting how many
i/os we're saving. In the extreme it's possible to simulate any cache
algorithm on a recorded list of page requests and count how many page
misses it generates to compare it with an optimal cache algorithm.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] assertion failure 9.3.4

2014-04-18 Thread Andrew Dunstan


On 04/17/2014 10:15 AM, Andrew Dunstan wrote:


On 04/16/2014 10:28 PM, Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:

On 04/16/2014 07:19 PM, Tom Lane wrote:
Yeah, it would be real nice to see a self-contained test case for 
this.

Well, that might be hard to put together, but I did try running without
pg_stat_statements and auto_explain loaded and the error did not occur.
Not sure where that gets us in terms of deciding on a culprit.
Could we at least see the exact parameter settings for 
pg_stat_statements

and auto_explain?  (And any other GUCs with nondefault values?)





Here are all the settings from the run that failed:

   listen_addresses = '*'
   port = 5432
   fsync = on
   synchronous_commit = off
   checkpoint_segments = 128
   checkpoint_completion_target = 0.9
   shared_buffers = 512MB
   max_connections = 300
   work_mem = 128MB
   maintenance_work_mem = 32MB
   effective_cache_size = 16GB
   effective_io_concurrency = 2
   logging_collector = on
   log_destination = 'stderr'
   log_filename = 'postgresql-%a.log'
   log_rotation_size = 0
   log_truncate_on_rotation = on
   log_line_prefix = '%t [%p] %l: '
   log_connections = on
   log_disconnections = on
   log_statement = 'all'
   track_activity_query_size = 10240
   shared_preload_libraries = 'auto_explain,pg_stat_statements'

As you can see, auto_explain's log_min_duration hasn't been set, so it 
shouldn't be doing anything very much, I should think.




There definitely seems to be something going on involving these two 
pre-loaded modules. With both auto_explain and pg_stat_statements 
preloaded I can reproduce the error fairly reliably. I have also 
reproduced it, but less reliably, with auto_explain alone loaded. I have 
not reproduced it with pg_stat_statements alone loaded, but Josh Berkus 
has reported that another client has experienced something very similar 
(duplicate PK errors on a non-assertion-enabled build) with 
pg_stat_statements alone loaded.


cheers

andrew





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How can we make beta testing better?

2014-04-18 Thread Greg Stark
On Fri, Apr 18, 2014 at 4:15 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
  There was no reply, which I took as this isn't a
 new feature and isn't user visible anyway, so what would be the point?

To be fair the list was pretty long already. And like regression
testing, coming up with a list of things to test is kind of beside the
point. If we knew which things had to be tested then we would have
tested them already. The point of beta testing is to have people run
their applications on it in case their applications are doing stuff we
*didn't* anticipate needing to test.

I think the take-away here is that testing for long periods of time
under load and while using the full suite of complex configurations
simultaneously that they use in production (such as hot standby
replicas to spread load) is what's important.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DISCARD ALL (Again)

2014-04-18 Thread Joshua D. Drake


On 04/18/2014 08:01 AM, Peter Eisentraut wrote:


On 4/17/14, 4:44 PM, Joshua D. Drake wrote:

That we should also release the GD?


In some cases, SD or GD are used to cache things.  Having the connection
pooler blow that away would defeat the point.


Not on a per session basis. Although I can see your point. The GD is 
supposed to be global per session. If, I discard the session to it's 
original state, that is going to predate the creation of the GD. That is 
expected behavior.


JD









--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
Political Correctness is for cowards.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How can we make beta testing better?

2014-04-18 Thread Josh Berkus
On 04/18/2014 08:15 AM, Alvaro Herrera wrote:
 and see whether it still works at all for you.  I asked Josh
 specifically to mention it in a followup to this message which you can
 see in that thread.  There was no reply, which I took as this isn't a
 new feature and isn't user visible anyway, so what would be the point?

It's not possible for a single document to be both a press release
(which that was) and a technical list of things to test.  That needs to
be two different documents ... and the technical one needs to somehow
evolve, so that we can add things to it as we discover potential problem
areas.

This does show how we're not really doing anything to publicize betas as
please test this though.  For that matter, our /beta page doesn't
really inspire someone to jump up and test things.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] assertion failure 9.3.4

2014-04-18 Thread Josh Berkus
On 04/18/2014 09:42 AM, Andrew Dunstan wrote:

 There definitely seems to be something going on involving these two
 pre-loaded modules. With both auto_explain and pg_stat_statements
 preloaded I can reproduce the error fairly reliably. I have also
 reproduced it, but less reliably, with auto_explain alone loaded. I have
 not reproduced it with pg_stat_statements alone loaded, but Josh Berkus
 has reported that another client has experienced something very similar
 (duplicate PK errors on a non-assertion-enabled build) with
 pg_stat_statements alone loaded.

Correct.   However, this seems to produce the issue less reliably;
that is, it takes several hours of load testing for a duplicate PK to
show up, so I suspect that with pg_stat_statements alone it's also
timing issue.  I'm still waiting on the results with
pg_stat_statements.so NOT loaded to confirm that we're not actually
getting two separate bugs with similar symptoms.

The second site does not have any increase in query size.  Their only
settings for pg_s_s are:

pg_stat_statements.max = 1
pg_stat_statements.track = all

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-18 Thread Bruce Momjian
On Fri, Apr 18, 2014 at 04:46:31PM +0530, Atri Sharma wrote:
 This can be changed by introducing an ageing factor that sees how much time 
 the
 current buffer has spend in shared buffers. If the time that the buffer has
 spent is large enough (relatively) and it is not hot currently, that means it
 has had its chance and can be evicted. This shall save the new page (3) from
 being evicted since it's time in shared buffers shall not be high enough to
 mandate eviction and it shall be given more chances.
 
 Since gettimeofday() is an expensive call and hence cannot be done in the 
 tight
 loop, we can count the number of clocksweeps the current buffer has seen
 (rather, survived). This shall give us a rough idea of the estimate of the
 relative age of the buffer.

Counting clock sweeps is an intersting idea.  I think one concern was
tracking hot buffers in cases where there is no memory pressure, and
hence the clock sweep isn't running --- I am not sure how this would
help in that case.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-18 Thread Atri Sharma
On Sat, Apr 19, 2014 at 1:07 AM, Bruce Momjian br...@momjian.us wrote:

 On Fri, Apr 18, 2014 at 04:46:31PM +0530, Atri Sharma wrote:
  This can be changed by introducing an ageing factor that sees how much
 time the
  current buffer has spend in shared buffers. If the time that the buffer
 has
  spent is large enough (relatively) and it is not hot currently, that
 means it
  has had its chance and can be evicted. This shall save the new page (3)
 from
  being evicted since it's time in shared buffers shall not be high enough
 to
  mandate eviction and it shall be given more chances.
 
  Since gettimeofday() is an expensive call and hence cannot be done in
 the tight
  loop, we can count the number of clocksweeps the current buffer has seen
  (rather, survived). This shall give us a rough idea of the estimate of
 the
  relative age of the buffer.

 Counting clock sweeps is an intersting idea.  I think one concern was
 tracking hot buffers in cases where there is no memory pressure, and
 hence the clock sweep isn't running --- I am not sure how this would
 help in that case.


I feel that if there is no memory pressure, frankly it doesnt matter much
about what gets out and what not. The case I am specifically targeting is
when the clocksweep gets to move about a lot i.e. high memory pressure
workloads. Of course,  I may be totally wrong here.

One thing that I discussed with Merlin offline and am now concerned about
is how will the actual eviction work. We cannot traverse the entire list
and then find all the buffers with refcount 0 and then do another traversal
to find the oldest one.

Any thoughts there would be appreciated.

Regards,

Atri

-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-18 Thread Jason Petersen
On Apr 18, 2014, at 1:51 PM, Atri Sharma atri.j...@gmail.com wrote:

 Counting clock sweeps is an intersting idea.  I think one concern was
 tracking hot buffers in cases where there is no memory pressure, and
 hence the clock sweep isn't running --- I am not sure how this would
 help in that case.
 


Yes, we obviously want a virtual clock. Focusing on the use of gettimeofday 
seems silly to me: it was something quick for the prototype.

The problem with the clocksweeps is they don’t actually track the progression 
of “time” within the PostgreSQL system.

What’s wrong with using a transaction number or some similar sequence? It would 
accurately track “age” in the sense we care about: how long ago in “units of 
real work being done by the DB” something was added.

—Jason



Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-18 Thread Atri Sharma
Yes, we obviously want a virtual clock. Focusing on the use of gettimeofday
 seems silly to me: it was something quick for the prototype.

 The problem with the clocksweeps is they don’t actually track the
 progression of “time” within the PostgreSQL system.

 What’s wrong with using a transaction number or some similar sequence? It
 would accurately track “age” in the sense we care about: how long ago in
 “units of real work being done by the DB” something was added.



Well, AIUI, we only need the 'relative' age of buffers in relation to the
youngest buffer present. So, the guy who has seen the maximum amount of
clocksweeps is the guy who has been around the most.

I do not see a need for an accurate estimate of the time spent in the
buffer for any purpose right now. It may be useful in the future though.

How do you get the transaction ID? By accessing a tuple on the page and
reading it's XMIN?

Regards,

Atri



-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-18 Thread Peter Geoghegan
On Fri, Apr 18, 2014 at 1:11 PM, Jason Petersen ja...@citusdata.com wrote:
 Yes, we obviously want a virtual clock. Focusing on the use of gettimeofday
 seems silly to me: it was something quick for the prototype.

The gettimeofday() call doesn't need to happen in a tight loop. It can
be reasonably passed down from higher up, very probably without
consequence. The LRU-K paper actually recommends a delay of 5 seconds.
There is at least one other major database system that unambiguously
uses a wall-clock delay of 3 seconds (by default) for this exact
purpose - avoiding what the LRU-K paper calls correlated references.

I'm not saying that we should go with that particular scheme for these
reasons. However, it's plainly untrue that using wall time like this
represents some kind of insurmountable scalability obstacle.

 The problem with the clocksweeps is they don’t actually track the
 progression of “time” within the PostgreSQL system.

 What’s wrong with using a transaction number or some similar sequence? It
 would accurately track “age” in the sense we care about: how long ago in
 “units of real work being done by the DB” something was added.

The LRU-K paper suggests that a scheme like this could be preferable.
I have my doubts that it can be made to work with Postgres.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-18 Thread Bruce Momjian
On Sat, Apr 19, 2014 at 01:21:29AM +0530, Atri Sharma wrote:
 I feel that if there is no memory pressure, frankly it doesnt matter much 
 about
 what gets out and what not. The case I am specifically targeting is when the
 clocksweep gets to move about a lot i.e. high memory pressure workloads. Of
 course,  I may be totally wrong here.
 
 One thing that I discussed with Merlin offline and am now concerned about is
 how will the actual eviction work. We cannot traverse the entire list and then
 find all the buffers with refcount 0 and then do another traversal to find the
 oldest one.

I thought if there was memory pressure the clock sweep would run and we
wouldn't have everything at the max counter access value.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)

2014-04-18 Thread Tom Lane
Noah Misch n...@leadboat.com writes:
 Interesting bug.
 On Fri, Mar 28, 2014 at 04:34:41PM -0400, Tom Lane wrote:
 I think we might be better off to get rid of toast_flatten_tuple_attribute
 and instead insist that composite Datums never contain any external toast
 pointers in the first place.

 Performance is the dominant issue here; the hacker-centric considerations you
 outlined vanish next to it.

I'm not exactly convinced by that line of argument, especially when it's
entirely unsupported by any evidence as to which implementation approach
will actually perform worse.

However, I have done some investigation into what it'd take to keep the
current approach and teach the array code to detoast composite-type array
elements properly.  The attached draft patch only fixes arrays; ranges
need work too, and I'm not sure what else might need adjustment.  But this
patch does fix the test case provided by Jan Pecek.

The main problem with this patch, as I see it, is that it'll introduce
extra syscache lookup overhead even when there are no toasted fields
anywhere.  I've not really tried to quantify how much, since that would
require first agreeing on a benchmark case --- anybody have a thought
about what that ought to be?  But in at least some of these functions,
we'll be paying a cache lookup or two per array element, which doesn't
sound promising.

This could be alleviated by caching the lookup results over longer
intervals, but I don't see any way to do that without invasive changes
to the APIs of a number of exported array functions, which doesn't seem
like a good thing for a bug fix that we need to back-patch.  I think the
back-branch fix would have to be pretty much what you see here, even if
we then make changes to reduce the costs in HEAD.

Comments?

regards, tom lane

diff --git a/src/backend/access/common/heaptuple.c b/src/backend/access/common/heaptuple.c
index aea9d40..48b09b8 100644
*** a/src/backend/access/common/heaptuple.c
--- b/src/backend/access/common/heaptuple.c
*** heap_form_tuple(TupleDesc tupleDescripto
*** 649,674 
  	 * Check for nulls and embedded tuples; expand any toasted attributes in
  	 * embedded tuples.  This preserves the invariant that toasting can only
  	 * go one level deep.
- 	 *
- 	 * We can skip calling toast_flatten_tuple_attribute() if the attribute
- 	 * couldn't possibly be of composite type.  All composite datums are
- 	 * varlena and have alignment 'd'; furthermore they aren't arrays. Also,
- 	 * if an attribute is already toasted, it must have been sent to disk
- 	 * already and so cannot contain toasted attributes.
  	 */
  	for (i = 0; i  numberOfAttributes; i++)
  	{
  		if (isnull[i])
  			hasnull = true;
! 		else if (att[i]-attlen == -1 
!  att[i]-attalign == 'd' 
!  att[i]-attndims == 0 
!  !VARATT_IS_EXTENDED(DatumGetPointer(values[i])))
! 		{
! 			values[i] = toast_flatten_tuple_attribute(values[i],
! 	  att[i]-atttypid,
! 	  att[i]-atttypmod);
! 		}
  	}
  
  	/*
--- 649,661 
  	 * Check for nulls and embedded tuples; expand any toasted attributes in
  	 * embedded tuples.  This preserves the invariant that toasting can only
  	 * go one level deep.
  	 */
  	for (i = 0; i  numberOfAttributes; i++)
  	{
  		if (isnull[i])
  			hasnull = true;
! 		else
! 			values[i] = toast_flatten_tuple_attribute(values[i], att[i]);
  	}
  
  	/*
*** heap_form_minimal_tuple(TupleDesc tupleD
*** 1403,1428 
  	 * Check for nulls and embedded tuples; expand any toasted attributes in
  	 * embedded tuples.  This preserves the invariant that toasting can only
  	 * go one level deep.
- 	 *
- 	 * We can skip calling toast_flatten_tuple_attribute() if the attribute
- 	 * couldn't possibly be of composite type.  All composite datums are
- 	 * varlena and have alignment 'd'; furthermore they aren't arrays. Also,
- 	 * if an attribute is already toasted, it must have been sent to disk
- 	 * already and so cannot contain toasted attributes.
  	 */
  	for (i = 0; i  numberOfAttributes; i++)
  	{
  		if (isnull[i])
  			hasnull = true;
! 		else if (att[i]-attlen == -1 
!  att[i]-attalign == 'd' 
!  att[i]-attndims == 0 
!  !VARATT_IS_EXTENDED(values[i]))
! 		{
! 			values[i] = toast_flatten_tuple_attribute(values[i],
! 	  att[i]-atttypid,
! 	  att[i]-atttypmod);
! 		}
  	}
  
  	/*
--- 1390,1402 
  	 * Check for nulls and embedded tuples; expand any toasted attributes in
  	 * embedded tuples.  This preserves the invariant that toasting can only
  	 * go one level deep.
  	 */
  	for (i = 0; i  numberOfAttributes; i++)
  	{
  		if (isnull[i])
  			hasnull = true;
! 		else
! 			values[i] = toast_flatten_tuple_attribute(values[i], att[i]);
  	}
  
  	/*
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index 9a821d3..2d1a922 100644
*** a/src/backend/access/heap/tuptoaster.c
---