date:20140418

Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)

2014-04-18 Thread Tom Lane

Noah Misch  writes:
> Interesting bug.
> On Fri, Mar 28, 2014 at 04:34:41PM -0400, Tom Lane wrote:
>> I think we might be better off to get rid of toast_flatten_tuple_attribute
>> and instead insist that composite Datums never contain any external toast
>> pointers in the first place.

> Performance is the dominant issue here; the hacker-centric considerations you
> outlined vanish next to it.

I'm not exactly convinced by that line of argument, especially when it's
entirely unsupported by any evidence as to which implementation approach
will actually perform worse.

However, I have done some investigation into what it'd take to keep the
current approach and teach the array code to detoast composite-type array
elements properly.  The attached draft patch only fixes arrays; ranges
need work too, and I'm not sure what else might need adjustment.  But this
patch does fix the test case provided by Jan Pecek.

The main problem with this patch, as I see it, is that it'll introduce
extra syscache lookup overhead even when there are no toasted fields
anywhere.  I've not really tried to quantify how much, since that would
require first agreeing on a benchmark case --- anybody have a thought
about what that ought to be?  But in at least some of these functions,
we'll be paying a cache lookup or two per array element, which doesn't
sound promising.

This could be alleviated by caching the lookup results over longer
intervals, but I don't see any way to do that without invasive changes
to the APIs of a number of exported array functions, which doesn't seem
like a good thing for a bug fix that we need to back-patch.  I think the
back-branch fix would have to be pretty much what you see here, even if
we then make changes to reduce the costs in HEAD.

Comments?

regards, tom lane

diff --git a/src/backend/access/common/heaptuple.c b/src/backend/access/common/heaptuple.c
index aea9d40..48b09b8 100644
*** a/src/backend/access/common/heaptuple.c
--- b/src/backend/access/common/heaptuple.c
*** heap_form_tuple(TupleDesc tupleDescripto
*** 649,674 
  	 * Check for nulls and embedded tuples; expand any toasted attributes in
  	 * embedded tuples.  This preserves the invariant that toasting can only
  	 * go one level deep.
- 	 *
- 	 * We can skip calling toast_flatten_tuple_attribute() if the attribute
- 	 * couldn't possibly be of composite type.  All composite datums are
- 	 * varlena and have alignment 'd'; furthermore they aren't arrays. Also,
- 	 * if an attribute is already toasted, it must have been sent to disk
- 	 * already and so cannot contain toasted attributes.
  	 */
  	for (i = 0; i < numberOfAttributes; i++)
  	{
  		if (isnull[i])
  			hasnull = true;
! 		else if (att[i]->attlen == -1 &&
!  att[i]->attalign == 'd' &&
!  att[i]->attndims == 0 &&
!  !VARATT_IS_EXTENDED(DatumGetPointer(values[i])))
! 		{
! 			values[i] = toast_flatten_tuple_attribute(values[i],
! 	  att[i]->atttypid,
! 	  att[i]->atttypmod);
! 		}
  	}
  
  	/*
--- 649,661 
  	 * Check for nulls and embedded tuples; expand any toasted attributes in
  	 * embedded tuples.  This preserves the invariant that toasting can only
  	 * go one level deep.
  	 */
  	for (i = 0; i < numberOfAttributes; i++)
  	{
  		if (isnull[i])
  			hasnull = true;
! 		else
! 			values[i] = toast_flatten_tuple_attribute(values[i], att[i]);
  	}
  
  	/*
*** heap_form_minimal_tuple(TupleDesc tupleD
*** 1403,1428 
  	 * Check for nulls and embedded tuples; expand any toasted attributes in
  	 * embedded tuples.  This preserves the invariant that toasting can only
  	 * go one level deep.
- 	 *
- 	 * We can skip calling toast_flatten_tuple_attribute() if the attribute
- 	 * couldn't possibly be of composite type.  All composite datums are
- 	 * varlena and have alignment 'd'; furthermore they aren't arrays. Also,
- 	 * if an attribute is already toasted, it must have been sent to disk
- 	 * already and so cannot contain toasted attributes.
  	 */
  	for (i = 0; i < numberOfAttributes; i++)
  	{
  		if (isnull[i])
  			hasnull = true;
! 		else if (att[i]->attlen == -1 &&
!  att[i]->attalign == 'd' &&
!  att[i]->attndims == 0 &&
!  !VARATT_IS_EXTENDED(values[i]))
! 		{
! 			values[i] = toast_flatten_tuple_attribute(values[i],
! 	  att[i]->atttypid,
! 	  att[i]->atttypmod);
! 		}
  	}
  
  	/*
--- 1390,1402 
  	 * Check for nulls and embedded tuples; expand any toasted attributes in
  	 * embedded tuples.  This preserves the invariant that toasting can only
  	 * go one level deep.
  	 */
  	for (i = 0; i < numberOfAttributes; i++)
  	{
  		if (isnull[i])
  			hasnull = true;
! 		else
! 			values[i] = toast_flatten_tuple_attribute(values[i], att[i]);
  	}
  
  	/*
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index 9a821d3..2d1a922 100644
*** a/src/backend/access/heap/tuptoaster.c
--- b/src/backend/ac

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-18 Thread Bruce Momjian

On Sat, Apr 19, 2014 at 01:21:29AM +0530, Atri Sharma wrote:
> I feel that if there is no memory pressure, frankly it doesnt matter much 
> about
> what gets out and what not. The case I am specifically targeting is when the
> clocksweep gets to move about a lot i.e. high memory pressure workloads. Of
> course,  I may be totally wrong here.
> 
> One thing that I discussed with Merlin offline and am now concerned about is
> how will the actual eviction work. We cannot traverse the entire list and then
> find all the buffers with refcount 0 and then do another traversal to find the
> oldest one.

I thought if there was memory pressure the clock sweep would run and we
wouldn't have everything at the max counter access value.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-18 Thread Peter Geoghegan

On Fri, Apr 18, 2014 at 1:11 PM, Jason Petersen  wrote:
> Yes, we obviously want a virtual clock. Focusing on the use of gettimeofday
> seems silly to me: it was something quick for the prototype.

The gettimeofday() call doesn't need to happen in a tight loop. It can
be reasonably passed down from higher up, very probably without
consequence. The LRU-K paper actually recommends a delay of 5 seconds.
There is at least one other major database system that unambiguously
uses a wall-clock delay of 3 seconds (by default) for this exact
purpose - avoiding what the LRU-K paper calls "correlated references".

I'm not saying that we should go with that particular scheme for these
reasons. However, it's plainly untrue that using wall time like this
represents some kind of insurmountable scalability obstacle.

> The problem with the clocksweeps is they don’t actually track the
> progression of “time” within the PostgreSQL system.
>
> What’s wrong with using a transaction number or some similar sequence? It
> would accurately track “age” in the sense we care about: how long ago in
> “units of real work being done by the DB” something was added.

The LRU-K paper suggests that a scheme like this could be preferable.
I have my doubts that it can be made to work with Postgres.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-18 Thread Atri Sharma

Yes, we obviously want a virtual clock. Focusing on the use of gettimeofday
> seems silly to me: it was something quick for the prototype.
>
> The problem with the clocksweeps is they don’t actually track the
> progression of “time” within the PostgreSQL system.
>
> What’s wrong with using a transaction number or some similar sequence? It
> would accurately track “age” in the sense we care about: how long ago in
> “units of real work being done by the DB” something was added.
>
>
>
Well, AIUI, we only need the 'relative' age of buffers in relation to the
youngest buffer present. So, the guy who has seen the maximum amount of
clocksweeps is the guy who has been around the most.

I do not see a need for an accurate estimate of the time spent in the
buffer for any purpose right now. It may be useful in the future though.

How do you get the transaction ID? By accessing a tuple on the page and
reading it's XMIN?

Regards,

Atri



-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-18 Thread Jason Petersen

On Apr 18, 2014, at 1:51 PM, Atri Sharma  wrote:

> Counting clock sweeps is an intersting idea.  I think one concern was
> tracking hot buffers in cases where there is no memory pressure, and
> hence the clock sweep isn't running --- I am not sure how this would
> help in that case.
> 

Yes, we obviously want a virtual clock. Focusing on the use of gettimeofday 
seems silly to me: it was something quick for the prototype.

The problem with the clocksweeps is they don’t actually track the progression 
of “time” within the PostgreSQL system.

What’s wrong with using a transaction number or some similar sequence? It would 
accurately track “age” in the sense we care about: how long ago in “units of 
real work being done by the DB” something was added.

—Jason

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-18 Thread Atri Sharma

On Sat, Apr 19, 2014 at 1:07 AM, Bruce Momjian  wrote:

> On Fri, Apr 18, 2014 at 04:46:31PM +0530, Atri Sharma wrote:
> > This can be changed by introducing an ageing factor that sees how much
> time the
> > current buffer has spend in shared buffers. If the time that the buffer
> has
> > spent is large enough (relatively) and it is not hot currently, that
> means it
> > has had its chance and can be evicted. This shall save the new page (3)
> from
> > being evicted since it's time in shared buffers shall not be high enough
> to
> > mandate eviction and it shall be given more chances.
> >
> > Since gettimeofday() is an expensive call and hence cannot be done in
> the tight
> > loop, we can count the number of clocksweeps the current buffer has seen
> > (rather, survived). This shall give us a rough idea of the estimate of
> the
> > relative age of the buffer.
>
> Counting clock sweeps is an intersting idea.  I think one concern was
> tracking hot buffers in cases where there is no memory pressure, and
> hence the clock sweep isn't running --- I am not sure how this would
> help in that case.
>
>
I feel that if there is no memory pressure, frankly it doesnt matter much
about what gets out and what not. The case I am specifically targeting is
when the clocksweep gets to move about a lot i.e. high memory pressure
workloads. Of course,  I may be totally wrong here.

One thing that I discussed with Merlin offline and am now concerned about
is how will the actual eviction work. We cannot traverse the entire list
and then find all the buffers with refcount 0 and then do another traversal
to find the oldest one.

Any thoughts there would be appreciated.

Regards,

Atri

-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-18 Thread Bruce Momjian

On Fri, Apr 18, 2014 at 04:46:31PM +0530, Atri Sharma wrote:
> This can be changed by introducing an ageing factor that sees how much time 
> the
> current buffer has spend in shared buffers. If the time that the buffer has
> spent is large enough (relatively) and it is not hot currently, that means it
> has had its chance and can be evicted. This shall save the new page (3) from
> being evicted since it's time in shared buffers shall not be high enough to
> mandate eviction and it shall be given more chances.
> 
> Since gettimeofday() is an expensive call and hence cannot be done in the 
> tight
> loop, we can count the number of clocksweeps the current buffer has seen
> (rather, survived). This shall give us a rough idea of the estimate of the
> relative age of the buffer.

Counting clock sweeps is an intersting idea.  I think one concern was
tracking hot buffers in cases where there is no memory pressure, and
hence the clock sweep isn't running --- I am not sure how this would
help in that case.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] assertion failure 9.3.4

2014-04-18 Thread Josh Berkus

On 04/18/2014 09:42 AM, Andrew Dunstan wrote:

> There definitely seems to be something going on involving these two
> pre-loaded modules. With both auto_explain and pg_stat_statements
> preloaded I can reproduce the error fairly reliably. I have also
> reproduced it, but less reliably, with auto_explain alone loaded. I have
> not reproduced it with pg_stat_statements alone loaded, but Josh Berkus
> has reported that another client has experienced something very similar
> (duplicate PK errors on a non-assertion-enabled build) with
> pg_stat_statements alone loaded.

Correct.   However, this seems to produce the issue less "reliably";
that is, it takes several hours of load testing for a duplicate PK to
show up, so I suspect that with pg_stat_statements alone it's also
timing issue.  I'm still waiting on the results with
pg_stat_statements.so NOT loaded to confirm that we're not actually
getting two separate bugs with similar symptoms.

The second site does not have any increase in query size.  Their only
settings for pg_s_s are:

pg_stat_statements.max = 1
pg_stat_statements.track = all

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How can we make beta testing better?

2014-04-18 Thread Josh Berkus

On 04/18/2014 08:15 AM, Alvaro Herrera wrote:
> and see whether it still works at all for you.  I asked Josh
> specifically to mention it in a followup to this message which you can
> see in that thread.  There was no reply, which I took as "this isn't a
> new feature and isn't user visible anyway, so what would be the point?"

It's not possible for a single document to be both a press release
(which that was) and a technical list of things to test.  That needs to
be two different documents ... and the technical one needs to somehow
evolve, so that we can add things to it as we discover potential problem
areas.

This does show how we're not really doing anything to publicize betas as
"please test this" though.  For that matter, our /beta page doesn't
really inspire someone to jump up and test things.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DISCARD ALL (Again)

2014-04-18 Thread Joshua D. Drake



On 04/18/2014 08:01 AM, Peter Eisentraut wrote:


On 4/17/14, 4:44 PM, Joshua D. Drake wrote:

That we should also release the GD?


In some cases, SD or GD are used to cache things.  Having the connection
pooler blow that away would defeat the point.


Not on a per session basis. Although I can see your point. The GD is 
supposed to be global per session. If, I discard the session to it's 
original state, that is going to predate the creation of the GD. That is 
expected behavior.


JD









--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
Political Correctness is for cowards.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How can we make beta testing better?

2014-04-18 Thread Greg Stark

On Fri, Apr 18, 2014 at 4:15 PM, Alvaro Herrera
 wrote:
>  There was no reply, which I took as "this isn't a
> new feature and isn't user visible anyway, so what would be the point?"

To be fair the list was pretty long already. And like regression
testing, coming up with a list of things to test is kind of beside the
point. If we knew which things had to be tested then we would have
tested them already. The point of beta testing is to have people run
their applications on it in case their applications are doing stuff we
*didn't* anticipate needing to test.

I think the take-away here is that testing for long periods of time
under load and while using the full suite of complex configurations
simultaneously that they use in production (such as hot standby
replicas to spread load) is what's important.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] assertion failure 9.3.4

2014-04-18 Thread Andrew Dunstan



On 04/17/2014 10:15 AM, Andrew Dunstan wrote:


On 04/16/2014 10:28 PM, Tom Lane wrote:

Andrew Dunstan  writes:

On 04/16/2014 07:19 PM, Tom Lane wrote:
Yeah, it would be real nice to see a self-contained test case for 
this.

Well, that might be hard to put together, but I did try running without
pg_stat_statements and auto_explain loaded and the error did not occur.
Not sure where that gets us in terms of deciding on a culprit.
Could we at least see the exact parameter settings for 
pg_stat_statements

and auto_explain?  (And any other GUCs with nondefault values?)





Here are all the settings from the run that failed:

   listen_addresses = '*'
   port = 5432
   fsync = on
   synchronous_commit = off
   checkpoint_segments = 128
   checkpoint_completion_target = 0.9
   shared_buffers = 512MB
   max_connections = 300
   work_mem = 128MB
   maintenance_work_mem = 32MB
   effective_cache_size = 16GB
   effective_io_concurrency = 2
   logging_collector = on
   log_destination = 'stderr'
   log_filename = 'postgresql-%a.log'
   log_rotation_size = 0
   log_truncate_on_rotation = on
   log_line_prefix = '%t [%p] %l: '
   log_connections = on
   log_disconnections = on
   log_statement = 'all'
   track_activity_query_size = 10240
   shared_preload_libraries = 'auto_explain,pg_stat_statements'

As you can see, auto_explain's log_min_duration hasn't been set, so it 
shouldn't be doing anything very much, I should think.




There definitely seems to be something going on involving these two 
pre-loaded modules. With both auto_explain and pg_stat_statements 
preloaded I can reproduce the error fairly reliably. I have also 
reproduced it, but less reliably, with auto_explain alone loaded. I have 
not reproduced it with pg_stat_statements alone loaded, but Josh Berkus 
has reported that another client has experienced something very similar 
(duplicate PK errors on a non-assertion-enabled build) with 
pg_stat_statements alone loaded.


cheers

andrew





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-18 Thread Greg Stark

On Fri, Apr 18, 2014 at 4:14 PM, Robert Haas  wrote:
> I am a bit confused by this remark.  In *any* circumstance when you
> evict you're incurring precisely one page fault I/O when the page is
> read back in.   That doesn't mean that the choice of which page to
> evict is irrelevant.

But you might be evicting a page that will be needed soon or one that
won't be needed for a while. If it's not needed for a while you might
be able to avoid many page evictions by caching a page that will be
used several times.

If all the pages currently in RAM are hot -- meaning they're hot
enough that they'll be needed again before the page you're reading in
-- then they're all equally bad to evict.

I'm trying to push us away from the gut instinct that frequently used
pages are important to cache and towards actually counting how many
i/os we're saving. In the extreme it's possible to simulate any cache
algorithm on a recorded list of page requests and count how many page
misses it generates to compare it with an optimal cache algorithm.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How can we make beta testing better?

2014-04-18 Thread Alvaro Herrera

MauMau wrote:
> From: "Josh Berkus" 
> >How can we make beta testing better and more effective?  How can we get
> >more users to actually throw serious workloads at new versions and share
> >the results?
> >
> >I've tried a couple of things over the last two years and they haven't
> >worked all that well.  Since we're about to go into another beta testing
> >period, we need something new.  Ideas?
> 
> I've had a look at the mail for 9.3 beta 1:
> 
> http://www.postgresql.org/message-id/517c64f4.6000...@agliodbs.com

This call for testing is interesting: note there was not a single
mention of foreign key locking getting a huge change, and please test it
and see whether it still works at all for you.  I asked Josh
specifically to mention it in a followup to this message which you can
see in that thread.  There was no reply, which I took as "this isn't a
new feature and isn't user visible anyway, so what would be the point?"

But hey, what the hell do I know about advocacy anyway, huh?  So I shut
up.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-18 Thread Robert Haas

On Thu, Apr 17, 2014 at 5:00 PM, Greg Stark  wrote:
> On Thu, Apr 17, 2014 at 10:18 AM, Robert Haas  wrote:
>> Because all the usage counts are the same, the eviction at
>> this point is completely indiscriminate.  We're just as likely to kick
>> out a btree root page or a visibility map page as we are to kick out a
>> random heap page, even though the former have probably been accessed
>> several orders of magnitude more often.  That's clearly bad.
>
> That's not clear at all. In that circumstance regardless of what page
> you evict you're incurring precisely one page fault i/o when the page
> is read back in.

I am a bit confused by this remark.  In *any* circumstance when you
evict you're incurring precisely one page fault I/O when the page is
read back in.   That doesn't mean that the choice of which page to
evict is irrelevant.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DISCARD ALL (Again)

2014-04-18 Thread Tom Lane

Peter Eisentraut  writes:
> On 4/17/14, 8:24 PM, Tom Lane wrote:
>> We could in fact implement #2, I imagine, by destroying and recreating
>> the entire language interpreter.  So I could imagine implementing a
>> "DISCARD INTERPRETERS" kind of command that would zap the current
>> interpreter(s) for whichever PL languages happened to feel like
>> cooperating with the command.

> More generally, any extension could maintain any kind of cross-call
> state.  plproxy, dblink, pgmemcache come to mind.  A general hook into
> DISCARD might be doable, but then it's not clear how to categorize this
> into DISCARD subcommands.

Right.  So if we go down that path, we're basically giving up the ability
to define what DISCARD ALL does by specifying an equivalent list of
subcommands.  Maybe that's an acceptable tradeoff, but I don't like it
much.

I wonder whether we could have a "DISCARD identifier" variant where
any hooked-in extension could recognize the identifier and zap some
appropriate subset of its own state.  This would get us to a situation
where we could say "DISCARD ALL is the union of all DISCARD operations,
but some of those are documented in relevant extensions' docs rather
than here".

The permissions problem remains, of course.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] assertion in 9.4 with wal_level=logical

2014-04-18 Thread Andres Freund

On 2014-04-18 11:50:55 -0300, Alvaro Herrera wrote:
> It's a bit painful that HeapTupleHeaderGetUpdateXid allocates memory,
> but to fix that we would have to remove all allocations from
> GetMultiXactIdMembers which doesn't sound feasible.

I was thinking for a second we could just allocate something during
startup based on max_connections (+ other possible backends), but I
think that won't work because of subtransactions?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] assertion in 9.4 with wal_level=logical

2014-04-18 Thread Alvaro Herrera

Andres Freund wrote:
> On 2014-04-18 11:50:55 -0300, Alvaro Herrera wrote:
> > It's a bit painful that HeapTupleHeaderGetUpdateXid allocates memory,
> > but to fix that we would have to remove all allocations from
> > GetMultiXactIdMembers which doesn't sound feasible.
> 
> I was thinking for a second we could just allocate something during
> startup based on max_connections (+ other possible backends), but I
> think that won't work because of subtransactions?

Yeah, I don't think the number of members in a multixact is bound.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DISCARD ALL (Again)

2014-04-18 Thread Peter Eisentraut

On 4/17/14, 4:44 PM, Joshua D. Drake wrote:
> That we should also release the GD?

In some cases, SD or GD are used to cache things.  Having the connection
pooler blow that away would defeat the point.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DISCARD ALL (Again)

2014-04-18 Thread Peter Eisentraut

On 4/17/14, 8:24 PM, Tom Lane wrote:
> We could in fact implement #2, I imagine, by destroying and recreating
> the entire language interpreter.  So I could imagine implementing a
> "DISCARD INTERPRETERS" kind of command that would zap the current
> interpreter(s) for whichever PL languages happened to feel like
> cooperating with the command.

More generally, any extension could maintain any kind of cross-call
state.  plproxy, dblink, pgmemcache come to mind.  A general hook into
DISCARD might be doable, but then it's not clear how to categorize this
into DISCARD subcommands.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] assertion in 9.4 with wal_level=logical

2014-04-18 Thread Andres Freund

On 2014-04-18 16:44:55 +0200, Robert Haas wrote:
> On Thu, Apr 17, 2014 at 10:47 PM, Andres Freund  
> wrote:
> > On 2014-04-17 17:40:01 -0300, Alvaro Herrera wrote:
> >> For once, this looks more like a problem in logical decoding, which is
> >> trying to assert about the tuple being updated; the assertion failing is
> >> the one added a week ago about not palloc'ing in a critical section.
> >
> > It's this (older) assertion in HeapTupleHeaderGetCmax():
> >
> > 
> > Assert(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tup)));
> >
> > That can allocate memory if xmax is a multixact... Does anybody have a
> > better idea to solve this than adding a CritSectionCount == 0 && in
> > there?
> 
> Blech.  Isn't that just nerfing the assertion?

Not precicisely sure what you mean, but the only memory allocation in
HeapTupleHeaderGetCmax() and log_heap_new_cid() is that Assert(). And
that's the only "forbidden" thing in that codepath.
Now, we could alternatively restructure the codepaths so they pass in
xmax from outside the critical section, but I had a quick look and the
risk/complications from that seems bigger than the assertion buys us
there.
I don't have a better idea unfortunately :(

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] assertion in 9.4 with wal_level=logical

2014-04-18 Thread Alvaro Herrera

Robert Haas wrote:
> On Thu, Apr 17, 2014 at 10:47 PM, Andres Freund  
> wrote:
> > On 2014-04-17 17:40:01 -0300, Alvaro Herrera wrote:
> >> For once, this looks more like a problem in logical decoding, which is
> >> trying to assert about the tuple being updated; the assertion failing is
> >> the one added a week ago about not palloc'ing in a critical section.
> >
> > It's this (older) assertion in HeapTupleHeaderGetCmax():
> >
> > 
> > Assert(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tup)));
> >
> > That can allocate memory if xmax is a multixact... Does anybody have a
> > better idea to solve this than adding a CritSectionCount == 0 && in
> > there?
> 
> Blech.  Isn't that just nerfing the assertion?

Well, that's exactly the point.  Most of the time,
HeapTupleHeaderGetCmax gets called in a non-critical section, and we
want to run the assertion in that case.  But it's not huge trouble if
the assertion is not run in the rare case where HeapTupleHeaderGetCmax
is being used to write a Xlog record.

It's a bit painful that HeapTupleHeaderGetUpdateXid allocates memory,
but to fix that we would have to remove all allocations from
GetMultiXactIdMembers which doesn't sound feasible.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] assertion in 9.4 with wal_level=logical

2014-04-18 Thread Robert Haas

On Thu, Apr 17, 2014 at 10:47 PM, Andres Freund  wrote:
> On 2014-04-17 17:40:01 -0300, Alvaro Herrera wrote:
>> For once, this looks more like a problem in logical decoding, which is
>> trying to assert about the tuple being updated; the assertion failing is
>> the one added a week ago about not palloc'ing in a critical section.
>
> It's this (older) assertion in HeapTupleHeaderGetCmax():
>
> 
> Assert(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tup)));
>
> That can allocate memory if xmax is a multixact... Does anybody have a
> better idea to solve this than adding a CritSectionCount == 0 && in
> there?

Blech.  Isn't that just nerfing the assertion?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Get more from indices.

2014-04-18 Thread Tom Lane

Kyotaro HORIGUCHI  writes:
> Ok, I am convinced that your suggestion - truncating
> query_pathkeys by removing eventually no-op entries - seems
> preferable and will have wider effect naturally - more promised.

> I won't persist with the way this patch currently does but the
> new patch of course can't come up within this CF. I will agree if
> you decide to make this patch 'Returned with Feedback'. (I feel a
> little sad for 'Rejected' but you can justly do that if you think
> that the patch comming up next is utterly different from this
> one:()

I marked it as "returned with feedback".

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How can we make beta testing better?

2014-04-18 Thread MauMau


From: "Josh Berkus" 

How can we make beta testing better and more effective?  How can we get
more users to actually throw serious workloads at new versions and share
the results?

I've tried a couple of things over the last two years and they haven't
worked all that well.  Since we're about to go into another beta testing
period, we need something new.  Ideas?


I've had a look at the mail for 9.3 beta 1:

http://www.postgresql.org/message-id/517c64f4.6000...@agliodbs.com

Please excuse me for my misunderstanding and cheekiness.  I feel the 
followings may need to be addressed:


* Existing and (more importantly) new users don't know about the beta 
release.
I remember I saw the news about new features of MySQL on the daily mail news 
of some famous IT media, such as IDG's Computerworld, Infoworld, and 
Japanese IT industry news media.  On the other hand, I'm afraid I hadn't see 
news about new features of PostgreSQL until its final release was out.  Is 
subscribing to pgsql-announce the only way to know the beta release?


* Existing users are satisfied with the version they are using for their 
current use cases, so they don't have motivation to try a new release.
To increase the use cases of existing users and get more users, more 
attractive features may be necessary such as in-memory database, columnar 
database, MPP for scaleout, database compression, integration with Hadoop, 
multi-tenant database, Oracle compatibility (this should be very important 
to get many users in practice), etc.  I think eye-catching features like 
streaming replication and materialized views are necessary to widen 
PostgreSQL use.


* Explain new features by associating them with the new trends like cloud, 
mobile, social, big data.


Regards
MauMau





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] pg_ctl always uses the same event source

2014-04-18 Thread MauMau


From: "Amit Kapila" 

Currently -e option is accepted with all the options that can be provided
in pg_ctl.  Shouldn't we accept it only with options related to service,
because that is only when it will be used.  Basically write_stderr() will
write to event log only incase of service.


Thank you for your kind and patient review.  I hope this will bear fruit.

I don't find it so strange that -e is accepted by all operation modes of 
pg_ctl --- pg_ctl seems to handle switches that way.  For example, -c is 
only relevant to start and restart, but it is accepted by all modes.  -D is 
not relevant to pg_ctl kill, but it is not rejected.  Plus, I prepared for 
the future possibility that other modes of pg_ctl will use event log.



Another minor point is you have forgotten to remove below declaration:
+ static void get_config_value(const char *name, char *buf, int buf_size);


Oh, removed.


Sorry for delayed response and I am not sure that I will be able to
complete the review of patch in next few days as I will be on vacation.


OK, I'm waiting for you.  Please have a nice vacation.

Regards
MauMau


pg_ctl_eventsrc_v8.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-18 Thread Atri Sharma

On Fri, Apr 18, 2014 at 7:27 AM, Peter Geoghegan  wrote:

A way I have in mind about eviction policy is to introduce a way to have an
ageing factor in each buffer and take the ageing factor into consideration
when evicting a buffer.

Consider a case where a table is pretty huge and spread across multiple
pages. The querying pattern is like a time series pattern i.e. a set of
rows is queried pretty frequently for some time, making the corresponding
page hot. Then, the next set of rows is queried frequently making that page
hot and so on.

Consider a new page entering the shared buffers with refcount 1 and
usage_count 1. If that page is a part of the workload described above, it
is likely that it shall not be used for a considerable amount of time after
it has entered the buffers but will be used eventually.

Now, the current hypothetical situation is that we have three pages:

1) The page that used to be hot at the previous time window but is no
longer hot and is actually the correct candidate for eviction.
2) The current hot page (It wont be evicted anyway for now).
3) The new page which just got in and should not be evicted since it can be
hot soon (for this workload it will be hot in the next time window).

When Clocksweep algorithm runs the next time, it will see the new buffer
page as the one to be evicted (since page (1) may still have usage_count >
0 i.e. it may be 'cooling' but not 'cool' yet.)

This can be changed by introducing an ageing factor that sees how much time
the current buffer has spend in shared buffers. If the time that the buffer
has spent is large enough (relatively) and it is not hot currently, that
means it has had its chance and can be evicted. This shall save the new
page (3) from being evicted since it's time in shared buffers shall not be
high enough to mandate eviction and it shall be given more chances.

Since gettimeofday() is an expensive call and hence cannot be done in the
tight loop, we can count the number of clocksweeps the current buffer has
seen (rather, survived). This shall give us a rough idea of the estimate of
the relative age of the buffer.

When an eviction happens, all the candidates with refcount = 0 shall be
taken.Then, among them, the one with highest ageing factor shall be evicted.

Of course, there may be better ways of doing the same, but I want to
highlight the point (or possibility) of introducing an ageing factor to
prevent eviction of relatively younger pages early in the eviction process.

The overhead isnt too big. We just need to add another attribute in buffer
header for the number of clocksweeps seen (rather, survived) and check it
when an eviction is taking place.The existing spinlock for buffer headers
shall be good for protecting contention and access. The access rules can be
similar to that of usage_count.

Thoughts and comments?

Regards,

Atri

-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Typo fix in src/backend/access/transam/recovery.conf.sample

2014-04-18 Thread Amit Langote

On Fri, Apr 18, 2014 at 7:50 PM, Magnus Hagander  wrote:
>
> On Fri, Apr 18, 2014 at 12:24 PM, Amit Langote 
> wrote:
>>
>> Hi,
>>
>> Attached fixes a minor typo in
>> src/backend/access/transam/recovery.conf.sample.
>
>
> Applied, except I also rewrapped the line to make it shorter. Thanks!
>

Ah, thanks!


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Typo fix in src/backend/access/transam/recovery.conf.sample

2014-04-18 Thread Magnus Hagander

On Fri, Apr 18, 2014 at 12:24 PM, Amit Langote wrote:

> Hi,
>
> Attached fixes a minor typo in
> src/backend/access/transam/recovery.conf.sample.
>

Applied, except I also rewrapped the line to make it shorter. Thanks!

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

[HACKERS] Typo fix in src/backend/access/transam/recovery.conf.sample

2014-04-18 Thread Amit Langote

Hi,

Attached fixes a minor typo in src/backend/access/transam/recovery.conf.sample.

--
Amit


recovery-conf-sample-typo-fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Get more from indices.

2014-04-18 Thread Kyotaro HORIGUCHI

Hello,

> I thought some more about this patch, and realized that it's more or less
> morally equivalent to allowing references to ungrouped variables when the
> query has a GROUP BY clause listing all the columns of the primary key.
> In that case the parser is effectively pretending that the GROUP BY list
> contains additional implicit entries that are functionally dependent on
> the entries that are actually there.  In this patch, what we want to do
> is recognize that trailing entries in an ORDER BY list are semantically
> no-ops and can be ignored because they are functionally dependent on
> earlier entries.

Ah, that sounds smarter than extending pathekys. I feel it preferable.

> Now, the reason that the parser restricts the functional dependency
> deduction to a primary key is that it wants to be able to identify a
> constraint OID that the query is dependent on to be semantically valid.
> In this case, we don't need such an OID, so just finding any old unique
> index on not-null columns is good enough.  (If someone drops the index,
> the optimization might become incorrect, but that would force replanning
> anyway.)

Agreed,

> However, this way of thinking about it shows that the patch is missing
> possible optimizations.  If we have "ORDER BY a, b, c" and (a,b) is the
> primary key, then including c in the ORDER BY list is semantically
> redundant, *whether or not we use an indexscan on the pkey index at all*.
> More: if we have "ORDER BY a, b, c" and the primary key is (b,a), we
> can still discard c from the sort requirement, even though the pkey
> index as such isn't helpful for producing the required order.

Hmm yes, it really seems expectable.

> So hacking up the pathkeys attributed to the indexscan is the wrong thing.
> Rather, what we should be looking to do is decide that c is a useless
> pathkey and remove it from the query_pathkeys, much as we'd do if we found
> "c = constant" in WHERE.  That would allow optimization of other query
> plans besides scan-the-pkey-index plans.

Ok, I am convinced that your suggestion - truncating
query_pathkeys by removing eventually no-op entries - seems
preferable and will have wider effect naturally - more promised.

I won't persist with the way this patch currently does but the
new patch of course can't come up within this CF. I will agree if
you decide to make this patch 'Returned with Feedback'. (I feel a
little sad for 'Rejected' but you can justly do that if you think
that the patch comming up next is utterly different from this
one:()

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Minor improvement in gin_private.h

2014-04-18 Thread Etsuro Fujita

The attached improves a comment in gin_private.h a little bit.

Thanks,

Best regards,
Etsuro Fujita
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
index 81a3bee..3aa4276 100644
--- a/src/include/access/gin_private.h
+++ b/src/include/access/gin_private.h
@@ -130,7 +130,7 @@ typedef struct GinMetaPageData
 #define GinPageRightMost(page) ( GinPageGetOpaque(page)->rightlink == 
InvalidBlockNumber)
 
 /*
- * We use our own ItemPointerGet(BlockNumber|GetOffsetNumber)
+ * We use our own ItemPointerGet(BlockNumber|OffsetNumber)
  * to avoid Asserts, since sometimes the ip_posid isn't "valid"
  */
 #define GinItemPointerGetBlockNumber(pointer) \

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Minor improvement in src/backend/access/gin/README

2014-04-18 Thread Etsuro Fujita

The attached improves a document in src/backend/access/gin/README.

Thanks,

Best regards,
Etsuro Fujita
diff --git a/src/backend/access/gin/README b/src/backend/access/gin/README
index 3f0c3e2..8a71d28 100644
--- a/src/backend/access/gin/README
+++ b/src/backend/access/gin/README
@@ -181,7 +181,7 @@ algorithms.
   GinSetPostingTree macro.
 
 * If IndexTupleHasNulls(itup) is true, the null category byte can be
-  accessed/set with GinGetNullCategory(itup) / GinSetNullCategory(itup,c)
+  accessed/set with GinGetNullCategory(itup,gs) / GinSetNullCategory(itup,gs,c)
 
 * The posting list is not present and must not be accessed.
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Archive recovery won't be completed on some situation.

2014-04-18 Thread Kyotaro HORIGUCHI

Hello,

> I don't think we should consider changing long-established behavior in
> the back-branches.  The old behavior may not be ideal, but that
> doesn't mean it's a bug.

Ok, I understand that. I give it up.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New windows compiler warning from 585bca39

2014-04-18 Thread Bruce Momjian

On Thu, Apr 17, 2014 at 04:14:55PM -0400, Tom Lane wrote:
> > Perhaps the #ifndef could be placed in a nicer spot in the patch, but the
> > attached should at least describe where the problem lies...
> 
> Yeah, I thought it better to make a separate declaration to wrap in
> #ifndef.  pgindent is probably going to insist on adding some vertical
> whitespace around the #if, and that'll look horrid if it's just in the
> middle of a list of variables.

Uh, I don't think that is the case anymore.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] client encoding that psql command sets

2014-04-18 Thread Albe Laurenz

Bruce Momjian wrote:
>> I suggest the attached documentation fix.
> 
> Patch applied and backpatched to 9.3.  Thanks.

What would PostgreSQL do without Bruce who undertakes the
Herculean task of making sure that nothing gets forgotten
and slips through the cracks?

Thanks!

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: iff -> if

2014-04-18 Thread Albe Laurenz

Nicolas Barbier wrote:
> 2014-04-17 Michael Paquier :
>> Is there no equivalent in German? For example in French there is "ssi".
> 
> gdw (genau dann, wenn)

Sorry, but I as a German native speaker and mathematitian have never
encountered this abbreviation.  I am familiar with "iff" though.

I think there just isn't a widely known German equivalent, and
(as Andreas has mentioned) symbols are usually used in writing.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

37 matches

Mail list logo