date:20120531

Re: [HACKERS] Create collation incorrect error code

2012-05-31 Thread Tom Lane

Magnus Hagander  writes:
> On Thu, May 31, 2012 at 8:16 PM, Tom Lane  wrote:
>> What platform?

> Ubuntu 12.04.

FWIW, I get the same on Fedora 16.  It works if I say
locale='nb_NO.utf8' to prevent case-folding, so apparently case of the
locale name has something to do with it ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] slow dropping of tables, DropRelFileNodeBuffers, tas

2012-05-31 Thread Simon Riggs

On 31 May 2012 19:09, Sergey Koposov  wrote:
> On Thu, 31 May 2012, Simon Riggs wrote:
>
>>
>> That struck me as a safe and easy optimisation. This was a problem I'd
>> been trying to optimise for 9.2, so I've written a patch that appears
>> simple and clean enough to be applied directly.
>
>
> Thanks! The patch indeed improved the timings, The dropping of 100 tables in
> a single commit before the patch took ~ 50 seconds, now it takes ~ 5 sec

Thanks for the timing.

>(it
> would be nice to reduce it further though, because the dropping of 1
> tables still takes ~10 min).

Why do you have 10,000 tables and why is it important to drop them so quickly?

If its that important, why not run the drop in parallel sessions?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.

2012-05-31 Thread Tom Lane

Robert Haas  writes:
> Now, measuring time skew is potentially a useful thing to do, if we
> believe that this will actually give us an accurate measurement of
> what the time skew is, because there are a whole series of things that
> people want to do which involve subtracting a slave timestamp from a
> master timestamp.  Tom has persistently rebuffed all such proposals on
> the grounds that there might be time skew, so in theory we could make
> those things possible by having a way to measure time skew, which this
> does.  Here's what we do: given a slave timestamp, add the estimated
> time skew to find an equivalent master timestamp, and then subtract.
> Using a method of this type would allow us to compute a *real* apply
> delay.  Woohoo!  Unfortunately, if time synchronization IS in use,
> then the system clocks are probably already synchronized three to six
> orders of magnitude more precisely than what this method can measure,
> so the effect of using GetReplicationTransferLatency() to adjust slave
> timestamps will be to massively reduce the accuracy of such
> calculations.  However, I've thus far been unable to convince anyone
> that this is a bad idea, so maybe this is where we're gonna end up.

Hmm ... first question is do we actually care whether the clocks are
synced to the millisecond level, ie what is it you'd do differently
if you know that the master and slave clocks are synced more closely
than you can measure at the protocol level.

But if there is a reason to care, perhaps we could have a setting that
says "we're using NTP, so trust the clocks to be synced"?  What I object
to is assuming that without any evidence, or being unable to operate
correctly in an environment where it's not true.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-31 Thread Jeff Janes

On Wed, May 30, 2012 at 6:10 PM, Sergey Koposov  wrote:
> On Wed, 30 May 2012, Jeff Janes wrote:
>
>> But anyway, is idt_match a fairly static table?  If so, I'd partition
>> that into 16 tables, and then have each one of your tasks join against
>> a different one of those tables.  That should relieve the contention
>> on the index root block, and might have some other benefits as well.
>
>
> No, idt_match is getting filled by multi-threaded copy() and then joined
> with 4 other big tables like idt_phot. The result is then split into
> partitions.

That does make things more complicated.  But you could you partition
it at that level and then do the joins partition-wise?

I don't have much experience at data partitioning (well, I do, but the
experience is with partitioning in Perl with terabytes of flat files,
not in PG :) ) but I think that once you have your partitioning keys
you want to apply them the same way up and down the data set.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] slow dropping of tables, DropRelFileNodeBuffers, tas

2012-05-31 Thread Jeff Janes

On Thu, May 31, 2012 at 11:09 AM, Sergey Koposov  wrote:
> On Thu, 31 May 2012, Simon Riggs wrote:
>
>>
>> That struck me as a safe and easy optimisation. This was a problem I'd
>> been trying to optimise for 9.2, so I've written a patch that appears
>> simple and clean enough to be applied directly.
>
>
> Thanks! The patch indeed improved the timings, The dropping of 100 tables in
> a single commit before the patch took ~ 50 seconds, now it takes ~ 5 sec (it
> would be nice to reduce it further though, because the dropping of 1
> tables still takes ~10 min).

I'm surprised it helped that much.  I thought the most it could
theoretically could help would be a factor of 4.

I tried the initially unlocked test, and for me it cut the time by a
factor of 3.  But I only have a 1GB shared_buffers at the max, I would
expect it help more at larger sizes because there is a constant
overhead not related to scanning the shared buffers which gets diluted
out the larger shared_buffers is.

I added to that a drop-all very similar to what Simon posted and got
another factor of 3.

But, if you can do this during a maintenance window, then just
restarting with a much smaller shared_buffers should give you a much
larger speed up than either or both of these.  If I can extrapolate up
to 10G from my current curve, setting it to 8MB instead would give a
speed up of nearly 400 fold.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-31 Thread Jeff Janes

On Thu, May 31, 2012 at 11:50 AM, Robert Haas  wrote:
>
> This test case is unusual because it hits a whole series of buffers
> very hard.  However, there are other cases where this happens on a
> single buffer that is just very, very hot, like the root block of a
> btree index, where the pin/unpin overhead hurts us.

I think that very very hot page is also the problem here, not a whole
sequence of hot pages.  Most of his buffer content sh lwlocks are on
just two buffers, and most of his blocked buffer mapping lwlocks on
are on just two partitions.  So I am guessing that almost all of his
spin-lock contention from Pin and Unpin are also coming from those
same two buffers.  Why there are two buffers when there is only one
index root block involved, I don't know.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-31 Thread Jeff Janes

On Thu, May 31, 2012 at 9:17 AM, Robert Haas  wrote:
>
> Oh, ho.  So from this we can see that the problem is that we're
> getting huge amounts of spinlock contention when pinning and unpinning
> index pages.
>
> It would be nice to have a self-contained reproducible test case for
> this, so that we could experiment with it on other systems.

I just posted a patch under subject "pgbench--new transaction type"
that introduces a pgbench -P option.

I think that that would do a good job of simulating unique-key
look-ups on the inner side of a nested loop (which is basically what
we have here) and so creating contention on index pages.  Right now I
don't have anything with more than 2 CPUs and 2 is not high enough to
get much contention so I can't post any meaningful numbers.  (pgbench
-P might also be of interest in hash index investigation)

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-31 Thread Jeff Janes

On Sun, May 27, 2012 at 11:45 AM, Sergey Koposov  wrote:
> Hi,
>
> I did another test using the same data and the same code, which I've
> provided before and the performance of the single thread seems to be
> degrading quadratically with the number of threads.
>
> Here are the results:
> Nthreads Time_to_execute_one_thread
> 1 8.1
> 2 7.8
> 3 8.1
> 4 9.0
> 5 10.2
> 6 11.4
> 7 13.3
> 8 16.1
> 9 19.0
> 10 21.4
> 11 23.8
> 12 27.3
> 13 30.2
> 14 32.0
> 15 34.1
> 16 37.5

This looks more skew hyperbolic than quadratic.  In the higher region,
it doesn't seem to be increasing at an increasing rate, but rather
increasing at a constant rate of about 2.5 sec per additional backend,
i.e. converging to an asymptote with a slope of 2.5.  A pedantic
point, perhaps, but if you are trying to infer the nature of the
bottleneck from the shape of the curve, they lead to rather different
sets of hypotheses.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench--new transaction type

2012-05-31 Thread Jeff Janes

On Sun, Jun 19, 2011 at 3:30 PM, Greg Smith  wrote:
> I applied Jeff's patch but changed this to address concerns about the
> program getting stuck running for too long in the function:
>
> #define plpgsql_loops   512
>
> This would be better named as "plpgsql_batch_size" or something similar
> instead, the current name suggests it's how many loops to run which is
> confusing.
>
> My main performance concern here was whether this change really matter so
> much once a larger number of clients were involved.  Some of the other
> things you can do to optimize single-client performance aren't as useful
> with lots of them.  Here's how the improvements in this mode worked for me
> on a server with 4 Hyper-Threaded cores (i870); shared_buffers=256MB,
> scale=100:
>
> 1 client:
> -S: 11533
> -S -M prepared: 19498
> -P: 49547
>
> 12 clients, 4 workers:
> -S:  56052
> -S -M prepared: 82043
> -P: 159443
>
> 96 clients, 8 workers:
> -S: 49940
> -S -M prepared: 76099
> -P: 137942
>
> I think this is a really nice new workload to demonstrate.  One of the
> things we tell people is that code works much faster when moved server-side,
> but how much faster isn't always easy to show.  Having this mode available
> lets them see how dramatic that can be quite easily.  I know I'd like to be
> able to run performance tests for clients of new hardware using PostgreSQL
> and tell them something like this:  "With simple clients executing a
> statement at a time, this server reaches 56K SELECTs/section.  But using
> server-side functions to execute them in larger batches it can do 159K".
>
> The value this provides for providing an alternate source for benchmark load
> generation, with a very different profile for how it exercises the server,
> is good too.
>
> Things to fix in the patch before it would be a commit candidate:
>
> -Adjust the loop size/name, per above
> -Reformat some of the longer lines to try and respect the implied right
> margin in the code formatting
> -Don't include the "plgsql function created." line unless in debugging mode.
> -Add the docs.  Focus on how this measures how fast the database can execute
> SELECT statements using server-side code.  An explanation that the
> "transaction" block size is 512 is important to share.  It also needs a
> warning that time based runs ("-T") may have to wait for a block to finish
> and go beyond its normally expected end time.
> -The word "via" in the "transaction type" output description is probably not
> the best choice.  Changing to "SELECT only using PL/pgSQL" would translate
> better, and follow the standard case use for the name of that language.

Sorry it has taken me a year to get back to this patch.  I have wanted
to use it, and to ask other people to run it and report their results,
several time recently, so I would like to get it into the core.

I've attached a new patch which addresses several of your concerns,
and adds the documentation.  The description is much longer than the
descriptions of other nearby options, which mostly just give a simple
statement of what they do rather than a description of why that is
useful.  I don't know if that means I'm starting a good trend, or a
bad one, or I'm just putting the exposition in the wrong place.

In addition to showing the benefits of coding things on the server
side when that is applicable, it also allows hackers to stress parts
of the server code that are not easy to stress otherwise.

Cheers,

Jeff


pgbench_loop_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Figuring out shared buffer pressure

2012-05-31 Thread Daniel Farina

On Thu, May 31, 2012 at 10:11 AM, Bruce Momjian  wrote:
> However, this doesn't help people configure shared buffers larger (e.g.
> 35%) if their working set is larger.  Right now, I don't see how a user
> would know this is happening.  On the flip side, they might have a
> smaller working set than 25% and spending the overhead of managing 1
> million shared buffers.  Again, there is no way to know if that is the
> case.

Another important use case: downgrades.  They do happen, and right now
are amazingly risky and made with limited information.  Clearly a most
complete picture is impossible because of reliance on kernel buffer
management, but knowing the PG buffer pool occupancy and flux seems
like it'd be so much better than knowing nothing, and it is likely
that some conservative intuition could be learned to perform
relatively safe downgrades.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-31 Thread Robert Haas

On Thu, May 31, 2012 at 3:25 PM, Merlin Moncure  wrote:
> Hm, couple questions: how do you determine if/when to un-nail a
> buffer, and who makes that decision (bgwriter?)

Well, I think some experimentation might be required, but my first
thought is to tie it into buffer eviction.  If we record a pin in some
side array in lieu of acquiring the buffer header spinlock, then we're
not bumping the usage count.  So perhaps what we might do is - when we
first nail the buffer, we set it's usage count to something much
higher than what's normally allowed, like say 50.  When it gets down
to 0 and somebody tries to evict it, we scan the per-backend arrays
and count the number of pins we find there, and set the usage count to
5 * that number.

The details could be completely wrong here.  The point is that I think
every so often we can just reconsider whether the buffer is still
worthy of being nailed.  Exactly how often we should reconsider it
seems like an empirical question.

> Is there a limit to
> how many buffers you are allowed to nail?

Probably not.  The system had better be designed so that the threshold
for nailing a buffer is pretty high.  It should be exceedingly
unlikely for more than a small percentage of buffers to meet the
criteria; if it happens, then there's probably a good reason.

> It seems like a much
> stronger idea, but one downside I see vs the 'pin for longer idea' i
> was kicking around was how to deal stale nailed buffers and keeping
> them from uncontrollably growing so that you have to either stop
> nailing or forcibly evicting them.

I think we're really talking about variants on the same theme.  My
idea is basically "pin forever", which is taking "longer" to its
logical extreme, but even if it were something short of forever I
think you'd still need some way to flush the stale pins.  Otherwise,
vacuum can get stuck, etc.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Interface of Row Level Security

2012-05-31 Thread Kohei KaiGai

2012/5/31 Kohei KaiGai :
> 2012/5/31 Robert Haas :
>>> If we would have an "ideal optimizer", I'd still like the optimizer to
>>> wipe out redundant clauses transparently, rather than RLSBYPASS
>>> permissions, because it just controls all-or-nothing stuff.
>>> For example, if tuples are categorized to unclassified, classified or
>>> secret, and RLS policy is configured as:
>>>  ((current_user IN ('alice', 'bob') AND X IN ('unclassified',
>>> 'classified')) OR (X IN 'unclassified)),
>>> superuser can see all the tuples, and alice and bob can see
>>> up to classified tuples.
>>> Is it really hard to wipe out redundant condition at planner stage?
>>> If current_user is obviously 'kaigai', it seems to me the left-side of
>>> this clause can be wiped out at the planner stage.
>>> Do I consider the issue too simple?
>>
>> Yes.  :-)
>>
>> There are two problems.  First, if using the extended query protocol
>> (e.g. pgbench -M prepared) you can prepare a statement just once and
>> then execute it multiple times.  In this case, stable-functions cannot
>> be constant-folded at plan time, because they are only guaranteed to
>> remain constant for a *single* execution of the query, not for all
>> executions of the query.  So any optimization in this area would have
>> to be limited to cases where the simple query protocol is used.  I
>> think that might still be worth doing, but it's a significant
>> limitation, to be sure.  Second, at present, there is no guarantee
>> that the snapshot used for planning the query is the same as the
>> snapshot used for executing the query, though commit
>> d573e239f03506920938bf0be56c868d9c3416da made that happen in some
>> common cases.  If we were to do constant-folding of stable functions
>> using the planner snapshot, it would represent a behavior change from
>> previous releases.  I am not clear whether that has any real-world
>> consequences that we should be worried about.  It seems to me that the
>> path of least resistance might be to refactor the portal stuff so that
>> we can provide a uniform guarantee that, when using the simple query
>> protocol, the planner and executor snapshots will be the same ... but
>> I might be wrong.
>>
> It may be an option to separate the case into two; a situation to execute
> the given query immediately just after optimization and never reused,
> and others.
> Even though the second situation, it may give us better query execution
> plan, if we try to reconstruct query plan just before executor with
> assumption that expects immutable / stable function can be replaced
> by constant value prior to execution.
> In other words, this idea tries to query optimization again on EXECUTE
> statement against to its nature, to replace immutable / stable functions
> by constant value, and to generate wiser execute plan.
> At least, it may make sense to have a flag on prepared statement to
> indicate whether it has possible better plan with this re-construction.
>
> Then, if so, we will be able to push the stuff corresponding to
> RLSBYPASS into the query optimization, and works transparently
> for users.
>
> Isn't it feasible to implement?
>
If we could replace a particular term that consists of constant values
and stable / immutable functions only by parameter references,
it may enable to handle the term as if a constant value, but actual
calculation is delayed to executor stage.

For example, according to this idea,
  PREPARE p1(int) AS SELECT * FROM tbl WHERE
  current_user in ('alice','bob') AND X > $1;
shall be internally rewritten to,
  PREPARE p1(int) AS SELECT * FROM tbl WHERE
  $2 AND X>$1;

then, $2 is implicitly calculated just before execution of this prepared
statement. The snapshot to be used for this calculation is same with
executor's one. It seems to me it is a feasible idea with less invasive
implementation to existing planner.

Does it make sense to describe exceptional condition using regular
clause, instead of special permission?

Thanks,
-- 
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Interface of Row Level Security

2012-05-31 Thread Kohei KaiGai

2012/5/31 Robert Haas :
>> If we would have an "ideal optimizer", I'd still like the optimizer to
>> wipe out redundant clauses transparently, rather than RLSBYPASS
>> permissions, because it just controls all-or-nothing stuff.
>> For example, if tuples are categorized to unclassified, classified or
>> secret, and RLS policy is configured as:
>>  ((current_user IN ('alice', 'bob') AND X IN ('unclassified',
>> 'classified')) OR (X IN 'unclassified)),
>> superuser can see all the tuples, and alice and bob can see
>> up to classified tuples.
>> Is it really hard to wipe out redundant condition at planner stage?
>> If current_user is obviously 'kaigai', it seems to me the left-side of
>> this clause can be wiped out at the planner stage.
>> Do I consider the issue too simple?
>
> Yes.  :-)
>
> There are two problems.  First, if using the extended query protocol
> (e.g. pgbench -M prepared) you can prepare a statement just once and
> then execute it multiple times.  In this case, stable-functions cannot
> be constant-folded at plan time, because they are only guaranteed to
> remain constant for a *single* execution of the query, not for all
> executions of the query.  So any optimization in this area would have
> to be limited to cases where the simple query protocol is used.  I
> think that might still be worth doing, but it's a significant
> limitation, to be sure.  Second, at present, there is no guarantee
> that the snapshot used for planning the query is the same as the
> snapshot used for executing the query, though commit
> d573e239f03506920938bf0be56c868d9c3416da made that happen in some
> common cases.  If we were to do constant-folding of stable functions
> using the planner snapshot, it would represent a behavior change from
> previous releases.  I am not clear whether that has any real-world
> consequences that we should be worried about.  It seems to me that the
> path of least resistance might be to refactor the portal stuff so that
> we can provide a uniform guarantee that, when using the simple query
> protocol, the planner and executor snapshots will be the same ... but
> I might be wrong.
>
It may be an option to separate the case into two; a situation to execute
the given query immediately just after optimization and never reused,
and others.
Even though the second situation, it may give us better query execution
plan, if we try to reconstruct query plan just before executor with
assumption that expects immutable / stable function can be replaced
by constant value prior to execution.
In other words, this idea tries to query optimization again on EXECUTE
statement against to its nature, to replace immutable / stable functions
by constant value, and to generate wiser execute plan.
At least, it may make sense to have a flag on prepared statement to
indicate whether it has possible better plan with this re-construction.

Then, if so, we will be able to push the stuff corresponding to
RLSBYPASS into the query optimization, and works transparently
for users.

Isn't it feasible to implement?

Thanks,
-- 
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.

2012-05-31 Thread Robert Haas

On Wed, May 30, 2012 at 12:17 PM, Fujii Masao  wrote:
> OTOH, I wonder whether we really need to send keepalive messages
> periodically to calculate a network latency. ISTM we don't unless a network
> latency varies from situation to situation so frequently and we'd like to
> monitor that in almost real time.

I didn't look at this patch too carefully when it was committed.
Looking at it more carefully now, it looks to me like this patch does
two different things.  One is to add a function called
GetReplicationApplyDelay(), which returns the number of milliseconds
since replay was fully caught up.  So if you were last caught up 5
minutes ago and you have replayed 4 minutes and 50 seconds worth of
WAL during that time, this function will return 5 minutes, not 10
seconds.  That is not what I would call "apply delay", which I would
define as how far behind you are NOW, not how long it's been since you
weren't behind at all.

The second thing it does is add a function called
GetReplicationTransferLatency().  The return value of this function is
the difference between the slave's clock at the time the last master
keepalive was processed and the master's clock at the time that
keepalive was generated.  I think that in practice, unless network
time synchronization is in use, this is mostly going to be computing
the clock skew between the master and the slave. If time
synchronization is in use, then as you say it'll be a very jittery
measure of master-slave network latency, which can be monitored
perfectly well from outside PG.

Now, measuring time skew is potentially a useful thing to do, if we
believe that this will actually give us an accurate measurement of
what the time skew is, because there are a whole series of things that
people want to do which involve subtracting a slave timestamp from a
master timestamp.  Tom has persistently rebuffed all such proposals on
the grounds that there might be time skew, so in theory we could make
those things possible by having a way to measure time skew, which this
does.  Here's what we do: given a slave timestamp, add the estimated
time skew to find an equivalent master timestamp, and then subtract.
Using a method of this type would allow us to compute a *real* apply
delay.  Woohoo!  Unfortunately, if time synchronization IS in use,
then the system clocks are probably already synchronized three to six
orders of magnitude more precisely than what this method can measure,
so the effect of using GetReplicationTransferLatency() to adjust slave
timestamps will be to massively reduce the accuracy of such
calculations.  However, I've thus far been unable to convince anyone
that this is a bad idea, so maybe this is where we're gonna end up.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-31 Thread Merlin Moncure

On Thu, May 31, 2012 at 1:50 PM, Robert Haas  wrote:
> On Thu, May 31, 2012 at 2:03 PM, Merlin Moncure  wrote:
>> On Thu, May 31, 2012 at 11:54 AM, Sergey Koposov  
>> wrote:
>>> On Thu, 31 May 2012, Robert Haas wrote:
>>>
 Oh, ho.  So from this we can see that the problem is that we're
 getting huge amounts of spinlock contention when pinning and unpinning
 index pages.

 It would be nice to have a self-contained reproducible test case for
 this, so that we could experiment with it on other systems.
>>>
>>>
>>> I have created it a few days ago:
>>> http://archives.postgresql.org/pgsql-hackers/2012-05/msg01143.php
>>>
>>> It is still valid. And I'm using exactly it to test. The only thing to
>>> change is to create a two-col index and drop another index.
>>> The scripts are precisely the ones I'm using now.
>>>
>>> The problem is that in order to see a really big slowdown (10 times slower
>>> than a single thread) I've had to raise the buffers to 48g but it was slow
>>> for smaller shared buffer settings as well.
>>>
>>> But I'm not sure how sensitive the test is to the hardware.
>>
>> It's not: high contention on spinlocks is going to suck no matter what
>> hardware you have.   I think the problem is pretty obvious now: any
>> case where multiple backends are scanning the same sequence of buffers
>> in a very tight loop is going to display this behavior.  It doesn't
>> come up that often: it takes a pretty unusual sequence of events to
>> get a bunch of backends hitting the same buffer like that.
>>
>> Hm, I wonder if you could alleviate the symptoms by making making the
>> Pin/UnpinBuffer smarter so that frequently pinned buffers could stay
>> pinned longer -- kinda as if your private ref count was hacked to be
>> higher in that case.   It would be a complex fix for a narrow issue
>> though.
>
> This test case is unusual because it hits a whole series of buffers
> very hard.  However, there are other cases where this happens on a
> single buffer that is just very, very hot, like the root block of a
> btree index, where the pin/unpin overhead hurts us.  I've been
> thinking about this problem for a while, but it hasn't made it up to
> the top of my priority list, because workloads where pin/unpin is the
> dominant cost are still relatively uncommon.  I expect them to get
> more common as we fix other problems.
>
> Anyhow, I do have some vague thoughts on how to fix this.  Buffer pins
> are a lot like weak relation locks, in that they are a type of lock
> that is taken frequently, but rarely conflicts.  And the fast-path
> locking in 9.2 provides a demonstration of how to handle this kind of
> problem efficiently: making the weak, rarely-conflicting locks
> cheaper, at the cost of some additional expense when a conflicting
> lock (in this case, a buffer cleanup lock) is taken.  In particular,
> each backend has its own area to record weak relation locks, and a
> strong relation lock must scan all of those areas and migrate any
> locks found there to the main lock table.  I don't think it would be
> feasible to adopt exactly this solution for buffer pins, because page
> eviction and buffer cleanup locks, while not exactly common, are
> common enough that we can't require a scan of N per-backend areas
> every time one of those operations occurs.
>
> But, maybe we could have a system of this type that only applies to
> the very hottest buffers.  Suppose we introduce two new buffer flags,
> BUF_NAILED and BUF_NAIL_REMOVAL.  When we detect excessive contention
> on the buffer header spinlock, we set BUF_NAILED.  Once we do that,
> the buffer can't be evicted until that flag is removed, and backends
> are permitted to record pins in a per-backend area protected by a
> per-backend spinlock or lwlock, rather than in the buffer header.
> When we want to un-nail the buffer, we set BUF_NAIL_REMOVAL.

Hm, couple questions: how do you determine if/when to un-nail a
buffer, and who makes that decision (bgwriter?) Is there a limit to
how many buffers you are allowed to nail?  It seems like a much
stronger idea, but one downside I see vs the 'pin for longer idea' i
was kicking around was how to deal stale nailed buffers and keeping
them from uncontrollably growing so that you have to either stop
nailing or forcibly evicting them.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] extending relations more efficiently

2012-05-31 Thread Jeff Davis

On Tue, 2012-05-01 at 10:08 -0400, Robert Haas wrote:
> We've previously discussed the possible desirability of extending
> relations in larger increments, rather than one block at a time, for
> performance reasons.  I attempted to determine how much performance we
> could possibly buy this way, and found that, as far as I can see, the
> answer is, basically, none.

Another point here is that with checksums, we will want to make sure
that zero pages can be treated as corrupt. That will probably involve
using the WAL for extension operations, and we'll want to mitigate that
cost somehow. Extending in larger chunks would probably be necessary.

There are some challenges there, but I think it's worth pursuing.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] \conninfo and SSL

2012-05-31 Thread Magnus Hagander

On Thu, May 31, 2012 at 9:04 PM, Bruce Momjian  wrote:
> On startup, psql shows the SSL information:
>
>        $ psql 'sslmode=require host=localhost'
>        psql (9.2beta1)
>        SSL connection (cipher: DHE-RSA-AES256-SHA, bits: 256)
>
> However, \conninfo does not mention SSL:
>
>        postgres=> \conninfo
>        You are connected to database "postgres" as user "postgres" on
>        host "localhost" at port "5432".
>
> Should \conninfo mention SSL?  Fortunately \c shows SSL information:
>
>        postgres=> \c
>        SSL connection (cipher: DHE-RSA-AES256-SHA, bits: 256)
>        You are now connected to database "postgres" as user "postgres".

Well, \c creates a new connection, so that's not really an information
command. It might not connect that trivially, depending on what
authentication method you use.

Including ssl info in \conninfo would be useful, I think.

//Magnus

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] \conninfo and SSL

2012-05-31 Thread Bruce Momjian

On startup, psql shows the SSL information:

$ psql 'sslmode=require host=localhost'
psql (9.2beta1)
SSL connection (cipher: DHE-RSA-AES256-SHA, bits: 256)

However, \conninfo does not mention SSL:

postgres=> \conninfo
You are connected to database "postgres" as user "postgres" on 
host "localhost" at port "5432".

Should \conninfo mention SSL?  Fortunately \c shows SSL information:

postgres=> \c
SSL connection (cipher: DHE-RSA-AES256-SHA, bits: 256)
You are now connected to database "postgres" as user "postgres".

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with MinGW W64

2012-05-31 Thread Robert Haas

On Thu, May 31, 2012 at 2:13 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Thu, May 31, 2012 at 8:15 AM, Johann 'Myrkraverk' Oskarsson
>>  wrote:
>>> Is this something to discuss with the MinGW W64 team?
>
>> My viewpoint on this (which is different than Tom's) is that we're
>> probably not entitled to assume anything about what the system header
>> files do with respect to stat.  On some systems, they might just have
>> a function prototype, while others might define stat or stat() as a
>> macro.   It seems to me that our source code is hoping for a function
>> definition rather than a macro definition and falling over when that's
>> not how it is.  I don't see that as very reasonable, unless we have
>> some basis for believing that the OS isn't entitled to define stat as
>> a macro rather than a function, and maybe not even then.
>
> I quote from the POSIX:2008 specification for :
>
>        The following shall be declared as functions and may also be
>        defined as macros. Function prototypes shall be provided.
>        ...
>        int    stat(const char *restrict, struct stat *restrict);
>
> I do not believe that the standard intends the word "shall" to have any
> wiggle room.  I would also read this to mean that if the header defines
> "stat" as a macro, that macro ought to be an alternative way of invoking
> the function.

There's nothing in the passage you quote that says the macro
definition can't do anything other than invoke the eponymous function,
but...

> Now we are messing up by failing to #undef the macro
> before redefining it, but if we do that and it still doesn't work, the
> header is not conformant to POSIX.

...this is probably still true.

>> We have
>> plenty of other places where we use our own wrapper function in lieu
>> of OS facilities for various reasons (e.g. BasicOpenFile) and I don't
>> think adding one more is a big deal.
>
> Well, I think it is.  Where we use a wrapper function, it's because it
> has somewhat different semantics from the underlying standard function.
> I do not think it's good for readability to define pgfoo() when that's
> only meant to be exactly foo().  I'm especially not keen on doing that
> just because one variant of MinGW has managed to break their conformance
> with POSIX.

In this case, I feel like we've kind of already stepped in it, because
we've defined stat() to be pgwin32_safestat(), which is, in fact, not
exactly stat().  So right now ISTM that a naive backend hacker might
think that stat() means "the stat provided by the OS", but, on
Windows, it doesn't.  I would exactly describe myself as "keen" on
adding wrapper functions to things like stat(), but until non-POSIX
operating systems go the way of the dodo bird, I'm not sure there's
any way around it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-31 Thread Robert Haas

On Thu, May 31, 2012 at 2:03 PM, Merlin Moncure  wrote:
> On Thu, May 31, 2012 at 11:54 AM, Sergey Koposov  
> wrote:
>> On Thu, 31 May 2012, Robert Haas wrote:
>>
>>> Oh, ho.  So from this we can see that the problem is that we're
>>> getting huge amounts of spinlock contention when pinning and unpinning
>>> index pages.
>>>
>>> It would be nice to have a self-contained reproducible test case for
>>> this, so that we could experiment with it on other systems.
>>
>>
>> I have created it a few days ago:
>> http://archives.postgresql.org/pgsql-hackers/2012-05/msg01143.php
>>
>> It is still valid. And I'm using exactly it to test. The only thing to
>> change is to create a two-col index and drop another index.
>> The scripts are precisely the ones I'm using now.
>>
>> The problem is that in order to see a really big slowdown (10 times slower
>> than a single thread) I've had to raise the buffers to 48g but it was slow
>> for smaller shared buffer settings as well.
>>
>> But I'm not sure how sensitive the test is to the hardware.
>
> It's not: high contention on spinlocks is going to suck no matter what
> hardware you have.   I think the problem is pretty obvious now: any
> case where multiple backends are scanning the same sequence of buffers
> in a very tight loop is going to display this behavior.  It doesn't
> come up that often: it takes a pretty unusual sequence of events to
> get a bunch of backends hitting the same buffer like that.
>
> Hm, I wonder if you could alleviate the symptoms by making making the
> Pin/UnpinBuffer smarter so that frequently pinned buffers could stay
> pinned longer -- kinda as if your private ref count was hacked to be
> higher in that case.   It would be a complex fix for a narrow issue
> though.

This test case is unusual because it hits a whole series of buffers
very hard.  However, there are other cases where this happens on a
single buffer that is just very, very hot, like the root block of a
btree index, where the pin/unpin overhead hurts us.  I've been
thinking about this problem for a while, but it hasn't made it up to
the top of my priority list, because workloads where pin/unpin is the
dominant cost are still relatively uncommon.  I expect them to get
more common as we fix other problems.

Anyhow, I do have some vague thoughts on how to fix this.  Buffer pins
are a lot like weak relation locks, in that they are a type of lock
that is taken frequently, but rarely conflicts.  And the fast-path
locking in 9.2 provides a demonstration of how to handle this kind of
problem efficiently: making the weak, rarely-conflicting locks
cheaper, at the cost of some additional expense when a conflicting
lock (in this case, a buffer cleanup lock) is taken.  In particular,
each backend has its own area to record weak relation locks, and a
strong relation lock must scan all of those areas and migrate any
locks found there to the main lock table.  I don't think it would be
feasible to adopt exactly this solution for buffer pins, because page
eviction and buffer cleanup locks, while not exactly common, are
common enough that we can't require a scan of N per-backend areas
every time one of those operations occurs.

But, maybe we could have a system of this type that only applies to
the very hottest buffers.  Suppose we introduce two new buffer flags,
BUF_NAILED and BUF_NAIL_REMOVAL.  When we detect excessive contention
on the buffer header spinlock, we set BUF_NAILED.  Once we do that,
the buffer can't be evicted until that flag is removed, and backends
are permitted to record pins in a per-backend area protected by a
per-backend spinlock or lwlock, rather than in the buffer header.
When we want to un-nail the buffer, we set BUF_NAIL_REMOVAL.  At that
point, it's no longer permissible to record new pins in the
per-backend areas, but old ones may still exist.  So then we scan all
the per-backend areas and transfer the pins to the buffer header, or
else just wait until no more exist; then, we clear both BUF_NAILED and
BUF_NAIL_REMOVAL.

So the pin algorithm looks like this:

read buffer header flags (unlocked)
if (flags & (BUF_NAILED|BUF_NAIL_REMOVAL) != BUF_NAILED)
{
take buffer header spinlock
record pin in buffer header
release buffer header spinlock;
}
else
{
take per-backend lwlock
record pin in per-backend area
release per-backend lwlock
read buffer header flags (unlocked)
if (flags & (BUF_NAILED|BUF_NAIL_REMOVAL) != BUF_NAILED)
{
take per-backend lwlock
forget pin in per-backend area
release per-backend lwlock
take buffer header spinlock
record pin in buffer header
release buffer header spinlock
}
}

Due to memory ordering effects, we might see the buffer as nailed when
in fact nail removal has already begun (or even, completed).  We can
prevent that if (1) the nail removal code sets the nail removal flag
before checking the per-backend areas and (2) the pin code checks the
nail removal flag AFTER

Re: [HACKERS] Figuring out shared buffer pressure

2012-05-31 Thread Cédric Villemain

Le jeudi 31 mai 2012 19:11:07, Bruce Momjian a écrit :
> On Wed, May 30, 2012 at 05:55:07PM -0400, Bruce Momjian wrote:
> > > > Seems buffers_alloc is the number of calls to StrategyGetBuffer(),
> > > > which tells how many time we have requested a buffer.  Not sure how
> > > > that helps measure buffer pressure.
> > > 
> > > Once the linked list is empty, every request for a buffer to read a
> > > new page into must result in the eviction of the previous occupant
> > > from this conceptual freelist buffer (except perhaps for some race
> > > conditions).  Isn't that what you wanted?  Except that the
> > > buffers_alloc does not get incremented when the StrategyGetBuffer is
> > > satisfied by a ring strategy rather than the default strategy.
> > 
> > Well, the ideal case is that I could find out how often data that is
> > near to be discarded is actually needed, hence the "reclaimed" field
> > that is often important for kernel memory presssure reporting on older
> > operating systems.  I will post an email soon about my theory of why
> > buffer pressure is an important thing to report to users.
> 
> OK, realizing there is no simple way to measure shared buffer pressure,
> let me explain why I want to.
> 
> Right now we simplisticly recommend 25% of RAM for shared_buffers, with
> a maximum of 8GB (512MB on Windows).  This helps to be sure that there
> are sufficient kernel buffers for high-write operations, and perhaps a
> kernel cache larger than shared buffers.
> 
> However, this doesn't help people configure shared buffers larger (e.g.
> 35%) if their working set is larger.  Right now, I don't see how a user
> would know this is happening.  On the flip side, they might have a
> smaller working set than 25% and spending the overhead of managing 1
> million shared buffers.  Again, there is no way to know if that is the
> case.
> 
> For example, we have reports that larger shared buffers is sometimes
> better, sometimes not, but there is no feedback we give the user to
> explain why this is happening.  My guess is that if their working set is
> larger than 25% of RAM, they benefit, if not, the buffer management
> overhead makes things slower.
> 
> I feel we need to allow users to get clearer information on how active
> their shared buffer cache is, perhaps allowing them to shink/grow it as
> appropriate.  Asking them to blindly try different shared buffer sizes
> seems suboptimal.

There is also a recent thread about the usagecount used.
Jeff suggested to maybe increase by 2 and decrease by 1. This is really near 
another idea I had but I didn't tested yet. Increment/decrement by 1 or 2, 
just make even numbers ascending, and odd number descending. So that by just 
looking at the usage count you can know what's going with your buffers (more 
odd than even or reverse ? building the cache or wasting it or ...). it should 
also allows a finer control of eviction.

Increase/decrease shared_buffer while postgresql is up is a nice idea. 
-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation


signature.asc
Description: This is a digitally signed message part.

Re: [HACKERS] Create collation incorrect error code

2012-05-31 Thread Magnus Hagander

On Thu, May 31, 2012 at 8:16 PM, Tom Lane  wrote:
> Magnus Hagander  writes:
>> template1=# create collation "nb_NO.utf8" (locale=nb_NO.utf8);
>> ERROR:  could not create locale "nb_no.utf8": Success
>
> What platform?

D'uh, sorry.

Ubuntu 12.04.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Create collation incorrect error code

2012-05-31 Thread Tom Lane

Magnus Hagander  writes:
> template1=# create collation "nb_NO.utf8" (locale=nb_NO.utf8);
> ERROR:  could not create locale "nb_no.utf8": Success

What platform?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with MinGW W64

2012-05-31 Thread Tom Lane

Robert Haas  writes:
> On Thu, May 31, 2012 at 8:15 AM, Johann 'Myrkraverk' Oskarsson
>  wrote:
>> Is this something to discuss with the MinGW W64 team?

> My viewpoint on this (which is different than Tom's) is that we're
> probably not entitled to assume anything about what the system header
> files do with respect to stat.  On some systems, they might just have
> a function prototype, while others might define stat or stat() as a
> macro.   It seems to me that our source code is hoping for a function
> definition rather than a macro definition and falling over when that's
> not how it is.  I don't see that as very reasonable, unless we have
> some basis for believing that the OS isn't entitled to define stat as
> a macro rather than a function, and maybe not even then.

I quote from the POSIX:2008 specification for :

The following shall be declared as functions and may also be
defined as macros. Function prototypes shall be provided. 
...
intstat(const char *restrict, struct stat *restrict);

I do not believe that the standard intends the word "shall" to have any
wiggle room.  I would also read this to mean that if the header defines
"stat" as a macro, that macro ought to be an alternative way of invoking
the function.  Now we are messing up by failing to #undef the macro
before redefining it, but if we do that and it still doesn't work, the
header is not conformant to POSIX.

You can read this yourself at
http://pubs.opengroup.org/onlinepubs/9699919799/

> We have
> plenty of other places where we use our own wrapper function in lieu
> of OS facilities for various reasons (e.g. BasicOpenFile) and I don't
> think adding one more is a big deal.

Well, I think it is.  Where we use a wrapper function, it's because it
has somewhat different semantics from the underlying standard function.
I do not think it's good for readability to define pgfoo() when that's
only meant to be exactly foo().  I'm especially not keen on doing that
just because one variant of MinGW has managed to break their conformance
with POSIX.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] slow dropping of tables, DropRelFileNodeBuffers, tas

2012-05-31 Thread Sergey Koposov


On Thu, 31 May 2012, Simon Riggs wrote:



That struck me as a safe and easy optimisation. This was a problem I'd
been trying to optimise for 9.2, so I've written a patch that appears
simple and clean enough to be applied directly.


Thanks! The patch indeed improved the timings, 
The dropping of 100 tables in a single commit before the patch took ~ 50 
seconds, now it takes ~ 5 sec (it would be nice to reduce it further 
though, because the dropping of 1 tables still takes ~10 min).


Cheers,
S

*
Sergey E. Koposov, PhD, Research Associate
Institute of Astronomy, University of Cambridge
Madingley road, CB3 0HA, Cambridge, UK
Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Create collation incorrect error code

2012-05-31 Thread Magnus Hagander

Strangeness:

template1=# create collation "nb_NO.utf8" (locale=nb_NO.utf8);
ERROR:  could not create locale "nb_no.utf8": Success

Clearly it's not successful...

On some runs, I get:

template1=# create collation "nb_NO.utf8" (locale=nb_NO.utf8);
ERROR:  could not create locale "nb_no.utf8": No such file or directory
DETAIL:  The operating system could not find any locale data for the
locale name "nb_no.utf8".

Which is correct.

The first one is clearly wrong. Seems there are some cases where errno
isn't set properly? But form what I can tell
(http://pubs.opengroup.org/onlinepubs/9699919799/functions/newlocale.html)
errno should be set.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-31 Thread Merlin Moncure

On Thu, May 31, 2012 at 11:54 AM, Sergey Koposov  wrote:
> On Thu, 31 May 2012, Robert Haas wrote:
>
>> Oh, ho.  So from this we can see that the problem is that we're
>> getting huge amounts of spinlock contention when pinning and unpinning
>> index pages.
>>
>> It would be nice to have a self-contained reproducible test case for
>> this, so that we could experiment with it on other systems.
>
>
> I have created it a few days ago:
> http://archives.postgresql.org/pgsql-hackers/2012-05/msg01143.php
>
> It is still valid. And I'm using exactly it to test. The only thing to
> change is to create a two-col index and drop another index.
> The scripts are precisely the ones I'm using now.
>
> The problem is that in order to see a really big slowdown (10 times slower
> than a single thread) I've had to raise the buffers to 48g but it was slow
> for smaller shared buffer settings as well.
>
> But I'm not sure how sensitive the test is to the hardware.

It's not: high contention on spinlocks is going to suck no matter what
hardware you have.   I think the problem is pretty obvious now: any
case where multiple backends are scanning the same sequence of buffers
in a very tight loop is going to display this behavior.  It doesn't
come up that often: it takes a pretty unusual sequence of events to
get a bunch of backends hitting the same buffer like that.

Hm, I wonder if you could alleviate the symptoms by making making the
Pin/UnpinBuffer smarter so that frequently pinned buffers could stay
pinned longer -- kinda as if your private ref count was hacked to be
higher in that case.   It would be a complex fix for a narrow issue
though.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] heap_form_tuple crashing

2012-05-31 Thread Atri Sharma

On Thu, May 31, 2012 at 7:01 PM, Tom Lane  wrote:
> Atri Sharma  writes:
>> My call to heap_form_tuple looks like:
>> val1=0;
>> tupledesc1=BlessTupleDesc(node->ss.ss_currentRelation->rd_att);
>> tuple=heap_form_tuple(tupledesc1,p1,&val1);
>
>> p1 is a pointer to a Datum instance which is created from a char array.
>
> Does that actually match the tupdesc you're using?  Are you sure you
> created the Datum correctly (ie, did you call the appropriate datatype
> input routine)?
>
> BTW, the BlessTupleDesc call here seems to be pure cargo-cult
> programming.  It should not be necessary to bless a relation's tupdesc
> (because that should be a named type already); and even if it were,
> heap_form_tuple doesn't care.
>
>                        regards, tom lane

Hi Tom,

Thanks for the advice and help.Your diagnosis is correct,I used the
correct datatype input routine and it worked like a charm.

I will remove the calls to BlessTupleDesc then?

Atri

-- 
Regards,

Atri
l'apprenant

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hot standby PSQL 9.1 Windows 2008 Servers

2012-05-31 Thread chinnaobi

Sorry to mention, In my setup the primary and standby servers receive same
traffic, so no issue with the 
"network fault between the primary and the standby, but not between the
primary and some of the clients updating it"

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Re-hot-standby-PSQL-9-1-Windows-2008-Servers-tp5710824p5710832.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hot standby PSQL 9.1 Windows 2008 Servers

2012-05-31 Thread chinnaobi

Dear Kevin,

Thank you for your reply. Yeah I am writing an application using powershell,
it's true it is not trivial and especially a guy like me who has no idea on
database. 

You raised all the cases which I am muddling with, But currently I am
testing this setup:

Always standby server is configured from base backup and restore from
storage server then start streaming replication(asynchronous). Base backups
are taken frequently. 

I am sure there is some data loss during switching. Still researching how to
do it clean. suggest me if you have any good papers on this ..

Reddy.



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Re-hot-standby-PSQL-9-1-Windows-2008-Servers-tp5710824p5710830.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Figuring out shared buffer pressure

2012-05-31 Thread Bruce Momjian

On Wed, May 30, 2012 at 05:55:07PM -0400, Bruce Momjian wrote:
> > > Seems buffers_alloc is the number of calls to StrategyGetBuffer(), which
> > > tells how many time we have requested a buffer.  Not sure how that helps
> > > measure buffer pressure.
> > 
> > Once the linked list is empty, every request for a buffer to read a
> > new page into must result in the eviction of the previous occupant
> > from this conceptual freelist buffer (except perhaps for some race
> > conditions).  Isn't that what you wanted?  Except that the
> > buffers_alloc does not get incremented when the StrategyGetBuffer is
> > satisfied by a ring strategy rather than the default strategy.
> 
> Well, the ideal case is that I could find out how often data that is
> near to be discarded is actually needed, hence the "reclaimed" field
> that is often important for kernel memory presssure reporting on older
> operating systems.  I will post an email soon about my theory of why
> buffer pressure is an important thing to report to users.

OK, realizing there is no simple way to measure shared buffer pressure,
let me explain why I want to.

Right now we simplisticly recommend 25% of RAM for shared_buffers, with
a maximum of 8GB (512MB on Windows).  This helps to be sure that there
are sufficient kernel buffers for high-write operations, and perhaps a
kernel cache larger than shared buffers.

However, this doesn't help people configure shared buffers larger (e.g.
35%) if their working set is larger.  Right now, I don't see how a user
would know this is happening.  On the flip side, they might have a
smaller working set than 25% and spending the overhead of managing 1
million shared buffers.  Again, there is no way to know if that is the
case.

For example, we have reports that larger shared buffers is sometimes
better, sometimes not, but there is no feedback we give the user to
explain why this is happening.  My guess is that if their working set is
larger than 25% of RAM, they benefit, if not, the buffer management
overhead makes things slower.

I feel we need to allow users to get clearer information on how active
their shared buffer cache is, perhaps allowing them to shink/grow it as
appropriate.  Asking them to blindly try different shared buffer sizes
seems suboptimal.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-31 Thread Sergey Koposov


On Thu, 31 May 2012, Robert Haas wrote:


Oh, ho.  So from this we can see that the problem is that we're
getting huge amounts of spinlock contention when pinning and unpinning
index pages.

It would be nice to have a self-contained reproducible test case for
this, so that we could experiment with it on other systems.


I have created it a few days ago:
http://archives.postgresql.org/pgsql-hackers/2012-05/msg01143.php

It is still valid. And I'm using exactly it to test. The only thing to 
change is to create a two-col index and drop another index.

The scripts are precisely the ones I'm using now.

The problem is that in order to see a really big slowdown (10 times slower 
than a single thread) I've had to raise the buffers to 48g but it was 
slow for smaller shared buffer settings as well.


But I'm not sure how sensitive the test is to the hardware.

Cheers,
S

*
Sergey E. Koposov,  PhD, Research Associate
Institute of Astronomy, University of Cambridge 
Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551

Web: http://www.ast.cam.ac.uk/~koposov/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hot standby PSQL 9.1 Windows 2008 Servers

2012-05-31 Thread Kevin Grittner

chinnaobi  wrote:

> You mean when the primary which is going to switch its role to
> standby might not have sent all the WAL records to the standby and
> If it is switched to standby it has more WAL records than the
> standby which is now serves as primary. Is it ??

What happens when there is a network fault between the primary and
the standby, but not between the primary and some of the clients
updating it?  Similarly, if this is asynchronous replication, what if
there have been commits on the primary which were still in the
network buffer when the primary crashed?

Clean automated failover is not a trivial task.  If you are writing
your own, it would be best to follow the steps recommended in the
documentation rather closely.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-31 Thread Merlin Moncure

On Thu, May 31, 2012 at 10:23 AM, Sergey Koposov  wrote:
> On Thu, 31 May 2012, Robert Haas wrote:
>>
>>
>> Thanks.  How did you generate this perf report?  It's cool, because I
>> haven't figured out how to make perf generate a report that is easily
>> email-able, and it seems you have.
>
>
> I did pretty much what you have said, e.g.
> attached it to running process by
> perf record -g -p PID
> and then
> perf report -g  > output
>
> And postgresql was compiled with cflags=-g
>
>>
>> The only trouble is that there's no call stack information here for
>> s_lock or PinBuffer, which is what I really want.  It seems to have
>> spit out call stack information only for the kernel functions, and not
>> for user functions.
>
>
> Yes, I forgot to clean the old binaries when recompiled with cflags=-g.
> So not it is fixed. I attach the updated perf report (i.e. the first 1
> lines of it to reduce the  file size).

That's basically what we needed.  The sequential scans are driving
index scans are all simultaneously pointing at the same couple of
pages in the indexThe are constantly pinning and unpinning -- the
database is schizophrenically going back and forth between the key
pages in the index being and not being allowed to be candidates for
buffer eviction.  Raising shared buffers doesn't help because it's
just marking the buffers to be available for eviction, not the
eviction process itself, that is the problem.  IOS doens't help
because it's index relation buffers, not the heap buffers we are
binding up on (although if the btree was fixed it's entirely possible
the problem could head right back to the heap for non IOS scans.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-31 Thread Robert Haas

On Thu, May 31, 2012 at 11:23 AM, Sergey Koposov  wrote:
> On Thu, 31 May 2012, Robert Haas wrote:
>>
>> Thanks.  How did you generate this perf report?  It's cool, because I
>> haven't figured out how to make perf generate a report that is easily
>> email-able, and it seems you have.
>
> I did pretty much what you have said, e.g.
> attached it to running process by
> perf record -g -p PID
> and then
> perf report -g  > output

Ah, interesting.  I never tried sending the output to a file.

> And postgresql was compiled with cflags=-g
>>
>> The only trouble is that there's no call stack information here for
>> s_lock or PinBuffer, which is what I really want.  It seems to have
>> spit out call stack information only for the kernel functions, and not
>> for user functions.
>
> Yes, I forgot to clean the old binaries when recompiled with cflags=-g.
> So not it is fixed. I attach the updated perf report (i.e. the first 1
> lines of it to reduce the  file size).

Oh, ho.  So from this we can see that the problem is that we're
getting huge amounts of spinlock contention when pinning and unpinning
index pages.

It would be nice to have a self-contained reproducible test case for
this, so that we could experiment with it on other systems.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Draft release notes complete

2012-05-31 Thread Magnus Hagander

On Thu, May 31, 2012 at 5:55 PM, Bruce Momjian  wrote:
> On Tue, May 15, 2012 at 12:57:37PM -0400, Magnus Hagander wrote:
>> On Fri, May 11, 2012 at 11:44 AM, Andrew Dunstan  wrote:
>> >
>> >
>> > On 05/11/2012 05:32 AM, Magnus Hagander wrote:
>> >>
>> >>
>> >> But in the interest of actually being productive - what *is* the
>> >> usecase for needing a 5 minute turnaround time? I don't buy the "check
>> >> what a patch looks like", because that should be done *before* the
>> >> commit, not after - so it's best verified by a local docs build anyway
>> >> (which will also be faster).
>> >>
>> >> I'm sure we can put something in with a pretty quick turnaround again
>> >> without too much strain on the system, but it does, as I mentioned
>> >> before, require decoupling it from the buildfarm which means it's not
>> >> just tweaking a config file.
>> >
>> >
>> > If it's of any use to you I have made some adjustments to the buildfarm 
>> > code
>> > which would let you do *just* the docs build (and dist make if you want). 
>> > It
>> > would still pull from git, and only do anything if there's a (relevant)
>> > change. So using that to set up a machine that would run every few minutes
>> > might work. Of course, building the docs can itself be fairly compute
>> > intensive, so you still might not want to run every few minutes if that's a
>> > limiting factor.
>>
>> that would definitely be useful. Compute intensive is not really a
>> problem, we can easily shape the box on that (and I think we already
>> do).
>>
>> Do you have some details of what to do and how to do it to use that,
>> so Stefan can set it up for us ? ;)
>
> Where are we on building the development docs more frequently?

Still waiting for details on how it works to set that up on the
buildfarm client.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Draft release notes complete

2012-05-31 Thread Bruce Momjian

On Tue, May 15, 2012 at 12:57:37PM -0400, Magnus Hagander wrote:
> On Fri, May 11, 2012 at 11:44 AM, Andrew Dunstan  wrote:
> >
> >
> > On 05/11/2012 05:32 AM, Magnus Hagander wrote:
> >>
> >>
> >> But in the interest of actually being productive - what *is* the
> >> usecase for needing a 5 minute turnaround time? I don't buy the "check
> >> what a patch looks like", because that should be done *before* the
> >> commit, not after - so it's best verified by a local docs build anyway
> >> (which will also be faster).
> >>
> >> I'm sure we can put something in with a pretty quick turnaround again
> >> without too much strain on the system, but it does, as I mentioned
> >> before, require decoupling it from the buildfarm which means it's not
> >> just tweaking a config file.
> >
> >
> > If it's of any use to you I have made some adjustments to the buildfarm code
> > which would let you do *just* the docs build (and dist make if you want). It
> > would still pull from git, and only do anything if there's a (relevant)
> > change. So using that to set up a machine that would run every few minutes
> > might work. Of course, building the docs can itself be fairly compute
> > intensive, so you still might not want to run every few minutes if that's a
> > limiting factor.
> 
> that would definitely be useful. Compute intensive is not really a
> problem, we can easily shape the box on that (and I think we already
> do).
> 
> Do you have some details of what to do and how to do it to use that,
> so Stefan can set it up for us ? ;)

Where are we on building the development docs more frequently?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)

2012-05-31 Thread Peter Geoghegan

On 31 May 2012 16:23, Tom Lane  wrote:
> Simon Riggs  writes:
>> In what way is it possibly destabilising?
>
> I'm prepared to believe that it only affects performance, but it could
> be destabilizing to that.  It needs proper review and testing, and the
> next CF is the right environment for that to happen.

It couldn't possibly be as destabilising to performance as
commit_delay was in 9.1.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hash index concurrency

2012-05-31 Thread David Fetter

On Wed, May 30, 2012 at 12:21:33AM -0400, Tom Lane wrote:
> Robert Haas  writes:
> > On Tue, May 29, 2012 at 11:21 PM, Jeff Janes  wrote:
> >> 2) Only support bitmap scans and not ordinary tid scans (the way gin
> >> indexes already do).
> 
> > -1 on losing amgettuple.  I regret that we lost that for GIN and I
> > shall regret it more if we lose it anywhere else.
> 
> Not sure that's all that big a deal for hash.  IIRC the only reasons to
> want it are for index-only scans (not possible anyway with hash) and
> exclusion constraints (for which you might as well use a btree, or plain
> index-supported uniqueness if hash had that).

It does via EXCLUDE constraints, so it could with what as far as I've
been able to tell would be some relatively small amount of coding.

dfetter@dfetter:5492=# CREATE TABLE foo(i TEXT, EXCLUDE USING HASH(i WITH =));
NOTICE:  CREATE TABLE / EXCLUDE will create implicit index "foo_i_excl" for 
table "foo"
CREATE TABLE
dfetter@dfetter:5492=# insert into foo VALUES (1),(1);
ERROR:  conflicting key value violates exclusion constraint "foo_i_excl"
DETAIL:  Key (i)=(1) conflicts with existing key (i)=(1).

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)

2012-05-31 Thread Peter Geoghegan

On 31 May 2012 16:26, Peter Geoghegan  wrote:
> On 31 May 2012 16:23, Tom Lane  wrote:
>> Simon Riggs  writes:
>>> In what way is it possibly destabilising?
>>
>> I'm prepared to believe that it only affects performance, but it could
>> be destabilizing to that.  It needs proper review and testing, and the
>> next CF is the right environment for that to happen.
>
> It couldn't possibly be as destabilising to performance as
> commit_delay was in 9.1.

Furthermore, it couldn't possibly affect performance in any way unless
commit_delay is set. I've just moved the delay site so that its only
executed by the group commit leader. The leader would execute the code
anyway, but now the followers don't.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Claudio Freire

On Thu, May 31, 2012 at 12:25 PM, Tom Lane  wrote:
>> No, Tatsuo's patch attacks a phase dominated by latency in some
>> setups.
>
> No, it does not.  The reason it's a win is that it avoids the O(N^2)
> behavior in the server.  Whether the bandwidth savings is worth worrying
> about cannot be proven one way or the other as long as that elephant
> is in the room.
>
>                        regards, tom lane

I understand that, but if the locking is fixed and made to be O(N)
(and hence each table locking O(1)), then latency suddenly becomes the
dominating factor.

I'm thinking, though, pg_upgrade runs locally, contrary to pg_dump
backups, so in that case latency would be negligible and Tatsuo's
patch inconsequential.

I'm also thinking, whether the ResourceOwner patch you've proposed
would get negated by Tatsuo's patch, because suddenly a "portal"
(IIRC) has a lot more locks than ResourceOwner could accomodate,
forcing a reversal to O(N²) behavior. In that case, that patch would
in fact be detrimental... huh... way to go 180

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Tom Lane

Claudio Freire  writes:
> On Thu, May 31, 2012 at 11:50 AM, Tom Lane  wrote:
>> The performance patches we applied to pg_dump over the past couple weeks
>> were meant to relieve pain in situations where the big server-side
>> lossage wasn't the dominant factor in runtime (ie, partial dumps).
>> But this one is targeting exactly that area, which is why it looks like
>> a band-aid and not a fix to me.

> No, Tatsuo's patch attacks a phase dominated by latency in some
> setups.

No, it does not.  The reason it's a win is that it avoids the O(N^2)
behavior in the server.  Whether the bandwidth savings is worth worrying
about cannot be proven one way or the other as long as that elephant
is in the room.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)

2012-05-31 Thread Peter Geoghegan

On 31 May 2012 14:58, Robert Haas  wrote:
> Fixing regressions before release is essential; improving performance
> is not - especially when the improvement relates to a little-used
> feature that you were proposing to get rid of two weeks ago.

Yes, the fact that I wanted to get rid of commit_delay is well
established - I called for its deprecation in a dedicated thread, and
during my talk at pgCon. Bruce's confusion as to how that interacted
with what I've been calling "new group commit" was actually what
crystallised my position here: it is trying, for the most part, to do
the same thing as new group commit, but in an entirely orthogonal way.
Bruce's confusion actually reflected the confusion of the code. So I'm
in a sense removing the overlap between commit_delay used to do but
now but shouldn't try to do anymore (make commits coincide, giving
good benchmark results) and what new group commit now does, while
preserving commit_delay's ability to trade off latency for throughput.

I didn't have an answer to the question of how we might continue to
offer a throughput/latency trade-off to users before, but knew that
with 9.2, commit_delay was totally ineffective anyway. The realisation
that it could be made effective by working with rather than against
new group commit changed my mind.

> It can't simultaneously be so unimportant that we should remove it altogether
> and so important that it's must-fix-before-release, and if one test
> can completely overturn your view of which category this falls into,
> that seems like a reason for taking some more time to think it over
> and, perhaps, run more tests.  We don't have a lot of latitude to
> maneuver at this point - anything we do now is going to go straight
> out into the wild.  Caution is appropriate.

The patch can be justified as a way of removing the tension between
new group commit and commit_delay. Removing commit_delay would also do
this, but then there'd be no way to make the aforementioned trade-off
that we previously offered. I suspect that if it restored the peaks
and valleys of commit_delay's changes to throughput in 9.1, over and
above a new group commit baseline, this would be more readily
accepted. I hope the patch isn't being punished for being effective.
Yes, it does offer a large boost to performance, but that happens to
be incidental, unlikely though that sounds.

You've called this a clever idea. I actually don't agree. I was fairly
surprised that no one noticed this earlier. It is rather obviously the
case that a delay that hopes to maximise the batching of commits at
the expense of latency should occur only in a single leader backend
that will proceed with the flush for the batch, and not within each
and every backend as it commits.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)

2012-05-31 Thread Tom Lane

Simon Riggs  writes:
> On 31 May 2012 15:00, Tom Lane  wrote:
>> If we want to finish the beta cycle in a reasonable time period and get
>> back to actual development, we have to refrain from adding more
>> possibly-destabilizing development work to 9.2.  And that is what
>> this is.

> In what way is it possibly destabilising?

I'm prepared to believe that it only affects performance, but it could
be destabilizing to that.  It needs proper review and testing, and the
next CF is the right environment for that to happen.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Bruce Momjian

On Thu, May 31, 2012 at 11:04:12AM -0400, Robert Haas wrote:
> On Thu, May 31, 2012 at 10:50 AM, Tom Lane  wrote:
> > Robert Haas  writes:
> >> On Thu, May 31, 2012 at 10:31 AM, Tom Lane  wrote:
> >>> I'm not; Jeff Janes is.  But you shouldn't be holding your breath
> >>> anyway, since it's 9.3 material at this point.
> >
> >> I agree we can't back-patch that change, but then I think we ought to
> >> consider back-patching some variant of Tatsuo's patch.  Maybe it's not
> >> reasonable to thunk an arbitrary number of relation names in there on
> >> one line, but how about 1000 relations per LOCK statement or so?  I
> >> guess we'd need to see how much that erodes the benefit, but we've
> >> certainly done back-branch rearrangements in pg_dump in the past to
> >> fix various kinds of issues, and this is pretty non-invasive.
> >
> > I am not convinced either that this patch will still be useful after
> > Jeff's fix goes in, ...
> 
> But people on older branches are not going to GET Jeff's fix.

FYI, if it got into Postgres 9.2, everyone upgrading to Postgres 9.2
would benefit because pg_upgrade uses the new cluster's pg_dumpall.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Robert Haas

On Thu, May 31, 2012 at 10:50 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Thu, May 31, 2012 at 10:31 AM, Tom Lane  wrote:
>>> I'm not; Jeff Janes is.  But you shouldn't be holding your breath
>>> anyway, since it's 9.3 material at this point.
>
>> I agree we can't back-patch that change, but then I think we ought to
>> consider back-patching some variant of Tatsuo's patch.  Maybe it's not
>> reasonable to thunk an arbitrary number of relation names in there on
>> one line, but how about 1000 relations per LOCK statement or so?  I
>> guess we'd need to see how much that erodes the benefit, but we've
>> certainly done back-branch rearrangements in pg_dump in the past to
>> fix various kinds of issues, and this is pretty non-invasive.
>
> I am not convinced either that this patch will still be useful after
> Jeff's fix goes in, ...

But people on older branches are not going to GET Jeff's fix.

> or that it provides any meaningful savings when
> you consider a complete pg_dump run.  Yeah, it will make the lock
> acquisition phase faster, but that's not a big part of the runtime
> except in very limited scenarios (--schema-only, perhaps).

That is not a borderline scenario, as others have also pointed out.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Claudio Freire

On Thu, May 31, 2012 at 11:50 AM, Tom Lane  wrote:
> The performance patches we applied to pg_dump over the past couple weeks
> were meant to relieve pain in situations where the big server-side
> lossage wasn't the dominant factor in runtime (ie, partial dumps).
> But this one is targeting exactly that area, which is why it looks like
> a band-aid and not a fix to me.

No, Tatsuo's patch attacks a phase dominated by latency in some
setups. That it's also becoming slow currently because of the locking
cost is irrelevant, with locking sped up, the patch should only
improve the phase even further. Imagine the current timeline:

* = locking
. = waiting

*.*.**.**.***.***...*.

Tatsuo's patch converts it to:

*.**

The locking fix would turn the timeline into:

*.*.*.*.*.*.*

Tatsuo's patch would turn that into:

***

And, as noted before, pg_dump --schema-only is a key bottleneck in pg_upgrade.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Bruce Momjian

On Thu, May 31, 2012 at 10:50:51AM -0400, Tom Lane wrote:
> Robert Haas  writes:
> > On Thu, May 31, 2012 at 10:31 AM, Tom Lane  wrote:
> >> I'm not; Jeff Janes is. �But you shouldn't be holding your breath
> >> anyway, since it's 9.3 material at this point.
> 
> > I agree we can't back-patch that change, but then I think we ought to
> > consider back-patching some variant of Tatsuo's patch.  Maybe it's not
> > reasonable to thunk an arbitrary number of relation names in there on
> > one line, but how about 1000 relations per LOCK statement or so?  I
> > guess we'd need to see how much that erodes the benefit, but we've
> > certainly done back-branch rearrangements in pg_dump in the past to
> > fix various kinds of issues, and this is pretty non-invasive.
> 
> I am not convinced either that this patch will still be useful after
> Jeff's fix goes in, or that it provides any meaningful savings when
> you consider a complete pg_dump run.  Yeah, it will make the lock
> acquisition phase faster, but that's not a big part of the runtime
> except in very limited scenarios (--schema-only, perhaps).

FYI, that is the pg_upgrade use-case, and pg_dump/restore time is
reportedly taking the majority of time in many cases.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)

2012-05-31 Thread Simon Riggs

On 31 May 2012 15:00, Tom Lane  wrote:
> Simon Riggs  writes:
>> On 31 May 2012 13:16, Robert Haas  wrote:
>>> Frankly, I think this whole thing should be pushed to 9.3.
>
>> What matters is that we have a patch that provides a massive
>> performance gain in write performance in just a few lines of code, and
>> that should be committed to 9.2.
>
> I agree with Robert on this.  This patch hasn't had *nearly* enough
> testing to justify cramming it into 9.2 at this point.  AFAIK the
> claim of "massive performance gain" is based on a single test case run
> by a single person, which doesn't even give me any confidence that it
> doesn't break anything, much less that it's a win across the board.

I agree with you. You would be mistaken if you thought that I think
Peter's laptop was sufficient proof for anyone to commit something and
I've already said exactly that to him.

My description of "massive performance gain" is appropriate based on
the measurements so far.

> If we want to finish the beta cycle in a reasonable time period and get
> back to actual development, we have to refrain from adding more
> possibly-destabilizing development work to 9.2.  And that is what
> this is.

In what way is it possibly destabilising? I see nothing in the patch
to merit that claim, so presumably you haven't read the patch yet?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Tom Lane

Robert Haas  writes:
> On Thu, May 31, 2012 at 10:31 AM, Tom Lane  wrote:
>> I'm not; Jeff Janes is.  But you shouldn't be holding your breath
>> anyway, since it's 9.3 material at this point.

> I agree we can't back-patch that change, but then I think we ought to
> consider back-patching some variant of Tatsuo's patch.  Maybe it's not
> reasonable to thunk an arbitrary number of relation names in there on
> one line, but how about 1000 relations per LOCK statement or so?  I
> guess we'd need to see how much that erodes the benefit, but we've
> certainly done back-branch rearrangements in pg_dump in the past to
> fix various kinds of issues, and this is pretty non-invasive.

I am not convinced either that this patch will still be useful after
Jeff's fix goes in, or that it provides any meaningful savings when
you consider a complete pg_dump run.  Yeah, it will make the lock
acquisition phase faster, but that's not a big part of the runtime
except in very limited scenarios (--schema-only, perhaps).

The performance patches we applied to pg_dump over the past couple weeks
were meant to relieve pain in situations where the big server-side
lossage wasn't the dominant factor in runtime (ie, partial dumps).
But this one is targeting exactly that area, which is why it looks like
a band-aid and not a fix to me.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Interface of Row Level Security

2012-05-31 Thread Robert Haas

On Wed, May 30, 2012 at 3:26 PM, Kohei KaiGai  wrote:
> My preference is RLSBYPASS permission rather than the approach
> with functions that return policy clause at run-time, because it needs
> to invalidate prepared statement at random timing.
> In case of this function approach, the RLS policy shall be generated
> on planner stage, and we cannot have any assumption to the criteria
> of RLS policy. A function might generate RLS policy regarding to the
> current user id. Yes, it is straightforward. The prepared statement
> should be invalidate whenever current user-id got switched.
> However, someone may define a function that generate RLS policy
> depending on the value of "client_min_messages" for example.
> Do we need to invalidate prepared statement whenever GUC get
> updated? I think it is overkill. We cannot predicate all the criteria
> user want to control the RLS policy using the functions.
> So, RLSBYPASS permission is more simple way to limit number of
> situations to invalidate prepared statements.

That's a good point.

> If we would have an "ideal optimizer", I'd still like the optimizer to
> wipe out redundant clauses transparently, rather than RLSBYPASS
> permissions, because it just controls all-or-nothing stuff.
> For example, if tuples are categorized to unclassified, classified or
> secret, and RLS policy is configured as:
>  ((current_user IN ('alice', 'bob') AND X IN ('unclassified',
> 'classified')) OR (X IN 'unclassified)),
> superuser can see all the tuples, and alice and bob can see
> up to classified tuples.
> Is it really hard to wipe out redundant condition at planner stage?
> If current_user is obviously 'kaigai', it seems to me the left-side of
> this clause can be wiped out at the planner stage.
> Do I consider the issue too simple?

Yes.  :-)

There are two problems.  First, if using the extended query protocol
(e.g. pgbench -M prepared) you can prepare a statement just once and
then execute it multiple times.  In this case, stable-functions cannot
be constant-folded at plan time, because they are only guaranteed to
remain constant for a *single* execution of the query, not for all
executions of the query.  So any optimization in this area would have
to be limited to cases where the simple query protocol is used.  I
think that might still be worth doing, but it's a significant
limitation, to be sure.  Second, at present, there is no guarantee
that the snapshot used for planning the query is the same as the
snapshot used for executing the query, though commit
d573e239f03506920938bf0be56c868d9c3416da made that happen in some
common cases.  If we were to do constant-folding of stable functions
using the planner snapshot, it would represent a behavior change from
previous releases.  I am not clear whether that has any real-world
consequences that we should be worried about.  It seems to me that the
path of least resistance might be to refactor the portal stuff so that
we can provide a uniform guarantee that, when using the simple query
protocol, the planner and executor snapshots will be the same ... but
I might be wrong.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Robert Haas

On Thu, May 31, 2012 at 10:31 AM, Tom Lane  wrote:
> Claudio Freire  writes:
>> It's not clear whether Tom is already working on that O(N^2) fix in locking.
>
> I'm not; Jeff Janes is.  But you shouldn't be holding your breath
> anyway, since it's 9.3 material at this point.

I agree we can't back-patch that change, but then I think we ought to
consider back-patching some variant of Tatsuo's patch.  Maybe it's not
reasonable to thunk an arbitrary number of relation names in there on
one line, but how about 1000 relations per LOCK statement or so?  I
guess we'd need to see how much that erodes the benefit, but we've
certainly done back-branch rearrangements in pg_dump in the past to
fix various kinds of issues, and this is pretty non-invasive.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with MinGW W64

2012-05-31 Thread Robert Haas

On Thu, May 31, 2012 at 8:15 AM, Johann 'Myrkraverk' Oskarsson
 wrote:
 MinGW W64's sys/stat.h #defines stat to be _stati64 and there is
 subsequently a compilation error in port.h:

 note: expected 'struct _stati64 *' but argument is of type 'struct
 stat *' error: conflicting types for 'pgwin32_safestat'
>>
>>> In this case, I really think we ought to change all backend calls
>>> that hit stat() to use something like pgstat() instead.
>>
>> I disagree with this conclusion.  That'd be an unnecessarily
>> nonstandard notation, which all existing and future developers would
>> have to learn.  I'd rather work around this in port.h if at all
>> possible.  I'm not quite sure why the existing code fails, though ---
>> is there a conflict between "#define stat" and "#define stat(a,b)"?
>
> I wouldn't know, the compiler is GCC 4.6.3 here (any 4.5+ will do I
> think) so all the usal GCC macro magic should be working.
>
> Is this something to discuss with the MinGW W64 team?

My viewpoint on this (which is different than Tom's) is that we're
probably not entitled to assume anything about what the system header
files do with respect to stat.  On some systems, they might just have
a function prototype, while others might define stat or stat() as a
macro.   It seems to me that our source code is hoping for a function
definition rather than a macro definition and falling over when that's
not how it is.  I don't see that as very reasonable, unless we have
some basis for believing that the OS isn't entitled to define stat as
a macro rather than a function, and maybe not even then.  We have
plenty of other places where we use our own wrapper function in lieu
of OS facilities for various reasons (e.g. BasicOpenFile) and I don't
think adding one more is a big deal.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Tom Lane

Claudio Freire  writes:
> It's not clear whether Tom is already working on that O(N^2) fix in locking.

I'm not; Jeff Janes is.  But you shouldn't be holding your breath
anyway, since it's 9.3 material at this point.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Claudio Freire

On Thu, May 31, 2012 at 11:17 AM, Robert Klemme
 wrote:
>
> OK, my fault was to assume you wanted to measure only your part, while
> apparently you meant overall savings.  But Tom had asked for separate
> measurements if I understood him correctly.  Also, that measurement of
> your change would go after the O(N^2) fix.  It could actually turn out
> to be much more than 9% because the overall time would be reduced even
> more dramatic.  So it might actually be good for your fix to wait a
> bit. ;-)

It's not clear whether Tom is already working on that O(N^2) fix in locking.

I'm asking because it doesn't seem like a complicated patch,
contributors may want to get working if not ;-)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-31 Thread Robert Haas

On Thu, May 31, 2012 at 7:31 AM, Sergey Koposov  wrote:
> On Wed, 30 May 2012, Robert Haas wrote:
>
>> I'd really like to find out exactly where all those s_lock calls are
>> coming from.  Is there any way you can get oprofile to output a
>> partial stack backtrace?  If you have perf it's very easy, just 'perf
>> record -g -a ' and then 'perf report
>> -g'.
>
>
> I repeated my test with 8 threads (without tasksetting) and with
> sharedbuffers=48g (because that seemed to trigger in particular long times ~
> 80 seconds). And I attach the perf report.

Thanks.  How did you generate this perf report?  It's cool, because I
haven't figured out how to make perf generate a report that is easily
email-able, and it seems you have.

The only trouble is that there's no call stack information here for
s_lock or PinBuffer, which is what I really want.  It seems to have
spit out call stack information only for the kernel functions, and not
for user functions.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Robert Klemme

On Thu, May 31, 2012 at 4:07 PM, Tatsuo Ishii  wrote:
>> On Thu, May 31, 2012 at 10:45 AM, Tatsuo Ishii  wrote:
>>> Just for record, I rerun the test again with my single-LOCK patch, and
>>> now total runtime of pg_dump is 113 minutes.
>>> 188 minutes(9.0)->125 minutes(git master)->113 minutes(with my patch).
>>>
>>> So far, I'm glad to see 40% time savings at this point.
>>
>> I see only 9.6% savings (100 * (113/125 - 1)).  What am I missing?
>
> What I meant was (100 * (113/188 - 1)).

OK, my fault was to assume you wanted to measure only your part, while
apparently you meant overall savings.  But Tom had asked for separate
measurements if I understood him correctly.  Also, that measurement of
your change would go after the O(N^2) fix.  It could actually turn out
to be much more than 9% because the overall time would be reduced even
more dramatic.  So it might actually be good for your fix to wait a
bit. ;-)

Kind regards

robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line: 1741

2012-05-31 Thread Robert Haas

On Thu, May 31, 2012 at 9:51 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> The one thing that still seems a little odd to me is that this caused
>> a pin count to get orphaned.  It seems reasonable that ignoring the
>> AccessExclusiveLock could result in not-found errors trying to open a
>> missing relation, and even fsync requests on a missing relation.  But
>> I don't see why that would cause the backend-local pin counts to get
>> messed up, which makes me wonder if there really is another bug here
>> somewhere.
>
> According to Heikki's log, the Assert was in the startup process itself,
> and it happened after an error:
>
>> 2012-05-26 10:44:28.587 CEST 10270 FATAL:  could not open file 
>> "base/21268/32994": No such file or directory
>> 2012-05-26 10:44:28.588 CEST 10270 CONTEXT:  writing block 2508 of relation 
>> base/21268/32994
>>          xlog redo multi-insert (init): rel 1663/21268/33006; blk 3117; 58 
>> tuples
>> TRAP: FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line: 
>> 1741)
>> 2012-05-26 10:44:31.131 CEST 10269 LOG:  startup process (PID 10270) was 
>> terminated by signal 6: Aborted
>
> I don't think that code is meant to recover from errors anyway, so
> the fact that it fails with a pin count held isn't exactly surprising.
> But it might be worth looking at exactly which on_proc_exit callbacks
> are installed in the startup process and what assumptions they make.

Which code isn't meant to recover from errors?

> As for where the error came from in the first place, it's easy to
> imagine somebody who's not got the word about the AccessExclusiveLock
> reading pages of the table into buffers that have already been scanned
> by the DROP.  So you'd end up with orphaned buffers belonging to a
> vanished table.  If somebody managed to dirty them by setting hint bits
> (we do allow that in HS mode no?) then later you'd have various processes
> trying to write the buffer before recycling it, which seems to fit the
> reported error.

Right, I understand the other errors.  It's just the pin count that I
am a bit confused about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Tatsuo Ishii

> On Thu, May 31, 2012 at 10:45 AM, Tatsuo Ishii  wrote:
>> Just for record, I rerun the test again with my single-LOCK patch, and
>> now total runtime of pg_dump is 113 minutes.
>> 188 minutes(9.0)->125 minutes(git master)->113 minutes(with my patch).
>>
>> So far, I'm glad to see 40% time savings at this point.
> 
> I see only 9.6% savings (100 * (113/125 - 1)).  What am I missing?

What I meant was (100 * (113/188 - 1)).
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)

2012-05-31 Thread Tom Lane

Simon Riggs  writes:
> On 31 May 2012 13:16, Robert Haas  wrote:
>> Frankly, I think this whole thing should be pushed to 9.3.

> What matters is that we have a patch that provides a massive
> performance gain in write performance in just a few lines of code, and
> that should be committed to 9.2.

I agree with Robert on this.  This patch hasn't had *nearly* enough
testing to justify cramming it into 9.2 at this point.  AFAIK the
claim of "massive performance gain" is based on a single test case run
by a single person, which doesn't even give me any confidence that it
doesn't break anything, much less that it's a win across the board.

If we want to finish the beta cycle in a reasonable time period and get
back to actual development, we have to refrain from adding more
possibly-destabilizing development work to 9.2.  And that is what
this is.

Add it to the upcoming CF, please.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)

2012-05-31 Thread Robert Haas

On Thu, May 31, 2012 at 8:38 AM, Peter Geoghegan  wrote:
> On 31 May 2012 13:16, Robert Haas  wrote:
>> On Thu, May 31, 2012 at 6:19 AM, Simon Riggs  wrote:
>> Frankly, I think this whole thing should be pushed to 9.3.  The
>> commit_delay and commit_siblings knobs suck, but they've sucked for a
>> long time, and it won't kill anybody to wait another release cycle to
>> fix them.  We have plenty of more important things queued up for 9.3
>> already, and I don't believe there's any compelling reason to think
>> that this particular thing needs preferential treatment.
>
> Why do you think that? Those knobs are now quite ineffective, though
> we never even considered that when the group commit delay patch was
> committed.  The entire body of research and commentary that exists on
> commit_delay has been invalidated for 9.2. If that isn't something
> that needs to be addressed before release, I don't know what is. The
> fact that the patch can sometimes double transaction throughput for an
> absolutely trivial change, moving 2 lines of code, is also a good
> reason to not bump this for another year.

Fixing regressions before release is essential; improving performance
is not - especially when the improvement relates to a little-used
feature that you were proposing to get rid of two weeks ago.  It can't
simultaneously be so unimportant that we should remove it altogether
and so important that it's must-fix-before-release, and if one test
can completely overturn your view of which category this falls into,
that seems like a reason for taking some more time to think it over
and, perhaps, run more tests.  We don't have a lot of latitude to
maneuver at this point - anything we do now is going to go straight
out into the wild.  Caution is appropriate.

However, rather than arguing about it, let's see if anyone else has an opinion.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line: 1741

2012-05-31 Thread Tom Lane

Robert Haas  writes:
> The one thing that still seems a little odd to me is that this caused
> a pin count to get orphaned.  It seems reasonable that ignoring the
> AccessExclusiveLock could result in not-found errors trying to open a
> missing relation, and even fsync requests on a missing relation.  But
> I don't see why that would cause the backend-local pin counts to get
> messed up, which makes me wonder if there really is another bug here
> somewhere.

According to Heikki's log, the Assert was in the startup process itself,
and it happened after an error:

> 2012-05-26 10:44:28.587 CEST 10270 FATAL:  could not open file 
> "base/21268/32994": No such file or directory
> 2012-05-26 10:44:28.588 CEST 10270 CONTEXT:  writing block 2508 of relation 
> base/21268/32994
>  xlog redo multi-insert (init): rel 1663/21268/33006; blk 3117; 58 
> tuples
> TRAP: FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line: 
> 1741)
> 2012-05-26 10:44:31.131 CEST 10269 LOG:  startup process (PID 10270) was 
> terminated by signal 6: Aborted

I don't think that code is meant to recover from errors anyway, so
the fact that it fails with a pin count held isn't exactly surprising.
But it might be worth looking at exactly which on_proc_exit callbacks
are installed in the startup process and what assumptions they make.

As for where the error came from in the first place, it's easy to
imagine somebody who's not got the word about the AccessExclusiveLock
reading pages of the table into buffers that have already been scanned
by the DROP.  So you'd end up with orphaned buffers belonging to a
vanished table.  If somebody managed to dirty them by setting hint bits
(we do allow that in HS mode no?) then later you'd have various processes
trying to write the buffer before recycling it, which seems to fit the
reported error.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Robert Klemme

On Thu, May 31, 2012 at 10:45 AM, Tatsuo Ishii  wrote:
> Just for record, I rerun the test again with my single-LOCK patch, and
> now total runtime of pg_dump is 113 minutes.
> 188 minutes(9.0)->125 minutes(git master)->113 minutes(with my patch).
>
> So far, I'm glad to see 40% time savings at this point.

I see only 9.6% savings (100 * (113/125 - 1)).  What am I missing?

Cheers

robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] heap_form_tuple crashing

2012-05-31 Thread Tom Lane

Atri Sharma  writes:
> My call to heap_form_tuple looks like:
> val1=0;
> tupledesc1=BlessTupleDesc(node->ss.ss_currentRelation->rd_att);
> tuple=heap_form_tuple(tupledesc1,p1,&val1);

> p1 is a pointer to a Datum instance which is created from a char array.

Does that actually match the tupdesc you're using?  Are you sure you
created the Datum correctly (ie, did you call the appropriate datatype
input routine)?

BTW, the BlessTupleDesc call here seems to be pure cargo-cult
programming.  It should not be necessary to bless a relation's tupdesc
(because that should be a named type already); and even if it were,
heap_form_tuple doesn't care.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)

2012-05-31 Thread Peter Geoghegan

On 31 May 2012 13:16, Robert Haas  wrote:
> On Thu, May 31, 2012 at 6:19 AM, Simon Riggs  wrote:
> Frankly, I think this whole thing should be pushed to 9.3.  The
> commit_delay and commit_siblings knobs suck, but they've sucked for a
> long time, and it won't kill anybody to wait another release cycle to
> fix them.  We have plenty of more important things queued up for 9.3
> already, and I don't believe there's any compelling reason to think
> that this particular thing needs preferential treatment.

Why do you think that? Those knobs are now quite ineffective, though
we never even considered that when the group commit delay patch was
committed.  The entire body of research and commentary that exists on
commit_delay has been invalidated for 9.2. If that isn't something
that needs to be addressed before release, I don't know what is. The
fact that the patch can sometimes double transaction throughput for an
absolutely trivial change, moving 2 lines of code, is also a good
reason to not bump this for another year.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)

2012-05-31 Thread Simon Riggs

On 31 May 2012 13:16, Robert Haas  wrote:
> On Thu, May 31, 2012 at 6:19 AM, Simon Riggs  wrote:
>> I've looked at this more closely now and I can see that the call to
>> XLogFlush() that is made from xact_redo_commit_internal() doesn't ever
>> actually flush WAL, so whether we delay or not is completely
>> irrelevant.
>>
>> So un-agreed. No change required to patch there.
>
> I think Peter's suggestion of forcibly setting the delay to 0 in the
> startup process is a good one, though.  It's one line of code, and if
> it isn't strictly necessary today, it still seems like good
> future-proofing.

Adding a line that does nothing is not a good idea. The Startup
process flushes very, very few WAL messages, so the setting is
irrelevant.

> I am not very happy about the idea of renaming commit_* to
> group_commit_*.  It's basically a cosmetic renaming, and breaking
> existing configuration files for cosmetic purposes does not seem
> warranted to me, especially when the old and new names are so close.
> I certainly don't think we can do that in 9.2, now that beta1 has
> already shipped.  Modifying the default contents of postgresql.conf
> after we've shipped beta has been a historical no-no for reasons that
> escape me at the moment, but IIRC they're not stupid reasons.
>
> Frankly, I think this whole thing should be pushed to 9.3.  The
> commit_delay and commit_siblings knobs suck, but they've sucked for a
> long time, and it won't kill anybody to wait another release cycle to
> fix them.  We have plenty of more important things queued up for 9.3
> already, and I don't believe there's any compelling reason to think
> that this particular thing needs preferential treatment.

No problem with pushing a variable rename through to 9.3. To be
honest, I don't care whether we rename them or not.

What matters is that we have a patch that provides a massive
performance gain in write performance in just a few lines of code, and
that should be committed to 9.2.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with MinGW W64

2012-05-31 Thread Johann 'Myrkraverk' Oskarsson

Tom Lane  writes:

> Robert Haas  writes:
>> On Tue, May 29, 2012 at 9:04 AM, Johann 'Myrkraverk' Oskarsson
>>  wrote:
>>> The header file crtdefs.h in MinGW typedefs errcode which conflicts
>>> with Postgres' elog.h.
>
>> Eep.  Maybe this is not directly relevant, but I guess my first
>> question is: why is MinGW doing that?
>
> I concur with Robert here: your first step should be to push back on
> the MinGW developers about this nonstandard intrusion on application
> namespace.  We've been using errcode() as a function name since 2003,
> and it's never been a problem before on any platform, including
> previous versions of MinGW.

I have contacted the MinGW W64 team on this.

> If they won't change it, then we could consider some other hack, but
> that should really be the first attempt.

>>> MinGW W64's sys/stat.h #defines stat to be _stati64 and there is
>>> subsequently a compilation error in port.h:
>>> 
>>> note: expected 'struct _stati64 *' but argument is of type 'struct
>>> stat *' error: conflicting types for 'pgwin32_safestat'
>
>> In this case, I really think we ought to change all backend calls
>> that hit stat() to use something like pgstat() instead.
>
> I disagree with this conclusion.  That'd be an unnecessarily
> nonstandard notation, which all existing and future developers would
> have to learn.  I'd rather work around this in port.h if at all
> possible.  I'm not quite sure why the existing code fails, though ---
> is there a conflict between "#define stat" and "#define stat(a,b)"?

I wouldn't know, the compiler is GCC 4.6.3 here (any 4.5+ will do I
think) so all the usal GCC macro magic should be working.

Is this something to discuss with the MinGW W64 team?

>>> There are series of redefined macros from the MinGW W64 CRT.
>>> ...
>>> And possibly some more.  Do we need these redefines?
>
>> We probably need them somewhere, or they wouldn't have been added.
>> But maybe we don't need them on the exact platform you're using.
>
> Can we deal with this by just wrapping each #define in #ifndef?

I'll take a look and make sure the #defines end up with the same
values.  If so I'll attach a patch for this.


-- 
   Johann Oskarssonhttp://www.2ndquadrant.com/|[]
   PostgreSQL Development, 24x7 Support, Training and Services  --+--
  |
   Blog: http://my.opera.com/myrkraverk/blog/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)

2012-05-31 Thread Robert Haas

On Thu, May 31, 2012 at 6:19 AM, Simon Riggs  wrote:
> I've looked at this more closely now and I can see that the call to
> XLogFlush() that is made from xact_redo_commit_internal() doesn't ever
> actually flush WAL, so whether we delay or not is completely
> irrelevant.
>
> So un-agreed. No change required to patch there.

I think Peter's suggestion of forcibly setting the delay to 0 in the
startup process is a good one, though.  It's one line of code, and if
it isn't strictly necessary today, it still seems like good
future-proofing.

I am not very happy about the idea of renaming commit_* to
group_commit_*.  It's basically a cosmetic renaming, and breaking
existing configuration files for cosmetic purposes does not seem
warranted to me, especially when the old and new names are so close.
I certainly don't think we can do that in 9.2, now that beta1 has
already shipped.  Modifying the default contents of postgresql.conf
after we've shipped beta has been a historical no-no for reasons that
escape me at the moment, but IIRC they're not stupid reasons.

Frankly, I think this whole thing should be pushed to 9.3.  The
commit_delay and commit_siblings knobs suck, but they've sucked for a
long time, and it won't kill anybody to wait another release cycle to
fix them.  We have plenty of more important things queued up for 9.3
already, and I don't believe there's any compelling reason to think
that this particular thing needs preferential treatment.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line: 1741

2012-05-31 Thread Robert Haas

On Thu, May 31, 2012 at 7:49 AM, Erik Rijkers  wrote:
> On Thu, May 31, 2012 13:14, Robert Haas wrote:
>> On Thu, May 31, 2012 at 1:06 AM, Erik Rijkers  wrote:
>>> In my test, I run the bash code (the bits that I posted earlier) in a while 
>>> loop so that the
>>> table
>>> is CREATEd, COPYied into, and DROPped every few seconds -- perhaps that 
>>> wasn't clear.  That loop
>>> is necessary; a few iterations are often successful.
>>
>> Yes... I let it run all night on my laptop with no errors.
>
> My apologies to both of you.  It seems the problem was indeed solved with 
> that commit from Robert.
>  I managed to forget that I, uh... temporary, commented out the git-pull from 
> my build script...

No problem.

The one thing that still seems a little odd to me is that this caused
a pin count to get orphaned.  It seems reasonable that ignoring the
AccessExclusiveLock could result in not-found errors trying to open a
missing relation, and even fsync requests on a missing relation.  But
I don't see why that would cause the backend-local pin counts to get
messed up, which makes me wonder if there really is another bug here
somewhere.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] slow dropping of tables, DropRelFileNodeBuffers, tas

2012-05-31 Thread Simon Riggs

On 30 May 2012 12:10, Heikki Linnakangas
 wrote:

> Hmm, we do this in smgrDoPendingDeletes:
>
> for (i = 0; i <= MAX_FORKNUM; i++)
> {
>        smgrdounlink(srel, i, false);
> }
>
> So we drop the buffers for each relation fork separately, which means that
> we scan the buffer pool four times. Relation forks in 8.4 introduced that
> issue, and 9.1 made it worse by adding another fork for unlogged tables.
> With some refactoring, we could scan the buffer pool just once. That would
> help a lot.

That struck me as a safe and easy optimisation. This was a problem I'd
been trying to optimise for 9.2, so I've written a patch that appears
simple and clean enough to be applied directly.

> Also, I wonder if DropRelFileNodeBuffers() could scan the pool without
> grabbing the spinlocks on every buffer? It could do an unlocked test first,
> and only grab the spinlock on buffers that need to be dropped.

Sounds less good and we'd need reasonable proof it actually did
anything useful without being dangerous.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


dropallforks.v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line: 1741

2012-05-31 Thread Erik Rijkers

On Thu, May 31, 2012 13:14, Robert Haas wrote:
> On Thu, May 31, 2012 at 1:06 AM, Erik Rijkers  wrote:
>> In my test, I run the bash code (the bits that I posted earlier) in a while 
>> loop so that the
>> table
>> is CREATEd, COPYied into, and DROPped every few seconds -- perhaps that 
>> wasn't clear.  That loop
>> is necessary; a few iterations are often successful.
>
> Yes... I let it run all night on my laptop with no errors.

My apologies to both of you.  It seems the problem was indeed solved with that 
commit from Robert.
  I managed to forget that I, uh... temporary, commented out the git-pull from 
my build script...

Thanks,

Erik Rijkers



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line: 1741

2012-05-31 Thread Simon Riggs

On 31 May 2012 12:14, Robert Haas  wrote:
> On Thu, May 31, 2012 at 1:06 AM, Erik Rijkers  wrote:
>> In my test, I run the bash code (the bits that I posted earlier) in a while 
>> loop so that the table
>> is CREATEd, COPYied into, and DROPped every few seconds -- perhaps that 
>> wasn't clear.  That loop
>> is necessary; a few iterations are often successful.
>
> Yes... I let it run all night on my laptop with no errors.

It looked to me like the correct fix to me as well, FWIW.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] heap_form_tuple crashing

2012-05-31 Thread Atri Sharma

On Thu, May 31, 2012 at 5:14 PM, Heikki Linnakangas
 wrote:
> On 31.05.2012 14:42, Atri Sharma wrote:
>>
>> Another thing I wanted to ask was that would you recommend building
>> tuples from strings directly or converting them to Datum first and
>> then build the tuples from Datum instances?
>
>
> It depends. If you have all the values in strings already, then
> BuildTupleFromCStrings() is probably the easiest. But if you have some
> attributes in Datum format already, then it's probably easier and faster to
> use heap_form_tuple().
>
>
> --
>  Heikki Linnakangas
>  EnterpriseDB   http://www.enterprisedb.com

Performance wise,which one would be better(I am envisioning a large
set of strings to be converted to tuples)?


-- 
Regards,

Atri
l'apprenant

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] heap_form_tuple crashing

2012-05-31 Thread Heikki Linnakangas


On 31.05.2012 14:42, Atri Sharma wrote:

Another thing I wanted to ask was that would you recommend building
tuples from strings directly or converting them to Datum first and
then build the tuples from Datum instances?


It depends. If you have all the values in strings already, then 
BuildTupleFromCStrings() is probably the easiest. But if you have some 
attributes in Datum format already, then it's probably easier and faster 
to use heap_form_tuple().


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] heap_form_tuple crashing

2012-05-31 Thread Atri Sharma

On Thu, May 31, 2012 at 5:02 PM, Heikki Linnakangas
 wrote:
> On 31.05.2012 13:42, Atri Sharma wrote:
>>
>> I am trying to call heap_form_tuple to create a tuple from a datum.
>>
>> My call to heap_form_tuple looks like:
>>
>>
>> val1=0;
>> tupledesc1=BlessTupleDesc(node->ss.ss_currentRelation->rd_att);
>> tuple=heap_form_tuple(tupledesc1,p1,&val1);
>>
>>
>> p1 is a pointer to a Datum instance which is created from a char array.
>>
>> When I am running the code,the system is crashing.
>>
>> Please let me know what can the problem be and also what I should do
>> to rectify it.
>
>
> Hard to say without seeing the full code...
>
> --
>  Heikki Linnakangas
>  EnterpriseDB   http://www.enterprisedb.com

Hi Heikki,

Thanks for the reply.

Another thing I wanted to ask was that would you recommend building
tuples from strings directly or converting them to Datum first and
then build the tuples from Datum instances?

Atri
-- 
Regards,

Atri
l'apprenant

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] heap_form_tuple crashing

2012-05-31 Thread Heikki Linnakangas


On 31.05.2012 13:42, Atri Sharma wrote:

I am trying to call heap_form_tuple to create a tuple from a datum.

My call to heap_form_tuple looks like:


val1=0;
tupledesc1=BlessTupleDesc(node->ss.ss_currentRelation->rd_att);
tuple=heap_form_tuple(tupledesc1,p1,&val1);


p1 is a pointer to a Datum instance which is created from a char array.

When I am running the code,the system is crashing.

Please let me know what can the problem be and also what I should do
to rectify it.


Hard to say without seeing the full code...

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)

2012-05-31 Thread Peter Geoghegan

On 31 May 2012 11:19, Simon Riggs  wrote:
> I've looked at this more closely now and I can see that the call to
> XLogFlush() that is made from xact_redo_commit_internal() doesn't ever
> actually flush WAL, so whether we delay or not is completely
> irrelevant.
>
> So un-agreed. No change required to patch there.

So, does that clear up the question of it being acceptable to add a
delay to every existing XLogFlush() call site? I think so.

Aside from the outstanding question of what to rename
commit_delay/commit_siblings to, and how we might want to reframe
those settings in the docs, I think that's everything.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line: 1741

2012-05-31 Thread Robert Haas

On Thu, May 31, 2012 at 1:06 AM, Erik Rijkers  wrote:
> In my test, I run the bash code (the bits that I posted earlier) in a while 
> loop so that the table
> is CREATEd, COPYied into, and DROPped every few seconds -- perhaps that 
> wasn't clear.  That loop
> is necessary; a few iterations are often successful.

Yes... I let it run all night on my laptop with no errors.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: avoid heavyweight locking on hash metapage

2012-05-31 Thread Robert Haas

On Thu, May 31, 2012 at 7:07 AM, Dickson S. Guedes  wrote:
> 2012/5/30 Robert Haas :
>> I tested the effect of this by setting up a series of 5-minute
>> read-only pgbench run at scale factor 300 with 8GB of shared buffers
>> on the IBM POWER7 machine.
>
> I know it doesn't matter, but out of curiosity what OS you used?

Fedora 16, Linux 3.2.6-3.fc16.ppc64

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: avoid heavyweight locking on hash metapage

2012-05-31 Thread Dickson S. Guedes

2012/5/30 Robert Haas :
> I tested the effect of this by setting up a series of 5-minute
> read-only pgbench run at scale factor 300 with 8GB of shared buffers
> on the IBM POWER7 machine.

I know it doesn't matter, but out of curiosity what OS you used?

best regards,
-- 
Dickson S. Guedes
mail/xmpp: gue...@guedesoft.net - skype: guediz
http://guedesoft.net - http://www.postgresql.org.br

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] heap_form_tuple crashing

2012-05-31 Thread Atri Sharma

Hi all,

I am trying to call heap_form_tuple to create a tuple from a datum.

My call to heap_form_tuple looks like:


val1=0;
tupledesc1=BlessTupleDesc(node->ss.ss_currentRelation->rd_att);
tuple=heap_form_tuple(tupledesc1,p1,&val1);


p1 is a pointer to a Datum instance which is created from a char array.

When I am running the code,the system is crashing.

Please let me know what can the problem be and also what I should do
to rectify it.

Atri
-- 
Regards,

Atri
l'apprenant

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GiST buffering build, bug in levelStep calculation

2012-05-31 Thread Alexander Korotkov

On Thu, May 31, 2012 at 1:36 PM, Heikki Linnakangas <
heikki.linnakan...@enterprisedb.com> wrote:

> I note that the calculations assume that leaf tuples and internal tuples
> have similar sizes. We calculate the average leaf tuple size, and use that
> to calculate the fan-out of internal pages. On some GiST opclasses, the
> values stored on internal pages might be quite different from the leaf
> tuples. I don't think we need to worry about that in practice, these
> calculations are not very accurate anyway, but perhaps a comment would be
> in order.
>

Probably we could collect per-level statistics of average tuple size?

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)

2012-05-31 Thread Simon Riggs

On 30 May 2012 17:19, Peter Geoghegan  wrote:
> On 30 May 2012 15:25, Simon Riggs  wrote:
>>> 1. It seems wrong to do it in xact_redo_commit_internal().  It won't
>>> matter if commit_siblings>0 since there won't be any other backends
>>> with transaction IDs anyway, but if commit_siblings==0 then we'll
>>> sleep for no possible benefit.
>>
>> Agreed

I've looked at this more closely now and I can see that the call to
XLogFlush() that is made from xact_redo_commit_internal() doesn't ever
actually flush WAL, so whether we delay or not is completely
irrelevant.

So un-agreed. No change required to patch there.


>>> 2. Doing it in FlushBuffer() seems slightly iffy since we might be
>>> sitting on a buffer lock.  But maybe it's a win anyway, or just not
>>> worth worrying about.
>>
>> Agreed.
>
> As I've pointed out, we cannot meaningfully skip the wait for
> auxiliary processes alone (except perhaps by having commit_siblings
> set sufficiently high).
>
>> The remaining cases aren't worth worrying about, apart from
>> SlruPhysicalWritePage() which happens during visibility checks and
>> needs to happen as quickly as possible also.
>
> I'm not so sure. It says in that function:
>
>                /*
>                 * We must determine the largest async-commit LSN for the 
> page. This
>                 * is a bit tedious, but since this entire function is a slow 
> path
>                 * anyway, it seems better to do this here than to maintain a 
> per-page
>                 * LSN variable (which'd need an extra comparison in the
>                 * transaction-commit path).
>                 */
>
>> I would say the additional contention from waiting outweighs the
>> benefit of the wait in those 3 places, so skipping the wait is wise.
>
> MinimumActiveBackends() reports the "count backends (other than
> myself) that are in active transactions", so unnecessary calls will
> have to occur when we have active transactions >= CommitSiblings, not
> connections >= CommitSiblings as was previously the case.
>
> What if we were to skip the wait during recovery only, by specially
> setting CommitDelay to 0 in the start-up process? Would that satisfy
> everyone's concerns about unhelpful delays? I'm not sure how this
> might interact with hot standby.

Hmm, that was a good idea, but as of my comments above, that isn't
required or useful.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GiST buffering build, bug in levelStep calculation

2012-05-31 Thread Heikki Linnakangas


On 29.05.2012 23:46, Alexander Korotkov wrote:

On Wed, May 30, 2012 at 12:25 AM, Tom Lane  wrote:


Alexander Korotkov  writes:

On Tue, May 29, 2012 at 11:42 PM, Tom Lane  wrote:

While I'm looking at this, is the first test involving
effective_cache_size bulletproof either?  In particular, is
avgIndexTuplesPerPage clamped to be strictly greater than 1?



It's based on collected statistics on already inserted tuple sizes. Since
tuple sizes are measured after possible toasting, I don't see the way
for avgIndexTuplesPerPage to be less than 1.


Yeah, but if it could be *equal* to one, you've got a zero-divide there.



avgIndexTuplesPerPage is calculated as:

avgIndexTuplesPerPage = pageFreeSpace / itupAvgSize;

I think size of each index tuple must be at least few times lower
than pageFreeSpace to let us create any index.


Hmm, in theory, it seems possible that every leaf level index tuple 
would completely fill an index page. Not sure how useful such an index 
would be, though. On internal pages, at least, you have to fit at least 
two tuples on a page or you can't build a tree.


I note that the calculations assume that leaf tuples and internal tuples 
have similar sizes. We calculate the average leaf tuple size, and use 
that to calculate the fan-out of internal pages. On some GiST opclasses, 
the values stored on internal pages might be quite different from the 
leaf tuples. I don't think we need to worry about that in practice, 
these calculations are not very accurate anyway, but perhaps a comment 
would be in order.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Tatsuo Ishii

>>> We recently fixed a couple of O(N^2) loops in pg_dump, but those covered
>>> extremely specific cases that might or might not have anything to do
>>> with what you're seeing.  The complainant was extremely helpful about
>>> tracking down the problems:
>>> http://archives.postgresql.org/pgsql-general/2012-03/msg00957.php
>>> http://archives.postgresql.org/pgsql-committers/2012-03/msg00225.php
>>> http://archives.postgresql.org/pgsql-committers/2012-03/msg00230.php
>> 
>> I'm wondering if these fixes (or today's commit) include the case for
>> a database has ~100 thounsands of tables, indexes. One of my customers
>> has had troubles with pg_dump for the database, it takes over 10
>> hours.
> 
> So I did qucik test with old PostgreSQL 9.0.2 and current (as of
> commit 2755abf386e6572bad15cb6a032e504ad32308cc). In a fresh initdb-ed
> database I created 100,000 tables, and each has two integer
> attributes, one of them is a primary key. Creating tables were
> resonably fast as expected (18-20 minutes). This created a 1.4GB
> database cluster.
> 
> pg_dump dbname >/dev/null took 188 minutes on 9.0.2, which was pretty
> long time as the customer complained. Now what was current?  Well it
> took 125 minutes. Ps showed that most of time was spent in backend.
> 
> Below is the script to create tables.
> 
> cnt=10
> while [ $cnt -gt 0 ]
> do
> psql -e -p 5432 -c "create table t$cnt(i int primary key, j int);" test
> cnt=`expr $cnt - 1`
> done
> 
> p.s. You need to increate max_locks_per_transaction before running
> pg_dump (I raised to 640 in my case).

Just for record, I rerun the test again with my single-LOCK patch, and
now total runtime of pg_dump is 113 minutes.
188 minutes(9.0)->125 minutes(git master)->113 minutes(with my patch).

So far, I'm glad to see 40% time savings at this point.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Interface of Row Level Security

2012-05-31 Thread Kohei KaiGai

2012/5/31 Yeb Havinga :
> On 2012-05-30 21:26, Kohei KaiGai wrote:
>>
>> If we would have an "ideal optimizer", I'd still like the optimizer to
>> wipe out redundant clauses transparently, rather than RLSBYPASS
>> permissions, because it just controls all-or-nothing stuff.
>> For example, if tuples are categorized to unclassified, classified or
>> secret, and RLS policy is configured as:
>>   ((current_user IN ('alice', 'bob') AND X IN ('unclassified',
>> 'classified')) OR (X IN 'unclassified)),
>> superuser can see all the tuples, and alice and bob can see
>> up to classified tuples.
>> Is it really hard to wipe out redundant condition at planner stage?
>> If current_user is obviously 'kaigai', it seems to me the left-side of
>> this clause can be wiped out at the planner stage.
>
>
> The query's RLS policy would be simpler if the RLS policy function that
> returns the WHERE clause would take the user as argument, so its result does
> not contain user conditionals.
>
> IF (current_user IN ('alice', 'bob')
> THEN
>  RETURN X IN ('unclassified', 'classified'))
> ELSE
>  RETURN X IN ('unclassified')
> END IF;
>
Yes, it is a happy case. But the point I'm concern about is, the conditions
to branch cases are not limited to current user-id.
The RLS policy shall be appended at planner stage, so prepared statement
needs to be invalidated whenever its prerequisites are changed.
For example, someone may assign a function that returns RLS policy
depending on whether the current hour is even-number of odd-number.
It implies that we cannot predicate all the cases to invalidate prepared
statement with RLS policy, because of too much flexibility.

Thanks,
-- 
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line: 1741

2012-05-31 Thread Heikki Linnakangas


On 31.05.2012 08:06, Erik Rijkers wrote:

On Thu, May 31, 2012 03:30, Robert Haas wrote:

On Wed, May 30, 2012 at 6:00 PM, Erik Rijkers  wrote:

directory
2012-05-30 23:40:57.909 CEST 3909 CONTEXT:  writing block 5152 of relation 
base/21268/26569
xlog redo multi-insert (init): rel 1663/21268/26581; blk 3852; 35 tuples
TRAP: FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line: 
1741)
2012-05-30 23:40:58.006 CEST 5331 FATAL:  could not open file 
"base/21268/26569": No such file
or
directory
2012-05-30 23:40:58.006 CEST 5331 CONTEXT:  writing block 5153 of relation 
base/21268/26569
2012-05-30 23:40:59.661 CEST 3908 LOG:  startup process (PID 3909) was 
terminated by signal 6:
Aborted
2012-05-30 23:40:59.661 CEST 3908 LOG:  terminating any other active server 
processes


Hmm.  I set up what I believe to be the same test case you were using,
and it's not crashing for me.  In fact, with the latest code, I
haven't been able to produce any error at all.  When I reverted my
last commit, I managed to get this:

ERROR:  could not open relation with OID 18229
STATEMENT:  select current_setting('port') port, count(*) from t

...but just once, and with no other error messages.So I'm either
missing a step somewhere, or something about your system just happens
to tickle this moreso than on mine.


In my test, I run the bash code (the bits that I posted earlier) in a while 
loop so that the table
is CREATEd, COPYied into, and DROPped every few seconds -- perhaps that wasn't 
clear.  That loop
is necessary; a few iterations are often successful.

I can test it today on a few other systems to see if it is reproducible.


I could no longer reproduce it after Robert's fix, the test case has 
been running for about an hour now. Please triple-check that you have it 
applied in the standby :-).


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Interface of Row Level Security

2012-05-31 Thread Yeb Havinga


On 2012-05-30 21:26, Kohei KaiGai wrote:

If we would have an "ideal optimizer", I'd still like the optimizer to
wipe out redundant clauses transparently, rather than RLSBYPASS
permissions, because it just controls all-or-nothing stuff.
For example, if tuples are categorized to unclassified, classified or
secret, and RLS policy is configured as:
   ((current_user IN ('alice', 'bob') AND X IN ('unclassified',
'classified')) OR (X IN 'unclassified)),
superuser can see all the tuples, and alice and bob can see
up to classified tuples.
Is it really hard to wipe out redundant condition at planner stage?
If current_user is obviously 'kaigai', it seems to me the left-side of
this clause can be wiped out at the planner stage.


The query's RLS policy would be simpler if the RLS policy function that 
returns the WHERE clause would take the user as argument, so its result 
does not contain user conditionals.


IF (current_user IN ('alice', 'bob')
THEN
  RETURN X IN ('unclassified', 'classified'))
ELSE
  RETURN X IN ('unclassified')
END IF;


regards,
Yeb



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

88 matches

Mail list logo