[HACKERS] Ideas for improving Concurrency Tests

2013-03-26 Thread Amit Kapila
Ideas for improving Concurrency testing   



1. Synchronization points in server code - To have better control for
concurrency testing, define synchronization points in server code which can
be used as follows: 

heap_truncate(..) 
{ 
 

SYNC_POINT(procid,'before_heap_open') 
rel = heap_open(rid,
AccessExclusiveLock); 
relations = lappend(relations, rel); 
} 

exec_simple_query(..) 
{ 
...

finish_xact_command(); 
 
SYNC_POINT(procid,'finish_xact_command') 

/* 
 * If there were no parsetrees,
return EmptyQueryResponse message. 
 */ 
 if (!parsetree_list) 
NullCommand(dest); 
 ... 
 }   
  


When code reaches at sync point it can
either emit a signal 
or wait for a signal 

Signal 
A value of a shared memory variable that
will be interpretted by different 
SYNC POINTS based on it's value. 

Emit a signal 
Assign the value (the signal) to the
shared memory variable (set a flag) and 
broadcast a global condition to wake
those waiting for a signal. 

Wait for a signal 
Loop over waiting for the global
condition until the global value matches 
the wait-for signal 

   To activate Synchronization points appropriate
actions can be set. 
   For Example, 
SET SYNC_POINT = 'before_heap_open WAIT_FOR
commit'; 
SET SYNC_POINT = 'after_finish_xact_command
SIGNAL commit'; 

   This above commands can activate the synchronization
points named 'before_heap_open' 
   and 'after_finish_xact_command'. 


   session s1 
   step s11  {SET SYNC_POINT = 'before_heap_open
WAIT_FOR commit';} 
   step s12  {Truncate tbl;} 
   session s2 
   step s21  {SET SYNC_POINT =
'after_finish_xact_command SIGNAL commit';} 
   step s22  {Insert into tbl values(1);} 

   The first activation requests the synchronization
point to wait for 
   another backend to emit the signal 'commit', and
second activation requests 
   the synchronization point to emit the signal
'commit', when the process's execution runs through 
   the synchronization point. 

   Above defined test will allow Truncate table to wait
for Insert to finish 

2. Enhance Isolation Framework - Currently, at most one step can be waiting
at a time. Enhance Concurrency test framework (isolation tester) to make
multiple sessions wait and then allow to release it serially. 

  This might help in
generating complex dead lock scenario's.

 

 

Above ideas could be useful to improve concurrency testing and can also be
helpful to generate test cases for some of the complicated bugs for which
there is no direct test.

This work is not a patch for 9.3, I just wanted an initial feedback. 

 

Feedback/Suggestions?

 

 

Reference : http://dev.mysql.com/doc/internals/en/debug-sync-facility.html

 

 

With Regards,

Amit Kapila.



Re: [HACKERS] [COMMITTERS] pgsql: Add PF_PRINTF_ATTRIBUTE to on_exit_msg_fmt.

2013-03-26 Thread Heikki Linnakangas

On 26.03.2013 02:02, Tom Lane wrote:

Heikki Linnakangashlinnakan...@vmware.com  writes:

On 25.03.2013 15:36, Tom Lane wrote:

Heikki Linnakangasheikki.linnakan...@iki.fi   writes:

Add PF_PRINTF_ATTRIBUTE to on_exit_msg_fmt.
Per warning from -Wmissing-format-attribute.



Hm, this is exactly what I removed yesterday, because it makes the build
fail outright on old gcc:



The attached seems to work. With this patch, on_exit_msg_func() is gone.
There's a different implementation of exit_horribly for pg_dumpall and
pg_dump/restore. In pg_dumpall, it just calls vwrite_msg(). In
pg_dump/restore's version, the logic from parallel_exit_msg_func() is
moved directly to exit_horribly().


Seems probably reasonable, though if we're taking exit_horribly out of
dumputils.c, meseems it ought not be declared in dumputils.h anymore.
Can we put that declaration someplace else, rather than commenting it
with an apology?


Ugh, the patch I posted doesn't actually work, because dumputils.c is 
also used in psql and some scripts, so you get a linker error in those. 
psql and scripts don't use exit_horribly or many of the other functions 
in dumputils.c, so I think we should split dumputils.c into two parts 
anyway. fmtId and the other functions that are used by psql in one file, 
and the functions that are only shared between pg_dumpall and pg_dump in 
another. Then there's also functions that are used by pg_dump and 
pg_restore, but not pg_dumpall or psql.


I'll try moving things around a bit...

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] adding support for zero-attribute unique/etc keys

2013-03-26 Thread Albe Laurenz
Darren Duncan wrote:
 The standard defines UNIQUE on the basis of the UNIQUE predicate:
 unique predicate ::= UNIQUE table subquery
 and states:
 1) Let T be the result of the table subquery.
 2) If there are no two rows in T such that the value of each column
in one row is non-null and is not distinct
from the value of the corresponding column in the other row,
then the result of the unique predicate is
*True*; otherwise, the result of the unique predicate is *False*.

 Since an imagined zero-column query would have an empty set of
 result columns, you could with equal force argue that these columns
 satisfy the condition or not, because the members of the empty
 set have all the properties you desire.

 So I see no compelling argument that such a UNIQUE constraint
 would force a single-row table.
 
 I do see that compelling argument, and it has to do with identities.
 
 The above definition of UNIQUE predicate says that the UNIQUE predicate is
 FALSE iff, for every pair of rows in T, the 2 rows of any pair are the same.

I don't understand that sentence.
I would say that it is FALSE iff there exist two rows in T
that satisy:
a) each column in both rows is not-null
b) each column in one of the rows is not distinct from
   the corresponding column in the other row

 Further, 2 rows are the same iff, for every corresponding column, the values 
 in
 both rows are the same.  Further, 2 such values are the same iff they are both
 not null and are mutually not distinct.
 
 So, determining if 2 rows are the same involves an iteration of dyadic logical
 AND over the predicates for each column comparison.  Now logical AND has an
 identity value, which is TRUE, because TRUE AND p (and p AND TRUE) results
 in p for all p.  Therefore, any 2 rows with zero columns each are the 
 same.
 
 Since any 2 rows with zero columns are the same, the UNIQUE predicate is 
 FALSE
 any time there is more than 1 row in a table.
 
 Hence, a UNIQUE constraint over zero columns signifies a row-comparison
 predicate that unconditionally results in TRUE, and so no two rows at all 
 would
 be allowed in the table with that constraint at once, thus restricting the 
 table
 to at most one row.
 
 Does anyone agree or disagree with this logic?

Yes :^)

You could use the same kind of argument like this:

UNIQUE is true iff any two rows in T satisfy for each column:
the column in row 1 is null OR the column in row 2 is null OR
the column in row 1 is distinct from the column in row 2

Now you you iterate your logical AND over this predicate
for all columns and come up with TRUE since there are none.
Consequently UNIQUE is satisfied, no matter how many rows there are.

In a nutshell:
All members of the empty set satisfy p, but also:
all members of the empty set satisfy the negation of p.

You can use this technique to make anything plausible.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Limiting setting of hint bits by read-only queries; vacuum_delay

2013-03-26 Thread Simon Riggs
On 26 March 2013 01:35, Greg Stark st...@mit.edu wrote:
 On Tue, Mar 26, 2013 at 12:00 AM, Simon Riggs si...@2ndquadrant.com wrote:
 I'll bet you all a beer at PgCon 2014 that this remains unresolved at
 that point.

 Are you saying you're only interested in working on it now? That after
 9.3 is release you won't be interested in working on it any more?

 As you said we've been eyeing this particular logic since 2004, why
 did it suddenly become more urgent now? Why didn't you work on it 9
 months ago at the beginning of the release cycle?

I'm not sure why your comments are so confrontational here, but I
don't think it helps much. I'm happy to buy you a beer too.

As I explained clearly in my first post, this idea came about trying
to improve on the negative aspects of the checksum patch. People were
working on ideas 9 months ago to resolve this, but they have come to
nothing. I regret that; Merlin and others have worked hard to find a
way: Respect to them.

My suggestion is to implement a feature that takes 1 day to write and
needs little testing to show it works. I'm happy to pursue that path
now, or later. Deciding we need an all-singing, all-dancing solution
that will take our best men (another) 6 months of hard arguing and
implementation is by far the best way I know of killing anything and I
won't be pursuing that route. If we did have 6 months funding for
any-feature-you-like, I wouldn't spend it all on this. My bet that
nobody else will have enough patience, time and skill, let alone
unpaid leave to follow that path, stands.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-03-26 Thread Heikki Linnakangas

On 30.12.2012 08:31, Daniel Farina wrote:

A version implementing that is attached, except I generate an
additional 64-bit session not exposed to the client to prevent even
casual de-leaking of the session state.  That may seem absurd, until
someone writes a tool that de-xors things and relies on it and then
nobody feels inclined to break it.  It also keeps the public session
number short.

I also opted to save the underestimate since I'm adding a handful of
fixed width fields to the file format anyway.


This patch needs documentation. At a minimum, the new calls_underest 
field needs to be listed in the description of the pg_stat_statements.


Pardon for not following the discussion: What exactly does the 
calls_underest field mean? I couldn't decipher it from the patch. What 
can an admin do with the value? How does it compare with just bumping up 
pg_stat_statements.max?


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Limiting setting of hint bits by read-only queries; vacuum_delay

2013-03-26 Thread Robert Haas
On Tue, Mar 26, 2013 at 5:27 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 26 March 2013 01:35, Greg Stark st...@mit.edu wrote:
 On Tue, Mar 26, 2013 at 12:00 AM, Simon Riggs si...@2ndquadrant.com wrote:
 I'll bet you all a beer at PgCon 2014 that this remains unresolved at
 that point.

 Are you saying you're only interested in working on it now? That after
 9.3 is release you won't be interested in working on it any more?

 As you said we've been eyeing this particular logic since 2004, why
 did it suddenly become more urgent now? Why didn't you work on it 9
 months ago at the beginning of the release cycle?

 I'm not sure why your comments are so confrontational here, but I
 don't think it helps much. I'm happy to buy you a beer too.

 As I explained clearly in my first post, this idea came about trying
 to improve on the negative aspects of the checksum patch. People were
 working on ideas 9 months ago to resolve this, but they have come to
 nothing. I regret that; Merlin and others have worked hard to find a
 way: Respect to them.

 My suggestion is to implement a feature that takes 1 day to write and
 needs little testing to show it works.

Any patch in this area isn't likely to take much testing to establish
whether it improves some particular case.  The problem is what happens
to all of the other cases - and I don't believe that part needs little
testing, hence the objections (with which I agree) to doing anything
about this now.

If we want to change something in this area, we might consider
resurrecting the patch I worked on for this last year, which had, I
believe, a fairly similar mechanism of operation to what you're
proposing, and some other nice properties as well:

http://www.postgresql.org/message-id/aanlktik5qzr8wts0mqcwwmnp-qhgrdky5av5aob7w...@mail.gmail.com
http://www.postgresql.org/message-id/aanlktimgkag7wdu-x77gnv2gh6_qo5ss1u5b6q1ms...@mail.gmail.com

...but I think the main reason why that never went anywhere is because
we never really had any confidence that the upsides were worth the
downsides.  Fundamentally, postponing hint bit setting (or hint bit
I/O) increases the total amount of work done by the system.  You still
end up writing the hint bits eventually, and in the meantime you do
more CLOG lookups.  Now, as a compensating benefit, you can spread the
work of writing the hint-bit updated pages out over a longer period of
time, so that no single query carries too much of the burden of
getting the bits set.  The worst-case-latency vs. aggregate-throughput
tradeoff is one with a long history and I think it's appropriate to
view this problem through that lens also.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interesting post-mortem on a near disaster with git

2013-03-26 Thread Cédric Villemain
Le lundi 25 mars 2013 19:35:12, Daniel Farina a écrit :
 On Mon, Mar 25, 2013 at 11:07 AM, Stefan Kaltenbrunner
 
 ste...@kaltenbrunner.cc wrote:
  Back when we used CVS for quite a few years I kept 7 day rolling
  snapshots of the CVS repo, against just such a difficulty as this. But
  we seem to be much better organized with infrastructure these days so I
  haven't done that for a long time.
  
  well there is always room for improvement(and for learning from others)
  - but I agree that this proposal seems way overkill. If people think we
  should keep online delayed mirrors we certainly have the resources to
  do that on our own if we want though...
 
 What about rdiff-backup?  I've set it up for personal use years ago
 (via the handy open source bash script backupninja) years ago and it
 has a pretty nice no-frills point-in-time, self-expiring, file-based
 automatic backup program that works well with file synchronization
 like rsync (I rdiff-backup to one disk and rsync the entire
 rsync-backup output to another disk).  I've enjoyed using it quite a
 bit during my own personal-computer emergencies and thought the
 maintenance required from me has been zero, and I have used it from
 time to time to restore, proving it even works.  Hardlinks can be used
 to tag versions of a file-directory tree recursively relatively
 compactly.
 
 It won't be as compact as a git-aware solution (since git tends to to
 rewrite entire files, which will confuse file-based incremental
 differential backup), but the amount of data we are talking about is
 pretty small, and as far as a lowest-common-denominator tradeoff for
 use in emergencies, I have to give it a lot of praise.  The main
 advantage it has here is it implements point-in-time recovery
 operations that easy to use and actually seem to work.  That said,
 I've mostly done targeted recoveries rather than trying to recover
 entire trees.

I have the same set up, and same feedback.
-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation


signature.asc
Description: This is a digitally signed message part.


Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-03-26 Thread Robert Haas
On Sun, Mar 24, 2013 at 10:02 PM, Josh Berkus j...@agliodbs.com wrote:
 On 03/20/2013 04:45 PM, Brendan Jurd wrote:
 Incompatibility:
 This patch introduces an incompatible change in the behaviour of the
 aforementioned array functions -- instead of returning NULL for empty
 arrays they return meaningful values.  Applications relying on the old
 behaviour to test for emptiness may be disrupted.  One can

 As a heavy user of arrays, I support this patch as being worth the
 breakage of backwards compatibility.  However, that means it certainly
 will need to wait for 9.4 to provide adequate warning.

I expect to lose this argument, but I think this is a terrible idea.
Users really hate it when they try to upgrade and find that they, uh,
can't, because of some application-level incompatibility like this.
They hate it twice as much when the change is essentially cosmetic.
There's no functional problems with arrays as they exist today that
this change would solve.

The way to make a change like this without breaking things for users
is to introduce a new type with different behavior and gradually
deprecate the old one.  Now, maybe it doesn't seem worth doing that
for a change this small.  But if so, I think that's evidence that this
isn't worth changing in the first place, not that it's worth changing
without regard for backwards-compatibility.

Personally, I think if we're going to start whacking around the
behavior here and risk inconveniencing our users, we ought to think a
little larger.  The fundamental thing that's dictating the current
behavior is that we have arrays of between 1 and 6 dimensions all
rolled up under a single data type.  But in many cases, if not nearly
all cases, what people want is, specifically, a one-dimensional array.
 If we were going to actually bite the bullet and create separate data
types for each possible number of array dimensions... and maybe fix
some other problems at the same time... then the effort involved in
ensuring backward-compatibility might seem like time better spent.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-03-26 Thread Pavel Stehule
2013/3/26 Robert Haas robertmh...@gmail.com:
 On Sun, Mar 24, 2013 at 10:02 PM, Josh Berkus j...@agliodbs.com wrote:
 On 03/20/2013 04:45 PM, Brendan Jurd wrote:
 Incompatibility:
 This patch introduces an incompatible change in the behaviour of the
 aforementioned array functions -- instead of returning NULL for empty
 arrays they return meaningful values.  Applications relying on the old
 behaviour to test for emptiness may be disrupted.  One can

 As a heavy user of arrays, I support this patch as being worth the
 breakage of backwards compatibility.  However, that means it certainly
 will need to wait for 9.4 to provide adequate warning.

 I expect to lose this argument, but I think this is a terrible idea.
 Users really hate it when they try to upgrade and find that they, uh,
 can't, because of some application-level incompatibility like this.
 They hate it twice as much when the change is essentially cosmetic.
 There's no functional problems with arrays as they exist today that
 this change would solve.

 The way to make a change like this without breaking things for users
 is to introduce a new type with different behavior and gradually
 deprecate the old one.  Now, maybe it doesn't seem worth doing that
 for a change this small.  But if so, I think that's evidence that this
 isn't worth changing in the first place, not that it's worth changing
 without regard for backwards-compatibility.

 Personally, I think if we're going to start whacking around the
 behavior here and risk inconveniencing our users, we ought to think a
 little larger.  The fundamental thing that's dictating the current
 behavior is that we have arrays of between 1 and 6 dimensions all
 rolled up under a single data type.  But in many cases, if not nearly
 all cases, what people want is, specifically, a one-dimensional array.
  If we were going to actually bite the bullet and create separate data
 types for each possible number of array dimensions... and maybe fix
 some other problems at the same time... then the effort involved in
 ensuring backward-compatibility might seem like time better spent.


I understand, but I don't agree. W have to fix impractical design of
arrays early. A ARRAY is 1st class - so there is not possible to use
varchar2 trick.

if we don't would to use GUC, what do you think about compatible
extension? We can overload a system functions behave. This can solve a
problem with updates and migrations.

Regards

Pavel


 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Limiting setting of hint bits by read-only queries; vacuum_delay

2013-03-26 Thread Simon Riggs
On 26 March 2013 11:33, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Mar 26, 2013 at 5:27 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 26 March 2013 01:35, Greg Stark st...@mit.edu wrote:
 On Tue, Mar 26, 2013 at 12:00 AM, Simon Riggs si...@2ndquadrant.com wrote:
 I'll bet you all a beer at PgCon 2014 that this remains unresolved at
 that point.

 Are you saying you're only interested in working on it now? That after
 9.3 is release you won't be interested in working on it any more?

 As you said we've been eyeing this particular logic since 2004, why
 did it suddenly become more urgent now? Why didn't you work on it 9
 months ago at the beginning of the release cycle?

 I'm not sure why your comments are so confrontational here, but I
 don't think it helps much. I'm happy to buy you a beer too.

 As I explained clearly in my first post, this idea came about trying
 to improve on the negative aspects of the checksum patch. People were
 working on ideas 9 months ago to resolve this, but they have come to
 nothing. I regret that; Merlin and others have worked hard to find a
 way: Respect to them.

 My suggestion is to implement a feature that takes 1 day to write and
 needs little testing to show it works.

 Any patch in this area isn't likely to take much testing to establish
 whether it improves some particular case.  The problem is what happens
 to all of the other cases - and I don't believe that part needs little
 testing, hence the objections (with which I agree) to doing anything
 about this now.

 If we want to change something in this area, we might consider
 resurrecting the patch I worked on for this last year, which had, I
 believe, a fairly similar mechanism of operation to what you're
 proposing, and some other nice properties as well:

 http://www.postgresql.org/message-id/aanlktik5qzr8wts0mqcwwmnp-qhgrdky5av5aob7w...@mail.gmail.com
 http://www.postgresql.org/message-id/aanlktimgkag7wdu-x77gnv2gh6_qo5ss1u5b6q1ms...@mail.gmail.com

 ...but I think the main reason why that never went anywhere is because
 we never really had any confidence that the upsides were worth the
 downsides.  Fundamentally, postponing hint bit setting (or hint bit
 I/O) increases the total amount of work done by the system.  You still
 end up writing the hint bits eventually, and in the meantime you do
 more CLOG lookups.  Now, as a compensating benefit, you can spread the
 work of writing the hint-bit updated pages out over a longer period of
 time, so that no single query carries too much of the burden of
 getting the bits set.  The worst-case-latency vs. aggregate-throughput
 tradeoff is one with a long history and I think it's appropriate to
 view this problem through that lens also.

I hadn't realised so many patches existed that were similar. Hackers
is bigger these days.

Reviewing the patch, I'd say the problem is that it is basically
implementing a new automatic heuristic. We simply don't have any
evidence that any new heuristic will work for all cases, so we do
nothing.

Whether we apply my patch, yours or Merlin's, my main thought now is
that we need a user parameter to control it so it can be adjusted
according to need and not touched at all if there is no problem.

My washing machine has a wonderful feature 15 min wash and it works
great for the times I know I need it; but in general, the auto wash
mode works fine since often you don't care that it takes 90 minutes.
It's much easier to see that the additional user option is beneficial,
but much harder to start arguing that the default wash cycle should be
85 or 92 minutes. It'd be great if the washing machine could work out
that I need my clothes quickly and that on-this-day-only I don't care
about the thoroughness of the wash, but it can't. I don't think the
washing machine engineers are idiots for not being able to work that
out, but if they only offered a single option because they thought
they knew better than me, I'd be less than impressed.

In the same way, we need some way to say big queries shouldn't do
cleanup even if autovacuum ends up doing more I/O over time (though
in fact I doubt this is the case, detailed argument on other post).

So please, lets go with a simple solution now that allows users to say
what they want.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-03-26 Thread Brendan Jurd
On 26 March 2013 22:57, Robert Haas robertmh...@gmail.com wrote:
 They hate it twice as much when the change is essentially cosmetic.
 There's no functional problems with arrays as they exist today that
 this change would solve.


We can't sensibly test for whether an array is empty.  I'd call that a
functional problem.

The NULL return from array_{length,lower,upper,ndims} is those
functions' way of saying their arguments failed a sanity check.  So we
cannot distinguish in a disciplined way between a valid, empty array,
and bad arguments.  If the zero-D implementation had been more
polished and say, array_ndims returned zero, we had provided an
array_empty function, or the existing functions threw errors for silly
arguments instead of returning NULL, then I'd be more inclined to see
your point.  But as it stands, the zero-D implementation has always
been half-baked and slightly broken, we just got used to working
around it.

Cheers,
BJ


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Limiting setting of hint bits by read-only queries; vacuum_delay

2013-03-26 Thread Merlin Moncure
On Tue, Mar 26, 2013 at 7:30 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 26 March 2013 11:33, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Mar 26, 2013 at 5:27 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 26 March 2013 01:35, Greg Stark st...@mit.edu wrote:
 On Tue, Mar 26, 2013 at 12:00 AM, Simon Riggs si...@2ndquadrant.com 
 wrote:
 I'll bet you all a beer at PgCon 2014 that this remains unresolved at
 that point.

 Are you saying you're only interested in working on it now? That after
 9.3 is release you won't be interested in working on it any more?

 As you said we've been eyeing this particular logic since 2004, why
 did it suddenly become more urgent now? Why didn't you work on it 9
 months ago at the beginning of the release cycle?

 I'm not sure why your comments are so confrontational here, but I
 don't think it helps much. I'm happy to buy you a beer too.

 As I explained clearly in my first post, this idea came about trying
 to improve on the negative aspects of the checksum patch. People were
 working on ideas 9 months ago to resolve this, but they have come to
 nothing. I regret that; Merlin and others have worked hard to find a
 way: Respect to them.

 My suggestion is to implement a feature that takes 1 day to write and
 needs little testing to show it works.

 Any patch in this area isn't likely to take much testing to establish
 whether it improves some particular case.  The problem is what happens
 to all of the other cases - and I don't believe that part needs little
 testing, hence the objections (with which I agree) to doing anything
 about this now.

 If we want to change something in this area, we might consider
 resurrecting the patch I worked on for this last year, which had, I
 believe, a fairly similar mechanism of operation to what you're
 proposing, and some other nice properties as well:

 http://www.postgresql.org/message-id/aanlktik5qzr8wts0mqcwwmnp-qhgrdky5av5aob7w...@mail.gmail.com
 http://www.postgresql.org/message-id/aanlktimgkag7wdu-x77gnv2gh6_qo5ss1u5b6q1ms...@mail.gmail.com

 ...but I think the main reason why that never went anywhere is because
 we never really had any confidence that the upsides were worth the
 downsides.  Fundamentally, postponing hint bit setting (or hint bit
 I/O) increases the total amount of work done by the system.  You still
 end up writing the hint bits eventually, and in the meantime you do
 more CLOG lookups.  Now, as a compensating benefit, you can spread the
 work of writing the hint-bit updated pages out over a longer period of
 time, so that no single query carries too much of the burden of
 getting the bits set.  The worst-case-latency vs. aggregate-throughput
 tradeoff is one with a long history and I think it's appropriate to
 view this problem through that lens also.

 I hadn't realised so many patches existed that were similar. Hackers
 is bigger these days.

 Reviewing the patch, I'd say the problem is that it is basically
 implementing a new automatic heuristic. We simply don't have any
 evidence that any new heuristic will work for all cases, so we do
 nothing.

 Whether we apply my patch, yours or Merlin's, my main thought now is
 that we need a user parameter to control it so it can be adjusted
 according to need and not touched at all if there is no problem.

After a night thinking about this, I'd like to make some points:

*) my patch deliberately did not 'set bits without dirty' -- with
checksums in mind as you noted (thanks for that).  I think the upside
for marking pages in that fasion anyways is overrated.

*) Any strategy that does not approximate hint bit behavior IMNSHO is
a non-starter.  By that I mean when your $condition is met so that
hint bits are not being written out, scans need to bail out of
HeapTupleSatisfiesMVCC processing with a cheap check.  If you don't do
that and rely on the transam.c guard, you've already missed the boat:
the even without clog lookup the extra processing there I can assure
you will show up in profiling of repeated scans (until vacuum).

*) The case of sequential tuples with the same xid is far and away the
most important one.  In OLTP workloads hint bit i/o is minor compared
to everything else going on.  Also, OLTP workloads are probably better
handled with an hint bit check just before eviction via bgwriter vs
during scan.

*) The budget for extra work inside HeapTupleSatisfiesMVCC is
exceptionally low.  For this reason, I think your idea would be better
framed at the page level and the bailout should be measured in the
number of pages, not tuples (that way the page can send in a single
boolean to control hint bit behavior).

*) The upside of optimizing xmax processing is fairly low for most
workloads I've seen

*) The benchmarking Amit and Hari did needs analysis.

*) For off-cycle release work that would help enable patches with
complex performance trade-offs (I'm working up another patch that has
even more compelling benefits and risks in the buffer allocator),  

[HACKERS] pg_dump in current master segfaults when dumping 9.2/9.1 databases

2013-03-26 Thread Bernd Helmle

My current master segfaults with pg_dump when dumping a 9.1 or 9.2 database:

$ LC_ALL=en_US.utf8 pg_dump -s -p 5448
pg_dump: column number -1 is out of range 0..22
zsh: segmentation fault  LC_ALL=en_US.utf8 pg_dump -s -p 5448

The reason seems to be that getTables() in pg_dump.c forget to select 
relpages in the query for releases = 90100. The error message comes from 
PQgetvalue(res, i, i_relpages), which complains about i_relpages being -1, 
which will then return NULL...


--
Thanks

Bernd


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump in current master segfaults when dumping 9.2/9.1 databases

2013-03-26 Thread Heikki Linnakangas

On 26.03.2013 15:31, Bernd Helmle wrote:

My current master segfaults with pg_dump when dumping a 9.1 or 9.2
database:

$ LC_ALL=en_US.utf8 pg_dump -s -p 5448
pg_dump: column number -1 is out of range 0..22
zsh: segmentation fault LC_ALL=en_US.utf8 pg_dump -s -p 5448

The reason seems to be that getTables() in pg_dump.c forget to select
relpages in the query for releases = 90100. The error message comes
from PQgetvalue(res, i, i_relpages), which complains about i_relpages
being -1, which will then return NULL...


Thanks, fixed.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Limiting setting of hint bits by read-only queries; vacuum_delay

2013-03-26 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 So please, lets go with a simple solution now that allows users to say
 what they want.

Simon, this is just empty posturing, as your arguments have nothing
whatsoever to do with whether the above description applies to your
patch.

More generally, the fact that a patch has some user-frobbable knob
does not mean that it's actually a good or even usable solution.  As
everybody keeps saying, testing on a wide range of use-cases would be
needed to prove that, and we don't have enough time left for such
testing in the 9.3 timeframe.  This problem needs to be attacked in
an organized and deliberate fashion, not by hacking something up under
time pressure and shipping it with minimal testing.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Back-branch security updates coming next week

2013-03-26 Thread Tom Lane
The core team has received word of a seriously nasty security problem
in recent releases of Postgres.  We will be wrapping update releases
to fix this next week, following the new usual schedule of tarball
wrap Monday afternoon EDT, public announcement Thursday (4/4).

Committers are reminded that it's uncool to commit any potentially
destabilizing changes to back branches in the last day or two before
a release wrap.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Limiting setting of hint bits by read-only queries; vacuum_delay

2013-03-26 Thread Simon Riggs
On 26 March 2013 14:44, Tom Lane t...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 So please, lets go with a simple solution now that allows users to say
 what they want.

 Simon, this is just empty posturing, as your arguments have nothing
 whatsoever to do with whether the above description applies to your
 patch.

Waiting for an auto-tuned solution to *every* problem means we just
sit and watch bad things happen, knowing how to fix them for
particular cases yet not being able to do anything at all.

 More generally, the fact that a patch has some user-frobbable knob
 does not mean that it's actually a good or even usable solution.  As
 everybody keeps saying, testing on a wide range of use-cases would be
 needed to prove that, and we don't have enough time left for such
 testing in the 9.3 timeframe.  This problem needs to be attacked in
 an organized and deliberate fashion, not by hacking something up under
 time pressure and shipping it with minimal testing.

Well, it has been tackled like that and we've *all* got nowhere. No
worries, I can wait a year for that beer.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] odd behavior in materialized view

2013-03-26 Thread Kevin Grittner
Fujii Masao masao.fu...@gmail.com wrote:

 Ping? ISTM this problem has not been fixed in HEAD yet.

It's next on my list.  The other reports seemed more serious and
more likely to be contentious in terms of the best fix.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ideas for improving Concurrency Tests

2013-03-26 Thread Greg Stark
On Tue, Mar 26, 2013 at 7:31 AM, Amit Kapila amit.kap...@huawei.com wrote:
 Above ideas could be useful to improve concurrency testing and can also be
 helpful to generate test cases for some of the complicated bugs for which
 there is no direct test.

I wonder how much explicit sync points would help with testing though.
It seems like they suffer from the problem that you'll only put sync
points where you actually expect problems and not where you don't
expect them -- which is exactly where problems are likely to occur.

Wouldn't it be more useful to implicitly create sync points whenever
synchronization events like spinlocks being taken occur?

And likewise explicitly listing the timing sequences to test seems
unconvincing. If we could arrange for two threads to execute every
possible interleaving of code by exhaustively trying every combination
that would be far more convincing. Most bugs are likely to hang out in
combinations we don't see in practice -- for instance having a tuple
deleted and a new one inserted in the same slot in the time a
different transaction was context switched out.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page replacement algorithm in buffer cache

2013-03-26 Thread Bruce Momjian
On Fri, Mar 22, 2013 at 06:06:18PM +, Greg Stark wrote:
 On Fri, Mar 22, 2013 at 2:02 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  And we definitely looked at ARC
 
 We didn't just look at it. At least one release used it. Then patent
 issues were raised (and I think the implementation had some contention
 problems).

The problem was cache line overhead between CPUs to manage the ARC
queues.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page replacement algorithm in buffer cache

2013-03-26 Thread Bruce Momjian
On Fri, Mar 22, 2013 at 04:16:18PM -0400, Tom Lane wrote:
 Merlin Moncure mmonc...@gmail.com writes:
  I think there is some very low hanging optimization fruit in the clock
  sweep loop.   first and foremost, I see no good reason why when
  scanning pages we have to spin and wait on a buffer in order to
  pedantically adjust usage_count.  some simple refactoring there could
  set it up so that a simple TAS (or even a TTAS with the first test in
  front of the cache line lock as we done automatically in x86 IIRC)
  could guard the buffer and, in the event of any lock detected, simply
  move on to the next candidate without messing around with that buffer
  at all.   This could construed as a 'trylock' variant of a spinlock
  and might help out with cases where an especially hot buffer is
  locking up the sweep.  This is exploiting the fact that from
  StrategyGetBuffer we don't need a *particular* buffer, just *a*
  buffer.
 
 Hm.  You could argue in fact that if there's contention for the buffer
 header, that's proof that it's busy and shouldn't have its usage count
 decremented.  So this seems okay from a logical standpoint.
 
 However, I'm not real sure that it's possible to do a conditional
 spinlock acquire that doesn't create just as much hardware-level
 contention as a full acquire (ie, TAS is about as bad whether it
 gets the lock or not).  So the actual benefit is a bit less clear.

Could we view the usage count, and if it is 5, and if there is someone
holding the lock, we just ignore the buffer and move on to the next
buffer?  Seems that could be done with no locking.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page replacement algorithm in buffer cache

2013-03-26 Thread Merlin Moncure
On Tue, Mar 26, 2013 at 11:40 AM, Bruce Momjian br...@momjian.us wrote:
 On Fri, Mar 22, 2013 at 04:16:18PM -0400, Tom Lane wrote:
 Merlin Moncure mmonc...@gmail.com writes:
  I think there is some very low hanging optimization fruit in the clock
  sweep loop.   first and foremost, I see no good reason why when
  scanning pages we have to spin and wait on a buffer in order to
  pedantically adjust usage_count.  some simple refactoring there could
  set it up so that a simple TAS (or even a TTAS with the first test in
  front of the cache line lock as we done automatically in x86 IIRC)
  could guard the buffer and, in the event of any lock detected, simply
  move on to the next candidate without messing around with that buffer
  at all.   This could construed as a 'trylock' variant of a spinlock
  and might help out with cases where an especially hot buffer is
  locking up the sweep.  This is exploiting the fact that from
  StrategyGetBuffer we don't need a *particular* buffer, just *a*
  buffer.

 Hm.  You could argue in fact that if there's contention for the buffer
 header, that's proof that it's busy and shouldn't have its usage count
 decremented.  So this seems okay from a logical standpoint.

 However, I'm not real sure that it's possible to do a conditional
 spinlock acquire that doesn't create just as much hardware-level
 contention as a full acquire (ie, TAS is about as bad whether it
 gets the lock or not).  So the actual benefit is a bit less clear.

 Could we view the usage count, and if it is 5, and if there is someone
 holding the lock, we just ignore the buffer and move on to the next
 buffer?  Seems that could be done with no locking.

The idea is that if someone is holding the lock to completely ignore
the buffer regardless of usage.  Quotes there because we test the lock
without cacheline lock.  Now if the buffer is apparently unlocked but
returns locked once you *do* acquire cache line lock in anticipation
of refcounting, again immediately bail and go to next buffer.

I see no reason whatsoever to have buffer allocator spin and wait on a
blocked buffer.  This is like jumping onto a merry-go-round being spun
by sadistic teenagers.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Limiting setting of hint bits by read-only queries; vacuum_delay

2013-03-26 Thread Bruce Momjian
On Tue, Mar 26, 2013 at 03:06:30PM +, Simon Riggs wrote:
  More generally, the fact that a patch has some user-frobbable knob
  does not mean that it's actually a good or even usable solution.  As
  everybody keeps saying, testing on a wide range of use-cases would be
  needed to prove that, and we don't have enough time left for such
  testing in the 9.3 timeframe.  This problem needs to be attacked in
  an organized and deliberate fashion, not by hacking something up under
  time pressure and shipping it with minimal testing.
 
 Well, it has been tackled like that and we've *all* got nowhere. No
 worries, I can wait a year for that beer.

This was the obvious result of this discussion --- it is a shame we had
to discuss this rather than working on more pressing 9.3 issues.  I also
think someone saying I would like to apply this now is more disruptive
than casual discussion about things like buffer count locking.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] adding support for zero-attribute unique/etc keys

2013-03-26 Thread Darren Duncan

On 2013.03.26 1:40 AM, Albe Laurenz wrote:

Darren Duncan wrote:

So, determining if 2 rows are the same involves an iteration of dyadic logical
AND over the predicates for each column comparison.  Now logical AND has an
identity value, which is TRUE, because TRUE AND p (and p AND TRUE) results
in p for all p.  Therefore, any 2 rows with zero columns each are the same.

Since any 2 rows with zero columns are the same, the UNIQUE predicate is FALSE
any time there is more than 1 row in a table.

Does anyone agree or disagree with this logic?


Yes :^)

You could use the same kind of argument like this:

UNIQUE is true iff any two rows in T satisfy for each column:
the column in row 1 is null OR the column in row 2 is null OR
the column in row 1 is distinct from the column in row 2

Now you you iterate your logical AND over this predicate
for all columns and come up with TRUE since there are none.
Consequently UNIQUE is satisfied, no matter how many rows there are.

In a nutshell:
All members of the empty set satisfy p, but also:
all members of the empty set satisfy the negation of p.

You can use this technique to make anything plausible.


Consider the context however.  We're talking about a UNIQUE constraint and so 
what we want to do is prevent the existence of multiple tuples in a relation 
that are the same for some defined subset of their attributes.  I would argue 
that logically, and commonsensically, two tuples with no attributes are the 
same, and hence a set of distinct tuples having zero attributes could have no 
more than one member, and so a UNIQUE constraint over zero attributes would say 
the relation can't have more than one tuple.  So unless someone wants to argue 
that two tuples with no attributes are not the same, my interpretation makes 
more sense and is clearly the one to follow. -- Darren Duncan




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add PF_PRINTF_ATTRIBUTE to on_exit_msg_fmt.

2013-03-26 Thread Heikki Linnakangas

On 26.03.2013 09:51, Heikki Linnakangas wrote:

On 26.03.2013 02:02, Tom Lane wrote:

Heikki Linnakangashlinnakan...@vmware.com writes:

On 25.03.2013 15:36, Tom Lane wrote:

Heikki Linnakangasheikki.linnakan...@iki.fi writes:

Add PF_PRINTF_ATTRIBUTE to on_exit_msg_fmt.
Per warning from -Wmissing-format-attribute.



Hm, this is exactly what I removed yesterday, because it makes the
build
fail outright on old gcc:



The attached seems to work. With this patch, on_exit_msg_func() is gone.
There's a different implementation of exit_horribly for pg_dumpall and
pg_dump/restore. In pg_dumpall, it just calls vwrite_msg(). In
pg_dump/restore's version, the logic from parallel_exit_msg_func() is
moved directly to exit_horribly().


Seems probably reasonable, though if we're taking exit_horribly out of
dumputils.c, meseems it ought not be declared in dumputils.h anymore.
Can we put that declaration someplace else, rather than commenting it
with an apology?


Ugh, the patch I posted doesn't actually work, because dumputils.c is
also used in psql and some scripts, so you get a linker error in those.
psql and scripts don't use exit_horribly or many of the other functions
in dumputils.c, so I think we should split dumputils.c into two parts
anyway. fmtId and the other functions that are used by psql in one file,
and the functions that are only shared between pg_dumpall and pg_dump in
another. Then there's also functions that are used by pg_dump and
pg_restore, but not pg_dumpall or psql.

I'll try moving things around a bit...


This is what I came up with. I created a new file, misc.c (for lack of a 
better name), for things that are shared by pg_dump and pg_restore, but 
not pg_dumpall or other programs. I moved all the parallel stuff from 
dumputils.c to parallel.c, and everything else that's not used outside 
pg_dump and pg_restore to misc.c. I moved exit_horribly() to parallel.c, 
because it needs to do things differently in parallel mode.


I still used a function pointer, not for the printf-style message 
printing routine, but for making dumputils.c independent of parallel 
mode. getThreadLocalPQBuffer() is now a function pointer; the default 
implementation just uses a static variable, but when pg_dump/restore 
enters parallel mode, it points the function pointer to a version that 
uses thread-local storage (on windows).


- Heikki
*** a/src/bin/pg_dump/Makefile
--- b/src/bin/pg_dump/Makefile
***
*** 19,25  include $(top_builddir)/src/Makefile.global
  override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
  
  OBJS=	pg_backup_archiver.o pg_backup_db.o pg_backup_custom.o \
! 	pg_backup_null.o pg_backup_tar.o parallel.o \
  	pg_backup_directory.o dumputils.o compress_io.o $(WIN32RES)
  
  KEYWRDOBJS = keywords.o kwlookup.o
--- 19,25 
  override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
  
  OBJS=	pg_backup_archiver.o pg_backup_db.o pg_backup_custom.o \
! 	pg_backup_null.o pg_backup_tar.o parallel.o misc.o \
  	pg_backup_directory.o dumputils.o compress_io.o $(WIN32RES)
  
  KEYWRDOBJS = keywords.o kwlookup.o
*** a/src/bin/pg_dump/common.c
--- b/src/bin/pg_dump/common.c
***
*** 14,19 
--- 14,20 
   *-
   */
  #include pg_backup_archiver.h
+ #include misc.h
  
  #include ctype.h
  
*** a/src/bin/pg_dump/compress_io.c
--- b/src/bin/pg_dump/compress_io.c
***
*** 53,59 
   */
  
  #include compress_io.h
! #include dumputils.h
  #include parallel.h
  
  /*--
--- 53,59 
   */
  
  #include compress_io.h
! #include misc.h
  #include parallel.h
  
  /*--
*** a/src/bin/pg_dump/dumputils.c
--- b/src/bin/pg_dump/dumputils.c
***
*** 25,45 
  extern const ScanKeyword FEScanKeywords[];
  extern const int NumFEScanKeywords;
  
- /* Globals exported by this file */
- int			quote_all_identifiers = 0;
- const char *progname = NULL;
- 
- #define MAX_ON_EXIT_NICELY20
- 
- static struct
- {
- 	on_exit_nicely_callback function;
- 	void	   *arg;
- }	on_exit_nicely_list[MAX_ON_EXIT_NICELY];
- 
- static int	on_exit_nicely_index;
- void		(*on_exit_msg_func) (const char *modulename, const char *fmt, va_list ap) = vwrite_msg;
- 
  #define supports_grant_options(version) ((version) = 70400)
  
  static bool parseAclItem(const char *item, const char *type,
--- 25,30 
***
*** 49,116  static bool parseAclItem(const char *item, const char *type,
  static char *copyAclUserName(PQExpBuffer output, char *input);
  static void AddAcl(PQExpBuffer aclbuf, const char *keyword,
  	   const char *subname);
! static PQExpBuffer getThreadLocalPQExpBuffer(void);
! 
! #ifdef WIN32
! static void shutdown_parallel_dump_utils(int code, void *unused);
! static bool parallel_init_done = false;
! static DWORD tls_index;
! static DWORD mainThreadId;
  
! static void
! shutdown_parallel_dump_utils(int code, void *unused)
! {
! 	/* Call the cleanup function 

Re: Remove invalid indexes from pg_dump Was: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-26 Thread Fujii Masao
On Tue, Mar 19, 2013 at 9:19 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 If failures happen with CREATE INDEX CONCURRENTLY, the system will be let
 with invalid indexes. I don't think that the user would like to see invalid
 indexes of
 an existing system being recreated as valid after a restore.
 So why not removing from a dump invalid indexes with something like the
 patch
 attached?

+1

The patch looks good to me.

 This should perhaps be applied in pg_dump for versions down to 8.2 where
 CREATE
 INDEX CONCURRENTLY has been implemented?

I think so.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Limiting setting of hint bits by read-only queries; vacuum_delay

2013-03-26 Thread Kevin Grittner
Merlin Moncure mmonc...@gmail.com wrote:

 *) For off-cycle release work that would help enable patches with
 complex performance trade-offs (I'm working up another patch that has
 even more compelling benefits and risks in the buffer allocator),  We
 desperately need a standard battery of comprehensive performance tests
 and doner machines.

Such a thing would vastly reduce the time needed to work on
something like this with confidence that it would not be a disaster
for some unidentified workload.  Sure, something could still slip
though the cracks -- but they would *be* cracks, not a wide gaping
hole.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add PF_PRINTF_ATTRIBUTE to on_exit_msg_fmt.

2013-03-26 Thread Alvaro Herrera
Heikki Linnakangas wrote:

 This is what I came up with. I created a new file, misc.c (for lack
 of a better name), for things that are shared by pg_dump and
 pg_restore, but not pg_dumpall or other programs. I moved all the
 parallel stuff from dumputils.c to parallel.c, and everything else
 that's not used outside pg_dump and pg_restore to misc.c. I moved
 exit_horribly() to parallel.c, because it needs to do things
 differently in parallel mode.

Not happy with misc.c as a filename.  How about pg_dump_utils.c or
pg_dump_misc.c?  I think the comment at the top should explicitely say
that the file is intended not to be linked in pg_dumpall.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-26 Thread Fujii Masao
On Sun, Mar 24, 2013 at 12:37 PM, Michael Paquier
michael.paqu...@gmail.com wrote:


 On Sat, Mar 23, 2013 at 10:20 PM, Andres Freund and...@2ndquadrant.com
 wrote:

 On 2013-03-22 07:38:36 +0900, Michael Paquier wrote:
  Is someone planning to provide additional feedback about this patch at
  some
  point?

 Yes, now that I have returned from my holidays - or well, am returning
 from them, I do plan to. But it should probably get some implementation
 level review from somebody but Fujii and me...

 Yeah, it would be good to have an extra pair of fresh eyes looking at those
 patches.

Probably I don't have enough time to review the patch thoroughly. It's quite
helpful if someone becomes another reviewer of this patch.

 Please find new patches realigned with HEAD. There were conflicts with 
 commits done recently.

ISTM you failed to make the patches from your repository.
20130323_1_toastindex_v7.patch contains all the changes of
20130323_2_reindex_concurrently_v25.patch

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] adding support for zero-attribute unique/etc keys

2013-03-26 Thread Gavin Flower

On 27/03/13 06:14, Darren Duncan wrote:

On 2013.03.26 1:40 AM, Albe Laurenz wrote:

Darren Duncan wrote:
So, determining if 2 rows are the same involves an iteration of 
dyadic logical
AND over the predicates for each column comparison.  Now logical AND 
has an
identity value, which is TRUE, because TRUE AND p (and p AND 
TRUE) results
in p for all p.  Therefore, any 2 rows with zero columns each 
are the same.


Since any 2 rows with zero columns are the same, the UNIQUE 
predicate is FALSE

any time there is more than 1 row in a table.

Does anyone agree or disagree with this logic?


Yes :^)

You could use the same kind of argument like this:

UNIQUE is true iff any two rows in T satisfy for each column:
the column in row 1 is null OR the column in row 2 is null OR
the column in row 1 is distinct from the column in row 2

Now you you iterate your logical AND over this predicate
for all columns and come up with TRUE since there are none.
Consequently UNIQUE is satisfied, no matter how many rows there are.

In a nutshell:
All members of the empty set satisfy p, but also:
all members of the empty set satisfy the negation of p.

You can use this technique to make anything plausible.


Consider the context however.  We're talking about a UNIQUE constraint 
and so what we want to do is prevent the existence of multiple tuples 
in a relation that are the same for some defined subset of their 
attributes.  I would argue that logically, and commonsensically, two 
tuples with no attributes are the same, and hence a set of distinct 
tuples having zero attributes could have no more than one member, and 
so a UNIQUE constraint over zero attributes would say the relation 
can't have more than one tuple. So unless someone wants to argue that 
two tuples with no attributes are not the same, my interpretation 
makes more sense and is clearly the one to follow. -- Darren Duncan




Hmm as a user, I would like at most one row with empty fields covered by 
a unique index.


Logical arguments to the contrary, remind me of the joke of the school 
boy who told his unlearned father that he had learnt logic and could 
prove that his father actually had 3 fish in his basket despite both 
seeing only 2 fish.  His unlearned father did not try to argue, and 
simply said: well your mother can have the first fish, I'll have the 
second, and that his learned son could have the third...





[HACKERS] spoonbill vs. -HEAD

2013-03-26 Thread Stefan Kaltenbrunner
Hi all!


I finally started to investigate why spoonbill stopped reporting to the
buildfarm feedback about 2 months ago.
It seems that the foreign-keys locking patch (or something commity very
close to January 23th) broke it in a fairly annoying way - running the
buildfarm script seems to
consistently stall during the isolationtester part of the regression
testing leaving the postgresql instance running causing all future
buildfarm runs to fail...


The process listing at that time looks like:

https://www.kaltenbrunner.cc/files/process_listing.txt


pg_stats_activity of the running instance:


https://www.kaltenbrunner.cc/files/pg_stat_activity.txt


pg_locks:

https://www.kaltenbrunner.cc/files/pg_locks.txt


backtraces of the three backends:

https://www.kaltenbrunner.cc/files/bt_20467.txt
https://www.kaltenbrunner.cc/files/bt_20897.txt
https://www.kaltenbrunner.cc/files/bt_24285.txt




Stefan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sql_drop Event Triggerg

2013-03-26 Thread Alvaro Herrera
Robert Haas escribió:
 On Wed, Mar 20, 2013 at 5:42 PM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:
  Here's a new version of this patch, rebased on top of the new
  pg_identify_object() stuff.  Note that the regression test doesn't work
  yet, because I didn't adjust to the new identity output definition (the
  docs need work, too).  But that's a simple change to do.  I'm leaving
  that for later.
 
 I think this is getting there.  A few things to think about:

Thanks.

 - pg_event_trigger_dropped_objects seems to assume that
 currentEventTriggerState will be pointing to the same list on every
 call.  But is that necessarily true?  I'm thinking about a case where
 someone opens a cursor in an event trigger and then tries to read from
 that cursor later in the transaction.  I think you might be able to
 crash the server that way.

Well, no, because it uses materialized return mode, so there's no next
time --- the list is constructed completely before
pg_event_trigger_dropped_objects returns.  So there's no such hole.

 - I am not wild about the idea of propagating PG_TRY/PG_CATCH blocks
 into yet more places.  On Linux-x86 they are pretty cheap because
 Linux doesn't need a system call to change the signal mask and x86 has
 few registers that must be saved-and-restored, but elsewhere this can
 be a performance problem.  Now maybe ProcessUtility is not a
 sufficiently-frequently called function for this to matter... but I'm
 not sure.  The alternative is to teach the error handling pathways
 about this in somewhat greater detail, since the point of TRY/CATCH is
 to cleanup things that the regular error handling stuff doesn't now
 about.

I tried this and it doesn't work.  The error pathways you speak about
would be the xact.c entry points to commit and abort transactions;
however, there's a problem with that because some of the commands that
ProcessUtility() deals with have their own transaction management
calls internally; so those would call CommitTransaction() and the
event trigger state would go away, and then when control gets back to
ProcessUtility there would be nothing to clean up.  I think we could
ignore the problem, or install smarts in ProcessUtility to avoid calling
event_trigger.c when one of those commands is involved -- but this seems
to me a solution worse than the problem.  So all in all I feel like the
macro on top of PG_TRY is the way to go.

Now there *is* one rather big performance problem in this patch, which
is that it turns on collection of object dropped data regardless of
there being event triggers that use the info at all.  That's a serious
drawback and we're going to get complaints about it.  So we need to do
something to fix that.

One idea that comes to mind is to add some more things to the grammar,
CREATE EVENT TRIGGER name ... WITH ('DROPPED OBJECTS');
or some such, so that when events happen for which any triggers have
that flag enabled, *then* collecting is activated, otherwise not.  This
would be stored in a new column in pg_event_trigger (say evtsupport, a
char array much like proargmodes).  

The sequence of (ahem) events goes like this:

ProcessUtility()
  EventTriggerBeginCompleteQuery()
  EventTriggerDDLCommandStart()
EventCacheLookup()
EventTriggerInvoke()
  .. run whatever command we've been handed ...
  EventTriggerDDLCommandEnd()
EventCacheLookup()
EventTriggerInvoke()
  EventTriggerEndCompleteQuery()

So EventTriggerBeginCompleteQuery() will have to peek into the event
trigger cache for any ddl_command_end triggers that might apply, and see
if any of them has the flag for dropped objects.  If it's there, then
enable dropped object collection.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sql_drop Event Triggerg

2013-03-26 Thread Tom Lane
Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Now there *is* one rather big performance problem in this patch, which
 is that it turns on collection of object dropped data regardless of
 there being event triggers that use the info at all.  That's a serious
 drawback and we're going to get complaints about it.  So we need to do
 something to fix that.

 One idea that comes to mind is to add some more things to the grammar,
 CREATE EVENT TRIGGER name ... WITH ('DROPPED OBJECTS');

Uh ... surely we can just notice whether there's a trigger on the
object-drop event?  I don't understand why we need *yet more*
mechanism here.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-03-26 Thread Josh Berkus

 I expect to lose this argument, but I think this is a terrible idea.
 Users really hate it when they try to upgrade and find that they, uh,
 can't, because of some application-level incompatibility like this.
 They hate it twice as much when the change is essentially cosmetic.
 There's no functional problems with arrays as they exist today that
 this change would solve.

Sure there is.   How do you distinguish between an array which is NULL
and an array which is empty?

Also, the whole array_dims is NULL thing trips up pretty much every
single PG user who uses arrays for the first time.  I'd expect when we
announce the fix, we'll find that many users where doing the wrong thing
in the first place and didn't know why it wasn't working.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpgsql_check_function - rebase for 9.3

2013-03-26 Thread Pavel Stehule
Hello all

2013/3/26 Tom Lane t...@sss.pgh.pa.us:
 Josh Berkus j...@agliodbs.com writes:
 Where are we with this patch?  I'm a bit unclear from the discussion on
 whether it passes muster or not.  Things seem to have petered out.

 I took another look at this patch tonight.  I think the thing that is
 bothering everybody (including Pavel) is that as submitted, pl_check.c
 involves a huge amount of duplication of knowledge and code from
 pl_exec.c, and to a lesser extent pl_comp.c.  It certainly looks like a
 maintenance disaster in the making.  It doesn't bother me so much that
 pl_check.c knows about each syntactic structure in plpgsql: there are
 already four or five places you have to touch when adding new syntax.
 Rather, it's that there's so much duplication of knowledge about
 semantic details, which is largely expressed by copied-and-pasted code
 from pl_exec.c.  It seems like a safe bet that we'll sometimes miss the
 need to fix pl_check.c when we fix something in pl_exec.c.  There are
 annoying duplications from pl_comp.c as well, eg knowledge of exactly
 which magic variables are defined in trigger functions.

 Having said all that, it's not clear that we can really do any better.
 The only obvious alternative is pushing support for a checking mode
 directly into pl_exec.c, which would obfuscate and slow down that code
 to an unacceptable degree if Pavel's results at
 http://www.postgresql.org/message-id/cafj8prakujmvjpjzfsrye7+ub8jf8wtz5rkxk-0ykxme-k8...@mail.gmail.com
 are any indication.  (In that message, Pavel proposes shoveling the
 problem into the compile code instead, but that seems unlikely to work
 IMO: the main problem in pl_check.c as it stands is duplication of code
 from pl_exec.c not pl_comp.c.  So I think that path could only lead to
 duplicating the same code into pl_comp.c.)

 So question 1 is do we want to accept that this is the implementation
 pathway we're going to settle for, or are we going to hold out for a
 better one?  I'd be the first in line to hold out if I had a clue how
 to move towards a better implementation, but I have none.  Are we
 prepared to reject this type of feature entirely because of the
 code-duplication problem?  That doesn't seem attractive either.

I wrote lot of versions and proposed code is redundant, but most
simple and clean.

I am really against to pushing check to pl_exec, because it
significantly decrease code readability and increase some bottle neck
in CPU extensive tests. More, there are too less place for some future
feature - performance advising, more verbose error messages, etc

In PL/pgPSM I used a little bit different architecture - necessary for
PSM and maybe better for PL/pgSQL too. It is two stage compiler -
parsing to AST, and AST compilation. It simplify gram.y and processing
order depends on AST deep iteration and not on bizon rules. It can
have a impact on speed of very large procedures probably - I don't see
different disadvantages. With this architecture I was able do lot of
controls to compile stage without problems.

Most complexity in current code is related to detecting record types
from expressions without expression evaluation. Maybe this code can be
in core or pl_compile.c. Code for Describe (F) message is not too
reusable. It is necessary for

DECLARE r RECORD;
FOR r IN SELECT ...
LOOP
   RAISE NOTICE '%', r.xx;
END LOOP;


 But, even granting that this implementation approach is acceptable,
 the patch does not seem close to being committable as-is: there's
 a lot of fit-and-finish work yet to be done.  To make my point I will
 just quote from one of the regression test additions:

 create or replace function f1()
 returns void as $$
 declare a1 int; a2 int;
 begin
   select 10,20 into a1;
 end;
 $$ language plpgsql;
 -- raise warning
 select plpgsql_check_function('f1()');
  plpgsql_check_function
 -
  warning:0:4:SQL statement:too many attributies for target variables
  Detail: There are less target variables than output columns in query.
  Hint: Check target variables in SELECT INTO statement
 (3 rows)

 Do we like this output format?  I don't.  The unlabeled, undocumented
 fields crammed into a single line with colon separators are neither
 readable nor useful.  If we actually need these fields, why aren't we
 splitting the output into multiple columns?  (I'm also wondering why
 the patch bothers with an option to emit this same info in XML.  Surely
 there is vanishingly small use-case for mechanical examination of this
 output.)

This format can be reduced, redesigned, changed. It is designed like
gcc output and optimized for using from psql console.

I tested table output - in original CHECK statement implementation,
but it is not too friendly for showing in monitor - it is just too
wide. There are similar arguments like using tabular output for
EXPLAIN, although there are higher complexity and nested 

Re: [HACKERS] sql_drop Event Triggerg

2013-03-26 Thread Alvaro Herrera
Tom Lane escribió:
 Alvaro Herrera alvhe...@2ndquadrant.com writes:
  Now there *is* one rather big performance problem in this patch, which
  is that it turns on collection of object dropped data regardless of
  there being event triggers that use the info at all.  That's a serious
  drawback and we're going to get complaints about it.  So we need to do
  something to fix that.
 
  One idea that comes to mind is to add some more things to the grammar,
  CREATE EVENT TRIGGER name ... WITH ('DROPPED OBJECTS');
 
 Uh ... surely we can just notice whether there's a trigger on the
 object-drop event?  I don't understand why we need *yet more*
 mechanism here.

There's no object-drop event, only ddl_command_end.  From previous
discussion I understood we didn't want a separate event, so that's what
we've been running with.

However, I think previous discussions have conflated many different
things, and we've been slowly fixing them one by one; so perhaps at this
point it does make sense to have a new object-drop event.  Let's discuss
it -- we would define it as taking place just before ddl_command_end,
and firing any time a command (with matching tag?) has called
performDeletion or performMultipleDeletions.  Does that sound okay?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sql_drop Event Triggerg

2013-03-26 Thread Robert Haas
On Tue, Mar 26, 2013 at 3:02 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 I tried this and it doesn't work.  The error pathways you speak about
 would be the xact.c entry points to commit and abort transactions;
 however, there's a problem with that because some of the commands that
 ProcessUtility() deals with have their own transaction management
 calls internally; so those would call CommitTransaction() and the
 event trigger state would go away, and then when control gets back to
 ProcessUtility there would be nothing to clean up.  I think we could
 ignore the problem, or install smarts in ProcessUtility to avoid calling
 event_trigger.c when one of those commands is involved -- but this seems
 to me a solution worse than the problem.  So all in all I feel like the
 macro on top of PG_TRY is the way to go.

I see.  :-(

 Now there *is* one rather big performance problem in this patch, which
 is that it turns on collection of object dropped data regardless of
 there being event triggers that use the info at all.  That's a serious
 drawback and we're going to get complaints about it.  So we need to do
 something to fix that.

Really?  Who is going to care about that?  Surely that overhead is
quite trivial.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spoonbill vs. -HEAD

2013-03-26 Thread Tom Lane
Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes:
 I finally started to investigate why spoonbill stopped reporting to the
 buildfarm feedback about 2 months ago.
 It seems that the foreign-keys locking patch (or something commity very
 close to January 23th) broke it in a fairly annoying way - running the
 buildfarm script seems to
 consistently stall during the isolationtester part of the regression
 testing leaving the postgresql instance running causing all future
 buildfarm runs to fail...

It looks from here like the isolationtester client is what's dropping
the ball --- the backend states are unsurprising, and two of them are
waiting for a new client command.  Can you get a stack trace from the
isolationtester process?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-03-26 Thread Robert Haas
On Tue, Mar 26, 2013 at 9:02 AM, Brendan Jurd dire...@gmail.com wrote:
 On 26 March 2013 22:57, Robert Haas robertmh...@gmail.com wrote:
 They hate it twice as much when the change is essentially cosmetic.
 There's no functional problems with arrays as they exist today that
 this change would solve.

 We can't sensibly test for whether an array is empty.  I'd call that a
 functional problem.

Sure you can.  Equality comparisons work just fine.

rhaas=# select '{}'::int4[] = '{}'::int4[];
 ?column?
--
 t
(1 row)

rhaas=# select '{}'::int4[] = '{1}'::int4[];
 ?column?
--
 f
(1 row)

 The NULL return from array_{length,lower,upper,ndims} is those
 functions' way of saying their arguments failed a sanity check.  So we
 cannot distinguish in a disciplined way between a valid, empty array,
 and bad arguments.  If the zero-D implementation had been more
 polished and say, array_ndims returned zero, we had provided an
 array_empty function, or the existing functions threw errors for silly
 arguments instead of returning NULL, then I'd be more inclined to see
 your point.  But as it stands, the zero-D implementation has always
 been half-baked and slightly broken, we just got used to working
 around it.

Well, you could easily change array_ndims() to error out if ARR_NDIM()
is negative or more than MAXDIM and return NULL only if it's exactly
0.  That wouldn't break backward compatibility because it would throw
an error only if fed a value that shouldn't ever exist in the first
place, short of a corrupted database.  I imagine the other functions
are amenable to similar treatment.

And if neither that nor just comparing against an empty array literal
floats your boat, adding an array_is_empty() function would let you
test for this condition without breaking backward compatibility, too.
That's overkill, I think, but it would work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] GSoC project : K-medoids clustering in Madlib

2013-03-26 Thread viod
Hello!

I'm an IT student, and I would like to apply for the 2013 GSoC.
I've been looking at this mailing list for a while now, and I saw a
suggestion for GSoC that particularly interested me: implementing the
K-medoids clustering in Madlib, as it is supposed to be more efficient than
the K-means algorithm.

I didn't know about these algorithms before, but I have documented myself,
and it looks quite interesting to me, and even more as I currently have
lessons (but very very simplified unfortunately).

I've got a few questions:
Won't this be a quite short project? I can't get an idea of how long it
would take me to implement this algorithm in a way that would be usable by
postgresql, but 3 months looks long for this task, doesn't it?

Someone on the IRC channel (can't remember who, sorry) told me it was used
in the KNN index. I guess this is used by pg_trgm, but are there other
modules using it currently?
And could you please give me some links explaining the internals of this
index? I've been through several articles presenting of it, but none very
satisfying.

Thanks a lot in advance!


Re: [HACKERS] sql_drop Event Triggerg

2013-03-26 Thread Alvaro Herrera
Robert Haas escribió:
 On Tue, Mar 26, 2013 at 3:02 PM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:

  Now there *is* one rather big performance problem in this patch, which
  is that it turns on collection of object dropped data regardless of
  there being event triggers that use the info at all.  That's a serious
  drawback and we're going to get complaints about it.  So we need to do
  something to fix that.
 
 Really?  Who is going to care about that?  Surely that overhead is
 quite trivial.

I don't think it is, because it involves syscache lookups for each
object being dropped, many extra pallocs, etc.  Surely that's many times
bigger than the PG_TRY overhead you were worried about.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spoonbill vs. -HEAD

2013-03-26 Thread Stefan Kaltenbrunner
On 03/26/2013 08:45 PM, Tom Lane wrote:
 Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes:
 I finally started to investigate why spoonbill stopped reporting to the
 buildfarm feedback about 2 months ago.
 It seems that the foreign-keys locking patch (or something commity very
 close to January 23th) broke it in a fairly annoying way - running the
 buildfarm script seems to
 consistently stall during the isolationtester part of the regression
 testing leaving the postgresql instance running causing all future
 buildfarm runs to fail...
 
 It looks from here like the isolationtester client is what's dropping
 the ball --- the backend states are unsurprising, and two of them are
 waiting for a new client command.  Can you get a stack trace from the
 isolationtester process?


https://www.kaltenbrunner.cc/files/isolationtester.txt


Stefan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add PF_PRINTF_ATTRIBUTE to on_exit_msg_fmt.

2013-03-26 Thread Kevin Grittner
Alvaro Herrera alvhe...@2ndquadrant.com wrote:

 Not happy with misc.c as a filename.

We already have two misc.c files:

src/backend/utils/adt/misc.c
src/interfaces/ecpg/ecpglib/misc.c

I much prefer not to repeat the same filename in different
directories if we can avoid it.

 How about pg_dump_utils.c or pg_dump_misc.c?

Those seem reasonable to me.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add PF_PRINTF_ATTRIBUTE to on_exit_msg_fmt.

2013-03-26 Thread Andres Freund
On 2013-03-26 13:14:53 -0700, Kevin Grittner wrote:
 Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 
  Not happy with misc.c as a filename.
 
 We already have two misc.c files:
 
 src/backend/utils/adt/misc.c
 src/interfaces/ecpg/ecpglib/misc.c
 
 I much prefer not to repeat the same filename in different
 directories if we can avoid it.
 
  How about pg_dump_utils.c or pg_dump_misc.c?
 
 Those seem reasonable to me.

I vote against including pg_ in the filename, for an implementation
private file that seems duplicative.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Drastic performance loss in assert-enabled build in HEAD

2013-03-26 Thread Tom Lane
Using HEAD's pg_dump, I see pg_dump -s regression taking 5 seconds.
On the other hand, running the same executable against the regression
database on a 9.2 postmaster takes 1.2 seconds.  Looks to me like we
broke something performance-wise.

A quick check with oprofile says it's all AllocSetCheck's fault:

samples  %image name   symbol name
883.6059  postgres AllocSetCheck
1140  1.0858  postgres base_yyparse
918   0.8744  postgres AllocSetAlloc
778   0.7410  postgres SearchCatCache
406   0.3867  postgres pg_strtok
394   0.3753  postgres hash_search_with_hash_value
387   0.3686  postgres core_yylex
373   0.3553  postgres MemoryContextCheck
256   0.2438  postgres nocachegetattr
231   0.2200  postgres ScanKeywordLookup
207   0.1972  postgres palloc

So maybe I'm nuts to care about the performance of an assert-enabled
backend, but I don't really find a 4X runtime degradation acceptable,
even for development work.  Does anyone want to fess up to having caused
this, or do I need to start tracking down what changed?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spoonbill vs. -HEAD

2013-03-26 Thread Tom Lane
Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes:
 On 03/26/2013 08:45 PM, Tom Lane wrote:
 It looks from here like the isolationtester client is what's dropping
 the ball --- the backend states are unsurprising, and two of them are
 waiting for a new client command.  Can you get a stack trace from the
 isolationtester process?

 https://www.kaltenbrunner.cc/files/isolationtester.txt

Hmm ... isolationtester.c:584 is in the code that tries to cancel the
current permutation (test case) after realizing that it's constructed
an invalid permutation.  It looks like the preceding PQcancel() failed
for some reason, since the waiting backend is still waiting.  The
isolationtester code doesn't bother to check for an error result there,
which is kinda bad, not that it's clear what it could do about it.
Could you look at the contents of the local variable buf in the
run_permutation stack frame?  Or else try modifying the code along the
lines of

-PQcancel(cancel, buf, sizeof(buf));
+if (!PQcancel(cancel, buf, sizeof(buf)))
+  fprintf(stderr, PQcancel failed: %s\n, buf);

and see if it prints anything interesting before hanging up.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-03-26 Thread Brendan Jurd
On 27 March 2013 06:47, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Mar 26, 2013 at 9:02 AM, Brendan Jurd dire...@gmail.com wrote:
 We can't sensibly test for whether an array is empty.  I'd call that a
 functional problem.

 Sure you can.  Equality comparisons work just fine.

 rhaas=# select '{}'::int4[] = '{}'::int4[];

The good news is, if anybody out there is using that idiom to test for
emptiness, they will not be disrupted by the change.

Cheers,
BJ


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-03-26 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Well, you could easily change array_ndims() to error out if ARR_NDIM()
 is negative or more than MAXDIM and return NULL only if it's exactly
 0.  That wouldn't break backward compatibility because it would throw
 an error only if fed a value that shouldn't ever exist in the first
 place, short of a corrupted database.  I imagine the other functions
 are amenable to similar treatment.

I haven't looked at the patch in detail, but I thought one of the key
changes was that '{}' would now be interpreted as a zero-length 1-D
array rather than a zero-D array.  If we do that it seems a bit moot
to argue about whether we should exactly preserve backwards-compatible
behavior in array_ndims(), because the input it's looking at won't be
the same anymore anyway.

In any case, the entire point of this proposal is that the current
behavior around zero-D arrays is *broken* and we don't want to be
backwards-compatible with it anymore.  So if you wish to argue against
that opinion, do so; but it seems a bit beside the point to simply
complain that backwards compatibility is being lost.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spoonbill vs. -HEAD

2013-03-26 Thread Andrew Dunstan


On 03/26/2013 02:50 PM, Stefan Kaltenbrunner wrote:

Hi all!


I finally started to investigate why spoonbill stopped reporting to the
buildfarm feedback about 2 months ago.
It seems that the foreign-keys locking patch (or something commity very
close to January 23th) broke it in a fairly annoying way - running the
buildfarm script seems to
consistently stall during the isolationtester part of the regression
testing leaving the postgresql instance running causing all future
buildfarm runs to fail...




There is some timeout code already in the buildfarm client. It was 
originally put there to help us when we got CVS hangs, a not infrequent 
occurrence in the early days, so it's currently only used if configured 
for the checkout phase, but it could easily be used to create a build 
timeout which would kill the whole process group if the timeout expired. 
It wouldn't work on Windows, and of course it won't solve whatever 
problem caused the hang in the first place, but it still might be worth 
doing.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch to add \watch to psql

2013-03-26 Thread Peter Eisentraut
On 3/24/13 3:10 PM, Tom Lane wrote:
 I also concur with the complaint here
 http://www.postgresql.org/message-id/caazkufzxyj-rt1aec6s0g7zm68tdlfbbm1r6hgrbbxnz80k...@mail.gmail.com
 that allowing a minimum sleep of 0 is rather dangerous

The original watch command apparently silently corrects a delay of 0
to 0.1 seconds.

 Another minor question is whether we really need the time-of-day in the
 banner,

That's also part of the original watch and occasionally useful, I think.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spoonbill vs. -HEAD

2013-03-26 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 There is some timeout code already in the buildfarm client. It was 
 originally put there to help us when we got CVS hangs, a not infrequent 
 occurrence in the early days, so it's currently only used if configured 
 for the checkout phase, but it could easily be used to create a build 
 timeout which would kill the whole process group if the timeout expired. 
 It wouldn't work on Windows, and of course it won't solve whatever 
 problem caused the hang in the first place, but it still might be worth 
 doing.

+1 --- at least then we'd get reports of failures, rather than the
current behavior where the animal just stops reporting.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Drastic performance loss in assert-enabled build in HEAD

2013-03-26 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:

 Using HEAD's pg_dump, I see pg_dump -s regression taking 5
 seconds.
 On the other hand, running the same executable against the regression
 database on a 9.2 postmaster takes 1.2 seconds.  Looks to me like we
 broke something performance-wise.

 A quick check with oprofile says it's all AllocSetCheck's fault:

 samples  %    image name  symbol name
 8    83.6059  postgres    AllocSetCheck
 1140  1.0858  postgres    base_yyparse
 918  0.8744  postgres    AllocSetAlloc
 778  0.7410  postgres    SearchCatCache
 406  0.3867  postgres    pg_strtok
 394  0.3753  postgres    hash_search_with_hash_value
 387  0.3686  postgres    core_yylex
 373  0.3553  postgres    MemoryContextCheck
 256  0.2438  postgres    nocachegetattr
 231  0.2200  postgres    ScanKeywordLookup
 207  0.1972  postgres    palloc

 So maybe I'm nuts to care about the performance of an assert-enabled
 backend, but I don't really find a 4X runtime degradation acceptable,
 even for development work.  Does anyone want to fess up to having caused
 this, or do I need to start tracking down what changed?

I checked master HEAD for a dump of regression and got about 4
seconds.  I checked right after my initial push of matview code and
got 2.5 seconds.  I checked just before that and got 1 second. 
There was some additional pg_dump work for matviews after the
initial push which may or may not account for the rest of the time.

Investigating now.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ignore invalid indexes in pg_dump

2013-03-26 Thread Tom Lane
Michael Paquier michael.paqu...@gmail.com writes:
 On top of checking indisvalid, I think that some additional checks on
 indislive and indisready are also necessary.

Those are not necessary, as an index that is marked indisvalid should
certainly also have those flags set.  If it didn't require making two
new version distinctions in getIndexes(), I'd be okay with the extra
checks; but as-is I think the maintenance pain this would add greatly
outweighs any likely value.

I've committed this in the simpler form that just adds indisvalid
checks to the appropriate version cases.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spoonbill vs. -HEAD

2013-03-26 Thread Stefan Kaltenbrunner
On 03/26/2013 09:33 PM, Tom Lane wrote:
 Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes:
 On 03/26/2013 08:45 PM, Tom Lane wrote:
 It looks from here like the isolationtester client is what's dropping
 the ball --- the backend states are unsurprising, and two of them are
 waiting for a new client command.  Can you get a stack trace from the
 isolationtester process?
 
 https://www.kaltenbrunner.cc/files/isolationtester.txt
 
 Hmm ... isolationtester.c:584 is in the code that tries to cancel the
 current permutation (test case) after realizing that it's constructed
 an invalid permutation.  It looks like the preceding PQcancel() failed
 for some reason, since the waiting backend is still waiting.  The
 isolationtester code doesn't bother to check for an error result there,
 which is kinda bad, not that it's clear what it could do about it.
 Could you look at the contents of the local variable buf in the
 run_permutation stack frame?  Or else try modifying the code along the
 lines of
 
 -PQcancel(cancel, buf, sizeof(buf));
 +if (!PQcancel(cancel, buf, sizeof(buf)))
 +  fprintf(stderr, PQcancel failed: %s\n, buf);
 
 and see if it prints anything interesting before hanging up.

hmm - will look into that in a bit - but I also just noticed that on the
same day spoonbill broke there was also a commit to that file
immediately before that code adding the fflush() calls.


Stefan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spoonbill vs. -HEAD

2013-03-26 Thread Tom Lane
Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes:
 hmm - will look into that in a bit - but I also just noticed that on the
 same day spoonbill broke there was also a commit to that file
 immediately before that code adding the fflush() calls.

It's hard to see how those would be related to this symptom.  My bet
is that the new fk-deadlock test exposed some pre-existing issue in
isolationtester.  Not quite clear what yet, though.

A different line of thought is that the cancel was received by the
backend but didn't succeed in cancelling the query for some reason.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ignore invalid indexes in pg_dump

2013-03-26 Thread Michael Paquier
On Wed, Mar 27, 2013 at 6:47 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Michael Paquier michael.paqu...@gmail.com writes:
  On top of checking indisvalid, I think that some additional checks on
  indislive and indisready are also necessary.

 Those are not necessary, as an index that is marked indisvalid should
 certainly also have those flags set.  If it didn't require making two
 new version distinctions in getIndexes(), I'd be okay with the extra
 checks; but as-is I think the maintenance pain this would add greatly
 outweighs any likely value.

 I've committed this in the simpler form that just adds indisvalid
 checks to the appropriate version cases.

Thanks.
-- 
Michael


Re: [HACKERS] regression test failed when enabling checksum

2013-03-26 Thread Jeff Davis
On Tue, 2013-03-26 at 02:50 +0900, Fujii Masao wrote:
 Hi,
 
 I found that the regression test failed when I created the database
 cluster with the checksum and set wal_level to archive. I think that
 there are some bugs around checksum feature. Attached is the regression.diff.

Thank you for the report. This was a significant oversight, but simple
to diagnose and fix.

There were several places that were doing something like:

   PageSetChecksumInplace
   if (use_wal)
  log_newpage
   smgrextend

Which is obviously wrong, because log_newpage set the LSN of the page,
invalidating the checksum. We need to set the checksum after
log_newpage.

Also, I noticed that copy_relation_data was doing smgrread without
validating the checksum (or page header, for that matter), so I also
fixed that.

Patch attached. Only brief testing done, so I might have missed
something. I will look more closely later.

Regards,
Jeff Davis
*** a/src/backend/access/heap/rewriteheap.c
--- b/src/backend/access/heap/rewriteheap.c
***
*** 273,286  end_heap_rewrite(RewriteState state)
  	/* Write the last page, if any */
  	if (state-rs_buffer_valid)
  	{
- 		PageSetChecksumInplace(state-rs_buffer, state-rs_blockno);
- 
  		if (state-rs_use_wal)
  			log_newpage(state-rs_new_rel-rd_node,
  		MAIN_FORKNUM,
  		state-rs_blockno,
  		state-rs_buffer);
  		RelationOpenSmgr(state-rs_new_rel);
  		smgrextend(state-rs_new_rel-rd_smgr, MAIN_FORKNUM, state-rs_blockno,
     (char *) state-rs_buffer, true);
  	}
--- 273,287 
  	/* Write the last page, if any */
  	if (state-rs_buffer_valid)
  	{
  		if (state-rs_use_wal)
  			log_newpage(state-rs_new_rel-rd_node,
  		MAIN_FORKNUM,
  		state-rs_blockno,
  		state-rs_buffer);
  		RelationOpenSmgr(state-rs_new_rel);
+ 
+ 		PageSetChecksumInplace(state-rs_buffer, state-rs_blockno);
+ 
  		smgrextend(state-rs_new_rel-rd_smgr, MAIN_FORKNUM, state-rs_blockno,
     (char *) state-rs_buffer, true);
  	}
***
*** 616,623  raw_heap_insert(RewriteState state, HeapTuple tup)
  		{
  			/* Doesn't fit, so write out the existing page */
  
- 			PageSetChecksumInplace(page, state-rs_blockno);
- 
  			/* XLOG stuff */
  			if (state-rs_use_wal)
  log_newpage(state-rs_new_rel-rd_node,
--- 617,622 
***
*** 632,637  raw_heap_insert(RewriteState state, HeapTuple tup)
--- 631,639 
  			 * end_heap_rewrite.
  			 */
  			RelationOpenSmgr(state-rs_new_rel);
+ 
+ 			PageSetChecksumInplace(page, state-rs_blockno);
+ 
  			smgrextend(state-rs_new_rel-rd_smgr, MAIN_FORKNUM,
  	   state-rs_blockno, (char *) page, true);
  
*** a/src/backend/commands/tablecmds.c
--- b/src/backend/commands/tablecmds.c
***
*** 51,56 
--- 51,57 
  #include commands/tablespace.h
  #include commands/trigger.h
  #include commands/typecmds.h
+ #include common/relpath.h
  #include executor/executor.h
  #include foreign/foreign.h
  #include miscadmin.h
***
*** 8902,8913  copy_relation_data(SMgrRelation src, SMgrRelation dst,
  
  		smgrread(src, forkNum, blkno, buf);
  
! 		PageSetChecksumInplace(page, blkno);
  
  		/* XLOG stuff */
  		if (use_wal)
  			log_newpage(dst-smgr_rnode.node, forkNum, blkno, page);
  
  		/*
  		 * Now write the page.	We say isTemp = true even if it's not a temp
  		 * rel, because there's no need for smgr to schedule an fsync for this
--- 8903,8923 
  
  		smgrread(src, forkNum, blkno, buf);
  
! 		if (!PageIsVerified(page, blkno))
! 			ereport(ERROR,
! 	(errcode(ERRCODE_DATA_CORRUPTED),
! 	 errmsg(invalid page in block %u of relation %s,
! 			blkno,
! 			relpathbackend(src-smgr_rnode.node,
! 		   src-smgr_rnode.backend,
! 		   forkNum;
  
  		/* XLOG stuff */
  		if (use_wal)
  			log_newpage(dst-smgr_rnode.node, forkNum, blkno, page);
  
+ 		PageSetChecksumInplace(page, blkno);
+ 
  		/*
  		 * Now write the page.	We say isTemp = true even if it's not a temp
  		 * rel, because there's no need for smgr to schedule an fsync for this

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] regression test failed when enabling checksum

2013-03-26 Thread Simon Riggs
On 25 March 2013 17:50, Fujii Masao masao.fu...@gmail.com wrote:

 I found that the regression test failed when I created the database
 cluster with the checksum and set wal_level to archive. I think that
 there are some bugs around checksum feature. Attached is the regression.diff.

Apologies for not responding to your original email, I must have missed that.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] regression test failed when enabling checksum

2013-03-26 Thread Simon Riggs
On 26 March 2013 23:23, Jeff Davis pg...@j-davis.com wrote:

 Patch attached. Only brief testing done, so I might have missed
 something. I will look more closely later.

Thanks, I'll look at that tomorrow.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Assertion failure when promoting node by deleting recovery.conf and restart node

2013-03-26 Thread Simon Riggs
On 25 March 2013 19:14, Heikki Linnakangas hlinnakan...@vmware.com wrote:

 Simon, can you comment on this?

Yes, will do.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC project : K-medoids clustering in Madlib

2013-03-26 Thread Atri Sharma
I suggested a couple of algorithms to be implemented in MADLib(apart
from K Medoids). You could pick some(or all) of them, which would
require 3 months to be completed.

As for more information on index, you can refer

http://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9.1

along with the postgres wiki. The wiki is the standard for anything postgres.

pg_trgm used KNN, but I believe it uses its own implementation of the
algorithm. The idea I proposed aims at writing an implementation in
the MADlib so that any client program can use the algorithm(s) in
their code directly, using MADlib functions.

Regards,

Atri

On 3/26/13, viod viod@gmail.com wrote:
 Hello!

 I'm an IT student, and I would like to apply for the 2013 GSoC.
 I've been looking at this mailing list for a while now, and I saw a
 suggestion for GSoC that particularly interested me: implementing the
 K-medoids clustering in Madlib, as it is supposed to be more efficient than
 the K-means algorithm.

 I didn't know about these algorithms before, but I have documented myself,
 and it looks quite interesting to me, and even more as I currently have
 lessons (but very very simplified unfortunately).

 I've got a few questions:
 Won't this be a quite short project? I can't get an idea of how long it
 would take me to implement this algorithm in a way that would be usable by
 postgresql, but 3 months looks long for this task, doesn't it?

 Someone on the IRC channel (can't remember who, sorry) told me it was used
 in the KNN index. I guess this is used by pg_trgm, but are there other
 modules using it currently?
 And could you please give me some links explaining the internals of this
 index? I've been through several articles presenting of it, but none very
 satisfying.

 Thanks a lot in advance!



-- 
Regards,

Atri
*l'apprenant*


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC project : K-medoids clustering in Madlib

2013-03-26 Thread Tom Lane
Atri Sharma atri.j...@gmail.com writes:
 I suggested a couple of algorithms to be implemented in MADLib(apart
 from K Medoids). You could pick some(or all) of them, which would
 require 3 months to be completed.

 As for more information on index, you can refer

 http://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9.1

 along with the postgres wiki. The wiki is the standard for anything postgres.

 pg_trgm used KNN, but I believe it uses its own implementation of the
 algorithm. The idea I proposed aims at writing an implementation in
 the MADlib so that any client program can use the algorithm(s) in
 their code directly, using MADlib functions.

I'm a bit confused as to why this is being proposed as a
Postgres-related project.  I don't even know what MADlib is, but I'm
pretty darn sure that no part of Postgres uses it.  KNNGist certainly
doesn't.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC project : K-medoids clustering in Madlib

2013-03-26 Thread Daniel Farina
On Tue, Mar 26, 2013 at 10:27 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Atri Sharma atri.j...@gmail.com writes:
 I suggested a couple of algorithms to be implemented in MADLib(apart
 from K Medoids). You could pick some(or all) of them, which would
 require 3 months to be completed.

 As for more information on index, you can refer

 http://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9.1

 along with the postgres wiki. The wiki is the standard for anything postgres.

 pg_trgm used KNN, but I believe it uses its own implementation of the
 algorithm. The idea I proposed aims at writing an implementation in
 the MADlib so that any client program can use the algorithm(s) in
 their code directly, using MADlib functions.

 I'm a bit confused as to why this is being proposed as a
 Postgres-related project.  I don't even know what MADlib is, but I'm
 pretty darn sure that no part of Postgres uses it.  KNNGist certainly
 doesn't.

It's a reasonably well established extension for Postgres for
statistical and machine learning methods.  Rather neat, but as you
indicate, it's not part of Postgres proper.

http://madlib.net/

https://github.com/madlib/madlib/

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers