date:20120317

Re: [HACKERS] incompatible pointer types with newer zlib

2012-03-17 Thread Peter Eisentraut

On tor, 2012-03-01 at 19:19 +0200, Peter Eisentraut wrote:
> I think the best fix would be to rearrange _PrintFileData() so that it
> doesn't use FH at all.  Instead, we could define a separate
> ArchiveHandle field IF that works more like OF, and then change
> ahwrite() to use that.

Here is a patch that might fix this.  I haven't been able to test this
properly, so this is just from tracing the code.  It looks like
_PrintFileData() doesn't need to use FH at all, so it could use a local
file handle variable instead.  Could someone verify this please?

diff --git i/src/bin/pg_dump/pg_backup_files.c w/src/bin/pg_dump/pg_backup_files.c
index a7fd91d..32b2a32 100644
--- i/src/bin/pg_dump/pg_backup_files.c
+++ w/src/bin/pg_dump/pg_backup_files.c
@@ -293,27 +293,32 @@ _PrintFileData(ArchiveHandle *AH, char *filename, RestoreOptions *ropt)
 {
 	char		buf[4096];
 	size_t		cnt;
+#ifdef HAVE_LIBZ
+	gzFile		fh;
+#else
+	FILE	   *fh;
+#endif
 
 	if (!filename)
 		return;
 
 #ifdef HAVE_LIBZ
-	AH->FH = gzopen(filename, "rb");
+	fh = gzopen(filename, "rb");
 #else
-	AH->FH = fopen(filename, PG_BINARY_R);
+	fh = fopen(filename, PG_BINARY_R);
 #endif
 
-	if (AH->FH == NULL)
+	if (!fh)
 		die_horribly(AH, modulename, "could not open input file \"%s\": %s\n",
 	 filename, strerror(errno));
 
-	while ((cnt = GZREAD(buf, 1, 4095, AH->FH)) > 0)
+	while ((cnt = GZREAD(buf, 1, 4095, fh)) > 0)
 	{
 		buf[cnt] = '\0';
 		ahwrite(buf, 1, cnt, AH);
 	}
 
-	if (GZCLOSE(AH->FH) != 0)
+	if (GZCLOSE(fh) != 0)
 		die_horribly(AH, modulename, "could not close data file after reading\n");
 }
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Regarding column reordering project for GSoc 2012

2012-03-17 Thread Atri Sharma

Hi All,

I am Atri Sharma.I am a C developer with experience in database
designing and database management systems.I have worked with
MySql,Oracle and other database systems.I have designed databases for
various projects as well.

I would be interested in developing the column reordering project by a
single ALTER command.I have an idea in mind that would solve
theproblem in a very easy way.

Please let me know how to proceed further on this.

My IRC handle is digitalknight.I work on IST.

-- 
Regards,

Atri
l'apprenant

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Incorrect assumptions with low LIMITs

2012-03-17 Thread Simon Riggs

On Fri, Mar 16, 2012 at 9:11 PM, Tom Lane  wrote:

> Simon Riggs  writes:
>> 2. We assume that if values do exist that they have rows uniformly
>> distributed across the whole table like rungs on a ladder.
>
> Well, yeah.  That's sometimes wrong, but not always.  In the absence
> of evidence to the contrary, I think it's a better assumption than
> most others.

The evidence I showed proves it is x1000 worse. I'm not sure how you
say it is "better than most" or that there is no evidence.

> You are not the first person to have run into this type of problem.

Yes - if I thought it was an isolated problem I would not have brought it up.

Also, I don't bring it up because I need help; the actual problem has
workarounds. But rewriting SQL because of a planner problem is not
something that highlights our strengths.

> If it were easily solved by some minor hack, we would have done that
> long since.  The problem is to not throw the baby out with the
> bathwater.
>
> I can see an argument for downgrading the assumed effectiveness of
> restrictions that are applied as qpquals (ie, not indexquals), but
> the LIMIT node coster doesn't have easy access to the information
> needed for that, especially not in situations more complicated than
> a single-table scan.

My wish was to register this as both a common and significant bug,
whatever the solution is.

The bug appears in very simple queries, so we can't hide behind some
doubt as to the exact cause.

Few planning errors are wrong by more than 1000 times; this plan gets
worse every time the client gains new customers, so its not just bad,
its getting worse. Multiple different queries show the same symptoms,
so its common also.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Incorrect assumptions with low LIMITs

2012-03-17 Thread Greg Stark

On Sat, Mar 17, 2012 at 9:34 AM, Simon Riggs  wrote:
> My wish was to register this as both a common and significant bug,

It has definitely come up here before many times.

However at root the problem is part of the general class of not
understanding how two different columns are related. Postgres is
assuming they're entirely independent and therefore all the values of
x are uniformly distributed over values of y.

To plan this better in your case it would have to know that blah_id <=
72572020 is not equally likely for user_id = ANY
('{list}'::integer[]) as it is for the table as a whole.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Incorrect assumptions with low LIMITs

2012-03-17 Thread Simon Riggs

On Sat, Mar 17, 2012 at 11:33 AM, Greg Stark  wrote:
> On Sat, Mar 17, 2012 at 9:34 AM, Simon Riggs  wrote:
>> My wish was to register this as both a common and significant bug,
>
> It has definitely come up here before many times.
>
> However at root the problem is part of the general class of not
> understanding how two different columns are related. Postgres is
> assuming they're entirely independent and therefore all the values of
> x are uniformly distributed over values of y.
>
> To plan this better in your case it would have to know that blah_id <=
> 72572020 is not equally likely for user_id = ANY
> ('{list}'::integer[]) as it is for the table as a whole.

That is not the root cause in this case, though I agree that is also a
problem. The first plan is more complex because of an ORDER BY that
favours the index scan, but that's actually irrelevant to the case;
the use of blah_id is actually the user using the fact that things are
allocated in date order to avoid doing a date sort.

The problems are as I described them

(1) no account made for sparsity, and other factors leading to an
overestimate of rows (N)

(2) inappropriate assumption of the effect of LIMIT m, which causes a
costly SeqScan to appear better than an IndexScan for low m/N, when in
fact that is seldom the case.

Overestimating N in (1) inverts the problem, so that an overestimate
isn't the safe thing at all.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] github notification spam

2012-03-17 Thread Bruce Momjian

After my pg_upgrade commit yesterday, I started receiving dozens of spam
emails from github.  I am not sure if it was because I was the
committer, or because I am subscribed to the github postgres feed.

Anyway, the spam has a URL at the bottom --- if you click on
notifications on that page, you can see the spam, and if you click on
the user name, and then under the tools gear icon, you can block the
user or report them for spam (I recommend both).

Eventually you will get an email stating they are investigating the
user.  I assume they will eventually figure out how to block this, but
for now, I thought other github subscribers and committers should know
about the problem.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] EquivalenceClasses and subqueries and PlaceHolderVars, oh my

2012-03-17 Thread Tom Lane

Greg Stark  writes:
> On Fri, Mar 16, 2012 at 3:16 PM, Tom Lane  wrote:
> So I now propose reverting the earlier two patches (but not their
> regression test cases of course) and instead hacking MergeAppend plan
> building as per (2).

> As a wise man once said, "This is tricky stuff". I feel a better that
> I got stuck on this stuff when you're still trying to feel your way
> after this many go-arounds.

Well, looking back on it, I feel this was at bottom a documentation
failure.  I think that when I wrote the EquivalenceClass code, I knew
that "child" members did not have similar semantics to regular members.
But I had forgotten that when Teodor reported the MergeAppend bug,
and so misdiagnosed what I was seeing happen as being corruption of
the EC contents, when it wasn't really.  I added some documentation
around this point in the patch I committed yesterday...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Another review of URI for libpq, v7 submission

2012-03-17 Thread Marko Kreen

On Thu, Mar 15, 2012 at 11:29:31PM +0200, Alex wrote:
> https://github.com/a1exsh/postgres/commits/uri

The point of the patch is to have one string with all connection options,
in standard format, yes?  So why does not this work:

  db = PQconnectdb("postgres://localhost");

?

-- 
marko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] github notification spam

2012-03-17 Thread Magnus Hagander

On Sat, Mar 17, 2012 at 14:46, Bruce Momjian  wrote:
> After my pg_upgrade commit yesterday, I started receiving dozens of spam
> emails from github.  I am not sure if it was because I was the
> committer, or because I am subscribed to the github postgres feed.
>
> Anyway, the spam has a URL at the bottom --- if you click on
> notifications on that page, you can see the spam, and if you click on
> the user name, and then under the tools gear icon, you can block the
> user or report them for spam (I recommend both).
>
> Eventually you will get an email stating they are investigating the
> user.  I assume they will eventually figure out how to block this, but
> for now, I thought other github subscribers and committers should know
> about the problem.

I have had this issue for well over a week by now. They don't seem
particularly keen on actually fixing it - they seem mostly happy with
having it removed from their website after the fact.

I find the more efficient solution to go into your own github account
settings under "notification center" and just turn it all off.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Command Triggers, patch v11

2012-03-17 Thread Tom Lane

Andres Freund  writes:
> On Friday, March 16, 2012 10:52:55 PM Tom Lane wrote:
>> Something else I just came across is that there are assorted places that
>> are aware that ExplainStmt contains a Query, eg setrefs.c, plancache.c,
>> and those have got to treat CreateTableAsStmt similarly.

> Hm. Is that so? As implemented in my version the planner just sees a plain 
> statement instead of a utility statement. Am I missing something?

Well, the work flow for EXPLAIN is:

parse analysis: recursively do parse analysis on contained query
plan: do nothing
execution: call planner on contained query, then optionally run it

and the reason for doing it that way is explained by
transformExplainStmt:

 * EXPLAIN is like other utility statements in that we emit it as a
 * CMD_UTILITY Query node; however, we must first transform the contained
 * query.  We used to postpone that until execution, but it's really necessary
 * to do it during the normal parse analysis phase to ensure that side effects
 * of parser hooks happen at the expected time.

ISTM that argument applies just as much to CREATE TABLE AS, especially
in view of the fact that we're restructuring the SELECT INTO case, in
which parse analysis of the SELECT certainly does happen early.  It's
also not clear to me why it wouldn't apply to COPY (SELECT ...).

I'm not going to touch the COPY (SELECT ...) issue right now, but
somebody ought to go back and check up on the exact user-visible bugs
that motivated moving EXPLAIN's parse analysis processing.  (I suspect
it had to do with plpgsql variable processing, but too lazy to go look
right now.)  If there's a plausible use case where similar bugs could
be exhibited in COPY, we're going to have to restructure that too.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] github notification spam

2012-03-17 Thread Magnus Hagander

On Sat, Mar 17, 2012 at 15:52, Magnus Hagander  wrote:
> On Sat, Mar 17, 2012 at 14:46, Bruce Momjian  wrote:
>> After my pg_upgrade commit yesterday, I started receiving dozens of spam
>> emails from github.  I am not sure if it was because I was the
>> committer, or because I am subscribed to the github postgres feed.
>>
>> Anyway, the spam has a URL at the bottom --- if you click on
>> notifications on that page, you can see the spam, and if you click on
>> the user name, and then under the tools gear icon, you can block the
>> user or report them for spam (I recommend both).
>>
>> Eventually you will get an email stating they are investigating the
>> user.  I assume they will eventually figure out how to block this, but
>> for now, I thought other github subscribers and committers should know
>> about the problem.
>
> I have had this issue for well over a week by now. They don't seem
> particularly keen on actually fixing it - they seem mostly happy with
> having it removed from their website after the fact.
>
> I find the more efficient solution to go into your own github account
> settings under "notification center" and just turn it all off.

As a followup since I got a question on IM - it appears to mostly be
"comments on my commits" that's causing the spam.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] github notification spam

2012-03-17 Thread Bruce Momjian

On Sat, Mar 17, 2012 at 03:52:39PM +0100, Magnus Hagander wrote:
> On Sat, Mar 17, 2012 at 14:46, Bruce Momjian  wrote:
> > After my pg_upgrade commit yesterday, I started receiving dozens of spam
> > emails from github.  I am not sure if it was because I was the
> > committer, or because I am subscribed to the github postgres feed.
> >
> > Anyway, the spam has a URL at the bottom --- if you click on
> > notifications on that page, you can see the spam, and if you click on
> > the user name, and then under the tools gear icon, you can block the
> > user or report them for spam (I recommend both).
> >
> > Eventually you will get an email stating they are investigating the
> > user.  I assume they will eventually figure out how to block this, but
> > for now, I thought other github subscribers and committers should know
> > about the problem.
> 
> I have had this issue for well over a week by now. They don't seem
> particularly keen on actually fixing it - they seem mostly happy with
> having it removed from their website after the fact.
> 
> I find the more efficient solution to go into your own github account
> settings under "notification center" and just turn it all off.

Yes, I just did turn it off because it happened again this morning.  I
was a little worried I might miss legitmate email, 

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] github notification spam

2012-03-17 Thread Bruce Momjian

On Sat, Mar 17, 2012 at 04:05:44PM +0100, Magnus Hagander wrote:
> On Sat, Mar 17, 2012 at 15:52, Magnus Hagander  wrote:
> > On Sat, Mar 17, 2012 at 14:46, Bruce Momjian  wrote:
> >> After my pg_upgrade commit yesterday, I started receiving dozens of spam
> >> emails from github.  I am not sure if it was because I was the
> >> committer, or because I am subscribed to the github postgres feed.
> >>
> >> Anyway, the spam has a URL at the bottom --- if you click on
> >> notifications on that page, you can see the spam, and if you click on
> >> the user name, and then under the tools gear icon, you can block the
> >> user or report them for spam (I recommend both).
> >>
> >> Eventually you will get an email stating they are investigating the
> >> user.  I assume they will eventually figure out how to block this, but
> >> for now, I thought other github subscribers and committers should know
> >> about the problem.
> >
> > I have had this issue for well over a week by now. They don't seem
> > particularly keen on actually fixing it - they seem mostly happy with
> > having it removed from their website after the fact.
> >
> > I find the more efficient solution to go into your own github account
> > settings under "notification center" and just turn it all off.
> 
> As a followup since I got a question on IM - it appears to mostly be
> "comments on my commits" that's causing the spam.

How does gihub know that the commit made to the Postgres source is the
same user name as my github account name?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] github notification spam

2012-03-17 Thread Magnus Hagander

On Sat, Mar 17, 2012 at 16:05, Bruce Momjian  wrote:
> On Sat, Mar 17, 2012 at 03:52:39PM +0100, Magnus Hagander wrote:
>> On Sat, Mar 17, 2012 at 14:46, Bruce Momjian  wrote:
>> > After my pg_upgrade commit yesterday, I started receiving dozens of spam
>> > emails from github.  I am not sure if it was because I was the
>> > committer, or because I am subscribed to the github postgres feed.
>> >
>> > Anyway, the spam has a URL at the bottom --- if you click on
>> > notifications on that page, you can see the spam, and if you click on
>> > the user name, and then under the tools gear icon, you can block the
>> > user or report them for spam (I recommend both).
>> >
>> > Eventually you will get an email stating they are investigating the
>> > user.  I assume they will eventually figure out how to block this, but
>> > for now, I thought other github subscribers and committers should know
>> > about the problem.
>>
>> I have had this issue for well over a week by now. They don't seem
>> particularly keen on actually fixing it - they seem mostly happy with
>> having it removed from their website after the fact.
>>
>> I find the more efficient solution to go into your own github account
>> settings under "notification center" and just turn it all off.
>
> Yes, I just did turn it off because it happened again this morning.  I
> was a little worried I might miss legitmate email,

Yeah, I'd be more worried if I was actually using all the github tools
for development. I just use it for code storage and such. The postgres
method of just using email lets the regular email tools do their job
on the spam that undoubtedly are created ther etoo, but they work much
better.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] github notification spam

2012-03-17 Thread Magnus Hagander

On Sat, Mar 17, 2012 at 16:06, Bruce Momjian  wrote:
> On Sat, Mar 17, 2012 at 04:05:44PM +0100, Magnus Hagander wrote:
>> On Sat, Mar 17, 2012 at 15:52, Magnus Hagander  wrote:
>> > On Sat, Mar 17, 2012 at 14:46, Bruce Momjian  wrote:
>> >> After my pg_upgrade commit yesterday, I started receiving dozens of spam
>> >> emails from github.  I am not sure if it was because I was the
>> >> committer, or because I am subscribed to the github postgres feed.
>> >>
>> >> Anyway, the spam has a URL at the bottom --- if you click on
>> >> notifications on that page, you can see the spam, and if you click on
>> >> the user name, and then under the tools gear icon, you can block the
>> >> user or report them for spam (I recommend both).
>> >>
>> >> Eventually you will get an email stating they are investigating the
>> >> user.  I assume they will eventually figure out how to block this, but
>> >> for now, I thought other github subscribers and committers should know
>> >> about the problem.
>> >
>> > I have had this issue for well over a week by now. They don't seem
>> > particularly keen on actually fixing it - they seem mostly happy with
>> > having it removed from their website after the fact.
>> >
>> > I find the more efficient solution to go into your own github account
>> > settings under "notification center" and just turn it all off.
>>
>> As a followup since I got a question on IM - it appears to mostly be
>> "comments on my commits" that's causing the spam.
>
> How does gihub know that the commit made to the Postgres source is the
> same user name as my github account name?

They both have the same email address. Remember that git commit
authors are User Name , not just userid like back in cvs
days.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] github notification spam

2012-03-17 Thread Bruce Momjian

On Sat, Mar 17, 2012 at 04:08:08PM +0100, Magnus Hagander wrote:
> On Sat, Mar 17, 2012 at 16:06, Bruce Momjian  wrote:
> > On Sat, Mar 17, 2012 at 04:05:44PM +0100, Magnus Hagander wrote:
> >> On Sat, Mar 17, 2012 at 15:52, Magnus Hagander  wrote:
> >> > On Sat, Mar 17, 2012 at 14:46, Bruce Momjian  wrote:
> >> >> After my pg_upgrade commit yesterday, I started receiving dozens of spam
> >> >> emails from github.  I am not sure if it was because I was the
> >> >> committer, or because I am subscribed to the github postgres feed.
> >> >>
> >> >> Anyway, the spam has a URL at the bottom --- if you click on
> >> >> notifications on that page, you can see the spam, and if you click on
> >> >> the user name, and then under the tools gear icon, you can block the
> >> >> user or report them for spam (I recommend both).
> >> >>
> >> >> Eventually you will get an email stating they are investigating the
> >> >> user.  I assume they will eventually figure out how to block this, but
> >> >> for now, I thought other github subscribers and committers should know
> >> >> about the problem.
> >> >
> >> > I have had this issue for well over a week by now. They don't seem
> >> > particularly keen on actually fixing it - they seem mostly happy with
> >> > having it removed from their website after the fact.
> >> >
> >> > I find the more efficient solution to go into your own github account
> >> > settings under "notification center" and just turn it all off.
> >>
> >> As a followup since I got a question on IM - it appears to mostly be
> >> "comments on my commits" that's causing the spam.
> >
> > How does gihub know that the commit made to the Postgres source is the
> > same user name as my github account name?
> 
> They both have the same email address. Remember that git commit
> authors are User Name , not just userid like back in cvs
> days.

Yes, but I am surprised they are doing email linking with Postgres
community commits.   It is a creative idea, but not something I would
think people would think of.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] github notification spam

2012-03-17 Thread Magnus Hagander

On Sat, Mar 17, 2012 at 16:39, Bruce Momjian  wrote:
> On Sat, Mar 17, 2012 at 04:08:08PM +0100, Magnus Hagander wrote:
>> On Sat, Mar 17, 2012 at 16:06, Bruce Momjian  wrote:
>> > On Sat, Mar 17, 2012 at 04:05:44PM +0100, Magnus Hagander wrote:
>> >> On Sat, Mar 17, 2012 at 15:52, Magnus Hagander  
>> >> wrote:
>> >> > On Sat, Mar 17, 2012 at 14:46, Bruce Momjian  wrote:
>> >> >> After my pg_upgrade commit yesterday, I started receiving dozens of 
>> >> >> spam
>> >> >> emails from github.  I am not sure if it was because I was the
>> >> >> committer, or because I am subscribed to the github postgres feed.
>> >> >>
>> >> >> Anyway, the spam has a URL at the bottom --- if you click on
>> >> >> notifications on that page, you can see the spam, and if you click on
>> >> >> the user name, and then under the tools gear icon, you can block the
>> >> >> user or report them for spam (I recommend both).
>> >> >>
>> >> >> Eventually you will get an email stating they are investigating the
>> >> >> user.  I assume they will eventually figure out how to block this, but
>> >> >> for now, I thought other github subscribers and committers should know
>> >> >> about the problem.
>> >> >
>> >> > I have had this issue for well over a week by now. They don't seem
>> >> > particularly keen on actually fixing it - they seem mostly happy with
>> >> > having it removed from their website after the fact.
>> >> >
>> >> > I find the more efficient solution to go into your own github account
>> >> > settings under "notification center" and just turn it all off.
>> >>
>> >> As a followup since I got a question on IM - it appears to mostly be
>> >> "comments on my commits" that's causing the spam.
>> >
>> > How does gihub know that the commit made to the Postgres source is the
>> > same user name as my github account name?
>>
>> They both have the same email address. Remember that git commit
>> authors are User Name , not just userid like back in cvs
>> days.
>
> Yes, but I am surprised they are doing email linking with Postgres
> community commits.   It is a creative idea, but not something I would
> think people would think of.

It's because we push a mirror of our git repository to github. they
don't track our actual community repository, just the mirror.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Gsoc2012 Idea --- Social Network database schema

2012-03-17 Thread HuangQi

Hi, all
   I am a student of Computer Science and Applied Math in a university in
Singapore. I'm planning to join Google Summer Code 2012 on PostgreSQL. It's
quite an honor to join the postgresql hacker community.
   I have some postgresql developing experience while doing my school
project. I'm doing Final Year Project(FYP) in this year, which is focusing
on database and will be implemented in postgresql. My prof worked with some
PhD students on social network database. They designed a database schema,
called RSN (relationship on social network). My FYP is about studying the
properties of RSN and try to implement more efficient optimisation
algorithms for database on RSN. Currently I'm having some ideas on RSN.
This schema reveals a property called acyclic property, and it preserves
some special join property. The acyclic database schema was well studied,
and there are many papers on it. I expect full working on this topic in
three months would be enough to get idea as well as implementation.
Please help me check whether this could become a topic for Postgres
community. It focuses on a particular schema of social network, and this
schema, to tell the truth, is still not quite mature yet. If this topic is
not possible, I would like to check some other topics on query processing
and optimisation, and doing the project together with my FYP in this
summer. If you have some good idea on the proper topic, please recommand to
me.
I'm quite glad if you could offer me some advices. Thanks a lot for
your help!

-- 
Best Regards
Huang Qi Victor

Re: [HACKERS] github notification spam

2012-03-17 Thread Bruce Momjian

On Sat, Mar 17, 2012 at 04:40:52PM +0100, Magnus Hagander wrote:
> > Yes, but I am surprised they are doing email linking with Postgres
> > community commits.   It is a creative idea, but not something I would
> > think people would think of.
> 
> It's because we push a mirror of our git repository to github. they
> don't track our actual community repository, just the mirror.

Right, but my point is they are linking email addresses from our git
commits to registered github user email addresses.  They are sending me
email because I committed to our community tree, and happen to have a
github account with the same email address.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Regarding column reordering project for GSoc 2012

2012-03-17 Thread Andrew Dunstan




On 03/17/2012 04:39 AM, Atri Sharma wrote:

Hi All,

I am Atri Sharma.I am a C developer with experience in database
designing and database management systems.I have worked with
MySql,Oracle and other database systems.I have designed databases for
various projects as well.

I would be interested in developing the column reordering project by a
single ALTER command.I have an idea in mind that would solve
theproblem in a very easy way.

Please let me know how to proceed further on this.




You should review the history on this. See 
 
as a good place to start.



You should be aware that anything that doesn't follow the previous 
discussions is likely to be rejected. But if you can do something that 
does implement what was previously discussed, that would be awesome.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] initdb and fsync

2012-03-17 Thread Cédric Villemain

Le vendredi 16 mars 2012 16:51:04, Andres Freund a écrit :
> On Friday, March 16, 2012 04:47:06 PM Robert Haas wrote:
> > On Fri, Mar 16, 2012 at 6:25 AM, Andres Freund  wrote:
> > >> > How are the results with sync_file_range(fd, 0, 0,
> > >> > SYNC_FILE_RANGE_WRITE)?
> > >> 
> > >> That is much faster than using fadvise. It goes down to ~2s.
> > >> 
> > >> Unfortunately, that's non-portable. Any other ideas? 6.5s a little on
> > >> the annoying side (and causes some disconcerting sounds to come from
> > >> my disk), especially when we _know_ it can be done in 2s.
> > > 
> > > Its not like posix_fadvise is actually portable. So I personally don't
> > > see a problem with that, but...
> > 
> > Well, sync_file_range only works on Linux, and will probably never
> > work anywhere else.  posix_fadvise() at least has a chance of being
> > supported on other platforms, being a standard and all that.  Though I
> > see that my Mac has neither.  :-(
> 
> I would suggest adding a wrapper function like:
> pg_hint_writeback_flush(fd, off, len);
> 
> which then is something like
> 
> #if HAVE_SYNC_FILE_RANGE
> sync_file_range(fd, off, len, SYNC_FILE_RANGE_WRITE);
> #elseif HAVE_POSIX_FADVISE
> posix_fadvise(fd, off, len, POSIX_FADV_DONTNEED);
> #else
> #endif
> 
> To my knowledge posix_fadvise currently is only supported on linux btw...

I agree with Andres.

I believe we should use sync_file_range (_before?) with linux.
And we can use posix_fadvise_dontneed on other kernels.

FADVISE_DONTNEED does start a writeback (it may decide not to do it too), but 
the primer objective of posix_fadvise_dontneed is not to make sync() faster. 
We just have writeback and sync() calls challenged together and we can face 
situation where linux does not handle that so well. (depends on linux 2.6.18 
or 32 or 3.2 or ...)


-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

Re: [HACKERS] Regarding column reordering project for GSoc 2012

2012-03-17 Thread Atri Sharma

Thanks Andrew.

I went through the link. Here is my idea:

I believe we can build a "middle" layer that encapsulates the physical
details from the user and gives him/her a VIEW that is actually generated by
the system itself. That means, we have a system such as:

LOGICAL LAYER
--
MIDDLE LAYER
--
PHYSICAL LAYER.

Now, the punch would be, the middle layer would be on the client. The middle
layer is more of an interface for the user which he/she can completely
customize.

Now, what is actually happening is that each component is mapped to their
physical address. I am planning to make a struct which will be the mapping
between the physical address and the logical address.I have not yet decided
on the design of the struct,but I plan an id , the type and the physical
address.When a component is used in the middle layer,it is represented in an
instantiation of the struct which is unchangeable throughout.Whenever we
need to customize the middle layer,we just need to reorder the components.

For our specific case,the reordering of table columns,we need not change the
schema or the physical ordering.All we need to do is,reorder the
components.One ALTER statement(with added customized clauses) will do the
reordering.The mapping remains constant,because,the struct instantiations of
the components remain constant,hence,the mapping remains constant and we can
access the physical address of any component at any time.

Please let me know your feedback on the idea.

Atri

-Original Message-
From: Andrew Dunstan [mailto:and...@dunslane.net] 
Sent: 17 March 2012 21:30
To: Atri Sharma
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Regarding column reordering project for GSoc 2012

On 03/17/2012 04:39 AM, Atri Sharma wrote:
> Hi All,
>
> I am Atri Sharma.I am a C developer with experience in database
> designing and database management systems.I have worked with
> MySql,Oracle and other database systems.I have designed databases for
> various projects as well.
>
> I would be interested in developing the column reordering project by a
> single ALTER command.I have an idea in mind that would solve
> theproblem in a very easy way.
>
> Please let me know how to proceed further on this.
>

You should review the history on this. See 

as a good place to start.

You should be aware that anything that doesn't follow the previous 
discussions is likely to be rejected. But if you can do something that 
does implement what was previously discussed, that would be awesome.

cheers

andrew

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Regarding column reordering project for GSoc 2012

2012-03-17 Thread Atri Sharma

Hi all,

In follow up to my last mail, attached below, for performance, we can build
a graph with edges having weight in sense of performance wise and padding
saving wise . Then , we can build a minimal spanning tree, and the tree
generated would be our order of columns . When any changes are made ,
changes in the graph will be done and minimal spanning tree would be
recomputed.

Atri

-Original Message-
From: Atri Sharma [mailto:atri.j...@gmail.com] 
Sent: 17 March 2012 22:43
To: 'Andrew Dunstan'
Cc: 'pgsql-hackers@postgresql.org'
Subject: RE: [HACKERS] Regarding column reordering project for GSoc 2012

Thanks Andrew.

I went through the link. Here is my idea:

I believe we can build a "middle" layer that encapsulates the physical
details from the user and gives him/her a VIEW that is actually generated by
the system itself. That means, we have a system such as:

LOGICAL LAYER
--
MIDDLE LAYER
--
PHYSICAL LAYER.

Now, the punch would be, the middle layer would be on the client. The middle
layer is more of an interface for the user which he/she can completely
customize.

Now, what is actually happening is that each component is mapped to their
physical address. I am planning to make a struct which will be the mapping
between the physical address and the logical address.I have not yet decided
on the design of the struct,but I plan an id , the type and the physical
address.When a component is used in the middle layer,it is represented in an
instantiation of the struct which is unchangeable throughout.Whenever we
need to customize the middle layer,we just need to reorder the components.

For our specific case,the reordering of table columns,we need not change the
schema or the physical ordering.All we need to do is,reorder the
components.One ALTER statement(with added customized clauses) will do the
reordering.The mapping remains constant,because,the struct instantiations of
the components remain constant,hence,the mapping remains constant and we can
access the physical address of any component at any time.

Please let me know your feedback on the idea.

Atri

-Original Message-
From: Andrew Dunstan [mailto:and...@dunslane.net] 
Sent: 17 March 2012 21:30
To: Atri Sharma
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Regarding column reordering project for GSoc 2012

On 03/17/2012 04:39 AM, Atri Sharma wrote:
> Hi All,
>
> I am Atri Sharma.I am a C developer with experience in database
> designing and database management systems.I have worked with
> MySql,Oracle and other database systems.I have designed databases for
> various projects as well.
>
> I would be interested in developing the column reordering project by a
> single ALTER command.I have an idea in mind that would solve
> theproblem in a very easy way.
>
> Please let me know how to proceed further on this.
>

You should review the history on this. See 

as a good place to start.

You should be aware that anything that doesn't follow the previous 
discussions is likely to be rejected. But if you can do something that 
does implement what was previously discussed, that would be awesome.

cheers

andrew

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Command Triggers, patch v11

2012-03-17 Thread Tom Lane

While looking at this I also noticed that DECLARE CURSOR uses a
structure that's randomly different in yet a third way: we start
with a utility statement containing a query, and then flip that
upside down so that the SELECT Query contains a utility statement!

I have a vague feeling that I'm the one who's guilty of that hack, too.

I'm not sure that anybody cares about being able to fire command
triggers on DECLARE CURSOR, but just from a consistency standpoint it
would make sense for this to work more like EXPLAIN and CREATE TABLE AS.
So that convinces me that UtilityContainsQuery() would be a good thing,
and I'll add that to the patch.  (Actually changing DECLARE CURSOR seems
like a separate patch, though.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Memory usage during sorting

2012-03-17 Thread Greg Stark

On Wed, Mar 7, 2012 at 7:55 PM, Robert Haas  wrote:
>> But it would mean we have about 1.7x  more runs that need to be merged
>> (for initially random data).  Considering the minimum merge order is
>> 6, that increase in runs is likely not to lead to an additional level
>> of merging, in which case the extra speed of building the runs would
>> definitely win.  But if it does cause an additional level of merge, it
>> could end up being a loss.
>
> That's true, but the real hit to the run size should be quite a bit
> less than 1.7x, because we'd also be using memory more efficiently,
> and therefore more tuples should fit.

I'm not sure I believe the 1.7x.  Keep in mind that even after
starting a new tape we can continue to read in new tuples that belong
on the first tape. So even if you have tuples that are more than N
positions out of place (where N is the size of your heap) as long as
you don't have very many you can keep writing out the first tape for
quite a while.

I suspect analyzing truly random inputs is also a bit like saying no
compression algorithm can work on average. Partly sorted data is quite
common and the tapesort algorithm would be able to do a lot of cases
in a single merge that the quicksort and merge would generate a large
number of merges for.

All that said, quicksort and merge would always do no more i/o in
cases where the total number of tuples to be sorted is significantly
less than N^2 since that would be guaranteed to be possible to process
with a single merge pass. (Where "significantly less" has something to
do with how many tuples you have to read in one i/o to be efficient).
That probably covers a lot of cases, and Postgres already has the
stats to predict when it would happen, more or less.

Fwiw when I was doing the top-k sorting I did a bunch of experiements
and came up with a rule-of-thumb that our quicksort was about twice as
fast as our heapsort. I'm not sure whether that's a big deal or not in
this case. Keep in mind that the only win would be reducing the cpu
time on a sort where every tuple was being written to disk and read
back. For most users that won't run noticeably faster, just reduce cpu
time consumption.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Command Triggers, patch v11

2012-03-17 Thread Andres Freund

On Saturday, March 17, 2012 06:45:27 PM Tom Lane wrote:
> I'm not sure that anybody cares about being able to fire command
> triggers on DECLARE CURSOR
I actually think it would make sense to explicitly not fire command triggers 
there given that DECLARE CURSOR actually potentially is somewhat performance 
relevant.

> , but just from a consistency standpoint it
> would make sense for this to work more like EXPLAIN and CREATE TABLE AS.
> So that convinces me that UtilityContainsQuery() would be a good thing,
> and I'll add that to the patch.  (Actually changing DECLARE CURSOR seems
> like a separate patch, though.)
Aggreed on that.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Regarding column reordering project for GSoc 2012

2012-03-17 Thread Alvaro Herrera


Excerpts from Atri Sharma's message of sáb mar 17 05:39:10 -0300 2012:
> Hi All,
> 
> I am Atri Sharma.I am a C developer with experience in database
> designing and database management systems.I have worked with
> MySql,Oracle and other database systems.I have designed databases for
> various projects as well.
> 
> I would be interested in developing the column reordering project by a
> single ALTER command.I have an idea in mind that would solve
> theproblem in a very easy way.

This is far too complex to be done in a GSoC project.

I don't see how having a middle layer client-side would fix anything.
You'd need to explain how it'd work, but to me that description sounds
bogus.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Regarding column reordering project for GSoc 2012

2012-03-17 Thread Atri Sharma

HI Alvaro,
I did send a complete description to the mailing list, based on your past 
discussions. I don’t know if you got it.

Please let me know if I should re-send the mail trail.

Atri

-Original Message-
From: Alvaro Herrera [mailto:alvhe...@commandprompt.com] 
Sent: 18 March 2012 00:17
To: Atri Sharma
Cc: Pg Hackers
Subject: Re: [HACKERS] Regarding column reordering project for GSoc 2012


Excerpts from Atri Sharma's message of sáb mar 17 05:39:10 -0300 2012:
> Hi All,
> 
> I am Atri Sharma.I am a C developer with experience in database
> designing and database management systems.I have worked with
> MySql,Oracle and other database systems.I have designed databases for
> various projects as well.
> 
> I would be interested in developing the column reordering project by a
> single ALTER command.I have an idea in mind that would solve
> theproblem in a very easy way.

This is far too complex to be done in a GSoC project.

I don't see how having a middle layer client-side would fix anything.
You'd need to explain how it'd work, but to me that description sounds
bogus.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Regarding column reordering project for GSoc 2012

2012-03-17 Thread Alvaro Herrera


Excerpts from Atri Sharma's message of sáb mar 17 15:48:13 -0300 2012:
> 
> HI Alvaro,
> I did send a complete description to the mailing list, based on your past 
> discussions. I don’t know if you got it.

I read it, but I don't see how is the client involved.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Regarding column reordering project for GSoc 2012

2012-03-17 Thread Atri Sharma

Hi Alvaro,
Thanks for going through my description.


I was going through the posts,when I realised that if we give each client the 
flexibility of defining his/her own order with complete control,that would be 
probably the right way(IMHO).

I mean,if the user sees,not the physical implementation,but a customizable 
interface instead,which is mapped to the physical implementation,then,we can 
solve to some extent,the problem of logical/physical decoupling.

If you are free,I can come on IRC right now and explain my idea to you.

Atri 


-Original Message-
From: Alvaro Herrera [mailto:alvhe...@commandprompt.com] 
Sent: 18 March 2012 00:34
To: Atri Sharma
Cc: Pg Hackers
Subject: RE: [HACKERS] Regarding column reordering project for GSoc 2012


Excerpts from Atri Sharma's message of sáb mar 17 15:48:13 -0300 2012:
> 
> HI Alvaro,
> I did send a complete description to the mailing list, based on your past 
> discussions. I don’t know if you got it.

I read it, but I don't see how is the client involved.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Regarding column reordering project for GSoc 2012

2012-03-17 Thread Dave Page

On Saturday, March 17, 2012, Atri Sharma  wrote:
> Hi Alvaro,
> Thanks for going through my description.
>
>
> I was going through the posts,when I realised that if we give each client
the flexibility of defining his/her own order with complete control,that
would be probably the right way(IMHO).
>
> I mean,if the user sees,not the physical implementation,but a
customizable interface instead,which is mapped to the physical
implementation,then,we can solve to some extent,the problem of
logical/physical decoupling.

Anything done in the client bar changing the column it orders on (which
ideally should be avoided too if practical) is going to lead to pain and
suffering as some will do it, some won't, and a few might get it wrong.

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Regarding column reordering project for GSoc 2012

2012-03-17 Thread Atri Sharma

Hi Dave,

 

I believe,with the ALTER command,we plan to give the user power to modify
ordering . Don't we?

 

Atri

 

From: Dave Page [mailto:dp...@pgadmin.org] 
Sent: 18 March 2012 00:46
To: Atri Sharma
Cc: Alvaro Herrera; Pg Hackers
Subject: Re: [HACKERS] Regarding column reordering project for GSoc 2012

 



On Saturday, March 17, 2012, Atri Sharma  wrote:
> Hi Alvaro,
> Thanks for going through my description.
>
>
> I was going through the posts,when I realised that if we give each client
the flexibility of defining his/her own order with complete control,that
would be probably the right way(IMHO).
>
> I mean,if the user sees,not the physical implementation,but a customizable
interface instead,which is mapped to the physical implementation,then,we can
solve to some extent,the problem of logical/physical decoupling.

Anything done in the client bar changing the column it orders on (which
ideally should be avoided too if practical) is going to lead to pain and
suffering as some will do it, some won't, and a few might get it wrong.

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Regarding column reordering project for GSoc 2012

2012-03-17 Thread Jaime Casanova

On Sat, Mar 17, 2012 at 2:16 PM, Atri Sharma  wrote:
> Hi Dave,
>
> I believe,with the ALTER command,we plan to give the user power to modify
> ordering . Don’t we?
>

And how is that different from a view that orders the columns as the
user wishes?

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Regarding column reordering project for GSoc 2012

2012-03-17 Thread Andrew Dunstan




On 03/17/2012 03:07 PM, Atri Sharma wrote:

Hi Alvaro,
Thanks for going through my description.


I was going through the posts,when I realised that if we give each client the 
flexibility of defining his/her own order with complete control,that would be 
probably the right way(IMHO).

I mean,if the user sees,not the physical implementation,but a customizable 
interface instead,which is mapped to the physical implementation,then,we can 
solve to some extent,the problem of logical/physical decoupling.


I think your design is a complete non-starter. Please review what was 
previously discussed. If you're not going to attempt that then my strong 
advice is to try a different project.




If you are free,I can come on IRC right now and explain my idea to you.



The place to discuss designs is on the mailing lists. That's the way we 
do things. The vast majority of hackers are not to be found in IRC.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Regarding column reordering project for GSoc 2012

2012-03-17 Thread Dave Page

On Saturday, March 17, 2012, Atri Sharma  wrote:
> Hi Dave,
>
>
>
> I believe,with the ALTER command,we plan to give the user power to modify
ordering . Don’t we?

Probably, but that's not relevant to my point as far as I can see - I was
objecting to your suggestion that the "middle layer" that translates the
physical to logical ordering and vice-versa (if I understood your
description correctly) be in the client.

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Regarding column reordering project for GSoc 2012

2012-03-17 Thread Andrew Dunstan

On 03/17/2012 04:42 PM, Dave Page wrote:

On Saturday, March 17, 2012, Atri Sharma > wrote:

> Hi Dave,
>
>
>
> I believe,with the ALTER command,we plan to give the user power to 
modify ordering . Don’t we?

Probably, but that's not relevant to my point as far as I can see - I 
was objecting to your suggestion that the "middle layer" that 
translates the physical to logical ordering and vice-versa (if I 
understood your description correctly) be in the client.

Mine too. We don't want a column ordering that's different for everyone. 
That's a recipe for mass confusion. We want to be able to mutate the 
ordering for everyone, and for everyone to see the same ordering. That 
means it MUST be done purely in the backend (as indeed all ALTER TABLE 
commands must be.)

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_terminate_backend for same-role

2012-03-17 Thread Noah Misch

On Fri, Mar 16, 2012 at 04:42:07PM -0700, Daniel Farina wrote:
> On Fri, Mar 16, 2012 at 3:42 PM, Noah Misch  wrote:
> > On Thu, Mar 15, 2012 at 04:14:03PM -0700, Daniel Farina wrote:
> >> I imagine the problem is a race condition whereby a pid might be
> >> reused by another process owned by another user (doesn't that also
> >> affect pg_cancel_backend?). ?Shall we just do everything using the
> >> MyCancelKey (which I think could just be called "SessionKey",
> >> "SessionSecret", or even just "Session") as to ensure we have no case
> >> of mistaken identity? Or does that end up being problematic?
> >
> > No, I think the hazard you identify here is orthogonal to the question of 
> > when
> > to authorize pg_terminate_backend(). ?As you note downthread, protocol-level
> > cancellations available in released versions already exhibit this hazard. ?I
> > wouldn't mind a clean fix for this, but it's an independent subject.
> 
> Hmm. Well, here's a patch that implements exactly that, I think,
> similar to the one posted to this thread, but not using BackendIds,
> but rather the newly-introduced "SessionId".  Would appreciate
> comments.  Because an out-of-band signaling method has been integrated
> more complex behaviors -- such as closing the
> TERM-against-SECURITY-DEFINER-FUNCTION hazard -- can be addressed.
> For now I've only attempted to solve the problem of backend ambiguity,
> which basically necessitated out-of-line information transfer as per
> the usual means.

This patch still changes the policy for pg_terminate_backend(), and it does
not fix other SIGINT senders like processCancelRequest() and ProcSleep().  If
you're concerned about PID-reuse races, audit all backend signalling.  Either
fix all such problems or propose a plan to get there eventually.  Any further
discussion of this topic needs a new subject line; mixing its consideration
with proposals to change the policy behind pg_terminate_backend() reduces the
chances of the right people commenting on these distinct questions.

Currently, when pg_terminate_backend() follows a pg_cancel_backend() on which
the target has yet to act, the eventual outcome is a terminated process.  With
this patch, the pg_terminate_backend() becomes a no-op with this warning:

> !  ereport(WARNING,
> !  
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> !   (errmsg("process is busy responding 
> to administrative "
> !   "request")),
> !   (errhint("This is temporary, and may 
> be retried.";

That's less useful than the current behavior.

That being said, I can't get too excited about closing PID-reuse races.  I've
yet to see another program do so.  I've never seen a trouble report around
this race for any software.  Every OS I have used assigns PIDs so as to
maximize the reuse interval, which seems like an important POLA measure given
typical admin formulae like "kill `ps | grep ...`".  This defense can only
matter in fork-bomb conditions, at which point a stray signal is minor.

I do think it's worth keeping this idea in a back pocket for achieving those
"more complex behaviors," should we ever desire them.

> > Here I discussed a hazard specific to allowing pg_terminate_backend():
> > http://archives.postgresql.org/message-id/20110602045955.gc8...@tornado.gateway.2wire.net
> >
> > To summarize, user code can trap SIGINT cancellations, but it cannot trap
> > SIGTERM terminations. ?If a backend is executing a SECURITY DEFINER function
> > when another backend of the same role calls pg_terminate_backend() thereon,
> > the pg_terminate_backend() caller could achieve something he cannot achieve 
> > in
> > PostgreSQL 9.1. ?I vote that this is an acceptable loss.
> 
> I'll throw out a patch that just lets this hazard exist and see what
> happens, although it is obsoleted/incompatible with the one already
> attached.

+1.  Has anyone actually said that the PID-reuse race or the thing I mention
above should block such a patch?  I poked back through the threads I could
remember and found nothing.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Command Triggers, patch v11

2012-03-17 Thread Tom Lane

I've found a couple more issues in the CTAS patch:

1. Previous versions delivered a "SELECT n" command tag for either
spelling of the command:

regression=# select * into t1 from int8_tbl;
SELECT 6
regression=# create table t2 as select * from int8_tbl;
SELECT 6

With the patch I get

regression=# select * into t1 from int8_tbl;
SELECT 0 0
regression=# create table t2 as select * from int8_tbl;
CREATE TABLE AS

The first of these is particularly unfortunate since it's outright lying
as to the number of rows processed.

I'm not sure what we should do instead.  We have gotten push-back before
anytime we changed the command tag for an existing command (and in fact
it seems that we intentionally added the rowcount display in 9.0, which
means there are people out there who care about that functionality).
On the other hand, the traditional output for CREATE TABLE AS doesn't
seem to satisfy the principle of least astonishment.  A third
consideration is that if we are pushing CREATE TABLE AS as the preferred
spelling, people will probably complain if it omits functionality that
SELECT INTO provides; so I'm not sure that "SELECT n" in one case and
"CREATE TABLE AS" in the other would be a good idea either.  Any
opinions what to do here?

2. Historically, CREATE RULE has allowed a rule action to be SELECT INTO
(though not CREATE TABLE AS).  Currently the patch is throwing an error
for that.  This seems like something that might not be worth fixing,
though.  It's fairly hard to conceive of a use-case for such a rule,
since it would work only once before starting to throw "table already
exists" errors.  How much do we care about preserving backward
compatibility here?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Command Triggers, patch v11

2012-03-17 Thread Andres Freund

On Saturday, March 17, 2012 11:04:30 PM Tom Lane wrote:
> I've found a couple more issues in the CTAS patch:
> 
> 1. Previous versions delivered a "SELECT n" command tag for either
> spelling of the command:
> 
> regression=# select * into t1 from int8_tbl;
> SELECT 6
> regression=# create table t2 as select * from int8_tbl;
> SELECT 6
> 
> With the patch I get
> 
> regression=# select * into t1 from int8_tbl;
> SELECT 0 0
hm. Stupid me.

> regression=# create table t2 as select * from int8_tbl;
> CREATE TABLE AS
> 
> The first of these is particularly unfortunate since it's outright lying
> as to the number of rows processed.
> I'm not sure what we should do instead.  We have gotten push-back before
> anytime we changed the command tag for an existing command (and in fact
> it seems that we intentionally added the rowcount display in 9.0, which
> means there are people out there who care about that functionality).
> On the other hand, the traditional output for CREATE TABLE AS doesn't
> seem to satisfy the principle of least astonishment.  A third
> consideration is that if we are pushing CREATE TABLE AS as the preferred
> spelling, people will probably complain if it omits functionality that
> SELECT INTO provides; so I'm not sure that "SELECT n" in one case and
> "CREATE TABLE AS" in the other would be a good idea either.  Any
> opinions what to do here?
I would prefer both returning CREATE TABLE AS since thats what actually 
happens. We already document that SELECT INTO is kinda deprecated: "The 
PostgreSQL usage of SELECT INTO to represent table creation is historical. It 
is best to use CREATE TABLE AS for this purpose in new code."
I have seen code that uses the command code for selecting the, app level, 
logging. Its kinda hard to do that if a CREATE TABLE AS/SELECT INTO returns 
SELECT.

Does CTAS ommit any functionality currently? I don't see any reason not to 
support stuff there thats supported in SELECT INTO.

> 2. Historically, CREATE RULE has allowed a rule action to be SELECT INTO
> (though not CREATE TABLE AS).  Currently the patch is throwing an error
> for that.  This seems like something that might not be worth fixing,
> though.  It's fairly hard to conceive of a use-case for such a rule,
> since it would work only once before starting to throw "table already
> exists" errors.  How much do we care about preserving backward
> compatibility here?
I vote for not supporting that anymore. That being possible looks more like an 
implementation detail to me.


Thanks for looking at this!

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-17 Thread Peter Geoghegan

Is there anything that I could be doing to help bring this patch
closer to a committable state? I'm thinking of the tests in particular
- do you suppose it's acceptable to commit them more or less as-is?

The standard for testing contrib modules seems to be a bit different,
as there is a number of other cases where an impedance mistmatch with
pg_regress necessitates doing things differently. So, the sepgsql
tests, which I understand are mainly to test the environment that the
module is being built for rather than the code itself, are written as
a shellscript than uses various selinux tools. There is also a Perl
script that uses DBD::Pg to benchmark intarray, for example.

Now that we have a defacto standard python driver, something that we
didn't have a couple of years ago, it probably isn't terribly
unreasonable to keep the tests in Python. They'll still probably need
some level of clean-up, to cut back on some of the tests that are
redundant. Some of the tests are merely fuzz tests, which are perhaps
a bit questionable.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-17 Thread Alvaro Herrera


Excerpts from Simon Riggs's message of lun mar 05 15:28:59 -0300 2012:
> 
> On Mon, Feb 27, 2012 at 2:47 AM, Robert Haas  wrote:

> >> Regarding performance, the good thing about this patch is that if you
> >> have an operation that used to block, it might now not block.  So maybe
> >> multixact-related operation is a bit slower than before, but if it
> >> allows you to continue operating rather than sit waiting until some
> >> other transaction releases you, it's much better.
> >
> > That's probably true, although there is some deferred cost that is
> > hard to account for.  You might not block immediately, but then later
> > somebody might block either because the mxact SLRU now needs fsyncs or
> > because they've got to decode an mxid long after the relevant segment
> > has been evicted from the SLRU buffers.  In general, it's hard to
> > bound that latter cost, because you only avoid blocking once (when the
> > initial update happens) but you might pay the extra cost of decoding
> > the mxid as many times as the row is read, which could be arbitrarily
> > many.  How much of a problem that is in practice, I'm not completely
> > sure, but it has worried me before and it still does.  In the worst
> > case scenario, a handful of frequently-accessed rows with MXIDs all of
> > whose members are dead except for the UPDATE they contain could result
> > in continual SLRU cache-thrashing.
> 
> Cases I regularly see involve wait times of many seconds.
> 
> When this patch helps, it will help performance by algorithmic gains,
> so perhaps x10-100.
> 
> That can and should be demonstrated though, I agree.

BTW, the isolation tester cases have a few places that in the unpatched
code die with deadlocks and with the patched code continue without
dying.  Others are cases that block in unpatched master, and continue
without blocking in patched.  This should be enough proof that there are
"algorithmic gains" here.

There's also a test case that demostrates a fix for the problem (pointed
out by docs) that if you acquire a row lock, then a subxact upgrades it
(say by deleting the row) and the subxact aborts, the original row lock
is lost.  With the patched code, the original lock is no longer lost.

I completely agree with the idea that we need some mitigation against
repeated lookups of mxids that contain committed updates.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sortsupport for text

2012-03-17 Thread Greg Stark

On Fri, Mar 2, 2012 at 8:45 PM, Robert Haas  wrote:
> 12789    28.2686  libc-2.13.so             strcoll_l
> 6802     15.0350  postgres                 text_cmp

I'm still curious how it would compare to call strxfrm and sort the
resulting binary blobs. I don't think the sortsupport stuff actually
makes this any easier though. Since using it requires storing the
binary blob somewhere I think the support would have to be baked into
tuplesort (or hacked into the sortkey as an expr that was evaluated
earlier somehow).

It's a tradeoff and not an obvious one. The binary blobs are larger
and it would mean reading and copying more data around memory. But it
would mean doing the work that strcoll_l does only n times instead of
nlogn times. That might be a pretty significant gain.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-17 Thread Alvaro Herrera


Excerpts from Simon Riggs's message of mar mar 06 18:33:13 -0300 2012:

> The lock modes are correct, appropriate and IMHO have meaningful
> names. No redesign required here.
> 
> Not sure about the naming of some of the flag bits however.

Feel free to suggest improvements ... I've probably seen them for too
long to find them anything but what I intended them to mean.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-17 Thread Alvaro Herrera

Excerpts from Simon Riggs's message of mar mar 06 17:28:12 -0300 2012:
> On Tue, Mar 6, 2012 at 7:39 PM, Alvaro Herrera
>  wrote:
> 
> > We provide four levels of tuple locking strength: SELECT FOR KEY UPDATE is
> > super-exclusive locking (used to delete tuples and more generally to update
> > tuples modifying the values of the columns that make up the key of the 
> > tuple);
> > SELECT FOR UPDATE is a standards-compliant exclusive lock; SELECT FOR SHARE
> > implements shared locks; and finally SELECT FOR KEY SHARE is a super-weak 
> > mode
> > that does not conflict with exclusive mode, but conflicts with SELECT FOR 
> > KEY
> > UPDATE.  This last mode implements a mode just strong enough to implement RI
> > checks, i.e. it ensures that tuples do not go away from under a check, 
> > without
> > blocking when some other transaction that want to update the tuple without
> > changing its key.
> 
> So there are 4 lock types, but we only have room for 3 on the tuple
> header, so we store the least common/deprecated of the 4 types as a
> multixactid. Some rewording would help there.

Hmm, I rewrote that paragraph two times.  I'll try to adjust it a bit
more.

> My understanding is that all of theses workloads will change
> 
> * Users of explicit SHARE lockers will be slightly worse in the case
> of the 1st locker, but then after that they'll be the same as before.

Right.  (We're assuming that there *are* users of SHARE locks, which I'm
not sure to be a given.)

> * Updates against an RI locked table will be dramatically faster
> because of reduced lock waits

Correct.

> ...and that these previous workloads are effectively unchanged:
> 
> * Stream of RI checks causes mxacts

Yes.

> * Multi row deadlocks still possible

Yes.

> * Queues of writers still wait in the same way

Yes.

> * Deletes don't cause mxacts unless by same transaction

Yeah .. there's no way for anyone to not conflict with a FOR KEY UPDATE
lock (the strength grabbed by a delete) unless you're the same
transaction.

> > The possibility of having an update within a MultiXact means that they must
> > persist across crashes and restarts: a future reader of the tuple needs to
> > figure out whether the update committed or aborted.  So we have a 
> > requirement
> > that pg_multixact needs to retain pages of its data until we're certain that
> > the MultiXacts in them are no longer of interest.
> 
> I think the "no longer of interest" aspect needs to be tracked more
> closely because it will necessarily lead to more I/O.

Not sure what you mean here.

> If we store the LSN on each mxact page, as I think we need to, we can
> get rid of pages more quickly if we know they don't have an LSN set.
> So its possible we can optimise that more.

Hmm, I had originally thought that this was rather pointless because it
was unlikely that a segment would *never* have *all* multis not
containing updates.  But then, maybe Robert is right and there are users
out there that run a lot of RI checks and never update the masters ...
Hm.  I'm not sure that LSN tracking is the right tool to do that
optimization, however -- I mean, a single multi containing an update in
a whole segment will prevent that segment from being considered useless.

> > VACUUM is in charge of removing old MultiXacts at the time of tuple 
> > freezing.
> 
> You mean mxact segments?

Well, both.  When a tuple is frozen, we both remove its Xmin/Xmax and
any possible multi that it might have in Xmax.  That's what I really
meant above.  But also, vacuum will remove pg_multixact segments just as
it will remove pg_clog segments.

(It is possible, and probably desirable, to remove a Multi much earlier
than freezing the tuple.  The patch does not (yet) do that, however.)

> Surely we set hint bits on tuples same as now? Hope so.

We set hint bits, but if a multi contains an update, we don't set
HEAP_XMAX_COMMITTED even when the update is known committed.  I think
we could do this in some cases.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Cross-backend signals and administration (Was: Re: pg_terminate_backend for same-role)

2012-03-17 Thread Daniel Farina

This thread evolved out of an attempt to implement
pg_terminate_backend for non-superusers.  I thought -- probably
erroneously -- that the major objection to that was the known
possibility of a PID-cycling race condition, whereby a signal could be
misdirected, in the case of terminate_backend, SIGTERM.  So now this
fork of the thread is about fixing these unlikely races, and then
passing administration requests (such as "query cancel" or "die" ) as
out-of-band information via SIGUSR1, just like how LISTEN/NOTIFY and
conflict signals are passed.

To prevent ambiguity, I am using a new special number -- a 'SessionId'
-- that is guaranteed unique to all backends ever created during the
uptime of a database.  This number is currently implemented in a way
that is guessable (so it cannot be accepted from external sources),
but I actually think it may be even more useful for a number of other
uses if given a non-guessable form (like cancellation keys).  In this
respect it would fulfill pretty much the same purposes as the notion
of a "session" seen on the web.

Noah offered me these comments:
> This patch still changes the policy for pg_terminate_backend(), and it does
> not fix other SIGINT senders like processCancelRequest() and ProcSleep().  If
> you're concerned about PID-reuse races, audit all backend signalling.  Either
> fix all such problems or propose a plan to get there eventually.

Is the postmaster signaling its children intrinsically vulnerable to
PID racing?  Because it controls when it can call wait() or waitpid()
on child processes, it can unambiguously know that PIDs have not been
cycled for use.  For this reason, a credible and entirely alternate
design might be to bounce IPC requests through the postmaster, but
since postmaster is so critical I had decided not to introduce nor
change mechanics there.

The Postmaster I think keeps a private copy of cancellation keys that
are not in shared memory, if I read it properly (not 100% sure), and
uses that for cancellation requests.  This has a useful property of
allowing cancellations even in event that shared memory goes insane
(and since postmaster is typically left as last sane process of the
group I thought it wise to not have it reuse a shared-memory based
approach).

I cannot comment on ProcSleep at this time.

> Currently, when pg_terminate_backend() follows a pg_cancel_backend() on which
> the target has yet to act, the eventual outcome is a terminated process.  With
> this patch, the pg_terminate_backend() becomes a no-op with this warning:
>
>> !                      ereport(WARNING,
>> !                                      
>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>> !                                       (errmsg("process is busy responding 
>> to administrative "
>> !                                                       "request")),
>> !                                       (errhint("This is temporary, and may 
>> be retried.";
>
> That's less useful than the current behavior.

Yes. It could be fixed with dynamic allocation (holding more
administration requests), but for just getting a flavor of what a
solution might be like.  I wanted to avoid additional dynamic
allocation (which would necessitate a similar condition in the form of
much-less likely OOM), but at some point I think this error condition
is inevitable in some form.  I see it as akin to EAGAIN.  Right now,
administrative requests are so short (copying and clearing a handful
of words out of PGPROC) that it's unlikely that this would be a
problem in practice.

> That being said, I can't get too excited about closing PID-reuse races.  I've
> yet to see another program do so.  I've never seen a trouble report around
> this race for any software.  Every OS I have used assigns PIDs so as to
> maximize the reuse interval, which seems like an important POLA measure given
> typical admin formulae like "kill `ps | grep ...`".  This defense can only
> matter in fork-bomb conditions, at which point a stray signal is minor.
>
> I do think it's worth keeping this idea in a back pocket for achieving those
> "more complex behaviors," should we ever desire them.

I feel mostly feel this way, too: I have a non-specific itch to solve
the race -- and I'm much more interested in pg_terminate_backend, the
original subject at hand (I'll submit a patch that is more of a
policy-fix than a signal wrangling one to the old thread).

I was also interested in sketching both as solution to this problem
and bringing down some of the non-intrinsic difficulty in
experimenting with rich inter-backend communication.  For example,
query progress (itself a much more complex beast, obviously) could
probably make use of these administration requests (with an async-back
channel akin to NOTIFY).  Or printing MemoryContextStats.  Or quick
profiling toggling/performance counters on a backend run amok without
having the ability to run gdb -p and completely freezing the process.

Thoughts appreciated.

Re: [HACKERS] Memory usage during sorting

2012-03-17 Thread Jeff Janes

On Sat, Mar 17, 2012 at 10:47 AM, Greg Stark  wrote:
> On Wed, Mar 7, 2012 at 7:55 PM, Robert Haas  wrote:
>>> But it would mean we have about 1.7x  more runs that need to be merged
>>> (for initially random data).  Considering the minimum merge order is
>>> 6, that increase in runs is likely not to lead to an additional level
>>> of merging, in which case the extra speed of building the runs would
>>> definitely win.  But if it does cause an additional level of merge, it
>>> could end up being a loss.
>>
>> That's true, but the real hit to the run size should be quite a bit
>> less than 1.7x, because we'd also be using memory more efficiently,
>> and therefore more tuples should fit.
>
> I'm not sure I believe the 1.7x.  Keep in mind that even after
> starting a new tape we can continue to read in new tuples that belong
> on the first tape. So even if you have tuples that are more than N
> positions out of place (where N is the size of your heap) as long as
> you don't have very many you can keep writing out the first tape for
> quite a while.
>
> I suspect analyzing truly random inputs is also a bit like saying no
> compression algorithm can work on average.

I'm not sure what you mean here.  The "also" suggests the previous
discussion was about something other than the assumption of
randomness, but that discussion doesn't make much sense unless both
paragraphs are about that.

Anyway I think keeping best/worst cases in mind is certainly a good
thing to do.  I've certainly seen train wrecks unfold when people
assumed anything could be compressed without verifying it.

> Partly sorted data is quite
> common and the tapesort algorithm would be able to do a lot of cases
> in a single merge that the quicksort and merge would generate a large
> number of merges for.

This is where some common and credible corpus would be invaluable.

> All that said, quicksort and merge would always do no more i/o in
> cases where the total number of tuples to be sorted is significantly
> less than N^2 since that would be guaranteed to be possible to process
> with a single merge pass. (Where "significantly less" has something to
> do with how many tuples you have to read in one i/o to be efficient).

Unless the tuples are quite large, the number needed for efficient i/o
is quite high.

> That probably covers a lot of cases, and Postgres already has the
> stats to predict when it would happen, more or less.

Currently sort makes no attempt to use the planner stats.  Would that
be a good thing to work on?

> Fwiw when I was doing the top-k sorting I did a bunch of experiements
> and came up with a rule-of-thumb that our quicksort was about twice as
> fast as our heapsort. I'm not sure whether that's a big deal or not in
> this case.

I've been seeing around 3 fold, and that might go up more with some of
the work being done that speeds up qsort but not tape sort.

> Keep in mind that the only win would be reducing the cpu
> time on a sort where every tuple was being written to disk and read
> back. For most users that won't run noticeably faster, just reduce cpu
> time consumption.

But in many (perhaps most) tape sorts CPU time is most of it.   A lot
of tape sorts are entirely memory backed.  The tuples either never
reach disk, or are partially written but never read, instead being
served back from the file-systems cache.  By changing to a tape sort,
you are buying insurance against a bunch of other sorts also running
simultaneously, but often the insurance doesn't pay off because those
numerous other sorts don't exist and the kernel manages to keep the
few ones that do in RAM anyway.

One avenue for major surgery on the sorts would be for an entry
admission where jobs can negotiable over the memory.  Something like
allocation by halves, but with a possibility that a job could decide
it would rather wait for another sort to finish and its memory to be
freed up, than to make do with what is currently available.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Memory usage during sorting

2012-03-17 Thread Jeff Janes

On Wed, Mar 7, 2012 at 11:55 AM, Robert Haas  wrote:
> On Sat, Mar 3, 2012 at 4:15 PM, Jeff Janes  wrote:
>
>> Anyway, I think the logtape could use redoing.  When your tapes are
>> actually physically tape drives, it is necessary to build up runs one
>> after the other on physical tapes, because un-mounting a tape from a
>> tape drive and remounting another tape is not very feasible at scale.
>> Which then means you need to place your runs carefully, because if the
>> final merge finds that two runs it needs are back-to-back on one tape,
>> that is bad.  But with disks pretending to be tapes, you could
>> re-arrange the "runs" with just some book keeping.  Maintaining the
>> distinction between "tapes" and "runs" is pointless, which means the
>> Fibonacci placement algorithm is pointless as well.
>
> I think you're right.  It seems to me that we could just write out an
> arbitrary number of runs, one per file, ending up with files number
> 1..N.

The problem there is that none of the files can be deleted until it
was entirely read, so you end up with all the data on disk twice.  I
don't know how often people run their databases so close to the edge
on disk space that this matters, but someone felt that that extra
storage was worth avoiding.  We could still get rid of the run/tape
distinction but keep the block recycling method--they are conceptually
distinct things.   (But hopefully improve the block recycling so the
locality doesn't degrade as much as it seems to now)

> If we can do a final merge of N runs without overrunning
> work_mem, fine.  If not, we merge the first K runs (for the largest
> possible K) and write out a new run N+1.  The range of valid run
> number is now K+1..N+1.  If those can be merged in a single pass, we
> do so; otherwise we again merge the first K runs (K+1 through 2K) to
> create a new run N+2.  And so on.
>
> I am not clear, however, whether this would be any faster.  It may not
> be worth tinkering with just for the reduction in code complexity.

Yeah, I was thinking it would only be worth redoing if the current
implementation interferes with other improvements--which I think is
likely, but don't have any concrete ideas yet where it would.

...

>>> As a desirable side effect, I think it would mean
>>> that we could dispense with retail palloc and pfree altogether.  We
>>> could just allocate a big chunk of memory, copy tuples into it until
>>> it's full, using a pointer to keep track of the next unused byte, and
>>> then, after writing the run, reset the allocation pointer back to the
>>> beginning of the buffer.  That would not only avoid the cost of going
>>> through palloc/pfree, but also the memory overhead imposed by
>>> bookkeeping and power-of-two rounding.
>>
>> Wouldn't we still need an array of pointers to the start of every
>> tuple's location in the buffer?  Or else, how would qsort know where
>> to find them?
>
> Yes, we'd still need that.  I don't see any way to get around that; I
> just don't like the expense of palloc-ing so many little chunks in
> addition to that array.

Right, OK, I had thought you meant the power of two rounding of
mem_tuples, I overlooked the power of two rounding of palloc.


>
>> Also, to do this we would need to get around the 1GB allocation limit.
>>  It is bad enough that memtuples is limited to 1GB, it would be much
>> worse if the entire arena was limited to that amount.
>
> Well, we could always allocate multiple 1GB chunks and peel off pieces
> from each one in turn.

I thought of that with mem_tuples itself, but I think it would be more
difficult to do that than just fixing the allocation limit would be.
Using multiple chunks might be easier with the buffer space as you
don't need to do heap arithmetic on those.

>>> If we do want to stick with the current algorithm, there seem to be
>>> some newer techniques for cutting down on the heap maintenance
>>> overhead.  Heikki's been investigating that a bit.
>>
>> Interesting.  Is that investigation around the poor L1/L2 caching
>> properties of large heaps?  I was wondering if there might be a way to
>> give tuplesort an initial estimate of how much data there was to sort,
>> so that it could use a smaller amount of memory than the max if it
>> decided that that would lead to better caching effects.  Once you know
>> you can't do an in-memory sort, then there is no reason to use more
>> memory than the amount that lets you merge all the runs in one go.
>
> No, it's about reducing the number of comparisons needed to maintain
> the heap property.

That sounds very interesting.  I didn't know it was even theoretically
possible to do that.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Regarding column reordering project for GSoc 2012

2012-03-17 Thread Daniel Farina

On Sat, Mar 17, 2012 at 1:48 PM, Andrew Dunstan  wrote:
> Mine too. We don't want a column ordering that's different for everyone.
> That's a recipe for mass confusion. We want to be able to mutate the
> ordering for everyone, and for everyone to see the same ordering. That means
> it MUST be done purely in the backend (as indeed all ALTER TABLE commands
> must be.)

I am in agreement.  This state would have to live in the session
between client and server, so that semantic analysis could do the
right thing by looking at the environment on the backend, too.  Life
is already really difficult because of session state: consider
connection poolers, which are both common and have subtle caveats that
are close to being completely silent if one uses the wrong construct.
The only reason people get away with this in practice is because most
of them do not use constructs like temp tables and dblink in
conjunction with pooling, but if they did, the prospect of what could
happen when sessions get crossed is a nightmare (clearly, I'd like a
way to treat this, because session state is also useful, but that's
not the here and now...)

Reordering attributes totally backends-side, though, would be a
wonderful feature, and already has plenty of work just to realize the
physical/logical decoupling in the backend, so I have a lot of
enthusiasm for the project in general.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gsoc2012 Idea --- Social Network database schema

2012-03-17 Thread Daniel Farina

On Sat, Mar 17, 2012 at 8:50 AM, HuangQi  wrote:
>     I'm quite glad if you could offer me some advices. Thanks a lot for your
> help!

Thank you for your interest! However, I am a little confused precisely
what you are thinking about implementing.  Are there particular access
methods or operators that you think are useful in this problem space,
or changes to the planner?

As long as you are soliciting for suggestions, I'll make one...

One that bites me (and my organization) all the time is the lack of
the access method skip scan (also called "loose index scan").  It's a
killer for append-mostly tables that track a much smaller number of
entities than the number of records in the table, and we have a
grotesque hack to do it right now.  In the more "social" space the
problem reappears in the form of newsfeeds, so I think that work would
have good impact across a nice spectrum of users.

Another skip-related feature that would be very nice is the
SQL-standard TABLESAMPLE feature.  I wonder if the notion of a
"SkipKind" could be taught to the executor that would provide cohesion
of implementation for most feature that involve skipping a lot of the
rows in a table while continuing a scan.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gsoc2012 Idea --- Social Network database schema

2012-03-17 Thread HuangQi

(Sorry, Daniel. Forgot to cc pgsql-hackers.)
Hi, Daniel
Thanks a lot for your response.
As I can see for now, in my FYP, as the acyclic schema has the property
that it has a join tree. I will check how many join trees it has and
investigate any best option for the RSN schema. If it does have, I will
modify the planner to just use this join tree if it detects a RSN database,
then no need to use System R to search through all possible join trees. The
implementation is narrow to only RSN schema. It might be too constraint for
Postgres as a generic database system.
For 'loose index join', I googled it and find the two sites useful to
understand.
http://dev.mysql.com/doc/refman/5.5/en/loose-index-scan.html
http://wiki.postgresql.org/wiki/Loose_indexscan
This is implemented in MySQL already, while Postgres only emulates the
access method. Do you have any mail thread talking about the current design
and progress?
About the second topic, so currently TABLESAMPLE is not implemented
inside Postgres? I didn't see this query before, but I googled it just now
and the query seems very weird and interesting.
http://www.fotia.co.uk/fotia/DY.18.TheTableSampleClause.aspx
Still, do you have any mail thread talking about this?
Thanks.



-- 
Best Regards
Huang Qi Victor

Re: [HACKERS] Regarding column reordering project for GSoc 2012

2012-03-17 Thread Atri Sharma

-Original Message-
From: Daniel Farina [mailto:dan...@heroku.com] 
Sent: 18 March 2012 06:30
To: Andrew Dunstan
Cc: Dave Page; Atri Sharma; Alvaro Herrera; Pg Hackers
Subject: Re: [HACKERS] Regarding column reordering project for GSoc 2012

On Sat, Mar 17, 2012 at 1:48 PM, Andrew Dunstan  wrote:
> Mine too. We don't want a column ordering that's different for everyone.
> That's a recipe for mass confusion. We want to be able to mutate the
> ordering for everyone, and for everyone to see the same ordering. That
means
> it MUST be done purely in the backend (as indeed all ALTER TABLE commands
> must be.)

I am in agreement.  This state would have to live in the session
between client and server, so that semantic analysis could do the
right thing by looking at the environment on the backend, too.  Life
is already really difficult because of session state: consider
connection poolers, which are both common and have subtle caveats that
are close to being completely silent if one uses the wrong construct.
The only reason people get away with this in practice is because most
of them do not use constructs like temp tables and dblink in
conjunction with pooling, but if they did, the prospect of what could
happen when sessions get crossed is a nightmare (clearly, I'd like a
way to treat this, because session state is also useful, but that's
not the here and now...)

Reordering attributes totally backends-side, though, would be a
wonderful feature, and already has plenty of work just to realize the
physical/logical decoupling in the backend, so I have a lot of
enthusiasm for the project in general.

-- 
Fdr

---

Hi All,

First of all, thanks a million for all the wonderful feedbacks. It has
really taught me a lot in just 8 hours.

I am understanding the scenario now. I will take a little of your time to
modify my original idea:

The "middle layer" still exists, but NOT on the individual client side ,
rather , on the server side. That is, we maintain the middle layer on the
server, and it is same for all the users. We can mutate the ordering, and
changes would be reflected in all the clients, since they are all accessing
the same middle layer, present on the server.

The mapping structure, according to me, should still remain the same.

Please let me know your feedback and corrections.

Atri

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Regarding column reordering project for GSoc 2012

2012-03-17 Thread Atri Sharma

-Original Message-
From: jcasa...@systemguards.com.ec [mailto:jcasa...@systemguards.com.ec] On
Behalf Of Jaime Casanova
Sent: 18 March 2012 01:04
To: Atri Sharma
Cc: Dave Page; Alvaro Herrera; Pg Hackers
Subject: Re: [HACKERS] Regarding column reordering project for GSoc 2012

On Sat, Mar 17, 2012 at 2:16 PM, Atri Sharma  wrote:
> Hi Dave,
>
> I believe,with the ALTER command,we plan to give the user power to modify
> ordering . Dont we?
>

And how is that different from a view that orders the columns as the
user wishes?

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación

--

Hi Jaime ,

Well, I believe, my idea is more towards the layers already present in
database systems, rather than views.

1) Views are just a window to some(or all) part of a database. There are
many restriction on views, none of which are applicable to our middle layer.
This is so because, even though the middle layer is encapsulating the
addressing and mapping details from the user , it still provides complete
access and rights to the database according to the user's privilege level.

2) Views are optional, wherever our middle layer is built right in the
database. It is not optional. User can only see the middle layer. It cannot
directly interact with the underlying layers.

Please let me know your feedback and comments.

Atri

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] double free in current HEAD's pg_dump

2012-03-17 Thread Joachim Wieland

There's a double free in the current HEAD's pg_dump. Fix attached.
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 2b0a5ff..57a6ccb 100644
*** a/src/bin/pg_dump/pg_dump.c
--- b/src/bin/pg_dump/pg_dump.c
*** dumpBlobs(Archive *fout, void *arg)
*** 2372,2379 
  		PQclear(res);
  	} while (ntups > 0);
  
- 	PQclear(res);
- 
  	return 1;
  }
  
--- 2372,2377 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch for parallel pg_dump

2012-03-17 Thread Joachim Wieland

On Fri, Mar 16, 2012 at 12:06 AM, Robert Haas  wrote:
>> Good. The only exit handler I've seen so far is
>> pgdump_cleanup_at_exit. If there's no other one, is it okay to remove
>> all of this stacking functionality (see on_exit_nicely_index /
>> MAX_ON_EXIT_NICELY) from dumputils.c and just define two global
>> variables, one for the function and one for the arg that this function
>> would operate on (or a struct of both)?
>
> No.  That code is included by other things - like pg_dumpall - that
> don't know there's such a thing as an Archive.  But I don't see that
> as a big problem; just on_exit_nicely whatever you want.  We could
> also add on_exit_nicely_reset(), if needed, to clear the existing
> handlers.

Yes, on_exit_nicely_reset() would be what I'd need to remove all
callbacks from the parent after the fork in the child process.

I still can't find any other hooks except for pgdump_cleanup_at_exit
from pg_dump.c. I guess what you're saying is that we provide
dumputil.c to other programs but even though none of them currently
sets any exit callback, you want to keep the functionality so that
they can set multiple exit hooks in the future should the need for
them arise.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

54 matches

Mail list logo