Re: [HACKERS] Patches that don't apply or don't compile: 2017-09-12

2017-09-14 Thread Thomas Munro
On Fri, Sep 15, 2017 at 8:13 AM, Robert Haas  wrote:
> On Wed, Sep 13, 2017 at 5:18 AM, Alvaro Herrera  
> wrote:
>> Thinking further, I think changing patch status automatically may never
>> be a good idea; there's always going to be some amount of common sense
>> applied beforehand (such as if a conflict is just an Oid catalog
>> collision, or a typo fix in a recent commit, etc).  Such conflicts will
>> always raise errors with a tool, but would never cause a (decent) human
>> being from changing the patch status to WoA.
>
> Well it would perhaps be fine if sending an updated patch bounced it
> right back to Needs Review.  But if things are only being auto-flipped
> in one direction that's going to get tedious.
>
> Or one could imagine having the CF app show the CI status alongside
> the existing status column.

FWIW that last thing is the idea that I've been discussing with
Magnus.  That web page I made wouldn't exist, and the "apply" and
"build" badges would appear directly on the CF app and you'd be able
to sort and filter by them etc.  The main submission status would
still be managed by humans and the new apply and build statuses would
be managed by faithful robot servants.

How exactly that would work, I don't know.  One idea is that a future
cfbot, rewritten in better Python and adopted into the pginfra
universe, pokes CF via an HTTPS endpoint so that the CF app finishes
up with the information in its database.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patches that don't apply or don't compile: 2017-09-12

2017-09-14 Thread Robert Haas
On Wed, Sep 13, 2017 at 5:18 AM, Alvaro Herrera  wrote:
> Thinking further, I think changing patch status automatically may never
> be a good idea; there's always going to be some amount of common sense
> applied beforehand (such as if a conflict is just an Oid catalog
> collision, or a typo fix in a recent commit, etc).  Such conflicts will
> always raise errors with a tool, but would never cause a (decent) human
> being from changing the patch status to WoA.

Well it would perhaps be fine if sending an updated patch bounced it
right back to Needs Review.  But if things are only being auto-flipped
in one direction that's going to get tedious.

Or one could imagine having the CF app show the CI status alongside
the existing status column.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patches that don't apply or don't compile: 2017-09-12

2017-09-14 Thread Aleksander Alekseev
Hi Martin,

> > === Build Failed: 7 ===
> > Title: Fix the optimization to skip WAL-logging on table created in same 
> > transaction
> > Author: Martijn van Oosterhout 
> > URL: https://commitfest.postgresql.org/14/528/
> 
> I'm not the author of this patch, and the page doesn't say so.
> Something wrong with your script?

It's a bug. The authors were determined by the senders of the first
email in the corresponding thread. This is because I needed email list
to CC the authors but the commitfest application doesn't show emails.

Sorry for the disturbance.

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature


Re: [HACKERS] Patches that don't apply or don't compile: 2017-09-12

2017-09-13 Thread Tomas Vondra
Hi Aleksander,

On 09/13/2017 11:49 AM, Aleksander Alekseev wrote:
> Hi Tomas,
> 
> I appreciate your feedback, although it doesn't seem to be completely
> fair. Particularly:
> 
>> You gave everyone about 4 hours to object
> 
> This is not quite accurate since my proposal was sent 2017-09-11
> 09:41:32 and this thread started - 2017-09-12 14:14:55.
> 

Understood. I didn't really consider the first message to be a proposal
with a deadline, as it starts with "here's a crazy idea" and it's not
immediately clear that you intend to proceed with it immediately, and
that you expect people to object.

The message I referenced is a much clearer on this.

>> You just changed the status of 10-15% open patches. I'd expect
>> things like this to be consulted with the CF manager, yet I don't see
>> any comments from Daniel.
> 
> Agree, that was clearly a mistake, I had to add Daniel to CC. Sorry I
> didn't do that. I've returned all affected patches back to "Needs
> Review". On the bright side while doing this I've noticed that many
> patches were already updated by their authors.
> 

Yeah.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patches that don't apply or don't compile: 2017-09-12

2017-09-13 Thread Daniel Gustafsson
> On 13 Sep 2017, at 11:49, Aleksander Alekseev  
> wrote:
> 
> Hi Tomas,
> 
> I appreciate your feedback, although it doesn't seem to be completely
> fair.

I would like to stress one thing (and I am speaking only for myself here), this
has been feedback and not criticism.  Your (and everyone involved in this)
initiative is great and automation will no doubt make the CF process better.
We just need to finetune it a little to make it work for, as well as with, the
community.

cheers ./daniel

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patches that don't apply or don't compile: 2017-09-12

2017-09-13 Thread Aleksander Alekseev
Hi Tomas,

I appreciate your feedback, although it doesn't seem to be completely
fair. Particularly:

> You gave everyone about 4 hours to object

This is not quite accurate since my proposal was sent 2017-09-11
09:41:32 and this thread started - 2017-09-12 14:14:55.

> You just changed the status of 10-15% open patches. I'd expect
> things like this to be consulted with the CF manager, yet I don't see
> any comments from Daniel.

Agree, that was clearly a mistake, I had to add Daniel to CC. Sorry I
didn't do that. I've returned all affected patches back to "Needs
Review". On the bright side while doing this I've noticed that many
patches were already updated by their authors.

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature


Re: [HACKERS] Patches that don't apply or don't compile: 2017-09-12

2017-09-13 Thread Alvaro Herrera
Aleksander Alekseev wrote:

> Agree, especially regarding build logs. All of this currently is only an
> experiment. For some reason I got a weird feeling that at this time it
> will be not quite successful one. If there will be too many false
> positives I'll just return the patches back to "Needs Review" as it was
> before. But I strongly believe that after a few iterations we will find
> a solution that will be good enough and this slight inconveniences will
> worth it in a long run.

Thinking further, I think changing patch status automatically may never
be a good idea; there's always going to be some amount of common sense
applied beforehand (such as if a conflict is just an Oid catalog
collision, or a typo fix in a recent commit, etc).  Such conflicts will
always raise errors with a tool, but would never cause a (decent) human
being from changing the patch status to WoA.

On the other hand, sending an email to the patch author (possibly CCing
the mailing list, incl. In-Reply-To headers as appropriate) notifying
them that there is a conflict would be very useful.  I think it would be
perfectly reasonable to automate that.  Include exactly what went wrong,
and of course the date where the problem was detected (in the email
headers) so that reviewers can see "this patch's author received a
notice and didn't act on it for two weeks" and take a decision to set it
WoA.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patches that don't apply or don't compile: 2017-09-12

2017-09-12 Thread Kyotaro HORIGUCHI
Hello, aside from the discussion on the policy of usage of
automation CI, it seems having trouble applying patches.

https://travis-ci.org/postgresql-cfbot/postgresql/builds/27450
>1363  heapam.c:2502:18: error: ‘HEAP_INSERT_SKIP_WAL’ undeclared (first use in 
>this function)
>1364  if (!(options & HEAP_INSERT_SKIP_WAL) && RelationNeedsWAL(relation))

These lines shows that the patch is applied halfway.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patches that don't apply or don't compile: 2017-09-12

2017-09-12 Thread Kyotaro HORIGUCHI
At Wed, 13 Sep 2017 08:13:08 +0900, Michael Paquier  
wrote in 

Re: [HACKERS] Patches that don't apply or don't compile: 2017-09-12

2017-09-12 Thread Michael Paquier
On Wed, Sep 13, 2017 at 7:39 AM, Daniel Gustafsson  wrote:
>> On 12 Sep 2017, at 23:54, Tomas Vondra  wrote:
>> With all due respect, it's hard not to see this as a disruption of the
>> current CF. I agree automating the patch processing is a worthwhile
>> goal, but we're not there yet and it seems somewhat premature.
>>
>> Let me explain why I think so:
>>
>> (1) You just changed the status of 10-15% open patches. I'd expect
>> things like this to be consulted with the CF manager, yet I don't see
>> any comments from Daniel. Considering he's been at the Oslo PUG meetup
>> today, I doubt he was watching hackers very closely.
>
> Correct, I’ve been travelling and running a meetup today so had missed this on
> -hackers.

FWIW, I tend to think that the status of a patch ought to be changed
by either a direct lookup at the patch itself or the author depending
on how the discussion goes on, not an automatic processing. Or at
least have more delay to allow people to object as some patches can be
applied, but do not apply automatically because of naming issues.
There are as well people sending test patches to allow Postgres to
fail on purpose, for example see the replication slot issue not able
to retain a past segment because the beginning of a record was not
tracked correctly on the receiver-side. This can make the recovery
tests fail, but we want them to fail to reproduce easily the wanted
failure.

>> (2) You gave everyone about 4 hours to object, ending 3PM UTC, which
>> excludes about one whole hemisphere where it's either too early or too
>> late for people to respond. I'd say waiting for >24 hours would be more
>> appropriate.
>
> Agreed.

Definitely. Any batch updates have to involve the CFM authorization at
least, in this case Daniel.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patches that don't apply or don't compile: 2017-09-12

2017-09-12 Thread Daniel Gustafsson
> On 12 Sep 2017, at 23:54, Tomas Vondra  wrote:
> 
> On 09/12/2017 04:14 PM, Aleksander Alekseev wrote:
>> Hello, hackers!
>> 
>> Thanks to the work of Thomas Munro now we have a CI for the patches on
>> the commitfest [1]. Naturally there is still room for improvement, but
>> in any case it's much, much better than nothing.
>> 
>> After a short discussion [2] we agreed (or at least no one objected)
>> to determine the patches that don't apply / don't compile / don't
>> pass regression tests and have "Needs Review" status, change the
>> status of these patches to "Waiting on Author" and write a report
>> (this one) with a CC to the authors. As all we know, we are short on
>> reviewers and this action will save them a lot of time. Here [3] you
>> can find a script I've been using to find such patches.
>> 
>> I rechecked the list manually and did my best to exclude the patches 
>> that were updated recently or that depend on other patches. However 
>> there still a chance that your patch got to the list by a mistake.
>> In this case please let me know.>
> 
> With all due respect, it's hard not to see this as a disruption of the
> current CF. I agree automating the patch processing is a worthwhile
> goal, but we're not there yet and it seems somewhat premature.
> 
> Let me explain why I think so:
> 
> (1) You just changed the status of 10-15% open patches. I'd expect
> things like this to be consulted with the CF manager, yet I don't see
> any comments from Daniel. Considering he's been at the Oslo PUG meetup
> today, I doubt he was watching hackers very closely.

Correct, I’ve been travelling and running a meetup today so had missed this on
-hackers.

> (2) You gave everyone about 4 hours to object, ending 3PM UTC, which
> excludes about one whole hemisphere where it's either too early or too
> late for people to respond. I'd say waiting for >24 hours would be more
> appropriate.

Agreed.

> I object to changing the patch status merely based on the script output.
> It's a nice goal, but we need to do the legwork first, otherwise it'll
> be just annoying and disrupting.

I too fear that automating the state change will move patches away from “Needs
review” in too many cases unless there is manual inspection step.  Colliding on
Oids in pg_proc comes to mind as a case where the patch won’t build, but the
reviewer can trivially fix that locally and keep reviewing.

> I suggest we inspect the reported patches manually, investigate whether
> the failures are legitimate or not, and eliminate as many false
> positives as possible. Once we are happy with the accuracy, we can
> enable it again.

This seems to summarize the sentiment in the other thread, this is absolutely a
step in the right direction, we just need to tweak it with human knowledge
before it can be made fully automatic to avoid false positives.  The last thing
we want is for the community to consider CF status changes/updates to be crying
wolf, there are few enough reviewers as there is.

cheers ./daniel

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patches that don't apply or don't compile: 2017-09-12

2017-09-12 Thread Tomas Vondra
On 09/12/2017 04:14 PM, Aleksander Alekseev wrote:
> Hello, hackers!
> 
> Thanks to the work of Thomas Munro now we have a CI for the patches on
> the commitfest [1]. Naturally there is still room for improvement, but
> in any case it's much, much better than nothing.
> 
> After a short discussion [2] we agreed (or at least no one objected)
> to determine the patches that don't apply / don't compile / don't
> pass regression tests and have "Needs Review" status, change the
> status of these patches to "Waiting on Author" and write a report
> (this one) with a CC to the authors. As all we know, we are short on
> reviewers and this action will save them a lot of time. Here [3] you
> can find a script I've been using to find such patches.
> 
> I rechecked the list manually and did my best to exclude the patches 
> that were updated recently or that depend on other patches. However 
> there still a chance that your patch got to the list by a mistake.
> In this case please let me know.>

With all due respect, it's hard not to see this as a disruption of the
current CF. I agree automating the patch processing is a worthwhile
goal, but we're not there yet and it seems somewhat premature.

Let me explain why I think so:

(1) You just changed the status of 10-15% open patches. I'd expect
things like this to be consulted with the CF manager, yet I don't see
any comments from Daniel. Considering he's been at the Oslo PUG meetup
today, I doubt he was watching hackers very closely.

(2) You gave everyone about 4 hours to object, ending 3PM UTC, which
excludes about one whole hemisphere where it's either too early or too
late for people to respond. I'd say waiting for >24 hours would be more
appropriate.

(3) The claim that "on one objected" is somewhat misleading, I guess. I
myself objected to automating this yesterday, and AFAICS Thomas Munro
shares the opinion that we're not ready for automating it.

(4) You say you rechecked the list manually - can you elaborate what you
checked? Per reports from others, some patches seem to "not apply"
merely because "git apply" is quite strict. Have you actually tried
applying / compiling the patches yourself?

(5) I doubt "does not apply" is actually sufficient to move the patch to
"waiting on author". For example my statistics patch was failing to
apply merely due to 821fb8cdbf lightly touching the SGML docs, changing
"type" to "kind" on a few places. Does that mean the patch can't get any
reviews until the author fixes it? Hardly. But after switching it to
"waiting on author" that's exactly what's going to happen, as people are
mostly ignoring patches in that state.

(6) It's generally a good idea to send a message the patch thread
whenever the status is changed, otherwise the patch authors may not
notice the change for a long time. I don't see any such messages,
certainly not in "my" patch thread.

I object to changing the patch status merely based on the script output.
It's a nice goal, but we need to do the legwork first, otherwise it'll
be just annoying and disrupting.

I suggest we inspect the reported patches manually, investigate whether
the failures are legitimate or not, and eliminate as many false
positives as possible. Once we are happy with the accuracy, we can
enable it again.


kind regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patches that don't apply or don't compile: 2017-09-12

2017-09-12 Thread Thomas Munro
On Wed, Sep 13, 2017 at 2:55 AM, Andreas Karlsson  wrote:
> On 09/12/2017 04:14 PM, Aleksander Alekseev wrote:
>>
>> Title: Foreign Key Arrays
>> Author: Mark Rofail
>> URL:https://commitfest.postgresql.org/14/1252/
>
>
> I am currently reviewing this one and it applies, compiles, and passes the
> test suite. It could be the compilation warnings which makes the system
> think it failed, but I could not find the log of the failed build.

I guess you didn't run "make check-world", because it crashes in the
contrib regression tests:

https://travis-ci.org/postgresql-cfbot/postgresql/builds/274732512

Sorry that the build logs are a bit hard to find at the moment.
Starting from http://commitfest.cputube.org/ if you click the
"build|failing" badge you'll land at
https://travis-ci.org/postgresql-cfbot/postgresql/branches and then
you have to locate the right branch, in this case commitfest/14/1252,
and then click the latest build link which (in this case) looks like
"# 4603 failed".  Eventually I'll have time to figure out how to make
the "build|failing" badge take you there directly...   Eventually I'll
also teach it how to dump a backtrace out of gdb the tests leave a
smouldering core.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patches that don't apply or don't compile: 2017-09-12

2017-09-12 Thread Aleksander Alekseev
Hi Andreas,

On 09/12/2017 04:14 PM, Aleksander Alekseev wrote:
> > Title: Foreign Key Arrays
> > Author: Mark Rofail
> > URL:https://commitfest.postgresql.org/14/1252/
> 
> I am currently reviewing this one and it applies, compiles, and passes the
> test suite. It could be the compilation warnings which makes the system
> think it failed, but I could not find the log of the failed build.

Thanks for your feedback. I'm changing the status of this patch back to
"Needs Review" and adding it to the exclude list until we will figure
out what went wrong.

> We want to be welcoming to new contributors so until we have a reliable CI
> server which can provide easy to read build logs I am against changing the
> status of any patches solely based on the result of the CI server. I think
> it should be used as a complimentary tool until the community deems it to be
> good enough.

Agree, especially regarding build logs. All of this currently is only an
experiment. For some reason I got a weird feeling that at this time it
will be not quite successful one. If there will be too many false
positives I'll just return the patches back to "Needs Review" as it was
before. But I strongly believe that after a few iterations we will find
a solution that will be good enough and this slight inconveniences will
worth it in a long run.

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature


Re: [HACKERS] Patches that don't apply or don't compile: 2017-09-12

2017-09-12 Thread Andreas Karlsson

On 09/12/2017 04:14 PM, Aleksander Alekseev wrote:

Title: Foreign Key Arrays
Author: Mark Rofail
URL:https://commitfest.postgresql.org/14/1252/


I am currently reviewing this one and it applies, compiles, and passes 
the test suite. It could be the compilation warnings which makes the 
system think it failed, but I could not find the log of the failed build.


We want to be welcoming to new contributors so until we have a reliable 
CI server which can provide easy to read build logs I am against 
changing the status of any patches solely based on the result of the CI 
server. I think it should be used as a complimentary tool until the 
community deems it to be good enough.


Andreas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Patches that don't apply or don't compile: 2017-09-12

2017-09-12 Thread Aleksander Alekseev
Hello, hackers!

Thanks to the work of Thomas Munro now we have a CI for the patches on
the commitfest [1]. Naturally there is still room for improvement, but
in any case it's much, much better than nothing.

After a short discussion [2] we agreed (or at least no one objected) to
determine the patches that don't apply / don't compile / don't pass
regression tests and have "Needs Review" status, change the status of
these patches to "Waiting on Author" and write a report (this one) with
a CC to the authors. As all we know, we are short on reviewers and this
action will save them a lot of time. Here [3] you can find a script I've
been using to find such patches.

I rechecked the list manually and did my best to exclude the patches
that were updated recently or that depend on other patches. However
there still a chance that your patch got to the list by a mistake. In
this case please let me know.

Unless there are any objections I'm going to re-run this script once or
twice a week during the commitfest and write reports like this one.

Here is the list:

=== Apply Failed: 25 ===
Title: 64-bit transaction identifiers
Author: Tianzhou Chen  
(it's a bug - actually Alexander Korotkov)
URL: https://commitfest.postgresql.org/14/1178/

Title: Add and report the new "in_hot_standby" GUC pseudo-variable.
Author: Elvis Pranskevichus 
URL: https://commitfest.postgresql.org/14/1090/

Title: allow mix of composite and scalar variables in target list
Author: Pavel Stehule 
URL: https://commitfest.postgresql.org/14/1139/

Title: Allow users to specify multiple tables in VACUUM commands
Author: "Bossart, Nathan" 
URL: https://commitfest.postgresql.org/14/1131/

Title: Controlling toast_tuple_target to tune rows >2kB
Author: Simon Riggs 
URL: https://commitfest.postgresql.org/14/1284/

Title: Convert join OR clauses into UNION queries
Author: Jim Nasby 
URL: https://commitfest.postgresql.org/14/1001/

Title: faster partition pruning in planner
Author: Amit Langote 
URL: https://commitfest.postgresql.org/14/1272/

Title: Fix race condition between SELECT and ALTER TABLE NO INHERIT
Author: Kyotaro HORIGUCHI 
URL: https://commitfest.postgresql.org/14/1259/

Title: Gather speed-up
Author: Rafia Sabih 
URL: https://commitfest.postgresql.org/14/1156/

Title: Hash partitioning
Authors: Yugo Nagata , "yang...@highgo.com" 

URL: https://commitfest.postgresql.org/14/1059/

Title: Improve compactify_tuples and PageRepairFragmentation
Author: Sokolov Yura 
URL: https://commitfest.postgresql.org/14/1138/

Title: Improve eval_const_expressions
Author: Heikki Linnakangas 
URL: https://commitfest.postgresql.org/14/1136/

Title: Incremental sort
Author: Alexander Korotkov 
URL: https://commitfest.postgresql.org/14/1124/

Title: libpq: Support for Apple Secure Transport SSL library
Author: Daniel Gustafsson 
URL: https://commitfest.postgresql.org/14/1228/

Title: multivariate MCV lists and histograms
Author: Tomas Vondra 
URL: https://commitfest.postgresql.org/14/1238/

Title: New function for tsquery creation
Author: Victor Drobny 
URL: https://commitfest.postgresql.org/14/1202/

Title: pg_basebackup has stricter tablespace rules than the server
Author: Pierre Ducroquet 
URL: https://commitfest.postgresql.org/14/1125/

Title: Range Merge Join
Author: Jeff Davis 
URL: https://commitfest.postgresql.org/14/1106/

Title: Retire polyphase merge
Author: Heikki Linnakangas 
URL: https://commitfest.postgresql.org/14/1267/

Title: Subscription code improvements
Author: Petr Jelinek 
URL: https://commitfest.postgresql.org/14/1248/

Title: Support arrays over domain types
Author: Tom Lane 
URL: https://commitfest.postgresql.org/14/1235/

Title: Support target_session_attrs=read-only and eliminate the added 
round-trip to detect session status.
Author: Elvis Pranskevichus 
URL: https://commitfest.postgresql.org/14/1148/

Title: taking stdbool.h into use
Author: Peter Eisentraut 
URL: https://commitfest.postgresql.org/14/1242/

Title: UPDATE of partition key
Author: Amit Khandekar 
URL: https://commitfest.postgresql.org/14/1023/

Title: Write Amplification Reduction Method (WARM)
Author: Pavan Deolasee 
URL: https://commitfest.postgresql.org/14/775/


=== Build Failed: 7 ===
Title: Fix the optimization to skip WAL-logging on table created in same 
transaction
Author: Martijn van Oosterhout 
URL: 

Re: [HACKERS] Patches I'm thinking of pushing shortly

2017-08-14 Thread Magnus Hagander
On Sun, Aug 13, 2017 at 11:43 PM, Robert Haas  wrote:

> On Sun, Aug 13, 2017 at 5:24 PM, Tom Lane  wrote:
> >> I'd vote for including this in v10.  There doesn't seem to be any
> >> downside to this: it's a no brainer to avoid our exploding hash table
> >> case when we can see it coming.
> >
> > Anybody else want to vote that way?  For myself it's getting a bit late
> > in the beta process to be including inessential changes, but I'm willing
> > to push it to v10 not just v11 if there's multiple people speaking for
> > that.
>
> I'd vote for waiting until v11.  I think it's too late to be doing
> things that might change good plans into bad ones or visca versa;
> that's a recipe for having to put out 10.1 and 10.2 a little quicker
> than I'd like.
>

+1 for waiting until v11 with it.

We have plenty enough other things that could end up needing a quick
post-release-release, and those are things that have received at least
somem testing...

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: [HACKERS] Patches I'm thinking of pushing shortly

2017-08-13 Thread Andres Freund
On 2017-08-13 17:43:10 -0400, Robert Haas wrote:
> On Sun, Aug 13, 2017 at 5:24 PM, Tom Lane  wrote:
> >> I'd vote for including this in v10.  There doesn't seem to be any
> >> downside to this: it's a no brainer to avoid our exploding hash table
> >> case when we can see it coming.
> >
> > Anybody else want to vote that way?  For myself it's getting a bit late
> > in the beta process to be including inessential changes, but I'm willing
> > to push it to v10 not just v11 if there's multiple people speaking for
> > that.
> 
> I'd vote for waiting until v11.  I think it's too late to be doing
> things that might change good plans into bad ones or visca versa;
> that's a recipe for having to put out 10.1 and 10.2 a little quicker
> than I'd like.

Similar here, there doesn't seem to be that much urgency.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patches I'm thinking of pushing shortly

2017-08-13 Thread Robert Haas
On Sun, Aug 13, 2017 at 5:24 PM, Tom Lane  wrote:
>> I'd vote for including this in v10.  There doesn't seem to be any
>> downside to this: it's a no brainer to avoid our exploding hash table
>> case when we can see it coming.
>
> Anybody else want to vote that way?  For myself it's getting a bit late
> in the beta process to be including inessential changes, but I'm willing
> to push it to v10 not just v11 if there's multiple people speaking for
> that.

I'd vote for waiting until v11.  I think it's too late to be doing
things that might change good plans into bad ones or visca versa;
that's a recipe for having to put out 10.1 and 10.2 a little quicker
than I'd like.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patches I'm thinking of pushing shortly

2017-08-13 Thread Tom Lane
Thomas Munro  writes:
> On Sat, Aug 12, 2017 at 3:24 AM, Tom Lane  wrote:
>> 1. check-hash-bucket-size-against-work_mem-2.patch from
>> https://www.postgresql.org/message-id/13698.1487283...@sss.pgh.pa.us

> +1

> I'd vote for including this in v10.  There doesn't seem to be any
> downside to this: it's a no brainer to avoid our exploding hash table
> case when we can see it coming.

Anybody else want to vote that way?  For myself it's getting a bit late
in the beta process to be including inessential changes, but I'm willing
to push it to v10 not just v11 if there's multiple people speaking for
that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patches I'm thinking of pushing shortly

2017-08-13 Thread Gavin Flower

On 13/08/17 16:19, Thomas Munro wrote:

On Sat, Aug 12, 2017 at 3:24 AM, Tom Lane  wrote:

[...]


I'd vote for including this in v10.  There doesn't seem to be any
downside to this: it's a no brainer to avoid our exploding hash table
case when we can see it coming.


But explosions are fun!
< ducks, and runs away VERY rapidly>

More seriously:
Up to now I'd been avoiding hash indexes, so great news!


Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patches I'm thinking of pushing shortly

2017-08-12 Thread Thomas Munro
On Sat, Aug 12, 2017 at 3:24 AM, Tom Lane  wrote:
> I have some patches sitting around in my workspace that I think are
> non-controversial, and so I was considering just pushing them once
> the tree opens for v11 development.  If anyone thinks they need
> further review, I'll put them into the September commitfest, but
> otherwise we might as well skip the overhead.  These are:
>
> 1. check-hash-bucket-size-against-work_mem-2.patch from
> https://www.postgresql.org/message-id/13698.1487283...@sss.pgh.pa.us
>
> That discussion sort of trailed off, but there wasn't really anyone
> saying not to commit it, and no new ideas have surfaced.

+1

I'd vote for including this in v10.  There doesn't seem to be any
downside to this: it's a no brainer to avoid our exploding hash table
case when we can see it coming.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patches I'm thinking of pushing shortly

2017-08-11 Thread Tom Lane
Robert Haas  writes:
> On Fri, Aug 11, 2017 at 11:24 AM, Tom Lane  wrote:
>> 3. remove-pgbench-option-ordering-constraint.patch from
>> https://www.postgresql.org/message-id/20559.1501703...@sss.pgh.pa.us
>> 
>> That one was already informally reviewed by two people, so I don't
>> think it needs another look.

> I'd vote for putting this fix into v10, but maybe that's just me.

... OK by me, any objections?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patches I'm thinking of pushing shortly

2017-08-11 Thread Robert Haas
On Fri, Aug 11, 2017 at 11:24 AM, Tom Lane  wrote:
> 3. remove-pgbench-option-ordering-constraint.patch from
> https://www.postgresql.org/message-id/20559.1501703...@sss.pgh.pa.us
>
> That one was already informally reviewed by two people, so I don't
> think it needs another look.

I'd vote for putting this fix into v10, but maybe that's just me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Patches I'm thinking of pushing shortly

2017-08-11 Thread Tom Lane
I have some patches sitting around in my workspace that I think are
non-controversial, and so I was considering just pushing them once
the tree opens for v11 development.  If anyone thinks they need
further review, I'll put them into the September commitfest, but
otherwise we might as well skip the overhead.  These are:

1. check-hash-bucket-size-against-work_mem-2.patch from
https://www.postgresql.org/message-id/13698.1487283...@sss.pgh.pa.us

That discussion sort of trailed off, but there wasn't really anyone
saying not to commit it, and no new ideas have surfaced.

2. simplify-simple-expresssion-checking.patch from
https://www.postgresql.org/message-id/2257.1498412...@sss.pgh.pa.us

This is the patch to get rid of plpgsql's exec_simple_check_node()
as Robert suggested.  It's never been anything but a maintenance
nuisance.

3. remove-pgbench-option-ordering-constraint.patch from
https://www.postgresql.org/message-id/20559.1501703...@sss.pgh.pa.us

That one was already informally reviewed by two people, so I don't
think it needs another look.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patches for TODO item: Avoid truncating empty OCDR temp tables on COMMIT

2013-01-28 Thread Robert Haas
On Mon, Jan 28, 2013 at 8:39 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 15.01.2013 21:03, Tom Lane wrote:
 Robert Haasrobertmh...@gmail.com  writes:

 On the third hand, the fact that a table is OCDR is recorded in
 backend-local storage somewhere, and that storage (unlike the
 relcache) had better be reliable.  So maybe there's some way to
 finesse it that way.

 Hm, keep a flag in that storage you mean?  Yeah, that could possibly
 work.

 Sounds reasonable.

 Until someone gets around to write a patch along those lines, I'm inclined
 to apply this one liner:

 *** a/src/backend/commands/tablecmds.c
 --- b/src/backend/commands/tablecmds.c
 ***
 *** 10124,10130  PreCommit_on_commit_actions(void)
 /* Do nothing (there shouldn't be such
 entries, actually) */
 break;
 case ONCOMMIT_DELETE_ROWS:
 !   oids_to_truncate =
 lappend_oid(oids_to_truncate, oc-relid);
 break;
 case ONCOMMIT_DROP:
 {
 --- 10124,10136 
 /* Do nothing (there shouldn't be such
 entries, actually) */
 break;
 case ONCOMMIT_DELETE_ROWS:
 !   /*
 !* If this transaction hasn't accessed any
 temporary relations,
 !* we can skip truncating ON COMMIT DELETE
 ROWS tables, as
 !* they must still be empty.
 !*/
 !   if (MyXactAccessedTempRel)
 !   oids_to_truncate =
 lappend_oid(oids_to_truncate, oc-relid);
 break;
 case ONCOMMIT_DROP:
 {

 We already have that MyXactAccessedTempRel global flag. Just checking that
 should cover many common cases.

+1 for that.  I'm actually unconvinced that we need to do any better
than that in general.  But certainly that seems like a good first
step.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patches for TODO item: Avoid truncating empty OCDR temp tables on COMMIT

2013-01-15 Thread Tom Lane
Gurjeet Singh singh.gurj...@gmail.com writes:
 On Mon, Jan 14, 2013 at 10:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I think this is unacceptable on its face.  It essentially supposes that
 relcache entries are reliable storage.  They are not.

 Would it be acceptable if we inverted the meaning of the struct member, and
 named it to  rd_rows_not_inserted. When registering an ON COMMIT action, we
 can set this member to true, and set it to false when inserting a row into
 it. The pre-commit hook will truncate any relation that doesn't have this
 member set to true.

 With that in place, even if the relcache entry is discarded midway through
 the transaction, the cleanup code will truncate the relation, preserving
 the correct behaviour.

Well, that would fail in the safe direction, but it just seems
excessively ugly and hard-to-understand.  Given the field demand for
this optimization (which so far as I've noticed is nil), I'm not
convinced we need to do this.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patches for TODO item: Avoid truncating empty OCDR temp tables on COMMIT

2013-01-15 Thread Robert Haas
On Tue, Jan 15, 2013 at 10:57 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Gurjeet Singh singh.gurj...@gmail.com writes:
 On Mon, Jan 14, 2013 at 10:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I think this is unacceptable on its face.  It essentially supposes that
 relcache entries are reliable storage.  They are not.

 Would it be acceptable if we inverted the meaning of the struct member, and
 named it to  rd_rows_not_inserted. When registering an ON COMMIT action, we
 can set this member to true, and set it to false when inserting a row into
 it. The pre-commit hook will truncate any relation that doesn't have this
 member set to true.

 With that in place, even if the relcache entry is discarded midway through
 the transaction, the cleanup code will truncate the relation, preserving
 the correct behaviour.

 Well, that would fail in the safe direction, but it just seems
 excessively ugly and hard-to-understand.  Given the field demand for
 this optimization (which so far as I've noticed is nil), I'm not
 convinced we need to do this.

Yep, this is also why I prefer the approach of setting a global
variable.  The demand for this is not *precisely* zero or it wouldn't
be on the TODO list, but the global seems like it would head off the
worst of the damage without requiring any fiddling with the relcache.

On the third hand, the fact that a table is OCDR is recorded in
backend-local storage somewhere, and that storage (unlike the
relcache) had better be reliable.  So maybe there's some way to
finesse it that way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patches for TODO item: Avoid truncating empty OCDR temp tables on COMMIT

2013-01-15 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On the third hand, the fact that a table is OCDR is recorded in
 backend-local storage somewhere, and that storage (unlike the
 relcache) had better be reliable.  So maybe there's some way to
 finesse it that way.

Hm, keep a flag in that storage you mean?  Yeah, that could possibly
work.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patches for TODO item: Avoid truncating empty OCDR temp tables on COMMIT

2013-01-14 Thread Tom Lane
Gurjeet Singh singh.gurj...@gmail.com writes:
 Please find attached two patches, implementing two different approaches to
 fix the issue of COMMIT truncating empty TEMP tables that have the ON
 COMMIT DELETE ROWS attribute.

 v2.patch: This approach introduces a boolean 'rd_rows_inserted' in
 RelationData struct, and sets this struct to true for every TEMP table in
 which a row is inserted. During commit, we avoid truncating those OCDR temp
 tables that haven't been inserted into in this transaction.

I think this is unacceptable on its face.  It essentially supposes that
relcache entries are reliable storage.  They are not.  There are some
places where we rely on relcache entries preserving state information
for optimization purposes --- but in this case, discarding a relcache
entry would result in visibly incorrect behavior, not just some
performance loss.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patches for TODO item: Avoid truncating empty OCDR temp tables on COMMIT

2013-01-14 Thread Gurjeet Singh
On Mon, Jan 14, 2013 at 10:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Gurjeet Singh singh.gurj...@gmail.com writes:
  Please find attached two patches, implementing two different approaches
 to
  fix the issue of COMMIT truncating empty TEMP tables that have the ON
  COMMIT DELETE ROWS attribute.

  v2.patch: This approach introduces a boolean 'rd_rows_inserted' in
  RelationData struct, and sets this struct to true for every TEMP table in
  which a row is inserted. During commit, we avoid truncating those OCDR
 temp
  tables that haven't been inserted into in this transaction.

 I think this is unacceptable on its face.  It essentially supposes that
 relcache entries are reliable storage.  They are not.  There are some
 places where we rely on relcache entries preserving state information
 for optimization purposes --- but in this case, discarding a relcache
 entry would result in visibly incorrect behavior, not just some
 performance loss.


Would it be acceptable if we inverted the meaning of the struct member, and
named it to  rd_rows_not_inserted. When registering an ON COMMIT action, we
can set this member to true, and set it to false when inserting a row into
it. The pre-commit hook will truncate any relation that doesn't have this
member set to true.

With that in place, even if the relcache entry is discarded midway through
the transaction, the cleanup code will truncate the relation, preserving
the correct behaviour.
-- 
Gurjeet Singh

http://gurjeet.singh.im/


Re: [HACKERS] patches that could use additional reviewers

2011-02-10 Thread Noah Misch
[Cc: trimmed]

On Wed, Feb 09, 2011 at 01:45:11PM -0500, Robert Haas wrote:
 A few other ones that could use more reviewers include:

 key locks

I'll take a look at this one.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patches that could use additional reviewers

2011-02-10 Thread Robert Haas
On Wed, Feb 9, 2011 at 1:45 PM, Robert Haas robertmh...@gmail.com wrote:
 A few other ones that could use more reviewers include:

I've just corrected the status of a few patches in the CommitFest
application.  In particular, I set the following back to Needs Review.

SQL/MED - postgresql_fdw
Self-tuning checkpoint sync spread
determining client_encoding from client locale

If anyone can jump in, that would be great.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] patches that could use additional reviewers

2011-02-09 Thread Robert Haas
On Wed, Feb 9, 2011 at 1:35 PM, Stephen Frost sfr...@snowman.net wrote:
 * Robert Haas (robertmh...@gmail.com) wrote:
 On Wed, Feb 9, 2011 at 1:09 PM, David E. Wheeler da...@kineticode.com 
 wrote:
  Frankly, I think you should surrender some of those 14 and cajole some 
  other folks to take on more.

 Happily...  only trouble is, I suck at cajoling.  Even my begging is
 distinctly sub-par.

 Plase?

 Erm, I've been through the commitfest app a couple of different times,
 but have ignored things which are marked 'Needs Reivew' when there's a
 reviewer listed...

 If there are patches where you're marked as the reviewer but you don't
 have time to review them or want help, take your name off as a reviewer
 for them and/or speak up and explicitly ask for help.  I'm not going to
 start reviewing something if I think someone else is already working on
 it..

Of the fourteen I signed up for, 10 are now marked Committed or
Returned with Feedback.  Of the remaining four, there are two that
could use more eyes:

MULTISET functions
Change pg_last_xlog_receive_location not to move backwards

A few other ones that could use more reviewers include:

range types
key locks
widen scale factor limit from pgbench

And your patch could probably use another reviewer too, if anyone else
is looking for stuff to help with:

log_csv_fields ; add current_role log option

And there are a few patches with no reviewer at all.

PL/Python invalidate composite argument functions
PL/Python tracebacks
contrib/btree_gist  (submitted very late)
SQL/MED - file_fdw

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patches that could use additional reviewers

2011-02-09 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 Of the fourteen I signed up for, 10 are now marked Committed or
 Returned with Feedback.  Of the remaining four, there are two that
 could use more eyes:
 
 MULTISET functions

I'll work on this one.

 Change pg_last_xlog_receive_location not to move backwards

I'll take a look at this one too, but I'm not that familiar with the
xlog code, etc, so I'm not sure if I'll be able to comment on
correctness...

 A few other ones that could use more reviewers include:
 
 range types
 key locks

If I can get through the others, I'll try and come back and look at
these.

 widen scale factor limit from pgbench

I was already starting to look at this one, actually. :)

 And your patch could probably use another reviewer too, if anyone else
 is looking for stuff to help with:
 
 log_csv_fields ; add current_role log option

Not sure if it counts if I review it. ;)

 And there are a few patches with no reviewer at all.
 
 PL/Python invalidate composite argument functions
 PL/Python tracebacks

I thought from the other threads that we had someone working the
PL/Pyton patches..? :/

 contrib/btree_gist  (submitted very late)

Looks like this one might just be committable w/o additional review, but
if it's still hanging around, I might be able to help.

 SQL/MED - file_fdw

Ditto on this.

Alright, I've marked myself as a reviewer for the ones I'll look at in
the next couple days.  The others are up for grabs for others, any
takers on additional reviewers for them?

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] patches that could use additional reviewers

2011-02-09 Thread Bernd Helmle



--On 9. Februar 2011 13:45:11 -0500 Robert Haas robertmh...@gmail.com 
wrote:



Of the fourteen I signed up for, 10 are now marked Committed or
Returned with Feedback.  Of the remaining four, there are two that
could use more eyes:


I'd happily jump in and look into one of those, but before mid of next week 
i really have no spare time to come up with something :(


--
Thanks

Bernd

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1

2009-10-29 Thread Peter Eisentraut
On Wed, 2009-10-28 at 12:51 -0700, Jeff Davis wrote:
 Trying to develop and document a set of standardized, stable hash
 functions covering a wide range of possible use cases sounds like it may
 be better served by an extension.

I suspect that some of the participants in this thread have PL/Proxy in
mind.  PL/Proxy should probably ship its own set of hash functions.

If we ever get built-in partitioning by hash (see thread nearby and most
previous ones like it), we should also think about providing a hash
function that doesn't change output over versions and is independent of
hash index implementation concerns.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1

2009-10-29 Thread Hannu Krosing
On Thu, 2009-10-29 at 09:47 +0200, Peter Eisentraut wrote:
 On Wed, 2009-10-28 at 12:51 -0700, Jeff Davis wrote:
  Trying to develop and document a set of standardized, stable hash
  functions covering a wide range of possible use cases sounds like it may
  be better served by an extension.
 
 I suspect that some of the participants in this thread have PL/Proxy in
 mind.  

Yes, I think that pl/proxy is the prime example of such use.

 PL/Proxy should probably ship its own set of hash functions.

Or maybe we could just extract the hashes form some version of stock
postgresql (say 8.3) and then make those available in contrib under the
name stable_hashes ?

 If we ever get built-in partitioning by hash (see thread nearby and most
 previous ones like it), we should also think about providing a hash
 function that doesn't change output over versions and is independent of
 hash index implementation concerns.

And maybe even documented ;)

-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1

2009-10-29 Thread Tom Lane
Hannu Krosing ha...@2ndquadrant.com writes:
 Or maybe we could just extract the hashes form some version of stock
 postgresql (say 8.3) and then make those available in contrib under the
 name stable_hashes ?

A better name would be wishful_thinking ... contrib does not have
control over some of the main risk factors, such as changes to the
internal representation of datatypes.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1

2009-10-29 Thread Peter Eisentraut
On Thu, 2009-10-29 at 09:32 -0400, Tom Lane wrote:
 Hannu Krosing ha...@2ndquadrant.com writes:
  Or maybe we could just extract the hashes form some version of stock
  postgresql (say 8.3) and then make those available in contrib under the
  name stable_hashes ?
 
 A better name would be wishful_thinking ... contrib does not have
 control over some of the main risk factors, such as changes to the
 internal representation of datatypes.

Good point.  But the internal layout of data types popular for use as
hash key changes rarely.  The hash function change, however, is a
showstopper for everyone concerned.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1

2009-10-29 Thread Hannu Krosing
On Thu, 2009-10-29 at 09:32 -0400, Tom Lane wrote:
 Hannu Krosing ha...@2ndquadrant.com writes:
  Or maybe we could just extract the hashes form some version of stock
  postgresql (say 8.3) and then make those available in contrib under the
  name stable_hashes ?
 
 A better name would be wishful_thinking ... contrib does not have
 control over some of the main risk factors, such as changes to the
 internal representation of datatypes.

My understanding was, that contrib is actually the only possibility to
have something maintained in sync with core postgreSQL.

   regards, tom lane
-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1

2009-10-28 Thread Hannu Krosing
On Wed, 2009-02-11 at 11:22 -0500, Tom Lane wrote:
 Asko Oja asc...@gmail.com writes:
  Did this change hashtext() visible to users? We have been using it quite
  widely for partitioning our databases. If so then it should be marked quite
  visibly in release notes as there might be others who will be hit by this.
 
 The hash functions are undocumented, have changed in the past, and are
 likely to change again in the future.  If you are using them in a way
 that depends on them to give the same answers across versions, you'd
 better stop.

Is at least the fact that they are undocumented, have changed in the
past, and are likely to change again in the future documented ?

I'm sure this is something that has hit unwary users in the past and
will hit again in the future, so some words about it in the doc's would
be appropriate.

search for hashtext on
http://www.postgresql.org/docs/8.4/interactive/index.html returned no
results, so I guess even theyr undocumented, will surprise you status
is not documented.

Hashing is a quite fundamental thing in computing, so I was quite
surprised to find out it had silently changed. 

I had never checked the docs for hash functions, but I had assumed, that
internal functions are prefixed by pg_ and anything else is public, free
to use functionality.

Changing hash functions also makes in-place upgrades a lot harder, as
they can't be done incrementally anymore for tables which use hash
indexes.


   regards, tom lane
 
-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1

2009-10-28 Thread Tom Lane
Hannu Krosing ha...@2ndquadrant.com writes:
 I had never checked the docs for hash functions, but I had assumed, that
 internal functions are prefixed by pg_ and anything else is public, free
 to use functionality.

Sure, it's free to use.  It's not free to assume that we promise never
to change it.

 Changing hash functions also makes in-place upgrades a lot harder, as
 they can't be done incrementally anymore for tables which use hash
 indexes.

Hash indexes are so far from being production-grade that this argument
is not significant.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1

2009-10-28 Thread Kenneth Marshall
On Wed, Oct 28, 2009 at 03:31:17PM -0400, Tom Lane wrote:
 Hannu Krosing ha...@2ndquadrant.com writes:
  I had never checked the docs for hash functions, but I had assumed, that
  internal functions are prefixed by pg_ and anything else is public, free
  to use functionality.
 
 Sure, it's free to use.  It's not free to assume that we promise never
 to change it.
 
  Changing hash functions also makes in-place upgrades a lot harder, as
  they can't be done incrementally anymore for tables which use hash
  indexes.
 
 Hash indexes are so far from being production-grade that this argument
 is not significant.
 
   regards, tom lane

In addition that change from 8.3 - 8.4 to store only the hash and not
the value in the index means that a reindex would be required in any event.

Cheers,
Ken

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1

2009-10-28 Thread Tom Lane
Kenneth Marshall k...@rice.edu writes:
 On Wed, Oct 28, 2009 at 03:31:17PM -0400, Tom Lane wrote:
 Hash indexes are so far from being production-grade that this argument
 is not significant.

 In addition that change from 8.3 - 8.4 to store only the hash and not
 the value in the index means that a reindex would be required in any event.

Indeed, and I fully expect there will be some more on-disk format
changes required before we get to the point where hash indexes are
actually interesting for production.  If we start insisting that they
be in-place-upgradable now, we will pretty much guarantee that they
never become useful enough to justify the restriction :-(

(As examples, the hash bucket size probably needs revisiting,
and we ought to think very hard about whether we shouldn't switch
to 64-bit hash values.  And that's not even considering some of the
more advanced suggestions that have been made.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1

2009-10-28 Thread Jeff Davis
On Wed, 2009-10-28 at 21:09 +0200, Hannu Krosing wrote:
 Is at least the fact that they are undocumented, have changed in the
 past, and are likely to change again in the future documented ?

That's a little confusing to me: how do we document that something is
undocumented? And where do we stop?

 Hashing is a quite fundamental thing in computing, so I was quite
 surprised to find out it had silently changed. 

There are many reasons to use a hash, and we don't want people to use
these functions for the wrong purpose. I have seen people use a
performance hash for security purposes before, and I had to demonstrate
some hash collisions to show why that was a bad idea. So, if we do
provide documented functions, it should be done carefully.

Trying to develop and document a set of standardized, stable hash
functions covering a wide range of possible use cases sounds like it may
be better served by an extension.

Regards,
Jeff Davis 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1

2009-10-28 Thread Hannu Krosing
On Wed, 2009-10-28 at 12:51 -0700, Jeff Davis wrote:
 On Wed, 2009-10-28 at 21:09 +0200, Hannu Krosing wrote:
  Is at least the fact that they are undocumented, have changed in the
  past, and are likely to change again in the future documented ?
 
 That's a little confusing to me: how do we document that something is
 undocumented? And where do we stop?

My previous e-mail message documents that the undocumentedness is
undocumented, so no need to go any further here ;)

Though undocumented, the hash functions are easily discoverable by doing

\df *hash*

in psql

  Hashing is a quite fundamental thing in computing, so I was quite
  surprised to find out it had silently changed. 
 
 There are many reasons to use a hash, and we don't want people to use
 these functions for the wrong purpose. 

I don't think that not documenting a hash function helps here at all.

 I have seen people use a
 performance hash for security purposes before, and I had to demonstrate
 some hash collisions to show why that was a bad idea. 

I've seen people use CRC32 as hash and then hit a collisions in 15 tries
with quite large keys.

 So, if we do provide documented functions, it should be done carefully.

Any user-visible behavior change should be done carefully, even if the
original behavior is not documented.

Careful upgrade of hasxxx functions would have kept the old functions,
and introduced the new ones with _v2 suffix, and then used these in
appropriate places. then kept the old ones for a few versions, with
maybe a deprecation warning and then moved them to contrib for a few
more versions.

Doing it this way could leave them undocumented and still not break
peoples applications in mysterious ways.

 Trying to develop and document a set of standardized, stable hash
 functions covering a wide range of possible use cases sounds like it may
 be better served by an extension.

I guess there are enough security/crypt/strong hashes in pgcrypto
package so that should not be a problem.

-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1

2009-10-28 Thread Hannu Krosing
On Wed, 2009-10-28 at 15:31 -0400, Tom Lane wrote:
 Hannu Krosing ha...@2ndquadrant.com writes:
  I had never checked the docs for hash functions, but I had assumed, that
  internal functions are prefixed by pg_ and anything else is public, free
  to use functionality.
 
 Sure, it's free to use.  It's not free to assume that we promise never
 to change it.

  Changing hash functions also makes in-place upgrades a lot harder, as
  they can't be done incrementally anymore for tables which use hash
  indexes.
 
 Hash indexes are so far from being production-grade that this argument
 is not significant.

AFAIK in-place upgrade is also not quite production-grade, so this was
meant as a forward-looking note for next time the hashxxx functions will
change.

   regards, tom lane
-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Patches for static check on geo_ops.c

2009-08-27 Thread Paul Matthews
Grzegorz Jaskiewicz wonderful static checker coughed up 5 errors in
geo_ops.c.  None of them of any particular excitement or of earth
shattering nature. A patch is attached below that should correct these.
(The more little issue we eliminate, the more the large ones will stand
out.)

At line 3131 value stored into 'dist' variable is never referenced again.
At line 3014 value stored into 'dist' variable is never referenced again.
At line 2942 value stored into 'd' variable is never referenced again.
At line 2953 value stored into 'd' variable is never referenced again.
At line 2993 value stored into 'd' variable is never referenced again.


? patchfile_clang
Index: src/backend/utils/adt/geo_ops.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/geo_ops.c,v
retrieving revision 1.103
diff -c -r1.103 geo_ops.c
*** src/backend/utils/adt/geo_ops.c 28 Jul 2009 09:47:59 -  1.103
--- src/backend/utils/adt/geo_ops.c 27 Aug 2009 09:31:30 -
***
*** 2939,2945 
memcpy(point, l1-p[1], sizeof(Point));
}
  
!   if ((d = dist_ps_internal(l2-p[0], l1))  dist)
{
result = DatumGetPointP(DirectFunctionCall2(close_ps,

PointPGetDatum(l2-p[0]),
--- 2939,2945 
memcpy(point, l1-p[1], sizeof(Point));
}
  
!   if (dist_ps_internal(l2-p[0], l1)  dist)
{
result = DatumGetPointP(DirectFunctionCall2(close_ps,

PointPGetDatum(l2-p[0]),
***
*** 2950,2956 

LsegPGetDatum(l2)));
}
  
!   if ((d = dist_ps_internal(l2-p[1], l1))  dist)
{
result = DatumGetPointP(DirectFunctionCall2(close_ps,

PointPGetDatum(l2-p[1]),
--- 2950,2956 

LsegPGetDatum(l2)));
}
  
!   if (dist_ps_internal(l2-p[1], l1)  dist)
{
result = DatumGetPointP(DirectFunctionCall2(close_ps,

PointPGetDatum(l2-p[1]),
***
*** 2990,2996 
point.x = box-low.x;
point.y = box-high.y;
statlseg_construct(lseg, box-low, point);
!   dist = d = dist_ps_internal(pt, lseg);
  
statlseg_construct(seg, box-high, point);
if ((d = dist_ps_internal(pt, seg))  dist)
--- 2990,2996 
point.x = box-low.x;
point.y = box-high.y;
statlseg_construct(lseg, box-low, point);
!   dist = dist_ps_internal(pt, lseg);
  
statlseg_construct(seg, box-high, point);
if ((d = dist_ps_internal(pt, seg))  dist)
***
*** 3011,3017 
statlseg_construct(seg, box-high, point);
if ((d = dist_ps_internal(pt, seg))  dist)
{
-   dist = d;
memcpy(lseg, seg, sizeof(lseg));
}
  
--- 3011,3016 
***
*** 3128,3134 
statlseg_construct(seg, box-high, point);
if ((d = lseg_dt(lseg, seg))  dist)
{
-   dist = d;
memcpy(bseg, seg, sizeof(bseg));
}
  
--- 3127,3132 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patches for static check on geo_ops.c

2009-08-27 Thread Grzegorz Jaskiewicz


On 27 Aug 2009, at 10:46, Paul Matthews wrote:


Grzegorz Jaskiewicz wonderful static checker coughed up 5 errors in
geo_ops.c.  None of them of any particular excitement or of earth
shattering nature. A patch is attached below that should correct  
these.
(The more little issue we eliminate, the more the large ones will  
stand

out.)


I am flattered, but I am merely a user of it - running it against  
postgresql's source. The checker is written by wonderful llvm  
developers, so you should thank them (and apple, for sponsoring  
development of the checker).



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patches for static check on geo_ops.c

2009-08-27 Thread Tom Lane
Paul Matthews p...@netspace.net.au writes:
 Grzegorz Jaskiewicz wonderful static checker coughed up 5 errors in
 geo_ops.c.  None of them of any particular excitement or of earth
 shattering nature. A patch is attached below that should correct these.
 (The more little issue we eliminate, the more the large ones will stand
 out.)

 At line 3131 value stored into 'dist' variable is never referenced again.
 At line 3014 value stored into 'dist' variable is never referenced again.
 At line 2942 value stored into 'd' variable is never referenced again.
 At line 2953 value stored into 'd' variable is never referenced again.
 At line 2993 value stored into 'd' variable is never referenced again.

I've applied the first three of these changes, but not the last two
(the 'dist' assignments).  clang seems to have a tin ear for style :-(.
It's failing to notice that we have several similar code blocks in
sequence in these two places, and making the last one different from the
rest would decrease code readability and modifiability.  I'm happy if
the compiler optimizes away useless stores, but I don't really think
it should presume to dictate code style to us on that basis.

BTW, if we did apply those changes, I suppose clang would immediately
start complaining that the preceding assignments to 'd' are useless.
So by the time we'd made it happy, those code blocks would look quite
different from their mates.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patches for static check on geo_ops.c

2009-08-27 Thread Paul Matthews
Tom Lane wrote:
 I've applied the first three of these changes, but not the last two
 (the 'dist' assignments).  clang seems to have a tin ear for style :-(.
 It's failing to notice that we have several similar code blocks in
 sequence in these two places, and making the last one different from the
 rest would decrease code readability and modifiability.

   
voice=Maxwell SmartAh! The old programming via copy-and-paste
trick/voice.

Maybe clang's ear for style isn't that bad after all.
:-)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] GIN improvements

2009-02-11 Thread Teodor Sigaev

But the real bottom line is: if autovacuum is working properly, it
should clean up the index before the list ever gets to the point where
it'd be sane to turn off indexscans.  So I don't see why we need to hack
the planner for this at all.  If any hacking is needed, it should be
in the direction of making sure autovacuum puts sufficient priority
on this task.


Autovacuum will start if table has GIN index with fastupdate=on and number of 
inserted tuple since last vacuum  autovacuum_vacuum_threshold.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1

2009-02-11 Thread Asko Oja
Did this change hashtext() visible to users? We have been using it quite
widely for partitioning our databases. If so then it should be marked quite
visibly in release notes as there might be others who will be hit by this.

regards
Asko

On Mon, Feb 9, 2009 at 11:22 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Kenneth Marshall k...@rice.edu writes:
  I have updated the patch posted by Jeff Davis on January 9th
  to include the micro-patch above as well as updated the polymorphism
  regressions tests. This applies cleanly to the latest CVS pull.

 Applied --- thanks for being persistent about resolving the doubts on this.

 One thing that apparently neither of you realized was that the
 polymorphism results were varying between bigendian and littleendian
 machines; I suppose you are using different hardware and that's why you
 didn't agree on what the results should be.

 Since we already agreed we were going to tolerate endianness dependence
 in the hash functions, I fixed that by adding some ORDER BYs.

regards, tom lane

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1

2009-02-11 Thread Tom Lane
Asko Oja asc...@gmail.com writes:
 Did this change hashtext() visible to users? We have been using it quite
 widely for partitioning our databases. If so then it should be marked quite
 visibly in release notes as there might be others who will be hit by this.

The hash functions are undocumented, have changed in the past, and are
likely to change again in the future.  If you are using them in a way
that depends on them to give the same answers across versions, you'd
better stop.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] GIN improvements

2009-02-09 Thread Tom Lane
Jeff Davis pg...@j-davis.com writes:
 On Wed, 2009-02-04 at 14:40 -0500, Robert Haas wrote:
 Well, there's nothing to force that plan to be invalidated when the
 state of the pending list changes, is there?

 Would it be unreasonable to invalidate cached plans during the pending
 list cleanup?

If the pending list cleanup is done by VACUUM then such an invalidation
already happens (VACUUM forces it after updating pg_class.reltuples/
relpages).  What's bothering me is the lack of any reasonable mechanism
for invalidating plans in the other direction, ie when the list grows
past the threshold where this code wants to turn off indexscans.  Since
the threshold depends on parameters that can vary across sessions, you'd
more or less have to send a global invalidation after every addition to
the list, in case that addition put it over the threshold in some other
session's view.  That's unreasonably often, in my book.

Also, as mentioned earlier, I'm pretty down on the idea of a threshold
where indexscans suddenly turn off entirely; that's not my idea of how
the planner ought to work.

But the real bottom line is: if autovacuum is working properly, it
should clean up the index before the list ever gets to the point where
it'd be sane to turn off indexscans.  So I don't see why we need to hack
the planner for this at all.  If any hacking is needed, it should be
in the direction of making sure autovacuum puts sufficient priority
on this task.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1

2009-02-09 Thread Tom Lane
Kenneth Marshall k...@rice.edu writes:
 I have updated the patch posted by Jeff Davis on January 9th
 to include the micro-patch above as well as updated the polymorphism
 regressions tests. This applies cleanly to the latest CVS pull.

Applied --- thanks for being persistent about resolving the doubts on this.

One thing that apparently neither of you realized was that the
polymorphism results were varying between bigendian and littleendian
machines; I suppose you are using different hardware and that's why you
didn't agree on what the results should be.

Since we already agreed we were going to tolerate endianness dependence
in the hash functions, I fixed that by adding some ORDER BYs.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] GIN improvements

2009-02-04 Thread Teodor Sigaev

I looked at this a little bit --- it needs proofreading (VACUUME?).

Sorry, VACUUME fixed



Do we really need an additional column in pgstat table entries in
order to store something that looks like it can be derived from the
other columns?  The stats tables are way too big already.


It's not derived, because vacuum resets n_inserted_tuples to zero, but it 
doesn't reset tuples_inserted, tuples_updated and tuples_hot_updated. So, 
n_inserted_tuples is calculable until first vacuum occurs.





Also, I really think it's a pretty bad idea to make index cost
estimation depend on the current state of the index's pending list
--- that state seems far too transient to base plan choices on.


I asked for that. v0.23 used statistic data by calling 
pg_stat_get_fresh_inserted_tuples(), so revert to that.

It's possible to add pending list information to IndexOptInfo, if it's 
acceptable.


It's particularly got to be nuts to turn off indexscans entirely
if the pending list is too full.  Having some lossy pages might
not be great but I don't believe it can be so bad that you should
go to a seqscan all the time.


It's impossible to return lossy page via amgettuple interface. Although, with 
amgetbitmap it works well - and GIN will not emit error even bitmap becomes lossy.


In attached version, gincostestimate will disable index scan if estimation of 
number of matched tuples in pending list is greater than non-lossy limit of 
tidbitmap. That estimation is a product of indexSelectivity and number of tuples 
in pending list.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


fast_insert_gin-0.26.gz
Description: Unix tar archive

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] GIN improvements

2009-02-04 Thread Jeff Davis
On Mon, 2009-02-02 at 20:38 -0500, Tom Lane wrote:
 Also, I really think it's a pretty bad idea to make index cost
 estimation depend on the current state of the index's pending list
 --- that state seems far too transient to base plan choices on.

I'm confused by this. Don't we want to base the plan choice on the most
current data, even if it is transient?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] GIN improvements

2009-02-04 Thread Robert Haas
On Wed, Feb 4, 2009 at 1:39 PM, Jeff Davis pg...@j-davis.com wrote:
 On Mon, 2009-02-02 at 20:38 -0500, Tom Lane wrote:
 Also, I really think it's a pretty bad idea to make index cost
 estimation depend on the current state of the index's pending list
 --- that state seems far too transient to base plan choices on.

 I'm confused by this. Don't we want to base the plan choice on the most
 current data, even if it is transient?

 Regards,
Jeff Davis

Well, there's nothing to force that plan to be invalidated when the
state of the pending list changes, is there?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] GIN improvements

2009-02-04 Thread Jeff Davis
On Wed, 2009-02-04 at 14:40 -0500, Robert Haas wrote:
 Well, there's nothing to force that plan to be invalidated when the
 state of the pending list changes, is there?
 

Would it be unreasonable to invalidate cached plans during the pending
list cleanup?

Anyway, it just strikes me as strange to expect a plan to be a good plan
for very long. Can you think of an example where we applied this rule
before?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] GIN improvements

2009-02-04 Thread Robert Haas
On Wed, Feb 4, 2009 at 4:23 PM, Jeff Davis pg...@j-davis.com wrote:
 On Wed, 2009-02-04 at 14:40 -0500, Robert Haas wrote:
 Well, there's nothing to force that plan to be invalidated when the
 state of the pending list changes, is there?


 Would it be unreasonable to invalidate cached plans during the pending
 list cleanup?

 Anyway, it just strikes me as strange to expect a plan to be a good plan
 for very long. Can you think of an example where we applied this rule
 before?

Well, I am not an expert on this topic.

But, plans for prepared statements and statements within PL/pgsql
functions are cached for the lifetime of the session, which in some
situations could be quite long.

I would think that invalidating significantly more often would be bad
for performance.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] GIN improvements

2009-02-02 Thread Tom Lane
Teodor Sigaev teo...@sigaev.ru writes:
 I'm very sorry, but v0.24 has a silly bug with not initialized value :(.
 New version is attached

I looked at this a little bit --- it needs proofreading (VACUUME?).

Do we really need an additional column in pgstat table entries in
order to store something that looks like it can be derived from the
other columns?  The stats tables are way too big already.

Also, I really think it's a pretty bad idea to make index cost
estimation depend on the current state of the index's pending list
--- that state seems far too transient to base plan choices on.
It's particularly got to be nuts to turn off indexscans entirely
if the pending list is too full.  Having some lossy pages might
not be great but I don't believe it can be so bad that you should
go to a seqscan all the time.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1

2009-01-27 Thread Kenneth Marshall
On Sun, Jan 25, 2009 at 10:27:03PM -0600, Kenneth Marshall wrote:
 On Sat, Jan 10, 2009 at 01:36:25PM -0500, Tom Lane wrote:
  Jeff Davis pg...@j-davis.com writes:
   I ran 5 times on both old and new code, eliminating the top and bottom
   and taking the average of the remaining 3, and I got a 6.9% performance
   improvement with the new code.
  
  The question that has been carefully evaded throughout the discussion
  of this patch is whether the randomness of the hash result is decreased,
  and if so what is that likely to cost us in performance of the various
  hash-dependent algorithms.  I would still like to have an answer to that
  before we make a change to gain marginal performance improvement in
  the hash function itself (which is already shown to be barely measurable
  in the total context of a hash-dependent operation...)
  
  regards, tom lane
  
 
 Dear Hackers and Reviewers,
 
 In response to Tom's questioning the randomness of the hash_any
 when the mixing functions are split into two, the first used when
 sequentially processing the input and the second for the final
 mix, I have generated a more detailed analysis of the two hash_any
 functions.
 
 First, one change to the 11/2008 patch, the keylen is added to a, b and
 c initially so we do not need to add it later on. The is the
 micro-diff:
 -
 
 --- hashfunc.c_TWOMIX   2009-01-22 14:07:34.0 -0600
 +++ hashfunc.c_TWOMIX2  2009-01-22 14:17:32.0 -0600
 @@ -336,7 +336,6 @@
  
 /* handle the last 11 bytes */
 k = (const unsigned char *) ka;
 -   c += keylen;
  #ifdef WORDS_BIGENDIAN
 switch (len)
 {
 @@ -439,7 +438,6 @@
 }
  
 /* handle the last 11 bytes */
 -   c += keylen;
  #ifdef WORDS_BIGENDIAN
 switch (len)/* all the case statements 
 fall through */
  
 -
 
 The remainder of this document will use the runs from my initial results
 broken out using various power-of-two bucket sizes to simulate our actual
 use in PostgreSQL as the number of items hashed increases and we use more
 and more bits of our hash to identify the appropriate bucket. I have run
 each test twice, once with our current hash_any() with the single mix()
 function and then a second time using my patch from the November commitfest
 plus the patch above to produce a new hash_any() with two separate mixing
 functions mix() and final(). For each run I have generated a sequence of
 unique inputs, up to the number of buckets, hashed them with the hash
 functions (both old and new), then I calculate the expected number of
 collision p(n) using the poisson formula for each number of buckets,
 where the number of buckets are 2**16, 2**18, 2**20, 2**22, 2**24, and
 2**26. For my initial run, I used a string consisting of the letter 'a'
 followed by the integer representation of the numbers from 0 to the
 (number of buckets - 1):
 
 1) auint32 ((i.e. a1,a0002...)
 Number of buckets: 65536
 Total number of items: 65536
 Expected number with n items: 24109 24109 12054 4018 1004 200 33 4
 Actual number mix():  24044 24172 12078 4036 980 186 30 10
 Actual number mix()/final():  24027 24232 12060 3972 1001 207 31 5 1
 
 Number of buckets: 262144
 Total number of items: 262144
 Expected number with n items: 96437 96437 48218 16072 4018 803 133 19 2
 Actual number mix():  96224 96730 48240 15951 4094 744 143 17 1
 Actual number mix()/final():  96335 96646 48071 16128 4018 801 122 21 2
 
 Number of buckets: 1048576
 Total number of items: 1048576
 Expected number with n items: 385749 385749 192874 64291 16072 3214 535 76 9
 Actual number mix():  385716 385596 193243 64115 16053 3285 478 77 12 
 1
 Actual number mix()/final():  385955 385016 193789 63768 16259 3190 511 79 8 1
 
 Number of buckets: 4194304
 Total number of items: 4194304
 Expected number with n items: 1542998 1542998 771499 257166 64291 12858 2143 
 306 38
 Actual number mix():  1542536 1543489 771351 25 63830 12847 2123 
 326 19 5 1
 Actual number mix()/final():  1541828 1544429 772072 256178 64579 12774 2129 
 288 22 5
 
 Number of buckets: 16777216
 Total number of items: 16777216
 Expected number with n items: 6171992 6171992 3085996 1028665 257166 51433 
 8572 1224 153
 Actual number mix():  6170866 6174079 3085912 1027140 257808 51385 
 8638 1219 146 23
 Actual number mix()/final():  6172058 6171991 3086279 1027916 257535 51465 
 8554 1243 149 23 3
 
 Number of buckets: 67108864
 Total number of items: 67108864
 Expected number with n items: 24687971 24687971 12343985 4114661 1028665 
 205733 34288 4898 612
 Actual number mix():  24686110 24690897 12344232 4113515 1028232 
 205682 34546 4942 629 72 7
 Actual number mix()/final():  24708515 24669248 12333034 4114796 1034256 
 208424 34888 5023 598 77 5
 
 Here is a 

Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1

2009-01-25 Thread Kenneth Marshall
On Sat, Jan 10, 2009 at 01:36:25PM -0500, Tom Lane wrote:
 Jeff Davis pg...@j-davis.com writes:
  I ran 5 times on both old and new code, eliminating the top and bottom
  and taking the average of the remaining 3, and I got a 6.9% performance
  improvement with the new code.
 
 The question that has been carefully evaded throughout the discussion
 of this patch is whether the randomness of the hash result is decreased,
 and if so what is that likely to cost us in performance of the various
 hash-dependent algorithms.  I would still like to have an answer to that
 before we make a change to gain marginal performance improvement in
 the hash function itself (which is already shown to be barely measurable
 in the total context of a hash-dependent operation...)
 
   regards, tom lane
 

Dear Hackers and Reviewers,

In response to Tom's questioning the randomness of the hash_any
when the mixing functions are split into two, the first used when
sequentially processing the input and the second for the final
mix, I have generated a more detailed analysis of the two hash_any
functions.

First, one change to the 11/2008 patch, the keylen is added to a, b and
c initially so we do not need to add it later on. The is the
micro-diff:
-

--- hashfunc.c_TWOMIX   2009-01-22 14:07:34.0 -0600
+++ hashfunc.c_TWOMIX2  2009-01-22 14:17:32.0 -0600
@@ -336,7 +336,6 @@
 
/* handle the last 11 bytes */
k = (const unsigned char *) ka;
-   c += keylen;
 #ifdef WORDS_BIGENDIAN
switch (len)
{
@@ -439,7 +438,6 @@
}
 
/* handle the last 11 bytes */
-   c += keylen;
 #ifdef WORDS_BIGENDIAN
switch (len)/* all the case statements fall 
through */
 
-

The remainder of this document will use the runs from my initial results
broken out using various power-of-two bucket sizes to simulate our actual
use in PostgreSQL as the number of items hashed increases and we use more
and more bits of our hash to identify the appropriate bucket. I have run
each test twice, once with our current hash_any() with the single mix()
function and then a second time using my patch from the November commitfest
plus the patch above to produce a new hash_any() with two separate mixing
functions mix() and final(). For each run I have generated a sequence of
unique inputs, up to the number of buckets, hashed them with the hash
functions (both old and new), then I calculate the expected number of
collision p(n) using the poisson formula for each number of buckets,
where the number of buckets are 2**16, 2**18, 2**20, 2**22, 2**24, and
2**26. For my initial run, I used a string consisting of the letter 'a'
followed by the integer representation of the numbers from 0 to the
(number of buckets - 1):

1) auint32 ((i.e. a1,a0002...)
Number of buckets: 65536
Total number of items: 65536
Expected number with n items: 24109 24109 12054 4018 1004 200 33 4
Actual number mix():  24044 24172 12078 4036 980 186 30 10
Actual number mix()/final():  24027 24232 12060 3972 1001 207 31 5 1

Number of buckets: 262144
Total number of items: 262144
Expected number with n items: 96437 96437 48218 16072 4018 803 133 19 2
Actual number mix():  96224 96730 48240 15951 4094 744 143 17 1
Actual number mix()/final():  96335 96646 48071 16128 4018 801 122 21 2

Number of buckets: 1048576
Total number of items: 1048576
Expected number with n items: 385749 385749 192874 64291 16072 3214 535 76 9
Actual number mix():  385716 385596 193243 64115 16053 3285 478 77 12 1
Actual number mix()/final():  385955 385016 193789 63768 16259 3190 511 79 8 1

Number of buckets: 4194304
Total number of items: 4194304
Expected number with n items: 1542998 1542998 771499 257166 64291 12858 2143 
306 38
Actual number mix():  1542536 1543489 771351 25 63830 12847 2123 
326 19 5 1
Actual number mix()/final():  1541828 1544429 772072 256178 64579 12774 2129 
288 22 5

Number of buckets: 16777216
Total number of items: 16777216
Expected number with n items: 6171992 6171992 3085996 1028665 257166 51433 8572 
1224 153
Actual number mix():  6170866 6174079 3085912 1027140 257808 51385 8638 
1219 146 23
Actual number mix()/final():  6172058 6171991 3086279 1027916 257535 51465 8554 
1243 149 23 3

Number of buckets: 67108864
Total number of items: 67108864
Expected number with n items: 24687971 24687971 12343985 4114661 1028665 205733 
34288 4898 612
Actual number mix():  24686110 24690897 12344232 4113515 1028232 205682 
34546 4942 629 72 7
Actual number mix()/final():  24708515 24669248 12333034 4114796 1034256 208424 
34888 5023 598 77 5

Here is a second run with number of items = (number of buckets)/2:
Number of buckets: 65536
Total number of items: 32768
Expected number with n items: 39749 19874 4968 828 103 10
Actual 

Re: [HACKERS] [PATCHES] GIN improvements

2009-01-23 Thread Teodor Sigaev

I'm very sorry, but v0.24 has a silly bug with not initialized value :(.
New version is attached

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


fast_insert_gin-0.25.gz
Description: Unix tar archive

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] GIN improvements

2009-01-22 Thread Teodor Sigaev
BTW, gincostestimate could use that information for cost estimation, but is 
index opening and metapge reading in amcostestimate acceptable?


That sounds reasonable to me. I think that's what the index-specific
cost estimators are for. 


Done.


Do you expect a performance impact?


I'm afraid for that and will test tomorrow. But statistic from index is exact.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


fast_insert_gin-0.24.gz
Description: Unix tar archive

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] GIN improvements

2009-01-21 Thread Teodor Sigaev
- after limit is reached, force cleanup of pending list by calling 
gininsertcleanup. Not very good, because users sometimes will see a huge 
execution time of simple insert. Although users who runs a huge update should be 
satisfied.


I have difficulties in a choice of way. Seems to me, the better will be second 
way: if user gets very long time of insertion then (auto)vacuum of his 
installation should tweaked.

I agree that the second solution sounds better to me.



Done. Now GIN counts number of pending tuples and pages and stores they on 
metapage. Index cleanup could start during normal insertion in two cases:

- number of pending tuples is too high to keep guaranteed non-lossy tidbitmap
- pending page's content doesn't fit into work_mem.

BTW, gincostestimate could use that information for cost estimation, but is 
index opening and metapge reading in amcostestimate acceptable?




--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


fast_insert_gin-0.23.gz
Description: Unix tar archive

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] GIN improvements

2009-01-21 Thread Jeff Davis
On Wed, 2009-01-21 at 15:06 +0300, Teodor Sigaev wrote:
 Done. Now GIN counts number of pending tuples and pages and stores they on 
 metapage. Index cleanup could start during normal insertion in two cases:
 - number of pending tuples is too high to keep guaranteed non-lossy tidbitmap
 - pending page's content doesn't fit into work_mem.

Great, thanks. I will take a look at this version tonight.

Because time is short, I will mark it as Ready for committer review
now. I think all of the major issues have been addressed, and I'll just
be looking at the code and testing it.

 BTW, gincostestimate could use that information for cost estimation, but is 
 index opening and metapge reading in amcostestimate acceptable?

That sounds reasonable to me. I think that's what the index-specific
cost estimators are for. Do you expect a performance impact?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] GIN improvements

2009-01-20 Thread Jeff Davis
On Mon, 2009-01-19 at 19:53 +0300, Teodor Sigaev wrote:
 I see only two guaranteed solution of the problem:
 - after limit is reached, force normal index inserts. One of the motivation 
 of 
 patch was frequent question from users: why update of whole table with GIN 
 index 
 is so slow? So this way will not resolve this question.
 - after limit is reached, force cleanup of pending list by calling 
 gininsertcleanup. Not very good, because users sometimes will see a huge 
 execution time of simple insert. Although users who runs a huge update should 
 be 
 satisfied.
 
 I have difficulties in a choice of way. Seems to me, the better will be 
 second 
 way: if user gets very long time of insertion then (auto)vacuum of his 
 installation should tweaked.
 

I agree that the second solution sounds better to me.

With the new Visibility Map, it's more reasonable to run VACUUM more
often, so those that are inserting single tuples at a time should not
encounter the long insert time.

I'm still looking at the rest of the patch.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] GIN improvements

2009-01-19 Thread Teodor Sigaev

Changes:
 Results of pernding list's scan now are placed directly in resulting 
tidbitmap. This saves cycles for filtering results and reduce memory usage. 
Also, it allows to not check losiness of tbm.




Is this a 100% bulletproof solution, or is it still possible for a query
to fail due to the pending list? It relies on the stats collector, so
perhaps in rare cases it could still fail?

Yes :(


Can you explain why the tbm must not be lossy?


The problem with lossy tbm has two aspects:
 - amgettuple interface hasn't possibility to work with page-wide result instead
   of exact ItemPointer. amgettuple can not return just a block number as
   amgetbitmap can.
 - Because of concurrent vacuum process: while we scan pending list, it's
   content could be transferred into regular structure of index and then we will
   find the same tuple twice. Again, amgettuple hasn't protection from that,
   only amgetbitmap has it. So, we need to filter results from regular GIN
   by results from pending list. ANd for filtering we can't use lossy tbm.

v0.21 prevents from that fail on call of gingetbitmap, because now all results 
are collected in single resulting tidbitmap.





Also, can you clarify why a large update can cause a problem? In the


If query looks like
UPDATE tbl SET col=... WHERE col ... and planner choose GIN indexscan over col 
then there is a probability of increasing of pending list over non-lossy limit.




previous discussion, you suggested that it force normal index inserts
after a threshold based on work_mem:

http://archives.postgresql.org/pgsql-hackers/2008-12/msg00065.php


I see only two guaranteed solution of the problem:
- after limit is reached, force normal index inserts. One of the motivation of 
patch was frequent question from users: why update of whole table with GIN index 
is so slow? So this way will not resolve this question.
- after limit is reached, force cleanup of pending list by calling 
gininsertcleanup. Not very good, because users sometimes will see a huge 
execution time of simple insert. Although users who runs a huge update should be 
satisfied.


I have difficulties in a choice of way. Seems to me, the better will be second 
way: if user gets very long time of insertion then (auto)vacuum of his 
installation should tweaked.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


fast_insert_gin-0.21.gz
Description: Unix tar archive

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] GIN improvements

2009-01-19 Thread Alvaro Herrera
Teodor Sigaev wrote:
 New version. Changes:
  - synced with current CVS

I notice you added a fillfactor reloption in ginoptions, but does it
really make sense?  I recall removing it because the original code
contained a comment that says this is here because default_reloptions
wants it, but it has no effect.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] GIN improvements

2009-01-19 Thread Teodor Sigaev

I notice you added a fillfactor reloption in ginoptions, but does it
really make sense?  I recall removing it because the original code
contained a comment that says this is here because default_reloptions
wants it, but it has no effect.


I didn't change a recognition of fillfactor value, although GIN doesn't use it 
for now.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] GIN improvements

2009-01-19 Thread Alvaro Herrera
Teodor Sigaev wrote:
 I notice you added a fillfactor reloption in ginoptions, but does it
 really make sense?  I recall removing it because the original code
 contained a comment that says this is here because default_reloptions
 wants it, but it has no effect.

 I didn't change a recognition of fillfactor value, although GIN doesn't 
 use it for now.

I suggest you take StdRdOptions out of the GinOptions struct, and leave
fillfactor out of ginoptions.  I don't think there's much point in
supporting options that don't actually do anything.  If the user tries
to set fillfactor for a gin index, he will get an error.  Which is a
good thing IMHO.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] GIN improvements

2009-01-19 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Teodor Sigaev wrote:
 I didn't change a recognition of fillfactor value, although GIN doesn't 
 use it for now.

 I suggest you take StdRdOptions out of the GinOptions struct, and leave
 fillfactor out of ginoptions.  I don't think there's much point in
 supporting options that don't actually do anything.  If the user tries
 to set fillfactor for a gin index, he will get an error.  Which is a
 good thing IMHO.

+1 ... appearing to accept an option that doesn't really do anything is
likely to confuse users.  We didn't have much choice in the previous
incarnation of reloptions, but I think now we should throw errors when
we can.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] GIN improvements

2009-01-19 Thread Teodor Sigaev



I suggest you take StdRdOptions out of the GinOptions struct, and leave
fillfactor out of ginoptions.  I don't think there's much point in
supporting options that don't actually do anything.  If the user tries
to set fillfactor for a gin index, he will get an error.  Which is a
good thing IMHO.

Oh, I see. Fixed.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


fast_insert_gin-0.22.gz
Description: Unix tar archive

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] GIN improvements

2009-01-18 Thread Jeff Davis
On Fri, 2009-01-16 at 15:39 +0300, Teodor Sigaev wrote:
  START_CRIT_SECTION();
  ...
  l = PageAddItem(...);
  if (l == InvalidOffsetNumber)
  elog(ERROR, failed to add item to index page in \%s\,
   RelationGetRelationName(index));
  
  It's no use using ERROR, because it will turn into PANIC, which is
 I did that similar to other GIN/GiST places. BTW, BTree  directly emits PANIC 
 if
 PageAddItem fails
 

I'd still prefer PANIC over an ERROR that will always turn into a PANIC.
I'll leave it as you did, though.

  
  4. Heikki mentioned:
  http://archives.postgresql.org/pgsql-hackers/2008-11/msg01832.php
  
  To make things worse, a query will fail if all the matching 
  fast-inserted tuples don't fit in the non-lossy tid bitmap.
  
  That issue still remains, correct? Is there a resolution to that?
 
 Now gincostestimate can forbid index scan by disable_cost (see Changes). Of 
 course, it doesn't prevent failure in case of large update (for example), but 
 it 
 prevents in most cases. BTW, because of sequential scan of pending list cost 
 of 
 scan grows up fast and index scan becomes non-optimal.

Is this a 100% bulletproof solution, or is it still possible for a query
to fail due to the pending list? It relies on the stats collector, so
perhaps in rare cases it could still fail?

It might be surprising though, that after an UPDATE and before a VACUUM,
the gin index just stops working (if work_mem is too low). For many
use-cases, if GIN is not used, it's just as bad as the query failing,
because it would be so slow.

Can you explain why the tbm must not be lossy?

Also, can you clarify why a large update can cause a problem? In the
previous discussion, you suggested that it force normal index inserts
after a threshold based on work_mem:

http://archives.postgresql.org/pgsql-hackers/2008-12/msg00065.php

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] GIN improvements

2009-01-16 Thread Teodor Sigaev

New version. Changes:
 - synced with current CVS
 - added all your changes
 - autovacuum will run if fast update mode is turned on and
   trigger of fresh tuple is fired
 - gincostestimate now tries to calculate cost of scan of pending pages.
   gincostestimate set disable_cost if it believe that tidbitmap will
   become lossy. So, tidbitmap has new method - estimation of
   maximum number of tuples with guaranteed non-lossy mode.



START_CRIT_SECTION();
...
l = PageAddItem(...);
if (l == InvalidOffsetNumber)
elog(ERROR, failed to add item to index page in \%s\,
 RelationGetRelationName(index));

It's no use using ERROR, because it will turn into PANIC, which is

I did that similar to other GIN/GiST places. BTW, BTree  directly emits PANIC if
PageAddItem fails




4. Heikki mentioned:
http://archives.postgresql.org/pgsql-hackers/2008-11/msg01832.php

To make things worse, a query will fail if all the matching 
fast-inserted tuples don't fit in the non-lossy tid bitmap.


That issue still remains, correct? Is there a resolution to that?


Now gincostestimate can forbid index scan by disable_cost (see Changes). Of 
course, it doesn't prevent failure in case of large update (for example), but it 
prevents in most cases. BTW, because of sequential scan of pending list cost of 
scan grows up fast and index scan becomes non-optimal.




5. I attached a newer version merged with HEAD.

Thank you


6. You defined:

#define GinPageHasFullRow(page) ( GinPageGetOpaque(page)-flags 
GIN_LIST_FULLROW )


Fixed



7.  I don't understand this chunk of code:

How can (!ItemPointerEquals(pos-item, item)) ever happen?

And how can (scanGetCandidate(scan, pos) == false) ever happen? Should
that be an Assert() instead?

If those can happen during normal operation, then we need a better error
message there.


It should be assert, but assert enabled and disabled code will be different :(.
In both cases, scanGetCandidate() should be called, but in assert enabled code 
we need to check return value and pos-item.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/



fast_insert_gin-0.20.gz
Description: Unix tar archive

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1

2009-01-10 Thread Kenneth Marshall
On Fri, Jan 09, 2009 at 02:00:39PM -0800, Jeff Davis wrote:
 On Fri, 2009-01-09 at 14:29 -0600, Kenneth Marshall wrote:
  Jeff,
  
  Thanks for the review. I would not really expect any differences in hash
  index build times other than normal noise variances. The most definitive
  benchmark that I have seen was done with my original patch submission
  in March posted by Luke of Greenplum:
  
We just applied this and saw a 5 percent speedup on a hash aggregation
 query with four columns in a 'group by' clause run against a single
 TPC-H table.
  
  I wonder if they would be willing to re-run their test? Thanks again.
 
 Separating mix() and final() should have some performance benefit,
 right?
 
 Regards,
   Jeff Davis
 
 
Yes, it does but the results can be swamped by other latencies in the
code path. Tests such as Tom's benchmark of the underlying functions is
needed to isolate the timings effectively or a benchmark like Greenplum's
that will benefit from a more efficient function.

Ken

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1

2009-01-10 Thread Jeff Davis
On Sat, 2009-01-10 at 11:06 -0600, Kenneth Marshall wrote:
  Separating mix() and final() should have some performance benefit,
  right?
  
 Yes, it does but the results can be swamped by other latencies in the
 code path. Tests such as Tom's benchmark of the underlying functions is
 needed to isolate the timings effectively or a benchmark like Greenplum's
 that will benefit from a more efficient function.
 

Ok. I isolated the function itself by just doing:

-- 10 million rows of random()::text
EXPLAIN ANALYZE SELECT hashtext(t) FROM randomtext;

I ran 5 times on both old and new code, eliminating the top and bottom
and taking the average of the remaining 3, and I got a 6.9% performance
improvement with the new code.

I tried quickly with a few other data types and got similar results.
It's obviously a small microbenchmark, but that's good enough for me. 

Thanks!

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1

2009-01-10 Thread Tom Lane
Jeff Davis pg...@j-davis.com writes:
 I ran 5 times on both old and new code, eliminating the top and bottom
 and taking the average of the remaining 3, and I got a 6.9% performance
 improvement with the new code.

The question that has been carefully evaded throughout the discussion
of this patch is whether the randomness of the hash result is decreased,
and if so what is that likely to cost us in performance of the various
hash-dependent algorithms.  I would still like to have an answer to that
before we make a change to gain marginal performance improvement in
the hash function itself (which is already shown to be barely measurable
in the total context of a hash-dependent operation...)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1

2009-01-10 Thread Jeff Davis
On Sat, 2009-01-10 at 13:36 -0500, Tom Lane wrote:
 Jeff Davis pg...@j-davis.com writes:
  I ran 5 times on both old and new code, eliminating the top and bottom
  and taking the average of the remaining 3, and I got a 6.9% performance
  improvement with the new code.
 
 The question that has been carefully evaded throughout the discussion
 of this patch is whether the randomness of the hash result is decreased,
 and if so what is that likely to cost us in performance of the various
 hash-dependent algorithms.  I would still like to have an answer to that
 before we make a change to gain marginal performance improvement in
 the hash function itself (which is already shown to be barely measurable
 in the total context of a hash-dependent operation...)
 

In:
http://archives.postgresql.org/message-id/20081104202655.gp18...@it.is.rice.edu

Ken showed that the number of hash collisions is the same in the old and
the new code for his test data. Not a direct measurement of randomness,
but it's some kind of indication.

I'm not an authority on either hash functions or statistics, so I'll
have to defer to Ken or someone else to actually prove that the
randomness is maintained.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1

2009-01-10 Thread Gregory Stark
Tom Lane t...@sss.pgh.pa.us writes:

 Jeff Davis pg...@j-davis.com writes:
 I ran 5 times on both old and new code, eliminating the top and bottom
 and taking the average of the remaining 3, and I got a 6.9% performance
 improvement with the new code.

 The question that has been carefully evaded throughout the discussion
 of this patch is whether the randomness of the hash result is decreased,

In fairness that doesn't seem to be the case. The original patch was posted
with such an analysis using cracklib and raw binary data:

http://article.gmane.org/gmane.comp.db.postgresql.devel.general/105675

  marginal performance improvement in the hash function itself (which is
 already shown to be barely measurable in the total context of a
 hash-dependent operation...)

If it's a 6% gain in the speed of Hash Join or HashAggregate it would be very
interesting. But I gather it's a 6% speedup in the time spent actually in the
hash function. Is that really where much of our time is going? If it's 10% of
the total time to execute one of these nodes then we're talking about a 0.6%
optimization...

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1

2009-01-10 Thread Jeff Davis
On Sat, 2009-01-10 at 13:57 -0500, Gregory Stark wrote:
 But I gather it's a 6% speedup in the time spent actually in the
 hash function.

That's correct. I was not able to detect any difference at all between
the old and new code using HashAggregate, which is where most of my
testing effort went:

http://archives.postgresql.org/message-id/1231531455.25019.75.ca...@jdavis

(see test results attached to that message, if you're interested)

I tested with two data distributions and 6 data types.

The 6-10% speedup I saw was a super-micro benchmark testing the hash
function, and I didn't spend much time with that.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1

2009-01-10 Thread Kenneth Marshall
On Sat, Jan 10, 2009 at 10:56:15AM -0800, Jeff Davis wrote:
 On Sat, 2009-01-10 at 13:36 -0500, Tom Lane wrote:
  Jeff Davis pg...@j-davis.com writes:
   I ran 5 times on both old and new code, eliminating the top and bottom
   and taking the average of the remaining 3, and I got a 6.9% performance
   improvement with the new code.
  
  The question that has been carefully evaded throughout the discussion
  of this patch is whether the randomness of the hash result is decreased,
  and if so what is that likely to cost us in performance of the various
  hash-dependent algorithms.  I would still like to have an answer to that
  before we make a change to gain marginal performance improvement in
  the hash function itself (which is already shown to be barely measurable
  in the total context of a hash-dependent operation...)
  
 
 In:
 http://archives.postgresql.org/message-id/20081104202655.gp18...@it.is.rice.edu
 
 Ken showed that the number of hash collisions is the same in the old and
 the new code for his test data. Not a direct measurement of randomness,
 but it's some kind of indication.
 
 I'm not an authority on either hash functions or statistics, so I'll
 have to defer to Ken or someone else to actually prove that the
 randomness is maintained.
 
 Regards,
   Jeff Davis
 
First, while I am not an expert by any means, basic statistics will give you the
probability of a collision when packing N items into M buckets chosen at random.
In all of my tests, I used 1.6M items into 2^32 buckets. If the bits mixing is
complete and random, the number of collisions should be close to the same no
matter what arrangement of bits are used for inputs. I think my test cases
indicate quite clearly that the new hash function is as independent of bit-
order as the older functions, in that the number of bucket collisions is
basically the same for all layouts. I have the test harness and can test
any other input that you would like me to. Second, the author of the basic
hash function (old and new) has performed more extensive testing and has
shown that the new functions are better in randomizing bits than his original
function. I will try and run a micro-benchmark of my original patch in March
and the result of the incremental approach that is the result of my Novermber
patch tomorrow.

Cheers,
Ken




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1

2009-01-10 Thread Kenneth Marshall
On Sat, Jan 10, 2009 at 01:57:27PM -0500, Gregory Stark wrote:
 Tom Lane t...@sss.pgh.pa.us writes:
 
  Jeff Davis pg...@j-davis.com writes:
  I ran 5 times on both old and new code, eliminating the top and bottom
  and taking the average of the remaining 3, and I got a 6.9% performance
  improvement with the new code.
 
  The question that has been carefully evaded throughout the discussion
  of this patch is whether the randomness of the hash result is decreased,
 
 In fairness that doesn't seem to be the case. The original patch was posted
 with such an analysis using cracklib and raw binary data:
 
 http://article.gmane.org/gmane.comp.db.postgresql.devel.general/105675
 
   marginal performance improvement in the hash function itself (which is
  already shown to be barely measurable in the total context of a
  hash-dependent operation...)
 
 If it's a 6% gain in the speed of Hash Join or HashAggregate it would be very
 interesting. But I gather it's a 6% speedup in the time spent actually in the
 hash function. Is that really where much of our time is going? If it's 10% of
 the total time to execute one of these nodes then we're talking about a 0.6%
 optimization...
 

The Greenplum test did show a 5% increase in performance with the replacement
functions in March.

Regards,
Ken

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1

2009-01-09 Thread Jeff Davis
On Mon, 2008-12-22 at 13:47 -0600, Kenneth Marshall wrote: 
 Dear PostgreSQL developers,
 
 I am re-sending this to keep this last change to the
 internal hash function on the radar.
 

Hi Ken,

A few comments:

1. New patch with very minor changes attached.

2. I reverted the change you made to indices.sgml. We still don't use
WAL for hash indexes, and in my opinion we should continue to discourage
their use until we do use WAL. We can add back in the comment that hash
indexes are suitable for large keys if we have some results to show
that.

3. There was a regression test failure in union.sql because the ordering
of the results was different. I updated the regression test.

4. Hash functions affect a lot more than hash indexes, so I ran through
a variety of tests that use a HashAggregate plan. Test setup and results
are attached. These results show no difference between the old and the
new code (about 0.1% better).

5. The hash index build time shows some improvement. The new code won in
every instance in which a there were a lot of duplicates in the table
(100 distinct values, 50K of each) by around 5%.

The new code appeared to be the same or slightly worse in the case of
hash index builds with few duplicates (100 distinct values, 5 of
each). The difference was about 1% worse, which is probably just noise.

Note: I'm no expert on hash functions. Take all of my tests with a grain
of salt.

I would feel a little better if I saw at least one test that showed
better performance of the new code on a reasonable-looking distribution
of data. The hash index build that you showed only took a second or two
-- it would be nice to see a test that lasted at least a minute.

Regards,
Jeff Davis




test_results.tar.gz
Description: application/compressed-tar
diff --git a/src/backend/access/hash/hashfunc.c b/src/backend/access/hash/hashfunc.c
index 96d5643..8a236b5 100644
--- a/src/backend/access/hash/hashfunc.c
+++ b/src/backend/access/hash/hashfunc.c
@@ -200,39 +200,94 @@ hashvarlena(PG_FUNCTION_ARGS)
  * hash function, see http://burtleburtle.net/bob/hash/doobs.html,
  * or Bob's article in Dr. Dobb's Journal, Sept. 1997.
  *
- * In the current code, we have adopted an idea from Bob's 2006 update
- * of his hash function, which is to fetch the data a word at a time when
- * it is suitably aligned.  This makes for a useful speedup, at the cost
- * of having to maintain four code paths (aligned vs unaligned, and
- * little-endian vs big-endian).  Note that we have NOT adopted his newer
- * mix() function, which is faster but may sacrifice some randomness.
+ * In the current code, we have adopted Bob's 2006 update of his hash
+ * which fetches the data a word at a time when it is suitably aligned.
+ * This makes for a useful speedup, at the cost of having to maintain
+ * four code paths (aligned vs unaligned, and little-endian vs big-endian).
+ * It also two separate mixing functions mix() and final(), instead
+ * of a slower multi-purpose function.
  */
 
 /* Get a bit mask of the bits set in non-uint32 aligned addresses */
 #define UINT32_ALIGN_MASK (sizeof(uint32) - 1)
+#define rot(x,k) (((x)(k)) | ((x)(32-(k
 
 /*--
  * mix -- mix 3 32-bit values reversibly.
- * For every delta with one or two bits set, and the deltas of all three
- * high bits or all three low bits, whether the original value of a,b,c
- * is almost all zero or is uniformly distributed,
- * - If mix() is run forward or backward, at least 32 bits in a,b,c
- *	 have at least 1/4 probability of changing.
- * - If mix() is run forward, every bit of c will change between 1/3 and
- *	 2/3 of the time.  (Well, 22/100 and 78/100 for some 2-bit deltas.)
+ *
+ * This is reversible, so any information in (a,b,c) before mix() is
+ * still in (a,b,c) after mix().
+ *
+ * If four pairs of (a,b,c) inputs are run through mix(), or through
+ * mix() in reverse, there are at least 32 bits of the output that
+ * are sometimes the same for one pair and different for another pair.
+ * This was tested for:
+ * * pairs that differed by one bit, by two bits, in any combination
+ *   of top bits of (a,b,c), or in any combination of bottom bits of
+ *   (a,b,c).
+ * * differ is defined as +, -, ^, or ~^.  For + and -, I transformed
+ *   the output delta to a Gray code (a^(a1)) so a string of 1's (as
+ *   is commonly produced by subtraction) look like a single 1-bit
+ *   difference.
+ * * the base values were pseudorandom, all zero but one bit set, or
+ *   all zero plus a counter that starts at zero.
+ * 
+ * This does not achieve avalanche.  There are input bits of (a,b,c)
+ * that fail to affect some output bits of (a,b,c), especially of a.  The
+ * most thoroughly mixed value is c, but it doesn't really even achieve
+ * avalanche in c. 
+ * 
+ * This allows some parallelism.  Read-after-writes are good at doubling
+ * the number of bits affected, so the goal of mixing pulls in the opposite
+ * direction as the goal of parallelism.  I did what 

Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1

2009-01-09 Thread Kenneth Marshall
On Fri, Jan 09, 2009 at 12:04:15PM -0800, Jeff Davis wrote:
 On Mon, 2008-12-22 at 13:47 -0600, Kenneth Marshall wrote: 
  Dear PostgreSQL developers,
  
  I am re-sending this to keep this last change to the
  internal hash function on the radar.
  
 
 Hi Ken,
 
 A few comments:
 
 1. New patch with very minor changes attached.
 
 2. I reverted the change you made to indices.sgml. We still don't use
 WAL for hash indexes, and in my opinion we should continue to discourage
 their use until we do use WAL. We can add back in the comment that hash
 indexes are suitable for large keys if we have some results to show
 that.
 
 3. There was a regression test failure in union.sql because the ordering
 of the results was different. I updated the regression test.
 
 4. Hash functions affect a lot more than hash indexes, so I ran through
 a variety of tests that use a HashAggregate plan. Test setup and results
 are attached. These results show no difference between the old and the
 new code (about 0.1% better).
 
 5. The hash index build time shows some improvement. The new code won in
 every instance in which a there were a lot of duplicates in the table
 (100 distinct values, 50K of each) by around 5%.
 
 The new code appeared to be the same or slightly worse in the case of
 hash index builds with few duplicates (100 distinct values, 5 of
 each). The difference was about 1% worse, which is probably just noise.
 
 Note: I'm no expert on hash functions. Take all of my tests with a grain
 of salt.
 
 I would feel a little better if I saw at least one test that showed
 better performance of the new code on a reasonable-looking distribution
 of data. The hash index build that you showed only took a second or two
 -- it would be nice to see a test that lasted at least a minute.
 
 Regards,
   Jeff Davis
 
 

Jeff,

Thanks for the review. I would not really expect any differences in hash
index build times other than normal noise variances. The most definitive
benchmark that I have seen was done with my original patch submission
in March posted by Luke of Greenplum:

  We just applied this and saw a 5 percent speedup on a hash aggregation
   query with four columns in a 'group by' clause run against a single
   TPC-H table.

I wonder if they would be willing to re-run their test? Thanks again.

Regards,
Ken


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1

2009-01-09 Thread Jeff Davis
On Fri, 2009-01-09 at 14:29 -0600, Kenneth Marshall wrote:
 Jeff,
 
 Thanks for the review. I would not really expect any differences in hash
 index build times other than normal noise variances. The most definitive
 benchmark that I have seen was done with my original patch submission
 in March posted by Luke of Greenplum:
 
   We just applied this and saw a 5 percent speedup on a hash aggregation
query with four columns in a 'group by' clause run against a single
TPC-H table.
 
 I wonder if they would be willing to re-run their test? Thanks again.

Separating mix() and final() should have some performance benefit,
right?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] Infrastructure changes for recovery (v8)

2008-12-29 Thread Simon Riggs

On Mon, 2008-12-29 at 15:06 +0900, Fujii Masao wrote:

 This seems to have not been fixed yet in the latest patch.
 
 http://archives.postgresql.org/message-id/494ff631.90...@enterprisedb.com
 recovery-infra-separated-again-1.patch

I'll add it to my issues-reported list so we can check for regressions.

Thanks,

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] Infrastructure changes for recovery (v8)

2008-12-28 Thread Fujii Masao
Hi,

On Tue, Nov 18, 2008 at 12:39 AM, Simon Riggs si...@2ndquadrant.com wrote:

 On Mon, 2008-11-17 at 16:18 +0200, Heikki Linnakangas wrote:
 Simon Riggs wrote:
  @@ -3845,6 +3850,52 @@ sigusr1_handler(SIGNAL_ARGS)
 
  PG_SETMASK(BlockSig);
 
  +   if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_START))
  +   {
  +   Assert(pmState == PM_STARTUP);
  +
  +   /*
  +* Go to shutdown mode if a shutdown request was pending.
  +*/
  +   if (Shutdown  NoShutdown)
  +   {
  +   pmState = PM_WAIT_BACKENDS;
  +   /* PostmasterStateMachine logic does the rest */
  +   }
  +   else
  +   {
  +   /*
  +* Startup process has entered recovery
  +*/
  +   pmState = PM_RECOVERY;


 Hmm, I smell a race condition here:

 1. Startup process goes into consistent state, and signals postmaster
 2. Startup process finishes WAL replay and dies
 3. Postmaster wakes up in reaper(), noting that the startup process
 dies, and goes into PM_RUN mode.
 4. The above signal handler for postmaster is run, causing an assertion
 failure, or putting postmaster back into PM_RECOVERY mode if assertions
 are disabled.

 Highly unlikely in practice, given how much code needs to run in the
 startup process between signaling the postmaster and exiting, but it
 seems theoretically possible. Do we care, and if we do, how can we fix it?

 Might be possible - it does depend on the sequence of actions its true.
 Agree not likely to happen, except as the result of another bug.

 I'll change it to a test for

 if (pmState == PM_STARTUP)
pmState = PM_RECOVERY;

 The assertion was mainly for documentation, its not protecting anything
 critical (IIRC).

This seems to have not been fixed yet in the latest patch.

http://archives.postgresql.org/message-id/494ff631.90...@enterprisedb.com
recovery-infra-separated-again-1.patch

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] Infrastructure changes for recovery (v8)

2008-12-23 Thread Simon Riggs

On Mon, 2008-12-22 at 22:18 +0200, Heikki Linnakangas wrote:

 I think it's useful to review the infra part of the patch separately, 
 so I split it out of the big patch again. I haven't looked at this in 
 detail yet, but it compiles and passes regression tests.

OK, thanks, much appreciated.

The patches were fairly distinct in their features, though choosing an
exact split line could be done in a number of ways.

I'll look through this in more detail today and get back to you.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1

2008-12-22 Thread Kenneth Marshall
Dear PostgreSQL developers,

I am re-sending this to keep this last change to the
internal hash function on the radar.

Ken

Sorry about the delay for this update to the new hash
index implementation. I was trying to get the WAL logging
in place and forgot to post the actual patch. The WAL
for hash indexes will need to wait for 8.5, but I did
want to add back in the piece of the Bob Jenkins 2006
hash function that was stripped out of the initial
patch on application due to concerns about the randomness
of the resulting hash values. Here is a re-post of my
initial findings comparing the old/new Jenkins hash
from lookup2 and lookup3. I have added a third column
containing the results for the hash_any() resulting
from the attached patch as well as simple timing test
for a DB reindex both before and after patching.

Also attached is a simple documentation patch updating
the note attached to the hash index description.

Regards,
Ken

Hi,

I have finally had a chance to do some investigation on
the performance of the old hash mix() function versus
the updated mix()/final() in the new hash function. Here
is a table of my current results for both the old and the
new hash function. In this case cracklib refers to the
cracklib-dict containing 1648379 unique words massaged
in various ways to generate input strings for the hash
functions. The result is the number of collisions in the
hash values generated.

hash inputoldnew  newv2
--------  -
cracklib  338316  338
cracklib x 2 (i.e. clibclib)  305319  300
cracklib x 3 (clibclibclib)   323329  315
cracklib x 10 302310  329
cracklib x 100350335  298
cracklib x 1000   314309  315
cracklib x 100 truncated to char(100) 311327  320

uint32 from 1-1648379 309319  347
(uint32 1-1948379)*256309314  304
(uint32 1-1948379)*16 310314  324
auint32 (i.e. a1,a0002...)  320321  312

uint32uint32 (i.e. uint64)321287  309

The different result columns are old = Jenkins 1996 hash
function(lookup2.c), new = Jenkins 2006 hash function
(lookup3.c), and newv2 = adaptation of current hash_any()
to incorporate the separate mix()/final() functions. As
you can see from the results, spliting the mix() and final()
apart does not result in any perceptible loss of randomness
in the hash assignment. I also ran a crude timing for a
reindex of the following database:

CREATE TABLE dict (word text);
CREATE INDEX wordhash ON dict USING hash (word);
INSERT INTO dict (word) VALUES('s;lhkjdpyoijxfg;lktjgh;sdlhkjo');
INSERT INTO dict (SELECT MAX(word)||MAX(word) FROM dict);
... (21 times)

REINDEX TABLE
...

The average time to reindex the table using our current
hash_any() without the separate mix()/final() was 1696ms
and 1482ms with the separate mix()/final() stages giving
almost 13% better performance for this stupid metric.
--- indices.sgml2008-10-13 14:40:06.0 -0500
+++ indices.sgml.NEW2008-11-04 12:42:35.0 -0600
@@ -190,13 +190,11 @@
 
   note
para
-Testing has shown productnamePostgreSQL/productname's hash
-indexes to perform no better than B-tree indexes, and the
-index size and build time for hash indexes is much worse.
-Furthermore, hash index operations are not presently WAL-logged,
+productnamePostgreSQL/productname's hash indexes provide
+the fast O(1) lookups, even for very large objects.
+Hash index operations are not presently WAL-logged,
 so hash indexes might need to be rebuilt with commandREINDEX/
-after a database crash.
-For these reasons, hash index use is presently discouraged.
+after a database crash. 
/para
   /note
 
--- hashfunc.c.ORIG 2008-09-03 13:07:14.0 -0500
+++ hashfunc.c.NEW  2008-11-04 08:36:16.0 -0600
@@ -200,39 +200,94 @@
  * hash function, see http://burtleburtle.net/bob/hash/doobs.html,
  * or Bob's article in Dr. Dobb's Journal, Sept. 1997.
  *
- * In the current code, we have adopted an idea from Bob's 2006 update
- * of his hash function, which is to fetch the data a word at a time when
- * it is suitably aligned.  This makes for a useful speedup, at the cost
- * of having to maintain four code paths (aligned vs unaligned, and
- * little-endian vs big-endian).  Note that we have NOT adopted his newer
- * mix() function, which is faster but may sacrifice some randomness.
+ * In the current code, we have adopted Bob's 2006 update of his hash
+ * which fetches the data a word at a time when it is suitably aligned.
+ * This makes for a useful speedup, at the cost of having to maintain
+ * four code paths (aligned vs unaligned, and little-endian vs big-endian).
+ * It also two separate mixing functions 

Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1

2008-12-22 Thread Alvaro Herrera
Kenneth Marshall wrote:
 Dear PostgreSQL developers,
 
 I am re-sending this to keep this last change to the
 internal hash function on the radar.

Please add it to the commitfest wiki page,
http://wiki.postgresql.org/wiki/CommitFest_2008-11

Thanks

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] Infrastructure changes for recovery (v8)

2008-12-22 Thread Fujii Masao
On Tue, Dec 23, 2008 at 5:18 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Simon Riggs wrote:

 On Wed, 2008-12-17 at 23:32 -0300, Alvaro Herrera wrote:

 Simon Riggs escribió:

 Please let me know how I can make the reviewer's job easier. Diagrams,
 writeups, whatever. Thanks,

 A link perhaps?

 There is much confusion on this point for which I'm very sorry.

 I originally wrote infra patch to allow it to be committed separately
 in the Sept commitfest, to reduce size of the forthcoming hotstandby
 patch. That didn't happen (no moans there) so the eventual hotstandby
 patch includes all of what was the infra patch, plus the new code.

 So currently there is no separate infra patch. The two line items on
 the CommitFest page are really just one large project. I would be in
 favour of removing the infra lines from the CommitFest page.

 I think it's useful to review the infra part of the patch separately, so I
 split it out of the big patch again. I haven't looked at this in detail yet,
 but it compiles and passes regression tests.

Super! I would fix synch rep code based on this patch.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] Infrastructure changes for recovery (v8)

2008-12-17 Thread Simon Riggs

On Thu, 2008-11-20 at 10:10 +, Simon Riggs wrote:
 On Thu, 2008-11-20 at 15:19 +0530, Pavan Deolasee wrote:
 
  Do you intend to split the patch into smaller pieces ? The latest hot
  standby patch is almost 10K+ lines. Splitting that would definitely
  help the review process.
 
 If it helps you, then I'll do it. Hang on an hour or so.

I've posted a slightly subdivided patch now via Wiki.

Putting infrastructure and hot standby together was fairly easy, but
splitting them apart has not been and I was unable to complete that
after a lot of hacking. 

If you wouldn't mind looking at the major subsystems some more, I'm
happy to attempt some further parceling to make it easier for you to
review. I'm not completely certain the infra v hot standby is a good
split point anyway.

Please let me know how I can make the reviewer's job easier. Diagrams,
writeups, whatever. Thanks,

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] Infrastructure changes for recovery (v8)

2008-12-17 Thread Alvaro Herrera
Simon Riggs escribió:

 Please let me know how I can make the reviewer's job easier. Diagrams,
 writeups, whatever. Thanks,

A link perhaps?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] Infrastructure changes for recovery (v8)

2008-12-17 Thread Simon Riggs

On Wed, 2008-12-17 at 23:32 -0300, Alvaro Herrera wrote:
 Simon Riggs escribió:
 
  Please let me know how I can make the reviewer's job easier. Diagrams,
  writeups, whatever. Thanks,
 
 A link perhaps?

There is much confusion on this point for which I'm very sorry.

I originally wrote infra patch to allow it to be committed separately
in the Sept commitfest, to reduce size of the forthcoming hotstandby
patch. That didn't happen (no moans there) so the eventual hotstandby
patch includes all of what was the infra patch, plus the new code.

So currently there is no separate infra patch. The two line items on
the CommitFest page are really just one large project. I would be in
favour of removing the infra lines from the CommitFest page. 

Of course we can consider hotstandby patch in parts, but
deconstructing it into wholly separate patches doesn't make much sense
now and would raise many questions about why certain code exists with no
apparent function or why certain design choices made.

If you were to review a part of this, I might ask that you look at the
changes to XidInMVCCSnapshot(), GetSnapshotData() and
AssignTransactionId(), which relate specifically to subtransaction
handling. Comments explain the new approach.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS][PATCHES] odd output in restore mode

2008-12-15 Thread Bruce Momjian
Martin Zaun wrote:
 4. Issue: missing break in switch, silent override of '-l' argument?
 
 This behaviour has been in there before and is not addresses by the
 patch: The user-selected Win32 mklink command mode is never applied
 due to a missing 'break' in CustomizableInitialize():
 
  switch (restoreCommandType)
  {
  case RESTORE_COMMAND_WIN32_MKLINK:
  SET_RESTORE_COMMAND(mklink, WALFilePath, xlogFilePath);
  case RESTORE_COMMAND_WIN32_COPY:
  SET_RESTORE_COMMAND(copy, WALFilePath, xlogFilePath);
  break;

I have added the missing 'break' to CVS HEAD;  thanks.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS][PATCHES] odd output in restore mode

2008-12-15 Thread Magnus Hagander
Bruce Momjian wrote:
 Martin Zaun wrote:
 4. Issue: missing break in switch, silent override of '-l' argument?

 This behaviour has been in there before and is not addresses by the
 patch: The user-selected Win32 mklink command mode is never applied
 due to a missing 'break' in CustomizableInitialize():

  switch (restoreCommandType)
  {
  case RESTORE_COMMAND_WIN32_MKLINK:
  SET_RESTORE_COMMAND(mklink, WALFilePath, xlogFilePath);
  case RESTORE_COMMAND_WIN32_COPY:
  SET_RESTORE_COMMAND(copy, WALFilePath, xlogFilePath);
  break;
 
 I have added the missing 'break' to CVS HEAD;  thanks.

Why no backpatch to 8.3? Seems like a clear bugfix to me.

//Magnus


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   3   4   5   6   7   8   9   10   >