date:20140124

On Thu, Jan 23, 2014 at 9:27 AM, Alexander Korotkov aekorot...@gmail.comwrote:

 On Wed, Jan 22, 2014 at 9:28 PM, Heikki Linnakangas 
 hlinnakan...@vmware.com wrote:

 On 01/22/2014 02:17 PM, Alexander Korotkov wrote:

 We already spent a lot of time with compression. Now we need to figure
 out
 the result we want see. I spent quite long time debugging varbyte
 encoding
 without segments. Finally, it passed very many tests without any
 problems.
 Now, it is just piece of junk. I'm afraid that we will have to
 reimplement
 everything from scratch another two or three times because code doesn't
 look perfect. For sure, it's incompatible with getting something into
 9.4.


 That's a bit harsh :-). But yes, I understand what you're saying. It's
 quite common for large patches like this to be rewritten several times
 before being committed; you just don't know what works best until you've
 tried many designs.


  Remember we have also subsequent fast-scan which is very needed for
 hstore
 and other application.
 I propose to do final decisions now and concentrate forces on making
 committable patch with these decisions. And don't change these decisions
 even if somebody have idea how to improve code readability in 100 times
 and
 potential extendability in 1000 times.
 I propose following decisions:
 1) Leave uncompressed area, allow duplicates in it
 2) Don't introduce Items on page.


 Well, I got the insertions to work now without the uncompressed area, so
 in the spirit of Just Getting This Damn Patch Committed, I'm going to go
 ahead with that. We can add the uncompressed area later if performance
 testing shows it to be necessary. And I agree about the Items on page idea
 - we can come back to that too still in 9.4 timeframe if necessary, but
 probably not.

 So, committed. It's the same design as in the most recent patch I posted,
 but with a bunch of bugs fixed, and cleaned up all over. I'm going to move
 to the fast scan patch now, but please do test and review the committed
 version to see if I missed something.


 Great! Thanks a lot!
 Assertion in dataPlaceToPageLeafRecompress is too strong. Page can contain
 GinDataLeafMaxContentSize bytes. Patch is attached.
 My test-suite don't run correctly. I'm debugging now.


ITSM I found this bug. ginVacuumPostingTreeLeaf re-encodes only some
segments. Others are not even re-palloced. They are moved left
in dataPlaceToPageLeafRecompress by memcpy. But it's incorrect to with
memcpy, proper solution is memmove. Using memcpy platform dependently can
lead to page corruption. Another solution is to re-palloc segments in
ginVacuumPostingTreeLeaf.

--
With best regards,
Alexander Korotkov.


gin-memmove-fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2014-01-24 Thread Simon Riggs

On 24 January 2014 07:08, Peter Geoghegan p...@heroku.com wrote:
 On Wed, Jan 15, 2014 at 6:57 AM, Simon Riggs si...@2ndquadrant.com wrote:
 v15 to fix the above problem.

 Minor quibble: I get a compiler warning with the patch applied.
 relcache.c: In function 'RememberToFreeTupleDescAtEOX':
 relcache.c:2317:3: warning: ISO C90 forbids mixed declarations and
 code [-Werror=declaration-after-statement].

Thanks, that was a wart, now fixed.

v16 attached

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


reduce_lock_levels.v16.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN improvements part 1: additional information


On 01/24/2014 10:03 AM, Alexander Korotkov wrote:

ITSM I found this bug. ginVacuumPostingTreeLeaf re-encodes only some
segments. Others are not even re-palloced. They are moved left
in dataPlaceToPageLeafRecompress by memcpy. But it's incorrect to with
memcpy, proper solution is memmove. Using memcpy platform dependently can
lead to page corruption. Another solution is to re-palloc segments in
ginVacuumPostingTreeLeaf.


Good catch. Thanks, committed, changing memcpy to memmove. Will have to 
switch to pallocing everything in the future, if we make leafRepackItems 
smarter, so that it doesn't rewrite all the segments after the first 
modified one.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN improvements part 1: additional information

On Fri, Jan 24, 2014 at 12:50 PM, Heikki Linnakangas 
hlinnakan...@vmware.com wrote:

 On 01/24/2014 10:03 AM, Alexander Korotkov wrote:

 ITSM I found this bug. ginVacuumPostingTreeLeaf re-encodes only some
 segments. Others are not even re-palloced. They are moved left
 in dataPlaceToPageLeafRecompress by memcpy. But it's incorrect to with
 memcpy, proper solution is memmove. Using memcpy platform dependently can
 lead to page corruption. Another solution is to re-palloc segments in
 ginVacuumPostingTreeLeaf.


 Good catch. Thanks, committed, changing memcpy to memmove. Will have to
 switch to pallocing everything in the future, if we make leafRepackItems
 smarter, so that it doesn't rewrite all the segments after the first
 modified one.


OK. What about previous fix in assert?

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] extension_control_path

2014-01-24 Thread Dimitri Fontaine

Sergey Muraviov sergey.k.murav...@gmail.com writes:
 I can't apply the patch.

Did you try using the `patch`(1) command?

The PostgreSQL project policy is to not use the git format when sending
patches to the mailing list, prefering the context diff format. So you
need to resort to using the basic patch commands rather than the modern
git tooling. See also:

  http://wiki.postgresql.org/wiki/Submitting_a_Patch

Patches must be in a format which provides context (eg: Context
Diff); 'normal' or 'plain' diff formats are not acceptable.

The following email might be useful for you:

  
http://www.postgresql.org/message-id/CAOR=d=0q0dal0bnztsddnwpgm5ejkxuykj7m+qsqbr728eo...@mail.gmail.com

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Infinite recursion in row-security based on updatable s.b. views

2014-01-24 Thread Craig Ringer

Hi all

I've hit an interesting wrinkle and am interested in opinions. By
integrating updatable security barrier view support with row-security,
it has become a lot harder to detect and prevent infinite recursion
(including mutual recursion) in row-security policy expansion.

The code is attached, but it's not an independent patch so it's way
easier to grab it from git:

  g...@github.com:ringerc/postgres.git
  branch rls-9.4-upd-sb-views (subject to rebase); or
  tagrls-9.4-upd-sb-views-v1

(Only contains half the old row-security patch; I'll rebase the docs,
tests, etc on top of this once it's working properly).


I've integrated the updatable security barrier view support into RLS by
injecting securityQuals in subquery_planner() just before
preprocess_rowmarks . (I'm still thinking about some inheritance related
aspects to that, but that's for later).

The problem is that this causes infinite recursion - the securityQuals
get expanded into a subquery over the original RTE that had the
row-security policy on it. Then subquery_planner is re-entered when
processing the subquery, a row-security qual is found on the target RTE,
and ... *boom*.

Fixing this is not as simple as suppressing expansion of row-security
policy when processing a security barrier subquery created by a
row-security policy, as it is desirable to respect the row-security
policy of *other* tables that get referenced in the expanded
row-security qual.

If we just record the relid of the relation a subquery was expanded from
and avoid expanding that inside the generated subquery we prevent simple
linear recursion, but not mutual recursion between two or more rels with
row-security policy.

KaiGai's row-security patch handles this because it does its own
recursive expansion, so (like the rewriter) it can keep a breadcrumb
trail and detect when it is repeating a path. That's not so practical
when row-security code tags RTEs with policy, then updatable s.b. views
goes and expands them.

So. Options.

1.Come up with a reasonable way to track recursive row-security
  expansion, detect infinite recursion, and bail out. Likely to involve
  a new global structure with planner/optimizer lifetime that gets
  checked and maintained by apply_row_security_rangetbl.

2.Solve the linear recursion case by storing a relid that should not get
  expanded for security quals when processing a subquery. Say don't do
  that, expect stack exceeded crashes if you do for the mutual
  recursion case. Seems user hostile, incomplete, and likely to break
  people's systems when they really don't expect it.

3.Disregard row-security policy on referenced tables when expanding
  row-security qualifiers. There's precendent here in foreign keys,
  which ignore row-security policy, but I don't think this is at all
  appealing.

4.Magic?

Unless someone has a suggestion for #4, I'll be going with #1. I'd
appreciate tips on doing this in a sane and efficient manner if anyone
has them. I'll be reading over the rewriter's infinite loop protection
and that of KaiGai's RLS patch for ideas in the mean time, and will give
it a good go.

BTW, I feel like we should be letting the rewriter do this job; it's
good at dealing with recursion problems already. That won't work so long
as security barrier qual expansion happens during planning, not rewrite,
though - and we've already explored the fun problems with doing upd.
s.b. qual expansion in rewrite.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From 4f8ed71e4cf18db7a4285da9b9c9f8f7f1030e32 Mon Sep 17 00:00:00 2001
From: Craig Ringer cr...@2ndquadrant.com
Date: Fri, 24 Jan 2014 16:18:40 +0800
Subject: [PATCH] RLS: First attempt to integrate with updatable s.b. views

Enters infinite recursion because the rls-protected table gets expanded
and the new RTE gets checked for row-security quals, gets securityQuals added,
and ... infinite recursion!
---
 src/backend/optimizer/plan/planner.c |   7 ++
 src/backend/optimizer/prep/prepunion.c   |   6 +
 src/backend/optimizer/util/Makefile  |   3 +-
 src/backend/optimizer/util/rowsecurity.c | 185 +++
 src/include/optimizer/rowsecurity.h  |  27 +
 5 files changed, 227 insertions(+), 1 deletion(-)
 create mode 100644 src/backend/optimizer/util/rowsecurity.c
 create mode 100644 src/include/optimizer/rowsecurity.h

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 80940ea..8cdd580 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -388,6 +388,13 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
 	}
 
 	/*
+	 * Check RTEs for row-security policies and set securityQuals on the
+	 * RTE if a policy is found. This must happen before inherited table
+	 * expansion so that the quals get copied to the child rels.
+	 */
+	apply_row_security_policy(root);
+
+	/*
 	 * Preprocess RowMark

Re: [HACKERS] GIN improvements part 1: additional information


On 01/24/2014 10:53 AM, Alexander Korotkov wrote:

OK. What about previous fix in assert?


Ah right, fixed that too now.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade: make the locale comparison more tolerating

2014-01-24 Thread Rushabh Lathia

Hello,

I started reviewing the patch. Go through the original mail thread to
understand
the need of fix and the actual problem.

http://www.postgresql.org/message-id/20121002155857.ge30...@momjian.us

Patch is using pg_valid_server_encoding() to compare the encoding which
looks
more correct. Did code walk through and it looks good to me. I tried to test
the patch on CentOS and its working fine. I am not quite knowledgeable
about other OS so on that perspective would good to have others view.

Here are the comment about the patch:

.) Patch gets cleanly apply on master ( using patch -p1 )
.) compilation done successful
.) Code walk through and logic looks good
.) Manual testing worked good for me

To test the issue I set the old database locale to en_US.utf8 and for new
database
locale to en_US.utf-8. WIth this when I tried pg_upgrade it failed with
lc_collate cluster
values do not match:  old en_US.utf8, new en_US.UTF-8. With the patch
pg_upgrade
running fine.

Regards,

Re: [HACKERS] [patch] Client-only installation on Windows


From: Andrew Dunstan and...@dunslane.net

Is there any reason why pgbench is listed in @client_program_files as
well as @client_contribs? AFAICT it should only be in the latter.


Thank you for reviewing the patch.  Yes, you are right.  I removed pgbench 
from @client_program_files.  In addition, I added some documentation, as 
well as modifying the usage at the end of install.pl.


I'll update the CommitFest entry shortly.

Regards
MauMau


client_only_install_win_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [bug fix] postgres.exe fails to start on Windows Server 2012 due to ASLR

Hello,

My customer reported the following problem on Windows. I'm afraid this is a
serious problem, and I think we should provide a fix in the next minor
release. I've attached a fix, and I would wish it to be back-ported.

[Problem]
The customer is using 64-bit PostgreSQL 9.1.x on Windows Server 2012 R2. He
found the following messages output repeatedly (once per about 3 minutes) in
the event log.

LOG: could not reserve shared memory region (addr=0127) for
child 1340: 487
LOG: could not fork autovacuum worker process: A blocking operation was
interrupted by a call to WSACancelBlockingCall.

I found the same problem was reported by another user here:

Re: PG 924, Windows 2012, error code 487
http://www.postgresql.org/message-id/2c0926abd16bb641a8e2f11a549200425471425...@phxccrprd01.adprod.bmc.com

In my customer's case, the failed process was autovacuum worker, so there
was no visible impact on their applications. However, as the above mail
suggests, this can happen with any backend process.

[Cause]
This is caused by the improved ASLR (Address Space Layout Randomization) in
Windows 8/2012. That is analyzed in the following mail last year, but the
problem was not addressed:

windows 8 RTM compatibility issue (could not reserve shared memory region
for child)

http://www.postgresql.org/message-id/5046caeb.4010...@grammatech.com

[Fix]
Disable ASLR when building PostgreSQL modules. This eliminated the log
output in my customer's environment.

I'll add this to the CommitFest if necessary.

Regards
MauMau

disable_ASLR.patch
Description: Binary data

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] pg_ctl always uses the same event source


From: MauMau maumau...@gmail.com

OK, I added several lines for this.  Please check the attached patch.


I'm sorry, I attached the old patch as v5 in my previous mail.  Attached on 
this mail is the correct one.


I'll update the CommitFest entry soon.

Regards
MauMau


pg_ctl_eventsrc_v5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Infinite recursion in row-security based on updatable s.b. views

2014-01-24 Thread Dean Rasheed

On 24 January 2014 09:04, Craig Ringer cr...@2ndquadrant.com wrote:
 Hi all

 I've hit an interesting wrinkle and am interested in opinions. By
 integrating updatable security barrier view support with row-security,
 it has become a lot harder to detect and prevent infinite recursion
 (including mutual recursion) in row-security policy expansion.

 The code is attached, but it's not an independent patch so it's way
 easier to grab it from git:

   g...@github.com:ringerc/postgres.git
   branch rls-9.4-upd-sb-views (subject to rebase); or
   tagrls-9.4-upd-sb-views-v1

 (Only contains half the old row-security patch; I'll rebase the docs,
 tests, etc on top of this once it's working properly).


 I've integrated the updatable security barrier view support into RLS by
 injecting securityQuals in subquery_planner() just before
 preprocess_rowmarks . (I'm still thinking about some inheritance related
 aspects to that, but that's for later).

 The problem is that this causes infinite recursion - the securityQuals
 get expanded into a subquery over the original RTE that had the
 row-security policy on it. Then subquery_planner is re-entered when
 processing the subquery, a row-security qual is found on the target RTE,
 and ... *boom*.

 Fixing this is not as simple as suppressing expansion of row-security
 policy when processing a security barrier subquery created by a
 row-security policy, as it is desirable to respect the row-security
 policy of *other* tables that get referenced in the expanded
 row-security qual.

 If we just record the relid of the relation a subquery was expanded from
 and avoid expanding that inside the generated subquery we prevent simple
 linear recursion, but not mutual recursion between two or more rels with
 row-security policy.

 KaiGai's row-security patch handles this because it does its own
 recursive expansion, so (like the rewriter) it can keep a breadcrumb
 trail and detect when it is repeating a path. That's not so practical
 when row-security code tags RTEs with policy, then updatable s.b. views
 goes and expands them.

 So. Options.

 1.Come up with a reasonable way to track recursive row-security
   expansion, detect infinite recursion, and bail out. Likely to involve
   a new global structure with planner/optimizer lifetime that gets
   checked and maintained by apply_row_security_rangetbl.

 2.Solve the linear recursion case by storing a relid that should not get
   expanded for security quals when processing a subquery. Say don't do
   that, expect stack exceeded crashes if you do for the mutual
   recursion case. Seems user hostile, incomplete, and likely to break
   people's systems when they really don't expect it.

 3.Disregard row-security policy on referenced tables when expanding
   row-security qualifiers. There's precendent here in foreign keys,
   which ignore row-security policy, but I don't think this is at all
   appealing.

 4.Magic?


My first thought is to add a boolean flag to RangeTblEntry (similar to
the inh flag) to say whether RLS expansion is requested for that
RTE. Then set it to false on each RTE as you add new RLS quals to it's
securityQuals.

In addition, when adding RLS quals to an RTE, I think they should be
fully and recursively expanded immediately, before setting the new
flag to false and moving on --- think recursively calling the rewriter
to expand view references in the new RLS qual, and
expand_security_qual() to expand any additional RLS quals in the
securityQuals list --- with loop detection there. I'm not pretending
that's going to be easy, but there are a couple of existing pieces of
code to borrow ideas from. Doing it this way should make it possible
to do the loop detection without any global structures.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN improvements part 1: additional information

On Fri, Jan 24, 2014 at 1:19 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 01/24/2014 10:53 AM, Alexander Korotkov wrote:

 OK. What about previous fix in assert?


 Ah right, fixed that too now.


Good, now my test-suite passed. Results are so.

Time of operations
 event | period
---+-
 index_build   | 00:01:45.709546
 index_build_recovery  | 00:00:09
 index_update  | 00:05:45.732214
 index_update_recovery | 00:01:17
 search_new| 00:24:02.191049
 search_updated| 00:26:54.122852
(6 rows)

Index sizes
 label |   size
---+---
 new   | 387547136
 after_updates | 610877440
(2 rows)

Before compression results were following.

Time of operations
 event | period
---+-
 index_build   | 00:01:51.116722
 index_build_recovery  | 00:00:08
 index_update  | 00:05:12.124758
 index_update_recovery | 00:01:43
 search_new| 00:23:44.281424
 search_updated| 00:25:41.96251
(6 rows)

Index sizes
 label |size
---+
 new   |  884514816
 after_updates | 1595252736
(2 rows)

So, we can see little regression. However, I'm not sure if it's very
significant.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] Recovery to backup point

Hi, Heiki-san,

From: MauMau maumau...@gmail.com

From: Heikki Linnakangas hlinnakan...@vmware.com

After some refactoring and fixing bugs in the existing code, I came up
with the attached patch. I called the option simply recovery_target,
with the only allowed value of immediate. IOW, if you want to stop
recovery as early as possible, you add recovery_target='immediate' to
recovery.conf. Now that we have four different options to set the
recovery target with, I rearranged the docs slightly. How does this look
to you?

I'm almost comfortable with your patch.  There are two comments:

C1. The following parts seem to be mistakenly taken from my patch.  These 
are not necessary for your patch, aren't they?

I'm going to add the attached new revision of the patch soon, which is 
almost based on yours.  All what I modified is removal of parts I mentioned 
above.  I confirmed that the original problem could be solved.  Thanks.

Regards
MauMau

recover_to_backup_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Postgresql for cygwin - 3rd

2014-01-24 Thread Andrew Dunstan



On 01/24/2014 01:20 AM, Marco Atzeri wrote:



AFAICT the regression is in Cygwin. The buildfarm passes because it's
using an oldish Cygwin release, 1.7.7 rather than the current 1.7.27. I
have brought the regression the athe attention of the Cygwin people in
the past, but without response.


which issue ?
During my package tests I have only two issues:

 tsearch  ... FAILED
and
test: prepared_xacts
must be skipped as it never completes




Those two issues need to be fixed. And yes, they are regressions from my 
Cygwin 1.7.7 setup where they pass consistently, just about every day. 
See http://www.pgbuildfarm.org/cgi-bin/show_history.pl?nm=brolgabr=HEAD


You don't get to choose which regression tests you're going to pass and 
which you're not. Disabling the tests or providing nonsensical results 
files are unacceptable. This is a Cygwin behavior issue and needs to be 
fixed.


cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] extension_control_path

2014-01-24 Thread Fabrízio de Royes Mello

On Fri, Jan 24, 2014 at 6:57 AM, Dimitri Fontaine dimi...@2ndquadrant.fr
wrote:

 Sergey Muraviov sergey.k.murav...@gmail.com writes:
  I can't apply the patch.

 Did you try using the `patch`(1) command?

 The PostgreSQL project policy is to not use the git format when sending
 patches to the mailing list, prefering the context diff format. So you
 need to resort to using the basic patch commands rather than the modern
 git tooling. See also:

   http://wiki.postgresql.org/wiki/Submitting_a_Patch

 Patches must be in a format which provides context (eg: Context
 Diff); 'normal' or 'plain' diff formats are not acceptable.


Would be nice if we can use git apply command...

:-)

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello

Re: [HACKERS] GIN improvements part2: fast scan

On Thu, Jan 23, 2014 at 8:22 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 01/14/2014 05:35 PM, Alexander Korotkov wrote:

 Attached version is rebased against last version of packed posting lists.


 Thanks!

 I think we're missing a trick with multi-key queries. We know that when
 multiple scan keys are used, they are ANDed together, so we can do the skip
 optimization even without the new tri-state consistent function.

 To get started, I propose the three attached patches. These only implement
 the optimization for the multi-key case, which doesn't require any changes
 to the consistent functions and hence no catalog changes. Admittedly this
 isn't anywhere near as useful in practice as the single key case, but let's
 go for the low-hanging fruit first. This nevertheless introduces some
 machinery that will be needed by the full patch anyway.

 I structured the code somewhat differently than your patch. There is no
 separate fast-path for the case where the optimization applies. Instead,
 I'm passing the advancePast variable all the way down to where the next
 batch of items are loaded from the posting tree. keyGetItem is now
 responsible for advancing the entry streams, and the logic in scanGetItem
 has been refactored so that it advances advancePast aggressively, as soon
 as one of the key streams let us conclude that no items  a certain point
 can match.

 scanGetItem might yet need to be refactored when we get to the full
 preconsistent check stuff, but one step at a time.


 The first patch is the most interesting one, and contains the scanGetItem
 changes. The second patch allows seeking to the right segment in a posting
 tree page, and the third allows starting the posting tree scan from root,
 when skipping items (instead of just following the right-links).

 Here are some simple performance test results, demonstrating the effect of
 each of these patches. This is a best-case scenario. I don't think these
 patches has any adverse effects even in the worst-case scenario, although I
 haven't actually tried hard to measure that. The used this to create a test
 table:

 create table foo (intarr int[]);
 -- Every row contains 0 (frequent term), and a unique number.
 insert into foo select array[0,g] from generate_series(1, 1000) g;
 -- Add another tuple with 0, 1 combo physically to the end of the table.
 insert into foo values (array[0,1]);

 The query I used is this:

 postgres=# select count(*) from foo where intarr @ array[0] and intarr @
 array[1];
  count
 ---
  2
 (1 row)

 I measured the time that query takes, and the number of pages hit, using
 explain (analyze, buffers true) 

 patches time (ms)   buffers
 ---
 unpatched   650 1316
 patch 1 0.521316
 patches 1+2 0.501316
 patches 1+2+3   0.1315

 So, the second patch isn't doing much in this particular case. But it's
 trivial, and I think it will make a difference in other queries where you
 have the opportunity skip, but return a lot of tuples overall.

 In summary, these are fairly small patches, and useful on their, so I
 think these should be committed now. But please take a look and see if the
 logic in scanGetItem/keyGetItem looks correct to you. After this, I think
 the main fast scan logic will go into keyGetItem.


Good, thanks! Now I can reimplement fast scan basing on this patches.

PS. I find it a bit surprising that in your patch, you're completely
 bailing out if there are any partial-match keys involved. Is there some
 fundamental reason for that, or just not implemented?


Just not implemented. I think there is two possible approaches to handle it:
1) Handle partial-match keys like OR on matching keys.
2) Implement keyAdvancePast for bitmap.
First approach seems to be useful with low number of keys. Probably, we
should implement automatic switching between them.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] [bug fix] psql's \conninfo reports incorrect destination on Windows


From: Fujii Masao masao.fu...@gmail.com

I think that 77035fa8a92d8c39f4c689e54f46813f203f09a8 fixed this problem.
Please check that.


I looked through your patch.  I'm fine with the PQhostaddr().  I didn't 
notice the problem when hostaddr was passed to psql on Windows.


OTOH, regarding PQhost(), I think we had better take my patch. 
connectOptions2() computes and set derived values to PGconn, so that 
PGconn's members have values which are actually used for connection.  To 
follow that rule, PGconn-pghost may as well have localhost on machines 
without UNIX domain sockets.  Plus, if we want to refer to the actual 
connection target for libpq implementation in other places than PQhost(), we 
can just use PGconn's members.  If we do so, we don't have to change 
PQhost().


Could you replace the patch?

Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal for CSN based snapshots

2014-01-24 Thread Rajeev rastogi

On 11th June 2013, Markus Wanner wrote:

  Agreed. Postgres-R uses a CommitOrderId, which is very similar in
  concept, for example.
 
  Do you think having this snapshot scheme would be helpful for
 Postgres-R?
 
 Yeah, it could help to reduce patch size, after a rewrite to use such a
 CSN.
 
  Or why do you need to tell apart aborted from in-progress
  transactions by CSN?
 
  I need to detect aborted transactions so they can be discared during
  the eviction process, otherwise the sparse array will fill up. They
  could also be filtered out by cross-referencing uncommitted slots
 with
  the procarray. Having the abort case do some additional work to make
  xid assigment cheaper looks like a good tradeoff.
 
 I see.
 
  Sparse buffer needs to be at least big enough to fit CSN slots for
  the xids of all active transactions and non-overflowed
  subtransactions. At the current level PGPROC_MAX_CACHED_SUBXIDS=64,
  the minimum comes out at 16 bytes * (64 + 1) slots * 100 =
 backends
  = 101.6KB per buffer, or 203KB total in the default configuration.
 
  A CSN is 8 bytes, the XID 4, resulting in 12 bytes per slot. So I
  guess the given 16 bytes includes alignment to 8 byte boundaries.
 Sounds good.
 
  8 byte alignment for CSNs is needed for atomic if not something else.
 
 Oh, right, atomic writes.
 
  I think the size could be cut in half by using a base value for CSNs
  if we assume that no xid is active for longer than 2B transactions as
  is currently the case. I didn't want to include the complication in
  the first iteration, so I didn't verify if that would have any
  gotchas.
 
 In Postgres-R, I effectively used a 32-bit order id which wraps around.
 
 In this case, I guess adjusting the base value will get tricky.
 Wrapping could probably be used as well, instead.
 
  The number of times each cache line can be invalidated is bounded by
  8.
 
 Hm.. good point.
 

We are also planning to implement CSN based snapshot.
So I am curious to know whether any further development is happening on this.
If not then what is the reason?

Am I missing something?

Thanks and Regards,
Kumar Rajeev Rastogi



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] psql's \conninfo reports incorrect destination on Windows

2014-01-24 Thread Fujii Masao

On Fri, Jan 24, 2014 at 9:00 PM, MauMau maumau...@gmail.com wrote:
 From: Fujii Masao masao.fu...@gmail.com

 I think that 77035fa8a92d8c39f4c689e54f46813f203f09a8 fixed this problem.
 Please check that.


 I looked through your patch.  I'm fine with the PQhostaddr().  I didn't
 notice the problem when hostaddr was passed to psql on Windows.

 OTOH, regarding PQhost(), I think we had better take my patch.
 connectOptions2() computes and set derived values to PGconn, so that
 PGconn's members have values which are actually used for connection.  To
 follow that rule, PGconn-pghost may as well have localhost on machines
 without UNIX domain sockets.

I'm not sure if we should follow that rule. As far as I read the libpq code,
at least connectFailureMessage() and connectDBStart() do the same things
as PQhost() does now.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Postgresql for cygwin - 3rd

2014-01-24 Thread Marco Atzeri




On 24/01/2014 12:56, Andrew Dunstan wrote:


On 01/24/2014 01:20 AM, Marco Atzeri wrote:



AFAICT the regression is in Cygwin. The buildfarm passes because it's
using an oldish Cygwin release, 1.7.7 rather than the current 1.7.27. I
have brought the regression the athe attention of the Cygwin people in
the past, but without response.


which issue ?
During my package tests I have only two issues:

 tsearch  ... FAILED
and
test: prepared_xacts
must be skipped as it never completes




Those two issues need to be fixed. And yes, they are regressions from my
Cygwin 1.7.7 setup where they pass consistently, just about every day.
See http://www.pgbuildfarm.org/cgi-bin/show_history.pl?nm=brolgabr=HEAD


1.7.7 is 3.5 years hold.
In the mean time we added 64 bit and moved to gcc-4.8.x


You don't get to choose which regression tests you're going to pass and
which you're not. Disabling the tests or providing nonsensical results
files are unacceptable. This is a Cygwin behavior issue and needs to be
fixed.


Distributing broken binary that crashes after standard rebase, it is 
also not acceptable and it is also worst.

Your farm is not testing this case, I presume.

Until I took over there was NO recent cygwin package at all in
the distribution, so do not complain that my binary is not perfect,
I already know, but for me almost working is better than NOT available 
or BROKEN.



cheers

andrew


I am available to work on tests regression, but I am not a postgresql
expert so do not expect all the job from my side.

Regards
Marco





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [9.3 bug] disk space in pg_xlog increases during archive recovery


From: Fujii Masao masao.fu...@gmail.com

On Wed, Jan 22, 2014 at 6:37 AM, Heikki Linnakangas

Thanks! The patch looks good to me. Attached is the updated version of
the patch. I added the comments.


Thank you very much.  Your comment looks great.  I tested some recovery 
situations, and confirmed that WAL segments were kept/removed as intended. 
I'll update the CommitFest entry with this patch.



hlinnakan...@vmware.com wrote:

Sorry for reacting so slowly, but I'm not sure I like this patch. It's a
quite useful property that all the WAL files that are needed for recovery
are copied into pg_xlog, even when restoring from archive, even when not
doing cascading replication. It guarantees that you can restart the 
standby,

even if the connection to the archive is lost for some reason. I
intentionally changed the behavior for archive recovery too, when it was
introduced for cascading replication. Also, I think it's good that the
behavior does not depend on whether cascading replication is enabled - 
it's

a quite subtle difference.

So, IMHO this is not a bug, it's a feature.


Yep.


I understood the benefit for the standby recovery.


To solve the original problem of running out of disk space in archive
recovery, I wonder if we should perform restartpoints more aggressively. 
We

intentionally don't trigger restatpoings by checkpoint_segments, only
checkpoint_timeout, but I wonder if there should be an option for that.


That's an option.


MauMau, did you try simply reducing checkpoint_timeout, while doing
recovery?


The problem is, we might not be able to perform restartpoints more 
aggressively
even if we reduce checkpoint_timeout in the server under the archive 
recovery.
Because the frequency of occurrence of restartpoints depends on not only 
that
checkpoint_timeout but also the checkpoints which happened while the 
server

was running.


I haven't tried reducing checkpoint_timeout.  I think we cannot take that 
approach, because we cannot suggest appropriate checkpoint_timeout to users. 
That is, what checkpoint_timeout setting can we suggest so that WAL doesn't 
accumulate in pg_xlog/ more than 9.1?


In addition, as Fujii-san said, it doesn't seem we can restartpoint 
completely.  Plus, if we can cause restartpoints frequently, the recovery 
would take (much?) longer, because shared buffers are flushed more 
frequently.


So, how about just removing AllowCascadeReplication() condition from the 
patch?  That allows WAL to accumulate in pg_xlog/ during standby recovery 
but not during archive recovery.


Regards
MauMau




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-24 Thread Florian Pflug

On Jan24, 2014, at 08:47 , Dean Rasheed dean.a.rash...@gmail.com wrote:
 I think it should probably be broken up. It might be overly ambitious
 to try to get all of this committed during this commitfest, and in any
 case, I suspect that the committer would probably choose to commit it
 in stages. Perhaps something like:
 
 Patch 1
 - Basic support for inverse transition functions, CREATE AGGREGATE
 support and doc updates. This should include test cases to validate
 that the underlying executor changes are correct, by defining custom
 aggregates such as sum(int) and array_agg() using inverse transition
 functions.
 
 Patch 2
 - Add built-in inverse transition functions for count, sum(int), and friends.
 
 Patch 3, 4...
 - Other related groups of built-in aggregates. By this point, it
 should be a fairly mechanical process.
 
 Splitting it up this way now should help to focus on getting patch 1
 correct, without being distracted by all the other aggregates that may
 or may not usefully be made to have inverse transition functions. I
 think the value of the feature has been proved, and it is good to see
 that it can be applied to so many aggregates, but let's not try to do
 it all at once.

Working on that now, will post individual patches later today.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] psql's \conninfo reports incorrect destination on Windows


From: Fujii Masao masao.fu...@gmail.com

On Fri, Jan 24, 2014 at 9:00 PM, MauMau maumau...@gmail.com wrote:

OTOH, regarding PQhost(), I think we had better take my patch.
connectOptions2() computes and set derived values to PGconn, so that
PGconn's members have values which are actually used for connection.  To
follow that rule, PGconn-pghost may as well have localhost on machines
without UNIX domain sockets.


I'm not sure if we should follow that rule. As far as I read the libpq 
code,

at least connectFailureMessage() and connectDBStart() do the same things
as PQhost() does now.


Yes, I found them ugly, too.  Maybe we can refactor them as a separate 
patch.


Regards
MauMau




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-24 Thread Andreas Pflug

Am 23.01.14 02:14, schrieb Jim Nasby:
 On 1/19/14, 5:51 PM, Dave Chinner wrote:
 Postgres is far from being the only application that wants this; many
 people resort to tmpfs because of this:
 https://lwn.net/Articles/499410/
 Yes, we covered the possibility of using tmpfs much earlier in the
 thread, and came to the conclusion that temp files can be larger
 than memory so tmpfs isn't the solution here.:)

 Although... instead of inventing new APIs and foisting this work onto
 applications, perhaps it would be better to modify tmpfs such that it
 can handle a temp space that's larger than memory... possibly backing
 it with X amount of real disk and allowing it/the kernel to decide
 when to passively move files out of the in-memory tmpfs and onto disk.

This is exactly what I'd expect from a file system that's suitable for
tmp purposes. The current tmpfs better should have been named memfs or
so, since it lacks the dedicated disk backing storage.

Regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: hide application_name from other users

2014-01-24 Thread Magnus Hagander

On Thu, Jan 23, 2014 at 2:01 AM, Greg Stark st...@mit.edu wrote:

 On Wed, Jan 22, 2014 at 1:03 PM, Josh Berkus j...@agliodbs.com wrote:
  Probably Heroku has some more specific exploit case to be concerned
  about here; if so, might I suggest taking it up with the -security list?

 I don't think there's a specific vulnerability that needs to be kept
 secret here.

 Here's an example. I just created a new hobby database which is on a
 multi-tenant cluster and ran select * from pg_stat_activity. Here are
 two of the more interesting examples:

  463752 | de5nmf0gbii3u5 | 32250 |   463751 | qspfkgrwgqtbcu | unicorn
 worker[1] -p 30390 -c ./config/unicorn.rb ||
   | |  |
 |   |
  | || insufficient privilege
  463752 | de5nmf0gbii3u5 | 32244 |   463751 | qspfkgrwgqtbcu | unicorn
 worker[0] -p 30390 -c ./config/unicorn.rb ||
   | |  |
 |   |
  | || insufficient privilege


 Note that the contents of the ARGV array are being set by the
 unicorn task queuing library. It knows it's making this information
 visible to other users with shell access on this machine. But the
 decision to stuff the ARGV information into the application_name is
 being made by the Pg driver. Neither is under the control of the
 application author who may not even be aware this is happening.
 Neither component has the complete information to make a competent
 decision about whether this information is safe to be in
 application_name or not.

 Note that the query is showing as insufficient privilege even
 though it is listed in the ps output -- the same ps output that is
 listing the unicorn ARGV that is being shown in the
 application_name

 You might say that the Pg gem is at fault for making a blanket policy
 decision for applications that the ARGV is safe to show to other
 database users but realistically it's so useful to see this
 information for your own connections that it's probably the right
 decision. Without it it's awfully hard to tell which worker is on
 which connection. It would just be nice to be able to treat
 application_name the same as query.


I would say that yes, this is clearly broken in the Pg gem. I can see it
having such a default, but not allowing an override...

The application can of course issue a SET application_name, assuming there
is a hook somewhere in the system that will run after the connection has
been established. I've had customers use that many times in java based
systems for example, but I don't know enough about the pg gem, or unicorn,
to have a clue if anything like it exists there. This is also a good way to
track how connections are used throughout a pooled system where the same
connection might be used for different things at different times.

What actually happens if you set the application_name in the connection
string in that environment? Does it override it to it's own default? If so,
the developers there clearly need to be taught about
fallback_application_name.

And what happens if you set it in PGAPPNAME?


Long term I agree we should really have some way of controlling these
permissions more fine grained, but I just blanket hiding application name
for non-superusers seems like a bad solution that still only fixes a small
part of the problem.




-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Change authentication error message (patch)

Bruce Momjian br...@momjian.us writes:
 On Thu, Jan 23, 2014 at 10:39:34PM -0500, Tom Lane wrote:
 I'm not convinced that this improves anything.  The problem might not in
 fact be either of the things you mention, in which case the new message 
 is outright misleading.  Also, what of the policy stated in the header
 comment for the function you're hacking, ie we intentionally don't reveal
 the precise cause of the failure to the client?

 Well, the only solution then would be to add some weasel words like
 perhaps expired password, but that seems so rare that I doubt it would
 apply very often and seems like an odd suggestion.   We could go with:

   password authentication failed for user \%s\: perhaps invalid or 
 expired password

 We did have two threads on this issue in the past 12 months so I figured
 we should try to do something.

I agree with doing *something*, but this particular thing seems to violate
our very long-standing policy on how to deal with authentication failures,
as well as being too vague to be really useful.

What would be well within that policy is to log additional information
into the postmaster log.  I see that md5_crypt_verify knows perfectly
well whether the problem is no password set, wrong password, or expired
password.  I don't see anything wrong with having it emit a log entry
--- maybe not in the second case for fear of log-spam complaints, but
certainly the third case and maybe the first one.  Or possibly cleaner,
have it return additional status so that auth_failed() can include the
info in the main ereport using errdetail_log().

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade: make the locale comparison more tolerating

2014-01-24 Thread Pavel Raiskup

Rushabh, really sorry I have to re-create the patch and thanks a
lot for looking at it!

Looking at the patch once again, I see that there were at least two
problems.  Firstly, I used the equivalent_locale function also on the
encoding values.  Even if that should not cause bugs (as it should result
in strncasecmp anyway), it was not pretty..

The second problem was assuming that the locale specifier A is not
longer then locale specifier B.  Comparisons like 'en_US.utf8' with
'en_US_.utf8' would result in success.  Bug resulting from this mistake is
not real probably but it is not nice anyway..

Rather cleaning the patch once more, attached,
Pavel
From 35b9f600b592db24bb0e25d168bc5955087d65df Mon Sep 17 00:00:00 2001
From: Pavel Raiskup prais...@redhat.com
Date: Sat, 21 Dec 2013 01:27:01 +0100
Subject: [PATCH] pg_upgrade: make the locale comparison more tolerating

Locale strings specified like 'cs_CZ.utf8' and 'cs_CZ.UTF-8'
should be treat as equivalent.  Absence of taking these as
equivalents caused fail during major server upgrade (when the
server machine has a little different encoding then the not yet
actualized data stack).  Workaround for that was changing the
system locale to match the previous locale string.

Applying of this commit makes the comparison to be done in two
phases.  Firstly is compared the encoding part of the locale
string (if any) and then the rest of string.  Before the encoding
part is compared, it is decoded into precisely defined code from
'enum pg_enc'.  This should make the comparison more stable even
for completely different spelling of encoding (e.g. 'latin2' and
'iso 8859-2').

References:
3356208.rhzgij6...@nb.usersys.redhat.com
20121002155857.ge30...@momjian.us
---
 contrib/pg_upgrade/check.c | 58 +++---
 1 file changed, 55 insertions(+), 3 deletions(-)

diff --git a/contrib/pg_upgrade/check.c b/contrib/pg_upgrade/check.c
new file mode 100644
index a706708..2adefb2
*** a/contrib/pg_upgrade/check.c
--- b/contrib/pg_upgrade/check.c
***
*** 9,14 
--- 9,15 
  
  #include postgres_fe.h
  
+ #include mb/pg_wchar.h
  #include pg_upgrade.h
  
  
*** set_locale_and_encoding(ClusterInfo *clu
*** 393,398 
--- 394,450 
  	PQfinish(conn);
  }
  
+ /*
+  * equivalent_encoding()
+  *
+  * Best effort encoding comparison.  Return true only if the encodings both
+  * are correctly spelled and when they are equivalent.
+  */
+ static bool
+ equivalent_encoding(const char *chara, const char *charb)
+ {
+ 	int			enca = pg_valid_server_encoding(chara);
+ 	int			encb = pg_valid_server_encoding(charb);
+ 
+ 	if (enca  0 || encb  0)
+ 		return false;
+ 
+ 	return (enca == encb);
+ }
+ 
+ /*
+  * equivalent_locale()
+  *
+  * Best effort locale comparison.  Return false if we are not 100% sure the
+  * locale is equivalent.
+  */
+ static bool
+ equivalent_locale(const char *loca, const char *locb)
+ {
+ 	int			lencmp;
+ 	const char *chara = strrchr(loca, '.');
+ 	const char *charb = strrchr(locb, '.');
+ 
+ 	if (!chara || !charb)
+ 		/* not both locale strings do contain encoding part */
+ 		return (pg_strcasecmp(loca, locb) == 0);
+ 	chara++;
+ 	charb++;
+ 
+ 	if (!equivalent_encoding(chara, charb))
+ 		return false;
+ 
+ 	/*
+ 	 * We know the encoding part is equivalent.  So now compare only the
+ 	 * locale identifier (e.g. en_US part of en_US.utf8).
+ 	 */
+ 
+ 	lencmp = chara - loca;
+ 	if (lencmp != charb - locb)
+ 		return false;
+ 
+ 	return (pg_strncasecmp(loca, locb, lencmp) == 0);
+ }
  
  /*
   * check_locale_and_encoding()
*** check_locale_and_encoding(ControlData *o
*** 409,421 
  	 * They also often use inconsistent hyphenation, which we cannot fix, e.g.
  	 * UTF-8 vs. UTF8, so at least we display the mismatching values.
  	 */
! 	if (pg_strcasecmp(oldctrl-lc_collate, newctrl-lc_collate) != 0)
  		pg_fatal(lc_collate cluster values do not match:  old \%s\, new \%s\\n,
  			   oldctrl-lc_collate, newctrl-lc_collate);
! 	if (pg_strcasecmp(oldctrl-lc_ctype, newctrl-lc_ctype) != 0)
  		pg_fatal(lc_ctype cluster values do not match:  old \%s\, new \%s\\n,
  			   oldctrl-lc_ctype, newctrl-lc_ctype);
! 	if (pg_strcasecmp(oldctrl-encoding, newctrl-encoding) != 0)
  		pg_fatal(encoding cluster values do not match:  old \%s\, new \%s\\n,
  			   oldctrl-encoding, newctrl-encoding);
  }
--- 461,473 
  	 * They also often use inconsistent hyphenation, which we cannot fix, e.g.
  	 * UTF-8 vs. UTF8, so at least we display the mismatching values.
  	 */
! 	if (!equivalent_locale(oldctrl-lc_collate, newctrl-lc_collate))
  		pg_fatal(lc_collate cluster values do not match:  old \%s\, new \%s\\n,
  			   oldctrl-lc_collate, newctrl-lc_collate);
! 	if (!equivalent_locale(oldctrl-lc_ctype, newctrl-lc_ctype))
  		pg_fatal(lc_ctype cluster values do not match:  old \%s\, new \%s\\n,
  			   oldctrl-lc_ctype, newctrl-lc_ctype);
! 	if (!equivalent_encoding(oldctrl-encoding, newctrl-encoding))

Re: [HACKERS] proposal: hide application_name from other users

2014-01-24 Thread Harold Giménez

On Fri, Jan 24, 2014 at 6:46 AM, Magnus Hagander mag...@hagander.net wrote:

On Thu, Jan 23, 2014 at 2:01 AM, Greg Stark st...@mit.edu wrote:

On Wed, Jan 22, 2014 at 1:03 PM, Josh Berkus j...@agliodbs.com wrote:
Probably Heroku has some more specific exploit case to be concerned
about here; if so, might I suggest taking it up with the -security list?

I don't think there's a specific vulnerability that needs to be kept
secret here.

Here's an example. I just created a new hobby database which is on a
multi-tenant cluster and ran select * from pg_stat_activity. Here are
two of the more interesting examples:

463752 | de5nmf0gbii3u5 | 32250 | 463751 | qspfkgrwgqtbcu | unicorn
worker[1] -p 30390 -c ./config/unicorn.rb ||
| | |
| |
| || insufficient privilege
463752 | de5nmf0gbii3u5 | 32244 | 463751 | qspfkgrwgqtbcu | unicorn
worker[0] -p 30390 -c ./config/unicorn.rb ||
| | |
| |
| || insufficient privilege

Note that the contents of the ARGV array are being set by the
unicorn task queuing library. It knows it's making this information
visible to other users with shell access on this machine. But the
decision to stuff the ARGV information into the application_name is
being made by the Pg driver. Neither is under the control of the
application author who may not even be aware this is happening.
Neither component has the complete information to make a competent
decision about whether this information is safe to be in
application_name or not.

Note that the query is showing as insufficient privilege even
though it is listed in the ps output -- the same ps output that is
listing the unicorn ARGV that is being shown in the
application_name

You might say that the Pg gem is at fault for making a blanket policy
decision for applications that the ARGV is safe to show to other
database users but realistically it's so useful to see this
information for your own connections that it's probably the right
decision. Without it it's awfully hard to tell which worker is on
which connection. It would just be nice to be able to treat
application_name the same as query.

I would say that yes, this is clearly broken in the Pg gem. I can see it
having such a default, but not allowing an override...

Uhm, it does allow an override as I said before.

The application can of course issue a SET application_name, assuming there
is a hook somewhere in the system that will run after the connection has
been established. I've had customers use that many times in java based
systems for example, but I don't know enough about the pg gem, or unicorn,
to have a clue if anything like it exists there. This is also a good way to
track how connections are used throughout a pooled system where the same
connection might be used for different things at different times.

What actually happens if you set the application_name in the connection
string in that environment? Does it override it to it's own default? If so,
the developers there clearly need to be taught about
fallback_application_name.

It can be overridden using any of these methods. It does in fact use
fallback_application_name when it defaults to $0, see
https://bitbucket.org/ged/ruby-pg/src/6c2444dc63e17eb695363993e8887cc5d67750bc/lib/pg/connection.rb?at=default#cl-46

And what happens if you set it in PGAPPNAME?

It works fine:

```
PGAPPNAME=this_is_a_custom_app_name ruby -w -rpg -e
PG.connect(dbname: 'hgmnz', host: 'localhost').exec('SELECT
application_name FROM pg_stat_activity') { |res| res.each { |row| puts
row.values_at('application_name') } }
this_is_a_custom_app_name
```

-Harold

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minmax indexes

2014-01-24 Thread Thom Brown

On 8 November 2013 20:11, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 Erik Rijkers wrote:
 On Thu, September 26, 2013 00:34, Erik Rijkers wrote:
  On Wed, September 25, 2013 22:34, Alvaro Herrera wrote:
 
  [minmax-5.patch]
 
  I have the impression it's not quite working correctly.

 Here's a version 7 of the patch, which fixes these bugs and adds
 opclasses for a bunch more types (timestamp, timestamptz, date, time,
 timetz), courtesy of Martín Marqués.  It's also been rebased to apply
 cleanly on top of today's master branch.

 I have also added a selectivity function, but I'm not positive that it's
 very useful yet.

This patch doesn't appear to have been submitted to any Commitfest.
Is this still a feature undergoing research then?

-- 
Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Changeset Extraction v7.1

On Thu, Jan 23, 2014 at 6:32 PM, Andres Freund and...@2ndquadrant.com wrote:
  I also wonder if we should use the
 terminology attach instead of acquire; that pairs more naturally
 with release.  Then the message, if we want more than an assert,
 might be this backend is already attached to a replication slot.

 I went with Acquire/Release because our locking code does so, and it
 seemed sensible to be consistent. I don't have strong feelings about it.

Yeah, but I think a slot is not really the same thing as a lock.
Acquire/release might be OK.  In some of my recent code I used
attach/detach, which feels a little more natural to me for something
like this, so I lean that direction.

 Unfortunately not. Inside the walsender there's currently no
 LWLockReleaseAll() for ERRORs since commands aren't run inside a
 transaction command...

 But maybe I should have fixed this by adding the release to
 WalSndErrorCleanup() instead? That'd still leave the problematic case
 that currently we try to delete a replication slot inside a CATCH when
 we fail while initializing the rest of logical replication... But I
 guess adding it would be a good idea independent of that.

+1.  I think that if we can't rely on error handling to clean up the
same things everywhere, it's gonna be a mess.  People won't be able to
keep track of which error cleanup is engaged in which code paths, and
screw-ups will result when old code paths are called from new call
sites.

 We could also do a StartTransactionCommand() but I'd rather not, that
 currently prevents code in that vicinity from doing anything it
 shouldn't via various Assert()s in existing code.

Like what?  I mean, I'm OK with having a partial error-handling
environment if that's all we need, but I think it's a bad plan to the
extent that the code here needs to be aware of error-handling
differences versus expectations elsewhere in our code.

 This doesn't need the LWLockRelease either.  It does need the
 SpinLockRelease, but as I think I noted previously, the right way to
 write this is probably: SpinLockAcquire(slot-mutex); was_active =
 slot-active; slot-active = true; SpinLockRelease(slot-mutex); if
 (was_active) ereport(). MyReplicatonSlot = slot.

 That's not really simpler tho? But if you prefer I can go that way.

It avoids a branch while holding the lock, and it puts the
SpinLockAcquire/Release pair much closer together, so it's easier to
visually verify that the lock is released in all cases.

 ReplicationSlotsComputeRequiredXmin still acquires ProcArrayLock, and
 the comment Provide interlock against concurrent recomputations
 doesn't seem adequate to me.  I guess the idea here is that we regard
 ProcArrayLock as protecting ReplicationSlotCtl-catalog_xmin and
 ReplicationSlotCtl-data_xmin, but if that's the idea then we only
 need to hold the lock during the time when we actually update those
 values, not the loop where we compute them.

 There's a comment someplace else to that end, but yes, that's
 essentially the idea. I decided to take it during the whole
 recomputation because we also take ProcArrayLock when creating a new
 decoding slot and initially setting -catalog_xmin. That's not strictly 
 required
 but seemed simpler that way, and the code shouldn't be very hot.
 The code that initially computes the starting value for catalog_xmin
 when creating a new decoding slot has to take ProcArrayLock to be safe,
 that's why I though it'd be convenient to always use it for those
 values.

I don't really see why it's simpler that way.  It's clearer what the
point of the lock is if you only hold it for the operations that need
to be protected by that lock.

 In all other cases where we modify *_xmin we're only increasing it which
 doesn't need a lock (HS feedback never has taken one, and
 GetSnapshotData() modifies -xmin while holding a shared lock), the only
 potential danger is a slight delay in increasing the overall value.

Right.  We might want to comment such places.

 Also, if that's the
 design, maybe they should be part of PROC_HDR *ProcGlobal rather than
 here.  It seems weird to have some of the values protected by
 ProcArrayLock live in a completely different data structure managed
 almost entirely by some other part of the system.

 Don't we already have cases of that? I seem to remember so. If you
 prefer having them there, I am certainly fine with doing that. This way
 they aren't allocated if slots are disabled but it's just two
 TransactionIds.

Let's go for it, unless we think of a reason not to.

 It's pretty evident that what's currently patch #4 (only peg the xmin
 horizon for catalog tables during logical decoding) needs to become
 patch #1, because it doesn't make any sense to apply this before we do
 that.

 Well, the slot code and the the slot support for streaming rep are
 independent from and don't use it. So they easily can come before it.

But this code is riddled with places where you track a catalog xmin
and a data xmin separately.  The

Re: [HACKERS] UNION ALL on partitioned tables won't use indices.

2014-01-24 Thread Noah Misch

On Tue, Jan 14, 2014 at 06:04:47PM +0900, Kyotaro HORIGUCHI wrote:
 I proposed 3 types of solution but the one of them - tweaking
 Equivalence Class (Type 1)- was dismissed because of
 inappropriate manipulation on Equivalence Class. The rest do
 essentially the same thing - flattening appendrels - at different
 timings.
 
 In type 2, the process is implemented as a generic appendrel
 flattening function and applied after expand_inherited_tables()
 in subquery_planner.
 
 In type 3, the equivelant process is combined in
 expand_inherited_rtentry().
 
 Type 2 loops once for whole appendrel list (in most cases) so it
 might be more effective than type 3 which searches the parent
 appendrel for each appended ones. Type 3 is more approprieate
 design assuming that appendrel tree deeper than 2 levels will be
 generated only by expand_inherited_tables().

Let's focus on type 3; Tom and I both found it most promising.

 The attached two patches are rebased to current 9.4dev HEAD and
 make check at the topmost directory and src/test/isolation are
 passed without error. One bug was found and fixed on the way. It
 was an assertion failure caused by probably unexpected type
 conversion introduced by collapse_appendrels() which leads
 implicit whole-row cast from some valid reltype to invalid
 reltype (0) into adjust_appendrel_attrs_mutator().

What query demonstrated this bug in the previous type 2/3 patches?

  - unionall_inh_idx_typ3_v4_20140114.patch

You have not addressed my review comments from November:
http://www.postgresql.org/message-id/20131123073913.ga1008...@tornado.leadboat.com

Specifically, these:

[transvar_merge_mutator()] has roughly the same purpose as
adjust_appendrel_attrs_mutator(), but it propagates the change to far fewer
node types.  Why this case so much simpler?  The parent translated_vars of
parent_appinfo may bear mostly-arbitrary expressions.

...

I get this warning:

prepunion.c: In function `expand_inherited_rtentry':
prepunion.c:1450: warning: passing argument 1 of `expression_tree_mutator' from 
incompatible pointer type

...

Approaches (2) and (3) leave the inheritance parent with rte-inh == true
despite no AppendRelInfo pointing to it as their parent.  Until now,
expand_inherited_rtentry() has been careful to clear rte-inh in such cases.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] extension_control_path

2014-01-24 Thread Sergey Muraviov

Hi

Now patch applies cleanly and works. :-)

But I have some notes:

1. There is an odd underscore character in functions
find_in_extension_control_path and list_extension_control_paths:
\extension_control__path\

2. If we have several versions of one extension in different directories
(which are listed in extension_control_path parameter) then we
get strange output from pg_available_extensions and
pg_available_extension_versions views (Information about extension, whose
path is at the beginning of the list, is duplicated). And only one version
of the extension can be created.

See examples:
/extensions/
├── postgis-2.0.4
│   ├── postgis--2.0.4.sql
│   └── postgis.control
└── postgis-2.1.1
├── postgis--2.1.1.sql
└── postgis.control

=

postgresql.conf:
   extension_control_path =
'/extensions/postgis-2.0.4:/extensions/postgis-2.1.1'

postgres=# table pg_catalog.pg_available_extensions;
  name   | default_version | installed_version |
comment
-+-+---+-
 postgis | 2.0.4   |   | PostGIS geometry,
geography, and raster spatial types and functions
 postgis | 2.0.4   |   | PostGIS geometry,
geography, and raster spatial types and functions
(2 rows)

postgres=# table pg_catalog.pg_available_extension_versions;
  name   | version | installed | superuser | relocatable | schema |
requires |   comment

-+-+---+---+-++--+-
 postgis | 2.0.4   | f | t | t   ||
 | PostGIS geometry, geography, and raster spatial types and functions
 postgis | 2.0.4   | f | t | t   ||
 | PostGIS geometry, geography, and raster spatial types and functions
(2 rows)


=

postgresql.conf:
   extension_control_path =
'/extensions/postgis-2.1.1:/extensions/postgis-2.0.4'

postgres=# table pg_catalog.pg_available_extensions;
  name   | default_version | installed_version |
comment
-+-+---+-
 postgis | 2.1.1   |   | PostGIS geometry,
geography, and raster spatial types and functions
 postgis | 2.1.1   |   | PostGIS geometry,
geography, and raster spatial types and functions
(2 rows)

postgres=# create extension postgis;
CREATE EXTENSION

postgres=# SELECT PostGIS_version();
postgis_version
---
 2.1 USE_GEOS=1 USE_PROJ=1 USE_STATS=1
(1 row)

postgres=# table pg_catalog.pg_available_extensions;
  name   | default_version | installed_version |
comment
-+-+---+-
 postgis | 2.1.1   | 2.1.1 | PostGIS geometry,
geography, and raster spatial types and functions
 postgis | 2.1.1   | 2.1.1 | PostGIS geometry,
geography, and raster spatial types and functions
(2 rows)

3. It would be fine to see an extension control path
in pg_available_extensions and pg_available_extension_versions views (in
separate column or within of extension name).

4. Perhaps the CREATE EXTENSION command should be improved to allow
creation of the required version of the extension.
So we can use different versions of extensions in different databases.

PS
Sorry for my English.

2014/1/24 Fabrízio de Royes Mello fabriziome...@gmail.com


 On Fri, Jan 24, 2014 at 6:57 AM, Dimitri Fontaine dimi...@2ndquadrant.fr
 wrote:
 
  Sergey Muraviov sergey.k.murav...@gmail.com writes:
   I can't apply the patch.
 
  Did you try using the `patch`(1) command?
 
  The PostgreSQL project policy is to not use the git format when sending
  patches to the mailing list, prefering the context diff format. So you
  need to resort to using the basic patch commands rather than the modern
  git tooling. See also:
 
http://wiki.postgresql.org/wiki/Submitting_a_Patch
 
  Patches must be in a format which provides context (eg: Context
  Diff); 'normal' or 'plain' diff formats are not acceptable.
 

 Would be nice if we can use git apply command...

 :-)

 --
 Fabrízio de Royes Mello
 Consultoria/Coaching PostgreSQL
  Timbira: http://www.timbira.com.br
  Blog sobre TI: http://fabriziomello.blogspot.com
  Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
  Twitter: http://twitter.com/fabriziomello




-- 
Best regards,
Sergey Muraviov

Re: [HACKERS] Why do we let autovacuum give up?

2014-01-24 Thread Josh Berkus

On 01/23/2014 07:22 PM, Alvaro Herrera wrote:
 If you ask me, I'd like autovac to know when not to run (or rather
  wait a bit, not forever), perhaps by checking load factors or some
  other tell-tale of an already-saturated I/O system.
 We had a proposed design to tell autovac when not to run (or rather,
 when to switch settings very high so that in practice it'd never run).
 At some point somebody said but we can just change autovacuum=off in
 postgresql.conf via crontab when the high load period starts, and turn
 it back on afterwards --- and that was the end of it.

Anything which depends on a timing-based feedback loop is going to be
hopeless.  Saying autovac shouldn't run if load is high sounds like a
simple statement, until you actually try to implement it.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Changeset Extraction v7.1

2014-01-24 Thread Andres Freund

On 2014-01-24 12:10:50 -0500, Robert Haas wrote:
  Unfortunately not. Inside the walsender there's currently no
  LWLockReleaseAll() for ERRORs since commands aren't run inside a
  transaction command...
 
  But maybe I should have fixed this by adding the release to
  WalSndErrorCleanup() instead? That'd still leave the problematic case
  that currently we try to delete a replication slot inside a CATCH when
  we fail while initializing the rest of logical replication... But I
  guess adding it would be a good idea independent of that.
 
 +1.  I think that if we can't rely on error handling to clean up the
 same things everywhere, it's gonna be a mess.  People won't be able to
 keep track of which error cleanup is engaged in which code paths, and
 screw-ups will result when old code paths are called from new call
 sites.

Ok, I'll additionally add it there.

  We could also do a StartTransactionCommand() but I'd rather not, that
  currently prevents code in that vicinity from doing anything it
  shouldn't via various Assert()s in existing code.
 
 Like what?  I mean, I'm OK with having a partial error-handling
 environment if that's all we need, but I think it's a bad plan to the
 extent that the code here needs to be aware of error-handling
 differences versus expectations elsewhere in our code.

Catalog lookups, building a snapshot, xid assignment, whatnot. All that
shouldn't happen in the locations creating/dropping a slot.
I think we should at some point separate parts of the error handling out
of xact.c. Currently its repeated slightly differently over logs of
places (check e.g. the callsites for LWLockReleaseAll), that's not
robust. But that's a project for another day.

Note that the actual decoding *does* happen inside a TransactionCommand,
as it'd be failure prone to copy all the cleanup logic. And we need to
have most of the normal cleanup code.

 I don't really see why it's simpler that way.  It's clearer what the
 point of the lock is if you only hold it for the operations that need
 to be protected by that lock.

  In all other cases where we modify *_xmin we're only increasing it which
  doesn't need a lock (HS feedback never has taken one, and
  GetSnapshotData() modifies -xmin while holding a shared lock), the only
  potential danger is a slight delay in increasing the overall value.

 Right.  We might want to comment such places.

  Don't we already have cases of that? I seem to remember so. If you
  prefer having them there, I am certainly fine with doing that. This way
  they aren't allocated if slots are disabled but it's just two
  TransactionIds.
 
 Let's go for it, unless we think of a reason not to.

ok on those counts.

  It's pretty evident that what's currently patch #4 (only peg the xmin
  horizon for catalog tables during logical decoding) needs to become
  patch #1, because it doesn't make any sense to apply this before we do
  that.
 
  Well, the slot code and the the slot support for streaming rep are
  independent from and don't use it. So they easily can come before it.
 
 But this code is riddled with places where you track a catalog xmin
 and a data xmin separately.  The only point of doing it that way is to
 support a division that hasn't been made yet.

If you think it will make stuff more manageable I can try separating all
lines dealing with catalog_xmin into another patch as long as data_xmin
doesn't have to be renamed.
That said, I don't really think it's a big problem that the division
hasn't been made, essentially the meaning is different, even if we don't
take advantage of it yet. data_xmin is there for streaming replication
where we need to prevent all removal, catalog_xmin is there for
changeset extraction.

 I have zero confidence that it's OK to treat fsync() as an operation
 that won't fail.  Linux documents EIO as a plausible error return, for
 example.  (And really, how could it not?)

But quite fundamentally having a the most basic persistency building
block fail is something you can't really handle safely. Note that
issue_xlog_fsync() has always (and I wager, will always) treated that as
a PANIC.
I don't recall many complaints about that for WAL. All of the ones I
found in a quick search were like oh, the fs invalidated my fd because
of corruption. And few.

  Calling a slot old or new looks liable to cause problems.  Maybe
  change those names to contain a character not allowed in a slot name,
  if we're going to keep doing it that way.
  I wondered about making them plain files as well but given the need for
  a directory independent from this I don't really see the advantage,
  we'll need to handle them anyway during cleanup.
 
 Yeah, sure, but if it makes for fewer in-between states, it might be worth it.

I don't think it'd make anything simpler with the new version of the
code. Agreed?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--

Re: [HACKERS] Change authentication error message (patch)

I wrote:
 I agree with doing *something*, but this particular thing seems to violate
 our very long-standing policy on how to deal with authentication failures,
 as well as being too vague to be really useful.

 What would be well within that policy is to log additional information
 into the postmaster log.  I see that md5_crypt_verify knows perfectly
 well whether the problem is no password set, wrong password, or expired
 password.  I don't see anything wrong with having it emit a log entry
 --- maybe not in the second case for fear of log-spam complaints, but
 certainly the third case and maybe the first one.  Or possibly cleaner,
 have it return additional status so that auth_failed() can include the
 info in the main ereport using errdetail_log().

Attached is a proposed patch that does exactly that.  This just addresses
the cases mentioned above; once the infrastructure is in place, there
might be more things that would be worth logging using this mechanism.

regards, tom lane

diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c
index 882dc8f..1974b57 100644
*** a/src/backend/libpq/auth.c
--- b/src/backend/libpq/auth.c
***
*** 38,46 
   *
   */
  static void sendAuthRequest(Port *port, AuthRequest areq);
! static void auth_failed(Port *port, int status);
  static char *recv_password_packet(Port *port);
! static int	recv_and_check_password_packet(Port *port);
  
  
  /*
--- 38,46 
   *
   */
  static void sendAuthRequest(Port *port, AuthRequest areq);
! static void auth_failed(Port *port, int status, char *logdetail);
  static char *recv_password_packet(Port *port);
! static int	recv_and_check_password_packet(Port *port, char **logdetail);
  
  
  /*
*** ClientAuthentication_hook_type ClientAut
*** 207,216 
   * in use, and these are items that must be presumed known to an attacker
   * anyway.
   * Note that many sorts of failure report additional information in the
!  * postmaster log, which we hope is only readable by good guys.
   */
  static void
! auth_failed(Port *port, int status)
  {
  	const char *errstr;
  	int			errcode_return = ERRCODE_INVALID_AUTHORIZATION_SPECIFICATION;
--- 207,217 
   * in use, and these are items that must be presumed known to an attacker
   * anyway.
   * Note that many sorts of failure report additional information in the
!  * postmaster log, which we hope is only readable by good guys.  In
!  * particular, if logdetail isn't NULL, we send that string to the log.
   */
  static void
! auth_failed(Port *port, int status, char *logdetail)
  {
  	const char *errstr;
  	int			errcode_return = ERRCODE_INVALID_AUTHORIZATION_SPECIFICATION;
*** auth_failed(Port *port, int status)
*** 273,286 
  	}
  
  	if (port-hba)
! 		ereport(FATAL,
! (errcode(errcode_return),
!  errmsg(errstr, port-user_name),
!  errdetail_log(Connection matched pg_hba.conf line %d: \%s\, port-hba-linenumber, port-hba-rawline)));
! 	else
! 		ereport(FATAL,
! (errcode(errcode_return),
!  errmsg(errstr, port-user_name)));
  
  	/* doesn't return */
  }
--- 274,294 
  	}
  
  	if (port-hba)
! 	{
! 		char   *cdetail;
! 
! 		cdetail = psprintf(_(Connection matched pg_hba.conf line %d: \%s\),
! 		   port-hba-linenumber, port-hba-rawline);
! 		if (logdetail)
! 			logdetail = psprintf(%s\n%s, logdetail, cdetail);
! 		else
! 			logdetail = cdetail;
! 	}
! 
! 	ereport(FATAL,
! 			(errcode(errcode_return),
! 			 errmsg(errstr, port-user_name),
! 			 logdetail ? errdetail_log(%s, logdetail) : 0));
  
  	/* doesn't return */
  }
*** void
*** 294,299 
--- 302,308 
  ClientAuthentication(Port *port)
  {
  	int			status = STATUS_ERROR;
+ 	char	   *logdetail = NULL;
  
  	/*
  	 * Get the authentication method to use for this frontend/database
*** ClientAuthentication(Port *port)
*** 507,518 
  		(errcode(ERRCODE_INVALID_AUTHORIZATION_SPECIFICATION),
  		 errmsg(MD5 authentication is not supported when \db_user_namespace\ is enabled)));
  			sendAuthRequest(port, AUTH_REQ_MD5);
! 			status = recv_and_check_password_packet(port);
  			break;
  
  		case uaPassword:
  			sendAuthRequest(port, AUTH_REQ_PASSWORD);
! 			status = recv_and_check_password_packet(port);
  			break;
  
  		case uaPAM:
--- 516,527 
  		(errcode(ERRCODE_INVALID_AUTHORIZATION_SPECIFICATION),
  		 errmsg(MD5 authentication is not supported when \db_user_namespace\ is enabled)));
  			sendAuthRequest(port, AUTH_REQ_MD5);
! 			status = recv_and_check_password_packet(port, logdetail);
  			break;
  
  		case uaPassword:
  			sendAuthRequest(port, AUTH_REQ_PASSWORD);
! 			status = recv_and_check_password_packet(port, logdetail);

Re: [HACKERS] Minmax indexes

2014-01-24 Thread Alvaro Herrera

Thom Brown wrote:
 On 8 November 2013 20:11, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
  Erik Rijkers wrote:
  On Thu, September 26, 2013 00:34, Erik Rijkers wrote:
   On Wed, September 25, 2013 22:34, Alvaro Herrera wrote:
  
   [minmax-5.patch]
  
   I have the impression it's not quite working correctly.
 
  Here's a version 7 of the patch, which fixes these bugs and adds
  opclasses for a bunch more types (timestamp, timestamptz, date, time,
  timetz), courtesy of Martín Marqués.  It's also been rebased to apply
  cleanly on top of today's master branch.
 
  I have also added a selectivity function, but I'm not positive that it's
  very useful yet.
 
 This patch doesn't appear to have been submitted to any Commitfest.
 Is this still a feature undergoing research then?

It's still a planned feature, but I didn't have time to continue work
for 2014-01.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minmax indexes

2014-01-24 Thread Thom Brown

On 24 January 2014 17:53, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 Thom Brown wrote:
 On 8 November 2013 20:11, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
  Erik Rijkers wrote:
  On Thu, September 26, 2013 00:34, Erik Rijkers wrote:
   On Wed, September 25, 2013 22:34, Alvaro Herrera wrote:
  
   [minmax-5.patch]
  
   I have the impression it's not quite working correctly.
 
  Here's a version 7 of the patch, which fixes these bugs and adds
  opclasses for a bunch more types (timestamp, timestamptz, date, time,
  timetz), courtesy of Martín Marqués.  It's also been rebased to apply
  cleanly on top of today's master branch.
 
  I have also added a selectivity function, but I'm not positive that it's
  very useful yet.

 This patch doesn't appear to have been submitted to any Commitfest.
 Is this still a feature undergoing research then?

 It's still a planned feature, but I didn't have time to continue work
 for 2014-01.

Alles klar.

Thanks

--
Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why do we let autovacuum give up?

2014-01-24 Thread Claudio Freire

On Fri, Jan 24, 2014 at 2:44 PM, Josh Berkus j...@agliodbs.com wrote:
 On 01/23/2014 07:22 PM, Alvaro Herrera wrote:
 If you ask me, I'd like autovac to know when not to run (or rather
  wait a bit, not forever), perhaps by checking load factors or some
  other tell-tale of an already-saturated I/O system.
 We had a proposed design to tell autovac when not to run (or rather,
 when to switch settings very high so that in practice it'd never run).
 At some point somebody said but we can just change autovacuum=off in
 postgresql.conf via crontab when the high load period starts, and turn
 it back on afterwards --- and that was the end of it.

 Anything which depends on a timing-based feedback loop is going to be
 hopeless.  Saying autovac shouldn't run if load is high sounds like a
 simple statement, until you actually try to implement it.

Exactly.

But people tuning autovac down are doing exactly that: trying to tune
autovac to background-only work. They *must* then launch foreground
vacuums, at times they deem sensible, when doing that.

So, problem is not of people tuning down autovacuum, but of them
forgetting to vacuum explicitly after doing so.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minmax indexes

2014-01-24 Thread Claudio Freire

On Fri, Jan 24, 2014 at 2:54 PM, Thom Brown t...@linux.com wrote:
 On 24 January 2014 17:53, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 Thom Brown wrote:
 On 8 November 2013 20:11, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
  Erik Rijkers wrote:
  On Thu, September 26, 2013 00:34, Erik Rijkers wrote:
   On Wed, September 25, 2013 22:34, Alvaro Herrera wrote:
  
   [minmax-5.patch]
  
   I have the impression it's not quite working correctly.
 
  Here's a version 7 of the patch, which fixes these bugs and adds
  opclasses for a bunch more types (timestamp, timestamptz, date, time,
  timetz), courtesy of Martín Marqués.  It's also been rebased to apply
  cleanly on top of today's master branch.
 
  I have also added a selectivity function, but I'm not positive that it's
  very useful yet.

 This patch doesn't appear to have been submitted to any Commitfest.
 Is this still a feature undergoing research then?

 It's still a planned feature, but I didn't have time to continue work
 for 2014-01.

What's the status?

I believe I have more than a use for minmax indexes, and wouldn't mind
lending a hand if it's within my grasp.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN improvements part2: fast scan


On 01/24/2014 01:58 PM, Alexander Korotkov wrote:

On Thu, Jan 23, 2014 at 8:22 PM, Heikki Linnakangas hlinnakan...@vmware.com

wrote:



In summary, these are fairly small patches, and useful on their, so I
think these should be committed now. But please take a look and see if the
logic in scanGetItem/keyGetItem looks correct to you. After this, I think
the main fast scan logic will go into keyGetItem.


Good, thanks! Now I can reimplement fast scan basing on this patches.


I hope we're not wasting effort doing the same thing, but I was also 
hacking that; here's what I got. It applies on top of the previous set 
of patches.


I decided to focus on the ginget.c changes, and worry about the catalog 
changes later. Instead, I added an abstraction for calling the ternary 
consistent function, with a shim implementation that checks if there is 
only one UNKNOWN argument, and tries the regular boolean consistent 
function both ways for the UNKNOWN argument. That's the same strategy 
we were already using when dealing with a key with one lossy page, so I 
refactored that to also use the new ternary consistent function.


That abstraction can be used to do other optimizations in the future. 
For example, building the truth table like I suggested a long time ago, 
or simply checking for some common cases, like if the consistent 
function implements plain AND logic. Or caching the results of expensive 
consistent functions. And obviously, that is where the call to the 
opclass-provided tri-state consistent function will go to.


Then, I rewrote keyGetItem to make use of the ternary consistent 
function, to perform the pre-consistent check. That is the core logic 
from your patch. Currently (ie. without the patch), we loop through all 
the entries, and advance them to the next item pointer  advancePast, 
and then perform the consistent check for the smallest item among those. 
With the patch, we first determine the smallest item pointer among the 
entries with curItem  advancePast, and call that minItem. The others 
are considered as unknown, as they might contain an item X where 
advancePast  X  minItem. Normally, we would have to load the next item 
from that entry to determine if X exists, but we can skip it if the 
ternary pre-consistent function says that X cannot match anyway.


In addition to that, I'm using the ternary consistent function to check 
if minItem is a match, even if we haven't loaded all the entries yet. 
That's less important, but I think for something like rare1 | (rare2  
frequent) it might be useful. It would allow us to skip fetching 
'frequent', when we already know that 'rare1' matches for the current 
item. I'm not sure if that's worth the cycles, but it seemed like an 
obvious thing to do, now that we have the ternary consistent function.


(hmm, I should put the above paragraphs in a comment in the patch)

This isn't exactly the same structure as in your patch, but I found the 
concept easier to understand when written this way. I did not implement 
the sorting of the entries. It seems like a sensible thing to do, but 
I'd like to see a test case that shows the difference before bothering 
with it. If we do it, a binary heap probably makes more sense than 
keeping the array fully sorted.


There's one tradeoff here that should be mentioned: In most cases, it's 
extremely cheap to fetch the next item from an entry stream. We load a 
page worth of items into the array, so it's just a matter of pulling the 
next item from the array. Instead of trying to refute such items based 
on other entries, would it be better to load them and call the 
consistent function the normal way for them? Refuting might slash all 
the entries in one consistent check, but OTOH, when refuting fails, the 
consistent check was a waste of cycles. If we only tried to refute items 
when the alternative would be to load a new page, there would be less 
change of a performance regression from this patch.


Anyway, how does this patch look to you? Did I get the logic correct?

Do you have some test data and/or queries that you could share easily?

- Heikki
From eb9c6a202cbb0ab03181cb19a434deb6082da497 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas heikki.linnakan...@iki.fi
Date: Thu, 23 Jan 2014 23:08:43 +0200
Subject: [PATCH 1/1] Add the concept of a ternary consistent check, and use it
 to skip entries.

When we have loaded the next item from some, but not all, entries in a scan,
it might be possible to prove that there cannot be any matches with smaller
item pointer coming from the other entries. In that case, we can
fast-forward those entries to the smallest item among the already-fetched
sources.

There is no support for opclass-defined ternary consistent functions yet,
but there is a shim function that calls the regular, boolean, consistent
function both ways, when only one input is unknown.

Per the concept by Alexander Korotkov
---
 src/backend/access/gin/Makefile   |   2 +-

Re: [HACKERS] new json funcs

2014-01-24 Thread Andrew Dunstan



On 01/22/2014 12:49 PM, Andrew Dunstan wrote:


On 01/21/2014 06:21 PM, Marko Tiikkaja wrote:

Hi Andrew,

On 1/18/14, 10:05 PM, I wrote:

But I'll continue with my review now that this has been sorted out.


Sorry about the delay.

I think the API for the new functions looks good.  They are all 
welcome additions to the JSON family.


The implementation side looks reasonable to me.  I'm not sure there's 
need to duplicate so much code, though.  E.g. json_to_recordset is 
almost identical to json_populate_recordset, and json_to_record has a 
bit of the same disease.


Finally, (as I'm sure you know already), docs are still missing. 
Marking the patch Waiting on Author for the time being.







New patch attached. Main change is I changed 
json_populate_record/json_to_record to call a common worker function, 
and likewise with json_populate_recordset/json_to_recordset.


We're still finalizing the docs - should be ready in the next day or so.



OK, here's the patch, this time with docs, thanks to Merlin Moncure and 
Josh Berkus for help with that.


I want to do some more wordsmithing around json_to_record{set} and 
json_populate_record{set}, but I think this is close to being 
committable as is.


cheers

andrew


diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index c0a75de..d20e0ea 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -10300,6 +10300,136 @@ table2-mapping
entryliteraljson_typeof('-123.4')/literal/entry
entryliteralnumber/literal/entry
   /row
+  row
+   entry
+ indexterm
+  primaryjson_build_array/primary
+ /indexterm
+ literaljson_build_array(VARIADIC any)/literal
+   /entry
+   entrytypejson/type/entry
+   entry
+ Builds a heterogeneously typed json array out of a variadic agument list.
+   /entry
+   entryliteralSELECT json_build_array(1,2,'3',4,5);/literal/entry
+   entry
+programlisting
+ json_build_array
+---
+ [1, 2, 3, 4, 5]
+ /programlisting
+   /entry
+  /row
+  row
+   entry
+ indexterm
+  primaryjson_build_object/primary
+ /indexterm
+ literaljson_build_object(VARIADIC any)/literal
+   /entry
+   entrytypejson/type/entry
+   entry
+ Builds a JSON array out of a variadic agument list.  By convention, the object is 
+ constructed out of alternating name/value arguments.
+   /entry
+   entryliteralSELECT json_build_object('foo',1,'bar',2);/literal/entry
+   entry
+programlisting
+   json_build_object
+
+ {foo : 1, bar : 2}
+ /programlisting
+   /entry
+  /row
+  row
+   entry
+ indexterm
+  primaryjson_object/primary
+ /indexterm
+ literaljson_object(text[])/literal
+   /entry
+   entrytypejson/type/entry
+   entry
+ Builds a JSON object out of a text array.  The array must have exactly one dimension
+ with an even number of members, in which case they are taken as alternating name/value
+ pairs, or two dimensions with such that each inner array has exactly two elements, which
+ are taken as a name/value pair.
+   /entry
+   entryliteralselect * from json_object('{a, 1, b, def, c, 3.5}')  or literalselect * from json_object('{{a, 1},{b, def},{c, 3.5}}')/literal/literal/entry
+   entry
+programlisting
+  json_object
+---
+ {a : 1, b : def, c : 3.5}
+ /programlisting
+   /entry
+  /row
+  row
+   entry
+ literaljson_object(keys text[], values text[])/literal
+   /entry
+   entrytypejson/type/entry
+   entry
+ The two argument form of JSON object takes keys and values pairwise from two separate
+ arrays. In all other respects it is identical to the one argument form.
+   /entry
+   entryliteralselect * from json_object('{a, b}', '{1,2}');/literal/entry
+   entry
+programlisting
+  json_object
+
+ {a : 1, b : 2}
+ /programlisting
+   /entry
+  /row
+  row
+   entry
+ indexterm
+  primaryjson_to_record/primary
+ /indexterm
+ literaljson_to_record(json, nested_as_text bool)/literal
+   /entry
+   entrytyperecord/type/entry
+   entry
+ json_to_record returns an arbitrary record from a JSON object.  As with all functions 
+ returning 'record', the caller must explicitly define the structure of the record 
+ when making the call. The input JSON must be an object, not a scalar or an array.
+ If nested_as_text is true, the function coerces nested complex elements to text.
+ Also, see notes below on columns and types.
+   /entry
+   entryliteralselect * from json_to_record('{a:1,b:[1,2,3],c:bar}',true) as x(a int, b text, d text) /literal/entry
+   entry
+programlisting
+ a |b| d

[HACKERS] LIKE INCLUDING CONSTRAINTS is broken

2014-01-24 Thread Alvaro Herrera

It seems CREATE TABLE ... (LIKE INCLUDING CONSTRAINTS)  doesn't work
cleanly when there's also regular inheritance; my guess is that attnums
get messed up at some point after the constraints are generated.

Here's a trivial test case:

create table b (b1 int unique check (b1  100));
CREATE TABLE c (c1 int not null references b (b1));
create table d (d1 int, d2 point not null);
create table a (a1 int not null,
a2 text primary key,
a3 timestamptz(6),
like b including constraints,
like c)
inherits (d);

You can see the broken state:

alvherre=# \d [ab]
   Tabla «public.a»
 Columna |Tipo | Modificadores 
-+-+---
 d1  | integer | 
 d2  | point   | not null
 a1  | integer | not null
 a2  | text| not null
 a3  | timestamp(6) with time zone | 
 b1  | integer | 
 c1  | integer | not null
Índices:
a_pkey PRIMARY KEY, btree (a2)
Restricciones CHECK:
b_b1_check CHECK (a2  100)
Hereda: d

 Tabla «public.b»
 Columna |  Tipo   | Modificadores 
-+-+---
 b1  | integer | 
Índices:
b_b1_key UNIQUE CONSTRAINT, btree (b1)
Restricciones CHECK:
b_b1_check CHECK (b1  100)
Referenciada por:
TABLE c CONSTRAINT c_c1_fkey FOREIGN KEY (c1) REFERENCES b(b1)


Notice how the CHECK constraint in table b points to column b1, but in
table a it is mentioning column a2, even though that one is not even of
the correct datatype.  In fact if you try an insert, you get a weird
error message:

alvherre=# insert into a (d2, a2, a1, c1) values ('(1, 0)', '1', 1, 1);
ERROR:  attribute 4 has wrong type
DETALLE:  Table has type text, but query expects integer.

If I take out the INHERITS clause in table a, the error disappears.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recovery to backup point

On 01/24/2014 01:37 PM, MauMau wrote:

Hi, Heiki-san,

From: MauMau maumau...@gmail.com

From: Heikki Linnakangas hlinnakan...@vmware.com

After some refactoring and fixing bugs in the existing code, I came up
with the attached patch. I called the option simply recovery_target,
with the only allowed value of immediate. IOW, if you want to stop
recovery as early as possible, you add recovery_target='immediate' to
recovery.conf. Now that we have four different options to set the
recovery target with, I rearranged the docs slightly. How does this look
to you?

I'm almost comfortable with your patch.  There are two comments:

C1. The following parts seem to be mistakenly taken from my patch.  These
are not necessary for your patch, aren't they?

I'm going to add the attached new revision of the patch soon, which is
almost based on yours.  All what I modified is removal of parts I mentioned
above.  I confirmed that the original problem could be solved.  Thanks.

Thanks, committed!

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] new json funcs

2014-01-24 Thread Laurence Rowe

For consistency with the existing json functions (json_each,
json_each_text, etc.) it might be better to add separate
json_to_record_text and json_to_recordset_text functions in place of the
nested_as_text parameter to json_to_record and json_to_recordset.


On 24 January 2014 10:26, Andrew Dunstan and...@dunslane.net wrote:


 On 01/22/2014 12:49 PM, Andrew Dunstan wrote:


 On 01/21/2014 06:21 PM, Marko Tiikkaja wrote:

 Hi Andrew,

 On 1/18/14, 10:05 PM, I wrote:

 But I'll continue with my review now that this has been sorted out.


 Sorry about the delay.

 I think the API for the new functions looks good.  They are all welcome
 additions to the JSON family.

 The implementation side looks reasonable to me.  I'm not sure there's
 need to duplicate so much code, though.  E.g. json_to_recordset is almost
 identical to json_populate_recordset, and json_to_record has a bit of the
 same disease.

 Finally, (as I'm sure you know already), docs are still missing. Marking
 the patch Waiting on Author for the time being.





 New patch attached. Main change is I changed 
 json_populate_record/json_to_record
 to call a common worker function, and likewise with
 json_populate_recordset/json_to_recordset.

 We're still finalizing the docs - should be ready in the next day or so.



 OK, here's the patch, this time with docs, thanks to Merlin Moncure and
 Josh Berkus for help with that.

 I want to do some more wordsmithing around json_to_record{set} and
 json_populate_record{set}, but I think this is close to being committable
 as is.

 cheers

 andrew




 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Support for pg_stat_archiver view

2014-01-24 Thread Fujii Masao

On Thu, Jan 23, 2014 at 4:10 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Thu, Jan 9, 2014 at 6:36 AM, Gabriele Bartolini
 gabriele.bartol...@2ndquadrant.it wrote:
 Il 08/01/14 18:42, Simon Riggs ha scritto:
 Not sure I see why it needs to be an SRF. It only returns one row.
 Attached is version 4, which removes management of SRF stages.

 I have been looking at v4 a bit more, and found a couple of small things:
 - a warning in pgstatfuncs.c
 - some typos and format fixing in the sgml docs
 - some corrections in a couple of comments
 - Fixed an error message related to pg_stat_reset_shared referring
 only to bgwriter and not the new option archiver. Here is how the new
 message looks:
 =# select pg_stat_reset_shared('popo');
 ERROR:  22023: unrecognized reset target: popo
 HINT:  Target must be bgwriter or archiver
 A new version is attached containing those fixes. I played also with
 the patch and pgbench, simulating some archive failures and successes
 while running pgbench and the reports given by pg_stat_archiver were
 correct, so I am marking this patch as Ready for committer.

I refactored the patch further.

* Remove ArchiverStats global variable to simplify pgarch.c.
* Remove stats_timestamp field from PgStat_ArchiverStats struct because
   it's not required.

I have some review comments:

+s.archived_wals,
+s.last_archived_wal,
+s.last_archived_wal_time,
+s.failed_attempts,
+s.last_failed_wal,
+s.last_failed_wal_time,

The column names of pg_stat_archiver look not consistent at least to me.
What about the followings?

  archive_count
  last_archived_wal
  last_archived_time
  fail_count
  last_failed_wal
  last_failed_time

I think that it's time to rename all the variables related to pg_stat_bgwriter.
For example, it's better to change PgStat_GlobalStats to PgStat_Bgwriter.
I think that it's okay to make this change as separate patch, though.

+char last_archived_wal[MAXFNAMELEN];/* last WAL file archived */
+TimestampTz last_archived_wal_timestamp;/* last archival success */
+PgStat_Counter failed_attempts;
+char last_failed_wal[MAXFNAMELEN];/* last WAL file involved
in failure */

Some hackers don't like the increase of the size of the statsfile. In order to
reduce the size as much as possible, we should use the exact size of WAL file
here instead of MAXFNAMELEN?

Regards,

-- 
Fujii Masao
*** a/doc/src/sgml/monitoring.sgml
--- b/doc/src/sgml/monitoring.sgml
***
*** 270,275  postgres: replaceableuser/ replaceabledatabase/ replaceablehost/ re
--- 270,283 
   /row
  
   row
+   entrystructnamepg_stat_archiver/indextermprimarypg_stat_archiver/primary/indexterm/entry
+   entryOne row only, showing statistics about the
+WAL archiver process's activity. See
+xref linkend=pg-stat-archiver-view for details.
+   /entry
+  /row
+ 
+  row
entrystructnamepg_stat_bgwriter/indextermprimarypg_stat_bgwriter/primary/indexterm/entry
entryOne row only, showing statistics about the
 background writer process's activity. See
***
*** 648,653  postgres: replaceableuser/ replaceabledatabase/ replaceablehost/ re
--- 656,718 
 /para
/note
  
+   table id=pg-stat-archiver-view xreflabel=pg_stat_archiver
+titlestructnamepg_stat_archiver/structname View/title
+ 
+tgroup cols=3
+ thead
+  row
+   entryColumn/entry
+   entryType/entry
+   entryDescription/entry
+  /row
+ /thead
+ 
+ tbody
+  row
+   entrystructfieldarchived_wals//entry
+   entrytypebigint/type/entry
+   entryNumber of WAL files that have been successfully archived/entry
+  /row
+  row
+   entrystructfieldlast_archived_wal//entry
+   entrytypetext/type/entry
+   entryName of the last WAL file successfully archived/entry
+  /row
+  row
+   entrystructfieldlast_archived_wal_time//entry
+   entrytypetimestamp with time zone/type/entry
+   entryTime of the last successful archive operation/entry
+  /row
+  row
+   entrystructfieldfailed_attempts//entry
+   entrytypebigint/type/entry
+   entryNumber of failed attempts for archiving WAL files/entry
+  /row
+  row
+   entrystructfieldlast_failed_wal//entry
+   entrytypetext/type/entry
+   entryName of the WAL file of the last failed archival operation/entry
+  /row
+  row
+   entrystructfieldlast_failed_wal_time//entry
+   entrytypetimestamp with time zone/type/entry
+   entryTime of the last failed archival operation/entry
+  /row
+  row
+   entrystructfieldstats_reset//entry
+   entrytypetimestamp with time zone/type/entry
+   entryTime at which these statistics were last reset/entry
+  /row
+ /tbody
+/tgroup
+   /table
+ 
+   para
+The structnamepg_stat_archiver/structname view will always have a
+single row, containing

Re: [HACKERS] Standalone synchronous master

ISTM the consensus is that we need better monitoring/administration 
interfaces so that people can script the behavior they want in external 
tools. Also, a new synchronous apply replication mode would be handy, 
but that'd be a whole different patch. We don't have a patch on the 
table that we could consider committing any time soon, so I'm going to 
mark this as rejected in the commitfest app.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] new json funcs

2014-01-24 Thread Andrew Dunstan



On 01/24/2014 03:40 PM, Laurence Rowe wrote:
For consistency with the existing json functions (json_each, 
json_each_text, etc.) it might be better to add separate 
json_to_record_text and json_to_recordset_text functions in place of 
the nested_as_text parameter to json_to_record and json_to_recordset.





It wouldn't be consistent with json_populate_record() and 
json_populate_recordset(), the two closest relatives, however.


And yes, I appreciate that we have not been 100% consistent. Community 
design can be a bit messy that way.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] new json funcs

2014-01-24 Thread Josh Berkus

On 01/24/2014 12:59 PM, Andrew Dunstan wrote:
 
 On 01/24/2014 03:40 PM, Laurence Rowe wrote:
 For consistency with the existing json functions (json_each,
 json_each_text, etc.) it might be better to add separate
 json_to_record_text and json_to_recordset_text functions in place of
 the nested_as_text parameter to json_to_record and json_to_recordset.


 
 It wouldn't be consistent with json_populate_record() and
 json_populate_recordset(), the two closest relatives, however.
 
 And yes, I appreciate that we have not been 100% consistent. Community
 design can be a bit messy that way.

FWIW, I prefer the parameter to having differently named functions.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standalone synchronous master

2014-01-24 Thread Josh Berkus

On 01/24/2014 12:47 PM, Heikki Linnakangas wrote:
 ISTM the consensus is that we need better monitoring/administration
 interfaces so that people can script the behavior they want in external
 tools. Also, a new synchronous apply replication mode would be handy,
 but that'd be a whole different patch. We don't have a patch on the
 table that we could consider committing any time soon, so I'm going to
 mark this as rejected in the commitfest app.

I don't feel that we'll never do auto-degrade is determinative;
several hackers were for auto-degrade, and they have a good use-case
argument.  However, we do have consensus that we need more scaffolding
than this patch supplies in order to make auto-degrade *safe*.

I encourage the submitter to resumbit and improved version of this patch
(one with more monitorability) for  9.5 CF1.  That'll give us a whole
dev cycle to argue about it.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2.1 index-only scans : abnormal heap fetches after VACUUM FULL

2014-01-24 Thread Jaime Casanova

On Tue, Dec 3, 2013 at 11:25 AM, Bruce Momjian br...@momjian.us wrote:

 Is everyone else OK with this approach?  Updated patch attached.


Hi,

I started to look at this patch and i found that it fails an assertion
as soon as you run a VACUUM FULL after a lazy VACUUM even if those are
on unrelated relations. For example in an assert-enabled build with
the regression database run:

VACUUM customer;
[... insert here whatever commands you like or nothing at all ...]
VACUUM FULL tenk1;

TRAP: FailedAssertion(!(InRecovery || ( ((void) ((bool) ((!
assert_enabled) || ! (!((heapBuf) = NBuffers  (heapBuf) =
-NLocBuffer)) || (ExceptionalCondition(!((heapBuf) = NBuffers 
(heapBuf) = -NLocBuffer), (FailedAssertion), visibilitymap.c,
260), 0, (heapBuf) != 0 )), File: visibilitymap.c, Line: 260)
LOG:  server process (PID 25842) was terminated by signal 6: Aborted
DETAIL:  Failed process was running: vacuum FULL customer;
LOG:  terminating any other active server processes

trace:
(gdb) bt
#0  0x7f9a3d00d475 in *__GI_raise (sig=optimized out) at
../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x7f9a3d0106f0 in *__GI_abort () at abort.c:92
#2  0x00777597 in ExceptionalCondition (
conditionName=conditionName@entry=0x7cd3b8 !(InRecovery || (
((void) ((bool) ((! assert_enabled) || ! (!((heapBuf) = NBuffers 
(heapBuf) = -NLocBuffer)) || (ExceptionalCondition(\!((heapBuf) =
NBuffers  (heapBuf) = -NLocBuffer)\, (\Fai...,
errorType=errorType@entry=0x7b0730 FailedAssertion,
fileName=fileName@entry=0x7cd105 visibilitymap.c,
lineNumber=lineNumber@entry=260) at assert.c:54
#3  0x004a7d99 in visibilitymap_set
(rel=rel@entry=0x7f9a3da56a00, heapBlk=heapBlk@entry=0,
heapBuf=heapBuf@entry=0, recptr=recptr@entry=0, vmBuf=220,
cutoff_xid=2) at visibilitymap.c:260
#4  0x004a33e5 in update_page_vm (relation=0x7f9a3da56a00,
page=page@entry=0x1868b18 , blkno=0) at rewriteheap.c:702
#5  0x004a3668 in raw_heap_insert
(state=state@entry=0x1849e98, tup=tup@entry=0x184f208) at
rewriteheap.c:641
#6  0x004a3b8b in rewrite_heap_tuple
(state=state@entry=0x1849e98, old_tuple=old_tuple@entry=0x1852a50,
new_tuple=new_tuple@entry=0x184f208)
at rewriteheap.c:433
#7  0x0055c373 in reform_and_rewrite_tuple
(tuple=tuple@entry=0x1852a50,
oldTupDesc=oldTupDesc@entry=0x7f9a3da4d350,
newTupDesc=newTupDesc@entry=0x7f9a3da599a8,
values=values@entry=0x1852f40, isnull=isnull@entry=0x184f920 ,
newRelHasOids=1 '\001',
rwstate=rwstate@entry=0x1849e98) at cluster.c:1670

-- 
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
Phone: +593 4 5107566 Cell: +593 987171157


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standalone synchronous master

2014-01-24 Thread Florian Pflug

On Jan24, 2014, at 22:29 , Josh Berkus j...@agliodbs.com wrote:
 On 01/24/2014 12:47 PM, Heikki Linnakangas wrote:
 ISTM the consensus is that we need better monitoring/administration
 interfaces so that people can script the behavior they want in external
 tools. Also, a new synchronous apply replication mode would be handy,
 but that'd be a whole different patch. We don't have a patch on the
 table that we could consider committing any time soon, so I'm going to
 mark this as rejected in the commitfest app.
 
 I don't feel that we'll never do auto-degrade is determinative;
 several hackers were for auto-degrade, and they have a good use-case
 argument.  However, we do have consensus that we need more scaffolding
 than this patch supplies in order to make auto-degrade *safe*.
 
 I encourage the submitter to resumbit and improved version of this patch
 (one with more monitorability) for  9.5 CF1.  That'll give us a whole
 dev cycle to argue about it.

There seemed to be at least some support for having way to manually
degrade from sync rep to async rep via something like

  ALTER SYSTEM SET synchronous_commit='local';

Doing that seems unlikely to meet much resistant on grounds of principle,
so it seems to me that working on that would be the best way forward for
the submitter. I don't know how hard it would be to pull this off,
though.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_get_viewdefs() indentation considered harmful

We're finding it more and more common for people to define partitioned
table views with hundreds or thousands of union branches.
pg_get_viewdefs indents each branch of the union by 8 spaces more than
the previous branch. The net effect is that the size of the output is
O(n^2) bytes just for the indentation spaces. If your union branches
are about 5 lines long then you hit MaxAlloc after about 5,000
branches. And incidentally there's no CHECK_FOR_INTERRUPTS in that
loop.

I would actually suggest pg_dump should be able to pass a flag to
disable indentation but then I noticed that all the code is already
there. It's just that every single variation of pg_get_viewdef sets
the indentation flag explicitly. I wonder if this was the intended
behaviour. Shouldn't pg_get_viewdef(view, false) set indentation off?

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_get_viewdefs() indentation considered harmful

Greg Stark st...@mit.edu writes:
 We're finding it more and more common for people to define partitioned
 table views with hundreds or thousands of union branches.

Really?  Given how poorly the system performs with that many inheritance
children, I've got a hard time believing either that this is common or
that ruleutils is your worst problem with it.

 pg_get_viewdefs indents each branch of the union by 8 spaces more than
 the previous branch.

I think that's because the unions are a nested binary tree so far as the
parsetree representation goes.  We could probably teach ruleutils to
flatten the display in common cases, but it might be a bit tricky to be
sure we don't create any lies about unusual nestings.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Recovery inconsistencies, standby much larger than primary

Since the point release we've run into a number of databases that when
we restore from a base backup end up being larger than the primary
database was. Sometimes by a large factor. The data below is from
9.1.11 (both primary and standby) but we've seen the same thing on
9.2.6.

primary$ for i in  1261982 1364767 1366221 473158 ; do echo -n $i  ;
du -shc $i* | tail -1 ; done
1261982 29G total
1364767 23G total
1366221 12G total
473158 76G total

standby$ for i in  1261982 1364767 1366221 473158 ; do echo -n $i  ;
du -shc $i* | tail -1 ; done
1261982 55G total
1364767 28G total
1366221 17G total
473158 139G total

I've run the snaga xlogdump on the WAL records played before reaching
a consistent point (we confirmed the extra storage had already
appeared by then) and grepped for the above relfilenode but they're
quite large. I believe these dumps don't contain any sensitive data,
when I verify that I can upload one of them for inspection.

$ ls -lh [14]*
-rw-rw-r-- 1 heroku heroku 325M Jan 24 04:13 1261982
-rw-r--r-- 1 root   root   352M Jan 25 00:04 1364767
-rw-r--r-- 1 root   root   123M Jan 25 00:04 1366221
-rw-r--r-- 1 root   root   357M Jan 25 00:04 473158

The first three are btrees and the fourth is a haeap btw.

We're also seeing log entries about wal contains reference to invalid
pages but these errors seem only vaguely correlated. Sometimes we get
the errors but the tables don't grow noticeably and sometimes we don't
get the errors and the tables are much larger.

Much of the added space is uninitialized pages as you might expect but
I don't understand is how the database can start up without running
into the reference to invalid pages panic consistently. We check
both that there are no references after consistency is reached *and*
that any references before consistency are resolved by a truncate or
unlink before consistency.

The primary was never this large btw, so it's not just a case of
leftover files from drops or truncates that might have failed on the
standby.

I'm assuming this is somehow related to the mulixact or transaction
wraparound problems but I don't really understand how they could be
hitting when both the primary and standby are post-upgrade to the most
recent point release which have the fixes

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_get_viewdefs() indentation considered harmful

On Fri, Jan 24, 2014 at 6:54 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 pg_get_viewdefs indents each branch of the union by 8 spaces more than
 the previous branch.

 I think that's because the unions are a nested binary tree so far as the
 parsetree representation goes.  We could probably teach ruleutils to
 flatten the display in common cases, but it might be a bit tricky to be
 sure we don't create any lies about unusual nestings.

That may be a worthwhile thing to do.

But it strikes me that pg_dump, at least when not doing an SQL dump,
really has no reason to ask for indentation at all. It's already
asking for non-prettyprinted output, why not make non-prettyprinted
also mean non-indented?

Also, it also seems like a reasonable safeguard that if the underlying
rule is larger than, say, 32kB or maybe 128kB you probably don't care
about indentation. That would cover all the system views except a
handful that seem to have peculiarly large pg_rewrite rules
considering their SQL definition isn't especially large:

daeqck898dvduj= select ev_class::regclass, length(ev_action)
rewrite_len,length(pg_get_viewdef(ev_class,true)) prettyprint_len,
length(pg_get_viewdef(ev_class,false)) non_prettyprint_len from
pg_rewrite  order by 2 desc limit 20;
  ev_class  | rewrite_len |
prettyprint_len | non_prettyprint_len
+-+-+-
 pg_seclabels   |  138565 |
13528 |   13758
 information_schema.columns |  126787 |
6401 |6642
 information_schema.usage_privileges|   93863 |
11653 |   11939
 information_schema.attributes  |   83268 |
4264 |4417
 information_schema.referential_constraints |   81762 |
2527 |2653
 pg_statio_all_tables   |   59617 |
1023 |1061
 pg_stats   |   59331 |
2803 |2899
 information_schema.domains |   54611 |
3206 |3320
 information_schema.element_types   |   53049 |
5608 |5762
 information_schema.routines|   52333 |
7852 |8089
 information_schema.column_privileges   |   49229 |
3883 |3954
 pg_indexes |   46717 |
 458 | 486
 information_schema.check_constraints   |   42132 |
1375 |1443
 information_schema.tables  |   37458 |
2122 |2212
 pg_stat_all_indexes|   35426 |
 508 | 528
 pg_statio_all_indexes  |   35412 |
 490 | 512
 information_schema.table_constraints   |   31882 |
3098 |3231
 information_schema.column_udt_usage|   31731 |
1034 |1090
 information_schema.parameters  |   30497 |
3640 |3750
 pg_stat_all_tables |   27193 |
1367 |1387
(20 rows)



-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_get_viewdefs() indentation considered harmful

Argh. Attached is a plain text file with that query data. I'll be
switching back to Gnus any day now.
daeqck898dvduj= select ev_class::regclass, length(ev_action) 
rewrite_len,length(pg_get_viewdef(ev_class,true)) prettyprint_len, 
length(pg_get_viewdef(ev_class,false)) non_prettyprint_len from pg_rewrite  
order by 2 desc limit 20;
  ev_class  | rewrite_len | prettyprint_len | 
non_prettyprint_len 
+-+-+-
 pg_seclabels   |  138565 |   13528 |   
13758
 information_schema.columns |  126787 |6401 |   
 6642
 information_schema.usage_privileges|   93863 |   11653 |   
11939
 information_schema.attributes  |   83268 |4264 |   
 4417
 information_schema.referential_constraints |   81762 |2527 |   
 2653
 pg_statio_all_tables   |   59617 |1023 |   
 1061
 pg_stats   |   59331 |2803 |   
 2899
 information_schema.domains |   54611 |3206 |   
 3320
 information_schema.element_types   |   53049 |5608 |   
 5762
 information_schema.routines|   52333 |7852 |   
 8089
 information_schema.column_privileges   |   49229 |3883 |   
 3954
 pg_indexes |   46717 | 458 |   
  486
 information_schema.check_constraints   |   42132 |1375 |   
 1443
 information_schema.tables  |   37458 |2122 |   
 2212
 pg_stat_all_indexes|   35426 | 508 |   
  528
 pg_statio_all_indexes  |   35412 | 490 |   
  512
 information_schema.table_constraints   |   31882 |3098 |   
 3231
 information_schema.column_udt_usage|   31731 |1034 |   
 1090
 information_schema.parameters  |   30497 |3640 |   
 3750
 pg_stat_all_tables |   27193 |1367 |   
 1387
(20 rows)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_get_viewdefs() indentation considered harmful

Greg Stark st...@mit.edu writes:
 But it strikes me that pg_dump, at least when not doing an SQL dump,
 really has no reason to ask for indentation at all. It's already
 asking for non-prettyprinted output, why not make non-prettyprinted
 also mean non-indented?

We do go to some effort to make pg_dump's output legible, so I'm not
especially on board with this idea.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [review] libpq: Support TLSv1.1+ (was: fe-secure.c and SSL/TLS)

2014-01-24 Thread Noah Misch

[restored Cc: list]

On Thu, Jan 09, 2014 at 10:12:28PM -0800, Wim Lewis wrote:
 I applied both libpq.tls11plus.diff and the related
 psql.conninfo.tlsver.diff patch to postgresql git head.

psql.conninfo.tlsver.diff is not so essential for debugging purposes since
commit 4cba1f6bbf7c8f956c95e72c43e517a56b97665b, but it still seems like a
perfectly reasonable addition.

 Source review:
 
 The source changes are pretty tiny. Although I think the change
 from TLSv1_method to SSLv23_method is correct, the comment is not
 quite correct:
 
  * SSLv23_method() is only method that negotiates
  * higher protocol versions.  Rest of the methods
  * allow only one specific TLS version.
 
 As I understand it (backed up by a quick glance through the openssl
 source), the TLSv1_method, TLSv1_1_method, and TLSv1_2_method will
 all advertise the corresponding protocol version to the peer, meaning
 that in practice they will negotiate *up to* that TLS version, but
 will still negotiate down to SSLv3. So, using TLSv1_2_method would
 give the right behavior when compiled against a recent openssl.
 However, someday when TLSv1.3 (or 2.0) appears, presumably the
 SSLv23_method will be extended to include it but TLSv1_2_method
 would have to be changed to TLSv1_3_method. Therefore using
 SSLv23_method and disabling older protocol versions with
 SSL_CTX_set_options() should have the desired behavior even in
 future versions. (And it doesn't require autoconf to probe the
 openssl version.)

With OpenSSL 1.0.1f, I see results consistent with the comment.  Using
TLSv1_1_method() gets a TLSv1.1 connection against a server capable of
TLSv1.2, and it fails against a server capable of no more than TLSv1.  The
patch's use of SSLv23_method() gets TLSv1.2 from the first server and TLSv1
from the second server.

 Testing:
 
 I built the patched postgresql against a handful of openssl versions:
 1.0.1 (netbsd, x86-64, supports TLSv1.1); Git head aka 1.0.1f++
 (osx, x86-32, supports TLSv1.2), and 0.9.8y (osx, x86-32, supports
 TLSv1.0). They all built cleanly and passed 'make check'. I also
 built 'contrib' and installed the sslinfo extension. I connected
 between each pair of versions (with psql) and saw that the connection
 negotiated the highest protocol version supported by both ends and
 a corresponding ciphersuite. /conninfo and the sslinfo extension
 agreed on the protocol version and ciphersuite in use.
 
 Things I didn't test:
 
 Client certificates, restricted sets of ciphersuites, MITM
 protocol-downgrade attacks, non-x86 architectures, or 1.0.0* versions
 of openssl.

The level of testing you did made for an excellent review.  Thank you.

I've committed both patches.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Changeset Extraction v7.1

On Fri, Jan 24, 2014 at 12:49 PM, Andres Freund and...@2ndquadrant.com wrote:
 But this code is riddled with places where you track a catalog xmin
 and a data xmin separately.  The only point of doing it that way is to
 support a division that hasn't been made yet.

 If you think it will make stuff more manageable I can try separating all
 lines dealing with catalog_xmin into another patch as long as data_xmin
 doesn't have to be renamed.
 That said, I don't really think it's a big problem that the division
 hasn't been made, essentially the meaning is different, even if we don't
 take advantage of it yet. data_xmin is there for streaming replication
 where we need to prevent all removal, catalog_xmin is there for
 changeset extraction.

I spent some more time studying the 0001 and 0002 patches this
afternoon, with a side dish of 0004.  I'm leaning toward thinking we
should go ahead and make that division.  I'm also wondering about
whether we've got the right naming here.  AFAICT, it's not the case
that we're going to use the catalog xmin for catalogs and the data
xmin for non-catalogs.  Rather, the catalog xmin is going to always
be included in globalxmin calculations, so IOW it should just be
called xmin.  The data xmin is going to be included only for
non-catalog tables.  I guess data is a reasonable antonym for
catalog, but I'm slightly tempted to propose
RecentGlobalNonCatalogXmin and similar.  Maybe that's too ugly to
live, but I can see someone failing to guess what the exact
distinction is between xmin and data xmin, and I bet they'd be a
lot less likely to misguess if we wrote non catalog.

It's interesting (though not immediately relevant) to speculate about
how we might extend this to fine-grained xmin tracking more generally.
 The design sketch that comes to mind (and I think parts of this have
been proposed before) is to have a means by which backends can promise
not to lock any more tables except under a new snapshot.  At the read
committed isolation level, or in any single-statement transaction,
backends can so promise whenever (a) all tables mentioned in the query
have been locked and (b) all functions to be invoked during the query
via the fmgr interface promise (e.g. via function labeling) that they
won't directly or indirectly do such a thing.  If they break their
promise, we detect it and ereport(ERROR).  Backends that have made
such a guarantee can be ignored for global-xmin calculations that
don't involve the tables they have locked.  One idea is to keep a hash
table keyed by dboid, reloid with some limited number of entries in
shared memory; it caches the table-specific xmin, a usage counter, and
a flag indicating whether the cached xmin might be stale.  In order to
promise not to lock any new tables, backends must make or update
entries for all the tables they already have locked in this hash
table; if there aren't enough entries, they're not allowed to promise.
 Thus, backends wishing to prune can use the cached xmin value if it's
present (optionally updating it if it's stale) and the minimum of the
xmins of the backends that haven't made a promise if it isn't.  This
is a bit hairy though; access to the shared hash table had better be
*really* fast, and we'd better not need to recompute the cached value
too often.

Anyway, whether we end up pursuing something like that or not, I think
I'm persuaded that this particular optimization won't really be a
problem for hypothetical future work in this area; and also that it's
a good idea to do this much now specifically for logical decoding.

 I have zero confidence that it's OK to treat fsync() as an operation
 that won't fail.  Linux documents EIO as a plausible error return, for
 example.  (And really, how could it not?)

 But quite fundamentally having a the most basic persistency building
 block fail is something you can't really handle safely. Note that
 issue_xlog_fsync() has always (and I wager, will always) treated that as
 a PANIC.
 I don't recall many complaints about that for WAL. All of the ones I
 found in a quick search were like oh, the fs invalidated my fd because
 of corruption. And few.

Well, you have a point.  And certainly this version looks much better
than the previous version in terms of the likelihood of PANIC
occurring in practice.  But I wonder if we couldn't cut it down even
further without too much effort.  Suppose we define a slot to exist
if, and only if, the state file exists.  A directory without a state
file is subject to arbitrary removal.  Then we can proceed as follows:

- mkdir() the directory.
- open() state.tmp
- write() state.tmp
- close() state.tmp
- fsync() parent directory, directory and state.tmp
- rename() state.tmp to state
- fsync() state

If any step except the last one fails, no problem.  The next startup
can nuke the leftovers; any future attempt to create a slot with the
name can ignore an EEXIST failure from mkdir() and procedure just as
above.  Only a failure of the very last fsync is

Re: [HACKERS] pg_get_viewdefs() indentation considered harmful

On Fri, Jan 24, 2014 at 6:54 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Greg Stark st...@mit.edu writes:
 We're finding it more and more common for people to define partitioned
 table views with hundreds or thousands of union branches.

 Really?  Given how poorly the system performs with that many inheritance
 children, I've got a hard time believing either that this is common or
 that ruleutils is your worst problem with it.

Doesn't make it a bad idea to fix it.  You may need hundreds or
thousands of union branches for this to totally break the world, but
you only need about five for it to be annoying.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_get_viewdefs() indentation considered harmful

On Fri, Jan 24, 2014 at 8:49 PM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Jan 24, 2014 at 6:54 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Greg Stark st...@mit.edu writes:
 We're finding it more and more common for people to define partitioned
 table views with hundreds or thousands of union branches.

 Really?  Given how poorly the system performs with that many inheritance
 children, I've got a hard time believing either that this is common or
 that ruleutils is your worst problem with it.

 Doesn't make it a bad idea to fix it.  You may need hundreds or
 thousands of union branches for this to totally break the world, but
 you only need about five for it to be annoying.

Indeed even aside from the performance questions, once you're indented
5-10 times the indention stops being useful at all. The query would
probably be even more readable if we just made indentation modulo 40
so once you get too far indented it wraps around which is not unlike
how humans actually indent things in this case.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minmax indexes

On Fri, Jan 24, 2014 at 12:58 PM, Claudio Freire klaussfre...@gmail.com wrote:

 What's the status?

 I believe I have more than a use for minmax indexes, and wouldn't mind
 lending a hand if it's within my grasp.


I'm also interested in looking at this. Mostly because I have ideas
for other summary functions that would be interesting and could use
the same infrastructure otherwise.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2.1 index-only scans : abnormal heap fetches after VACUUM FULL

2014-01-24 Thread Bruce Momjian

On Fri, Jan 24, 2014 at 04:52:55PM -0500, Jaime Casanova wrote:
 On Tue, Dec 3, 2013 at 11:25 AM, Bruce Momjian br...@momjian.us wrote:
 
  Is everyone else OK with this approach?  Updated patch attached.
 
 
 Hi,
 
 I started to look at this patch and i found that it fails an assertion
 as soon as you run a VACUUM FULL after a lazy VACUUM even if those are
 on unrelated relations. For example in an assert-enabled build with
 the regression database run:
 
 VACUUM customer;
 [... insert here whatever commands you like or nothing at all ...]
 VACUUM FULL customer;

Wow, thanks for the testing.  You are right that I had two bugs, both in
visibilitymap_set().  First, I made setting heapBuf optional, but forgot
to remove the Assert check now that it was optional.  Second, I passed
InvalidBlockNumber instead of InvalidBuffer when calling
visibilitymap_set().

Both are fixed in the attached patch.  Thanks for the report.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
new file mode 100644
index c34ab98..ff275fa
*** a/src/backend/access/heap/rewriteheap.c
--- b/src/backend/access/heap/rewriteheap.c
***
*** 107,112 
--- 107,114 
  #include access/rewriteheap.h
  #include access/transam.h
  #include access/tuptoaster.h
+ #include access/visibilitymap.h
+ #include commands/vacuum.h
  #include storage/bufmgr.h
  #include storage/smgr.h
  #include utils/memutils.h
*** typedef OldToNewMappingData *OldToNewMap
*** 172,177 
--- 174,180 
  
  /* prototypes for internal functions */
  static void raw_heap_insert(RewriteState state, HeapTuple tup);
+ static void update_page_vm(Relation relation, Page page, BlockNumber blkno);
  
  
  /*
*** end_heap_rewrite(RewriteState state)
*** 281,286 
--- 284,290 
  		true);
  		RelationOpenSmgr(state-rs_new_rel);
  
+ 		update_page_vm(state-rs_new_rel, state-rs_buffer, state-rs_blockno);
  		PageSetChecksumInplace(state-rs_buffer, state-rs_blockno);
  
  		smgrextend(state-rs_new_rel-rd_smgr, MAIN_FORKNUM, state-rs_blockno,
*** raw_heap_insert(RewriteState state, Heap
*** 633,638 
--- 637,643 
  			 */
  			RelationOpenSmgr(state-rs_new_rel);
  
+ 			update_page_vm(state-rs_new_rel, page, state-rs_blockno);
  			PageSetChecksumInplace(page, state-rs_blockno);
  
  			smgrextend(state-rs_new_rel-rd_smgr, MAIN_FORKNUM,
*** raw_heap_insert(RewriteState state, Heap
*** 678,680 
--- 683,705 
  	if (heaptup != tup)
  		heap_freetuple(heaptup);
  }
+ 
+ static void
+ update_page_vm(Relation relation, Page page, BlockNumber blkno)
+ {
+ 	Buffer		vmbuffer = InvalidBuffer;
+ 	TransactionId visibility_cutoff_xid;
+ 
+ 	visibilitymap_pin(relation, blkno, vmbuffer);
+ 	Assert(BufferIsValid(vmbuffer));
+ 
+ 	if (!visibilitymap_test(relation, blkno, vmbuffer) 
+ 		heap_page_is_all_visible(relation, InvalidBuffer, page,
+  visibility_cutoff_xid))
+ 	{
+ 		PageSetAllVisible(page);
+ 		visibilitymap_set(relation, blkno, InvalidBuffer,
+ 		  InvalidXLogRecPtr, vmbuffer, visibility_cutoff_xid);
+ 	}
+ 	ReleaseBuffer(vmbuffer);
+ }
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
new file mode 100644
index 899ffac..3e7546b
*** a/src/backend/access/heap/visibilitymap.c
--- b/src/backend/access/heap/visibilitymap.c
*** visibilitymap_set(Relation rel, BlockNum
*** 257,263 
  #endif
  
  	Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
- 	Assert(InRecovery || BufferIsValid(heapBuf));
  
  	/* Check that we have the right heap page pinned, if present */
  	if (BufferIsValid(heapBuf)  BufferGetBlockNumber(heapBuf) != heapBlk)
--- 257,262 
*** visibilitymap_set(Relation rel, BlockNum
*** 278,284 
  		map[mapByte] |= (1  mapBit);
  		MarkBufferDirty(vmBuf);
  
! 		if (RelationNeedsWAL(rel))
  		{
  			if (XLogRecPtrIsInvalid(recptr))
  			{
--- 277,283 
  		map[mapByte] |= (1  mapBit);
  		MarkBufferDirty(vmBuf);
  
! 		if (RelationNeedsWAL(rel)  BufferIsValid(heapBuf))
  		{
  			if (XLogRecPtrIsInvalid(recptr))
  			{
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
new file mode 100644
index 75e5f15..8d84bef
*** a/src/backend/commands/vacuumlazy.c
--- b/src/backend/commands/vacuumlazy.c
*** static void lazy_record_dead_tuple(LVRel
*** 152,159 
  	   ItemPointer itemptr);
  static bool lazy_tid_reaped(ItemPointer itemptr, void *state);
  static int	vac_cmp_itemptr(const void *left, const void *right);
- static bool heap_page_is_all_visible(Relation rel, Buffer buf,
- 		 TransactionId *visibility_cutoff_xid);
  
  
  /*
--- 152,157 
*** lazy_vacuum_page(Relation onerel, BlockN
*** 1232,1238 
  	 * check if the page has become

Re: [HACKERS] extension_control_path

On Fri, Jan 24, 2014 at 6:57 AM, Fabrízio de Royes Mello
fabriziome...@gmail.com wrote:
 Would be nice if we can use git apply command...

git apply seems to have raised pedantry to an art form.  Not only
won't it apply patches in any format other than the one it likes,
it'll fail to apply any part of the patch if there are any failing
hunks; I don't think it tolerates fuzz, either.  You can override some
of these behaviors but not all of them.  It seems like somebody
designed this tool more with the idea of preventing people from
applying patches than actually doing it.

patch, on the other hand, makes the very reasonable assumption that
if you didn't want to apply the patch, you wouldn't have run the
patch command in the first place.  It does its best to make sense of
whatever you feed it, and if it can't apply the whole thing, it still
applies as much as it can.  I find this much more desirable behavior.
It may be the policy of other projects to reject patches for trivial
formatting mistakes or minor fuzz, but it's not the policy here, and I
think that's a good thing.  We typically bounce things for rebasing if
there are actual rejects, but not otherwise.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_get_viewdefs() indentation considered harmful

On Fri, Jan 24, 2014 at 8:53 PM, Greg Stark st...@mit.edu wrote:
 On Fri, Jan 24, 2014 at 8:49 PM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Jan 24, 2014 at 6:54 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Greg Stark st...@mit.edu writes:
 We're finding it more and more common for people to define partitioned
 table views with hundreds or thousands of union branches.

 Really?  Given how poorly the system performs with that many inheritance
 children, I've got a hard time believing either that this is common or
 that ruleutils is your worst problem with it.

 Doesn't make it a bad idea to fix it.  You may need hundreds or
 thousands of union branches for this to totally break the world, but
 you only need about five for it to be annoying.

 Indeed even aside from the performance questions, once you're indented
 5-10 times the indention stops being useful at all. The query would
 probably be even more readable if we just made indentation modulo 40
 so once you get too far indented it wraps around which is not unlike
 how humans actually indent things in this case.

Ha!  That seems a little crazy, but *capping* the indentation at some
reasonable value might not be dumb.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Review] inherit support for foreign tables

2014-01-24 Thread Shigeru Hanada

Hi Fujita-san,

Thanks for the review.

2014/1/23 Etsuro Fujita fujita.ets...@lab.ntt.co.jp:
 Shigeru Hanada wrote:
 Though this would be debatable, in current implementation, constraints
 defined on a foreign table (now only NOT NULL and CHECK are supported)
 are not enforced during INSERT or UPDATE executed against foreign
 tables. This means that retrieved data might violates the constraints
 defined on local side. This is debatable issue because integrity of
 data is important for DBMS, but in the first cut, it is just
 documented as a note.

 I agree with you, but we should introduce a new keyword such as
 ASSERTIVE or something in 9.4, in preparation for the enforcement of
 constraints on the local side in a future release? What I'm concerned
 about is, otherwise, users will have to rewrite those DDL queries at
 that point. No?

In the original thread, whether the new syntax is necessary (maybe
ASSERTIVE will not be used though) is under discussion.  Anyway, your
point, decrease user's DDL rewrite less as possible is important.
Could you post review result and/or your opinion as the reply to the
original thread, then we include other people interested in this
feature can share discussion.

 Because I don't see practical case to have a foreign table as a
 parent, and it avoid a problem about recursive ALTER TABLE operation,
 foreign tables can't be a parent.

 I agree with you on that point.

 For other commands recursively processed such as ANALYZE, foreign
 tables in the leaf of inheritance tree are ignored.

 I'm not sure that in the processing of the ANALYZE command, we should
 ignore child foreign tables. It seems to me that the recursive
 processing is not that hard. Rather what I'm concerned about is that if
 the recursive processing is allowed, then autovacuum will probably have
 to access to forign tables on the far side in order to acquire those
 sample rows. It might be undesirable from the viewpoint of security or
 from the viewpoint of efficiency.

As you say, it's not difficult to do recursive ANALYZE including
foreign tables.  The reason why ANALYZE ignores descendant foreign
tables is consistent behavior.  At the moment, ANALYZE ignores foreign
tables in top-level (non-child) when no table name was given, and if
we want to ANALYZE foreign tables we need to specify explicitly.

I think it's not so bad to show WARNING or INFO message about foreign
table ignorance, including which is not a child.

 --- a/doc/src/sgml/ref/create_foreign_table.sgml
 +++ b/doc/src/sgml/ref/create_foreign_table.sgml
 @@ -22,6 +22,7 @@ CREATE FOREIGN TABLE [ IF NOT EXISTS ] replaceable
 class=PARAMETERtable_name
 replaceable class=PARAMETERcolumn_name/replaceable replaceable
 class=PARAMETERdata_type/replaceable [ OPTIONS ( replaceable
 class=PA\
 RAMETERoption/replaceable 'replaceable
 class=PARAMETERvalue/replaceable' [, ... ] ) ] [ COLLATE
 replaceablecollation/replaceable ] [ rep\
 laceable class=PARAMETERcolumn_constraint/replaceable [ ... ] ]
 [, ... ]
 ] )
 +[ INHERITS ( replaceableparent_table/replaceable [, ... ] ) ]
 SERVER replaceable class=parameterserver_name/replaceable
 [ OPTIONS ( replaceable class=PARAMETERoption/replaceable
 'replaceable class=PARAMETERvalue/replaceable' [, ... ] ) ]

 @@ -159,6 +160,17 @@ CREATE FOREIGN TABLE [ IF NOT EXISTS ] replaceable
 class=PARAMETERtable_name
 /varlistentry

 varlistentry
 + termreplaceable class=PARAMETERparent_table/replaceable/term
 + listitem
 + para
 + The name of an existing table or foreign table from which the new foreign
 + table automatically inherits all columns, see
 + xref linkend=ddl-inherit for details of table inheritance.
 + /para
 + /listitem
 + /varlistentry

 Correct? I think we should not allow a foreign table to be a parent.

Oops, good catch.
-- 
Shigeru HANADA


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Change authentication error message (patch)

2014-01-24 Thread Bruce Momjian

On Fri, Jan 24, 2014 at 10:10:00AM -0500, Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  On Thu, Jan 23, 2014 at 10:39:34PM -0500, Tom Lane wrote:
  I'm not convinced that this improves anything.  The problem might not in
  fact be either of the things you mention, in which case the new message 
  is outright misleading.  Also, what of the policy stated in the header
  comment for the function you're hacking, ie we intentionally don't reveal
  the precise cause of the failure to the client?
 
  Well, the only solution then would be to add some weasel words like
  perhaps expired password, but that seems so rare that I doubt it would
  apply very often and seems like an odd suggestion.   We could go with:
 
  password authentication failed for user \%s\: perhaps invalid or 
  expired password
 
  We did have two threads on this issue in the past 12 months so I figured
  we should try to do something.
 
 I agree with doing *something*, but this particular thing seems to violate
 our very long-standing policy on how to deal with authentication failures,
 as well as being too vague to be really useful.
 
 What would be well within that policy is to log additional information
 into the postmaster log.  I see that md5_crypt_verify knows perfectly
 well whether the problem is no password set, wrong password, or expired
 password.  I don't see anything wrong with having it emit a log entry
 --- maybe not in the second case for fear of log-spam complaints, but
 certainly the third case and maybe the first one.  Or possibly cleaner,
 have it return additional status so that auth_failed() can include the
 info in the main ereport using errdetail_log().

I was afraid that PGOPTIONS='-c client_min_messages=log' would allow
clients to see the log messages, but in testing I found we don't show
them during authentication, and I found this C comment:

 * client_min_messages is honored only after we complete the
 * authentication handshake.  This is required both for security
 * reasons and because many clients can't handle NOTICE messages
 * during authentication.

I like the 'LOG' idea very much, and liked your patch too.  :-)

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why do we let autovacuum give up?