date:20151008

On 10/08/2015 08:19 AM, Michael Paquier wrote:
> On Wed, Oct 7, 2015 at 5:44 PM, Amir Rohan wrote:
>> On 10/07/2015 10:29 AM, Michael Paquier wrote:
>>> On Wed, Oct 7, 2015 at 4:16 PM, Amir Rohan wrote:
 Also, the removal of poll_query_until from pg_rewind looks suspiciously
 like a copy-paste gone bad. Pardon if I'm missing something.
>>>
>>> Perhaps. Do you have a suggestion regarding that? It seems to me that
>>> this is more useful in TestLib.pm as-is.
>>>
>>
>> My mistake, the patch only shows some internal function being deleted
>> but RewindTest.pm (obviously) imports TestLib. You're right, TestLib is
>> a better place for it.
> 
> OK. Here is a new patch version. I have removed the restriction
> preventing to call make_master multiple times in the same script (one
> may actually want to test some stuff related to logical decoding or
> FDW for example, who knows...), forcing PGHOST to always use the same
> value after it has been initialized. I have added a sanity check
> though, it is not possible to create a node based on a base backup if
> no master are defined. This looks like a cheap insurance... I also
> refactored a bit the code, using the new init_node_info to fill in the
> fields of a newly-initialized node, and I removed get_free_port,
> init_node, init_node_from_backup, enable_restoring and
> enable_streaming from the list of routines exposed to the users, those
> can be used directly with make_master, make_warm_standby and
> make_hot_standby. We could add them again if need be, somebody may
> want to be able to get a free port, set up a node without those
> generic routines, just that it does not seem necessary now.
> Regards,
> 

If you'd like, I can write up some tests for cascading replication which
are currently missing.

Someone mentioned a daisy chain setup which sounds fun. Anything else in
particular? Also, it would be nice to have some canned way to measure
end-to-end replication latency for variable number of nodes.
What about going back through the commit log and writing some regression
tests for the real stinkers, if someone care to volunteer some candidate
bugs

Amir





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.

On Thu, Oct 8, 2015 at 3:59 PM, Amir Rohan  wrote:
> On 10/08/2015 08:19 AM, Michael Paquier wrote:
>> On Wed, Oct 7, 2015 at 5:44 PM, Amir Rohan wrote:
>>> On 10/07/2015 10:29 AM, Michael Paquier wrote:
 On Wed, Oct 7, 2015 at 4:16 PM, Amir Rohan wrote:
> Also, the removal of poll_query_until from pg_rewind looks suspiciously
> like a copy-paste gone bad. Pardon if I'm missing something.

 Perhaps. Do you have a suggestion regarding that? It seems to me that
 this is more useful in TestLib.pm as-is.

>>>
>>> My mistake, the patch only shows some internal function being deleted
>>> but RewindTest.pm (obviously) imports TestLib. You're right, TestLib is
>>> a better place for it.
>>
>> OK. Here is a new patch version. I have removed the restriction
>> preventing to call make_master multiple times in the same script (one
>> may actually want to test some stuff related to logical decoding or
>> FDW for example, who knows...), forcing PGHOST to always use the same
>> value after it has been initialized. I have added a sanity check
>> though, it is not possible to create a node based on a base backup if
>> no master are defined. This looks like a cheap insurance... I also
>> refactored a bit the code, using the new init_node_info to fill in the
>> fields of a newly-initialized node, and I removed get_free_port,
>> init_node, init_node_from_backup, enable_restoring and
>> enable_streaming from the list of routines exposed to the users, those
>> can be used directly with make_master, make_warm_standby and
>> make_hot_standby. We could add them again if need be, somebody may
>> want to be able to get a free port, set up a node without those
>> generic routines, just that it does not seem necessary now.
>> Regards,
>>
>
> If you'd like, I can write up some tests for cascading replication which
> are currently missing.

001 is testing cascading, like that node1 -> node2 -> node3.

> Someone mentioned a daisy chain setup which sounds fun. Anything else in
> particular? Also, it would be nice to have some canned way to measure
> end-to-end replication latency for variable number of nodes.

Hm. Do you mean comparing the LSN position between two nodes even if
both nodes are not connected to each other? What would you use it for?

> What about going back through the commit log and writing some regression
> tests for the real stinkers, if someone care to volunteer some candidate
> bugs

I have drafted a list with a couple of items upthread:
http://www.postgresql.org/message-id/cab7npqsgffsphocrhfoasdanipvn6wsh2nykf1kayrm+9_m...@mail.gmail.com
So based on the existing patch the list becomes as follows:
- wal_retrieve_retry_interval with a high value, say setting to for
example 2/3s and loop until it is applied by checking it is it has
been received by the standby every second.
- recovery_target_action
- archive_cleanup_command
- recovery_end_command
- pg_xlog_replay_pause and pg_xlog_replay_resume
In the list of things that could have a test, I recall that we should
test as well 2PC with the recovery delay, look at a1105c3d. This could
be included in 005.
The advantage of implementing that now is that we could see if the
existing routines are solid enough or not. Still, looking at what the
patch has now I think that we had better get a committer look at it,
and if the core portion gets integrated we could already use it for
the patch implementing quorum synchronous replication and in doing
more advanced tests with pg_rewind regarding the timeline handling
(both patches of this CF). I don't mind adding more now, though I
think that the set of sample tests included in this version is enough
as a base implementation of the facility and shows what it can do.
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Table partition + join pushdown

2015-10-08 Thread Taiki Kondo

Hello, Horiguchi-san.

Thank you for your comment.

> I got some warning on compilation on unused variables and wrong
> arguemtn type.

OK, I'll fix it.

> I failed to have a query that this patch works on. Could you let
> me have some specific example for this patch?

Please find attached.
And also make sure that setting of work_mem is '64kB' (not 64MB).

If there is the large work_mem enough to create hash table for
relation after appending, its cost may be better than pushed-down
plan's cost, then planner will not choose pushed-down plan this patch makes.
So, to make this patch working fine, work_mem size must be smaller than
the hash table size for relation after appending.

> This patch needs more comments. Please put comment about not only
> what it does but also the reason and other things for it.

OK, I'll put more comments in the code.
But it will take a long time, maybe...


> -- about namings
> 
> Names for functions and variables are needed to be more
> appropriate, in other words, needed to be what properly informs
> what they are. The followings are the examples of such names.

Thank you for your suggestion.

I also think these names are not good much.
I'll try to make names better , but it maybe take a long time...
Of course, I will use your suggestion as reference.

> "added_restrictlist"'s widely distributed as many function
> arguemnts and JoinPathExtraData makes me feel
> dizzy..

"added_restrictinfo" will be deleted from almost functions
other than try_join_pushdown() in next (v2) patch because
the place of filtering using this info will be changed
from Join node to Scan node and not have to place it
into other than try_join_pushdown().


> In make_restrictinfos_from_check_constr, the function returns
> modified constraint predicates correspond to vars under
> hashjoinable join clauses. I don't think expression_tree_mutator
> is suitable to do that since it could allow unwanted result when
> constraint predicates or join clauses are not simple OpExpr's.

Do you have any example of this situation?
I am trying to find unwanted results you mentioned, but I don't have
any results in this timing. I have a hunch that it will allow unwanted
results because I have thought only about very simple situation for
this function.


> Otherwise could you give me clear explanation on what it does?

This function transfers CHECK() constraint to filter expression by following
procedures.
(1) Get outer table's CHECK() constraint by using get_relation_constraints().
(2) Walk through expression tree got in (1) by using expression_tree_mutator() 
with check_constraint_mutator() and change only outer's Var node to
inner's one according to join clause.

For example, when CHECK() constraint of table A is "num % 4 = 0" and
join clause between table A and B is "A.num = B.data",
then we can get "B.data % 4 = 0" for filtering purpose.

This also accepts more complex join clause like "A.num = B.data * 2",
then we can get "(B.data * 2) % 4 = 0".

In procedure (2), to decide whether to use each join clause for changing
Var node or not, I implement check_constraint_mutator() to judge whether
join clause is hash-joinable or not.

Actually, I want to judge whether OpExpr as top expression tree of
join clause means "=" or not, but I can't find how to do it.

If you know how to do it, please let me know.



Best regards,
--
Taiki Kondo

NEC Solution Innovators, Ltd.






-Original Message-
From: Kyotaro HORIGUCHI [mailto:horiguchi.kyot...@lab.ntt.co.jp] 
Sent: Tuesday, October 06, 2015 8:35 PM
To: tai-ko...@yk.jp.nec.com
Cc: kai...@ak.jp.nec.com; aki-iwa...@vt.jp.nec.com; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] [Proposal] Table partition + join pushdown

Hello.

I tried to read this and had some random comments on this.

-- general

I got some warning on compilation on unused variables and wrong arguemtn type.

I failed to have a query that this patch works on. Could you let me have some 
specific example for this patch?

This patch needs more comments. Please put comment about not only what it does 
but also the reason and other things for it.


-- about namings

Names for functions and variables are needed to be more appropriate, in other 
words, needed to be what properly informs what they are. The followings are the 
examples of such names.

"added_restrictlist"'s widely distributed as many function arguemnts and 
JoinPathExtraData makes me feel dizzy.. create_mergejoin_path takes it as 
"filtering_clauses", which looks far better.

try_join_pushdown() is also the name with much wider meaning. This patch tries 
to move hashjoins on inheritance parent to under append paths. It could be 
generically called 'pushdown'
but this would be better be called such like 'transform appended hashjoin' or 
'hashjoin distribution'. The latter would be better.
(The function name would be try_distribute_hashjoin for the
case.)

The name make_restrictinfos_from_check_contr() also tells me wrong thing. For

Re: [HACKERS] pg_ctl/pg_rewind tests vs. slow AIX buildfarm members

On Wed, Oct 7, 2015 at 11:52 PM, Tom Lane  wrote:
> Michael Paquier  writes:
>> On Sat, Sep 26, 2015 at 9:12 AM, Tom Lane wrote:
>>> So the attached modified patch adjusts the PID-match logic and some
>>> comments, but is otherwise what I posted before.  I believe that this
>>> might actually work on Windows, but I have no way to test it.  Someone
>>> please try that?  (Don't forget to test the service-start path, too.)
>
>> If "pg_ctl start" is invoked directly from a command prompt, it works.
>> Now, if I run "pg_ctl start" within a script (in an msysgit
>> environment for example), pg_ctl fails, complaining that
>> postmaster.pid already exists. The TAP tests get broken as well,
>
> Reading that again, I think you mean that "pg_ctl start" works but you
> didn't try "pg_ctl start -w" manually?  Because it's "-w" that's at
> issue here, and the failing test cases are using that.

Yes, sorry. I fat-fingered the command from the prompt and forgot the
-w switch. test_postmaster_connection just behaves incorrectly, and
exists immediately with PQPING_NO_RESPONSE, aka "Stopped waiting,
could not start server, blah".

> I think there is still room to salvage something without fully rewriting
> the postmaster invocation logic to avoid using CMD, because it's still
> true that checking whether the CMD process is still there should be as
> good as checking the postmaster proper.  We just can't use kill() for it.
> I'd be inclined to get rid of the use of kill() on Unix as well; as Noah
> mentioned upthread, if we're using fork/exec we might as well pay
> attention to waitpid's results instead.  On Windows, I imagine that the
> thing to do is have start_postmaster() save aside the handle for the CMD
> process, and then in test_postmaster_connection(), check the handle state
> to see if the process is still running (I assume there's a way to do
> that).

That would be WaitForSingleObject(handle, ms_timeout) ==
WAIT_OBJECT_0, no? The handle should be picked up from
CreateRestrictedProcess, and I think that CloseHandle should not be
called on pi.hProcess if we are going to wait for it afterwards.
Reference:
https://msdn.microsoft.com/en-us/library/windows/desktop/ms687032%28v=vs.85%29.aspx

> I can take care of the Unix side of that, but as before, somebody else
> would need to code and test the Windows side.  It's probably just a few
> lines of code to add ... any volunteers?

I could give it a shot, do you still want to rework the Unix portion
of the patch? There are not many folks around able to test the patch
either way.
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.

On 10/08/2015 10:39 AM, Michael Paquier wrote:
> On Thu, Oct 8, 2015 at 3:59 PM, Amir Rohan  wrote:
>> On 10/08/2015 08:19 AM, Michael Paquier wrote:
>>> On Wed, Oct 7, 2015 at 5:44 PM, Amir Rohan wrote:
 On 10/07/2015 10:29 AM, Michael Paquier wrote:
> On Wed, Oct 7, 2015 at 4:16 PM, Amir Rohan wrote:
>> Also, the removal of poll_query_until from pg_rewind looks suspiciously
>> like a copy-paste gone bad. Pardon if I'm missing something.
>
> Perhaps. Do you have a suggestion regarding that? It seems to me that
> this is more useful in TestLib.pm as-is.
>

 My mistake, the patch only shows some internal function being deleted
 but RewindTest.pm (obviously) imports TestLib. You're right, TestLib is
 a better place for it.
>>>
>>> OK. Here is a new patch version. I have removed the restriction
>>> preventing to call make_master multiple times in the same script (one
>>> may actually want to test some stuff related to logical decoding or
>>> FDW for example, who knows...), forcing PGHOST to always use the same
>>> value after it has been initialized. I have added a sanity check
>>> though, it is not possible to create a node based on a base backup if
>>> no master are defined. This looks like a cheap insurance... I also
>>> refactored a bit the code, using the new init_node_info to fill in the
>>> fields of a newly-initialized node, and I removed get_free_port,
>>> init_node, init_node_from_backup, enable_restoring and
>>> enable_streaming from the list of routines exposed to the users, those
>>> can be used directly with make_master, make_warm_standby and
>>> make_hot_standby. We could add them again if need be, somebody may
>>> want to be able to get a free port, set up a node without those
>>> generic routines, just that it does not seem necessary now.
>>> Regards,
>>>
>>
>> If you'd like, I can write up some tests for cascading replication which
>> are currently missing.
> 
> 001 is testing cascading, like that node1 -> node2 -> node3.
> 
>> Someone mentioned a daisy chain setup which sounds fun. Anything else in
>> particular? Also, it would be nice to have some canned way to measure
>> end-to-end replication latency for variable number of nodes.
> 
> Hm. Do you mean comparing the LSN position between two nodes even if
> both nodes are not connected to each other? What would you use it for?
> 

In a cascading replication setup, the typical _time_ it takes for a
COMMIT on master to reach the slave (assuming constant WAL generation
rate) is an important operational metric.

It would be useful to catch future regressions for that metric,
which may happen even when a patch doesn't outright break cascading
replication. Just automating the measurement could be useful if
there's no pg facility that tracks performance over time in
a regimented fashion. I've seen multiple projects which consider
a "benchmark suite" to be part of its testing strategy.


As for the "daisy chain" thing, it was (IIRC) mentioned in a josh berkus
talk I caught on youtube. It's possible to setup cascading replication,
take down the master, and then reinsert it as replicating slave, so that
you end up with *all* servers replicating from the
ancestor in the chain, and no master. I think it was more
a fun hack then anything, but also an interesting corner case to
investigate.

>> What about going back through the commit log and writing some regression
>> tests for the real stinkers, if someone care to volunteer some candidate
>> bugs
> 
> I have drafted a list with a couple of items upthread:
> http://www.postgresql.org/message-id/cab7npqsgffsphocrhfoasdanipvn6wsh2nykf1kayrm+9_m...@mail.gmail.com
> So based on the existing patch the list becomes as follows:
> - wal_retrieve_retry_interval with a high value, say setting to for
> example 2/3s and loop until it is applied by checking it is it has
> been received by the standby every second.
> - recovery_target_action
> - archive_cleanup_command
> - recovery_end_command
> - pg_xlog_replay_pause and pg_xlog_replay_resume
> In the list of things that could have a test, I recall that we should
> test as well 2PC with the recovery delay, look at a1105c3d. This could
> be included in 005.

a1105c3 Mar 23 Fix copy & paste error in 4f1b890b137.  Andres Freund
4f1b890 Mar 15 Merge the various forms of transaction commit & abort
records.  Andres Freund

Is that the right commit?

> The advantage of implementing that now is that we could see if the
> existing routines are solid enough or not. 

I can do this if you point me at a self-contained thread/#issue.

> Still, looking at what the
> patch has now I think that we had better get a committer look at it,
> and if the core portion gets integrated we could already use it for
> the patch implementing quorum synchronous replication and in doing
> more advanced tests with pg_rewind regarding the timeline handling
> (both patches of this CF). 
>
> I don't mind adding more now, though I
>

Re: [HACKERS] Freeze avoidance of very large table.

2015-10-08 Thread Fujii Masao

On Mon, Oct 5, 2015 at 7:31 PM, Masahiko Sawada  wrote:
> On Sat, Oct 3, 2015 at 3:41 AM, Robert Haas  wrote:
>> On Fri, Oct 2, 2015 at 11:23 AM, Alvaro Herrera
>>  wrote:
 + /* all-frozen information is also cleared at the same time */
   PageClearAllVisible(page);
 + PageClearAllFrozen(page);
>>>
>>> I wonder if it makes sense to have a macro to clear both in unison,
>>> which seems a very common pattern.
>>
>> I think PageClearAllVisible should clear both, and there should be no
>> other macro.  There is no event that causes a page to cease being
>> all-visible that does not also cause it to cease being all-frozen.
>> You might think that deleting or locking a tuple would fall into that
>> category - but nope, XMAX needs to be cleared or the tuple pruned, or
>> there will be problems after wraparound.
>>
>
> Thank you for your advice.
> I understood.
>
> I changed the patch so that PageClearAllVisible clear both bits, and
> removed ClearAllFrozen.
> Attached the latest v16 patch which contains draft version documentation 
> patch.

Thanks for updating the patch! Here are another review comments.

+ereport(elevel,
+(errmsg("skipped %d frozen pages acoording to visibility map",
+vacrelstats->vmskipped_frozen_pages)));

Typo: acoording should be according.

When vmskipped_frozen_pages is 1, "1 frozen pages" in log message
sounds incorrect in terms of grammar. So probably errmsg_plural()
should be used here.

+relallvisible = visibilitymap_count(rel,
VISIBILITYMAP_ALL_VISIBLE);
+relallfrozen = visibilitymap_count(rel, VISIBILITYMAP_ALL_FROZEN);

We can refactor visibilitymap_count() so that it counts the numbers of
both all-visible and all-frozen tuples at the same time, in order to
avoid reading through visibility map twice.

heap_page_is_all_visible() can set all_frozen to TRUE even when
it returns FALSE. This is odd because the page must not be all frozen
when it's not all visible. heap_page_is_all_visible() should set
all_frozen to FALSE whenever all_visible is set to FALSE?
Probably it's better to forcibly set all_frozen to FALSE at the end of
the function whenever all_visible is FALSE.

+if (PageIsAllVisible(page))
 {
-Assert(BufferIsValid(*vmbuffer));

Why did you remove this assertion?

+if (all_frozen)
+{
+PageSetAllFrozen(page);
+flags |= VISIBILITYMAP_ALL_FROZEN;
+}

Why didn't you call visibilitymap_test() for all frozen case here?

In visibilitymap_set(), the argument flag must be either
(VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN) or
VISIBILITYMAP_ALL_VISIBLE. So I think that it's better to add
Assert() which checks whether the specified flag is valid or not.

+ * caller is expected to set PD_ALL_VISIBLE or
+ * PD_ALL_FROZEN first.
+ */
+Assert(PageIsAllVisible(heapPage) ||
PageIsAllFrozen(heapPage));

This should be the following?

  Assert(((flag | VISIBILITYMAP_ALL_VISIBLE) && PageIsAllVisible(heapPage)) ||
  ((flag | VISIBILITYMAP_ALL_FROZEN) && PageIsAllFrozen(heapPage)));

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Process pg_hba.conf keywords as case-insensitive

2015-10-08 Thread Андрей Асякин

Hi!
This is my first post to this list.
(Sorry for my bad english)

I decided to start small, but as it is in the TODO, then why not.

In TODO:
> Process pg_hba.conf keywords as case-insensitive
>http://archives.postgresql.org/pgsql-hackers/2009-09/msg00432.php

It seems to me reasonable to throw an error saying 'ALL' is not a valid
value and * not * reload the pg_hba.conf file.

It seems a good place parse_hba_line in src/backend/libpq/hba.c, and
use strcasecmp
for checks.

Patch attached, if is very simple,
diff --git a/src/backend/libpq/hba.c b/src/backend/libpq/hba.c
index 23c8b5d..72ae6f3 100644
--- a/src/backend/libpq/hba.c
+++ b/src/backend/libpq/hba.c
@@ -62,6 +62,8 @@ typedef struct check_network_data
 
 
 #define token_is_keyword(t, k)	(!t->quoted && strcmp(t->string, k) == 0)
+#define token_is_fake_keyword(t, k)	(strcasecmp(t->string, k) == 0 \
+&& strcmp(t->string, k) != 0)
 #define token_matches(t, k)  (strcmp(t->string, k) == 0)
 
 /*
@@ -931,6 +933,20 @@ parse_hba_line(List *line, int line_num, char *raw_line)
 	tokens = lfirst(field);
 	foreach(tokencell, tokens)
 	{
+token = lfirst(tokencell);
+if (!token->quoted && (
+token_is_fake_keyword(token, "all") ||
+token_is_fake_keyword(token, "sameuser") ||
+token_is_fake_keyword(token, "samerole") ||
+token_is_fake_keyword(token, "replication")))
+{
+ereport(LOG,
+(errcode(ERRCODE_CONFIG_FILE_ERROR),
+ errmsg("invalid database keyword name \"%s\"", token->string),
+ errcontext("line %d of configuration file \"%s\"",
+line_num, HbaFileName)));
+return NULL;
+}
 		parsedline->databases = lappend(parsedline->databases,
 		copy_hba_token(lfirst(tokencell)));
 	}
@@ -950,6 +966,17 @@ parse_hba_line(List *line, int line_num, char *raw_line)
 	tokens = lfirst(field);
 	foreach(tokencell, tokens)
 	{
+token = lfirst(tokencell);
+if (!token->quoted && token_is_fake_keyword(token, "all")) 
+{
+ereport(LOG,
+(errcode(ERRCODE_CONFIG_FILE_ERROR),
+ errmsg("invalid role keyword name \"%s\"", token->string),
+ errcontext("line %d of configuration file \"%s\"",
+line_num, HbaFileName)));
+return NULL;
+}
+
 		parsedline->roles = lappend(parsedline->roles,
 	copy_hba_token(lfirst(tokencell)));
 	}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Table partition + join pushdown

2015-10-08 Thread Kyotaro HORIGUCHI

Hello, thank you for the example.

I could see this patch working with it.

> > In make_restrictinfos_from_check_constr, the function returns
> > modified constraint predicates correspond to vars under
> > hashjoinable join clauses. I don't think expression_tree_mutator
> > is suitable to do that since it could allow unwanted result when
> > constraint predicates or join clauses are not simple OpExpr's.
> 
> Do you have any example of this situation?

As a rather simple case on the test environment made by the
provided script, the following query,

explain analyze
select data_x, data_y, num from check_test_div join inner_t on 
check_test_div.id + 1 = inner_t.id;

Makes the mutation fail then result in an assertion failure.

| TRAP: FailedAssertion("!(list_length(check_constr) == list_length(result))", 
File: "joinpath.c", Line: 1608)

This is because both 'check_test_div.id + 1' and inner_t.id don't
match the var-side of the constraints.

I don't see clearly what to do for the situation for now but this
is the one of the most important function for this feature and
should be cleanly designed.


At Thu, 8 Oct 2015 08:28:04 +, Taiki Kondo  wrote 
in <12a9442fbae80d4e8953883e0b84e0885f9...@bpxm01gp.gisp.nec.co.jp>
> Hello, Horiguchi-san.
> 
> Thank you for your comment.
> 
> > I got some warning on compilation on unused variables and wrong
> > arguemtn type.
> 
> OK, I'll fix it.
> 
> > I failed to have a query that this patch works on. Could you let
> > me have some specific example for this patch?
> 
> Please find attached.
> And also make sure that setting of work_mem is '64kB' (not 64MB).
> 
> If there is the large work_mem enough to create hash table for
> relation after appending, its cost may be better than pushed-down
> plan's cost, then planner will not choose pushed-down plan this patch makes.
> So, to make this patch working fine, work_mem size must be smaller than
> the hash table size for relation after appending.
> 
> > This patch needs more comments. Please put comment about not only
> > what it does but also the reason and other things for it.
> 
> OK, I'll put more comments in the code.
> But it will take a long time, maybe...

Sure.

> > -- about namings
> > 
> > Names for functions and variables are needed to be more
> > appropriate, in other words, needed to be what properly informs
> > what they are. The followings are the examples of such names.
> 
> Thank you for your suggestion.
> 
> I also think these names are not good much.
> I'll try to make names better , but it maybe take a long time...
> Of course, I will use your suggestion as reference.

Thanks.

> > "added_restrictlist"'s widely distributed as many function
> > arguemnts and JoinPathExtraData makes me feel
> > dizzy..
> 
> "added_restrictinfo" will be deleted from almost functions
> other than try_join_pushdown() in next (v2) patch because
> the place of filtering using this info will be changed
> from Join node to Scan node and not have to place it
> into other than try_join_pushdown().

I'm looking forward to see it.

> > In make_restrictinfos_from_check_constr, the function returns
> > modified constraint predicates correspond to vars under
> > hashjoinable join clauses. I don't think expression_tree_mutator
> > is suitable to do that since it could allow unwanted result when
> > constraint predicates or join clauses are not simple OpExpr's.
> 
> Do you have any example of this situation?
> I am trying to find unwanted results you mentioned, but I don't have
> any results in this timing. I have a hunch that it will allow unwanted
> results because I have thought only about very simple situation for
> this function.

As mentioned above.

> > Otherwise could you give me clear explanation on what it does?
> 
> This function transfers CHECK() constraint to filter expression by following
> procedures.
> (1) Get outer table's CHECK() constraint by using get_relation_constraints().
> (2) Walk through expression tree got in (1) by using 
> expression_tree_mutator() 
> with check_constraint_mutator() and change only outer's Var node to
> inner's one according to join clause.
> 
> For example, when CHECK() constraint of table A is "num % 4 = 0" and
> join clause between table A and B is "A.num = B.data",
> then we can get "B.data % 4 = 0" for filtering purpose.
> 
> This also accepts more complex join clause like "A.num = B.data * 2",
> then we can get "(B.data * 2) % 4 = 0".
> 
> In procedure (2), to decide whether to use each join clause for changing
> Var node or not, I implement check_constraint_mutator() to judge whether
> join clause is hash-joinable or not.

Thanks for the explanation. I think that the function has been
made considering only rather plain calses. We should put more
thought to making the logic clearer so that we can define the
desired/possible capability and limitations clearly.

> Actually, I want to judge whether OpExpr as top expression tree of
> join clause means "=" or

[HACKERS] proposal: PL/Pythonu - function ereport

2015-10-08 Thread Pavel Stehule

Hi

We cannot to raise PostgreSQL exception with setting all possible fields. I
propose new function

plpy.ereport(level, [ message [, detail [, hint [, sqlstate, ... )

The implementation will be based on keyword parameters, so only required
parameters should be used.

Examples:

plpy.ereport(plpy.NOTICE, 'some message', 'some detai')
plpy.ereport(plpy.ERROR, 'some message', sqlstate = 'zx243');

Comments, notices, objections?

Regards

Pavel

Re: [HACKERS] Proposal: pg_confcheck - syntactic & semantic validation of postgresql configuration files

2015-10-08 Thread Tom Lane

Amir Rohan  writes:
> Comments?

ISTM that all of the "functional" parts of this are superseded by
pg_file_settings; or at least, if they aren't, you need to provide a
rationale that doesn't consist only of pointing to pre-9.5 discussions.
The "advice" parts of it maybe are still useful, but a tool that's just
meant for that would look quite a bit different.  Maybe what you're really
after is to update pgtune.

Also, as a general comment, the sketch you provided seems to require
porting everything the server knows about GUC file syntax, file locations,
variable names and values into some other representation that apparently
is not C, nor Perl either.  I think that's likely to be impossible to keep
accurate or up to date.  Just as a thought experiment, ask yourself how
you'd validate the TimeZone setting in external code, bearing in mind that
anytime you don't give exactly the same answer the server would, your tool
has failed to be helpful.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Foreign join pushdown vs EvalPlanQual

2015-10-08 Thread Etsuro Fujita


On 2015/10/07 15:39, Etsuro Fujita wrote:

On 2015/10/07 15:06, Kyotaro HORIGUCHI wrote:

At Wed, 7 Oct 2015 00:24:57 -0400, Robert Haas 
wrote

I think it rather requires *replacing* two resjunk columns by one new
one.  The whole-row references for the individual foreign tables are
only there to support EvalPlanQual; if we instead have a column to
populate the foreign scan's slot directly, then we can use that column
for that purpose directly and there's no remaining use for the
whole-row vars on the baserels.



It is what I had in mind.



OK  I'll investigate this further.


I noticed that the approach using a column to populate the foreign 
scan's slot directly wouldn't work well in some cases.  For example, 
consider:


SELECT * FROM verysmall v LEFT JOIN (bigft1 JOIN bigft2 ON bigft1.x = 
bigft2.x) ON v.q = bigft1.q AND v.r = bigft2.r FOR UPDATE OF v;


The best plan is presumably something like this as you said before:

LockRows
-> Nested Loop
   -> Seq Scan on verysmall v
   -> Foreign Scan on bigft1 and bigft2
Remote SQL: SELECT * FROM bigft1 JOIN bigft2 ON bigft1.x = 
bigft2.x AND bigft1.q = $1 AND bigft2.r = $2


Consider the EvalPlanQual testing to see if the updated version of a 
tuple in v satisfies the query.  If we use the column in the testing, we 
would get the wrong results in some cases.


Best regards,
Etsuro Fujita



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Proposal: pg_confcheck - syntactic & semantic validation of postgresql configuration files

Hi,

Related SO question from '13:

https://stackoverflow.com/questions/1619/how-to-syntax-check-postgresql-config-files

Peter Eisentraut answered:

> There is no way to do this that is similar to apache2ctl. If you reload
> the configuration files and there is a syntax error, the PostgreSQL
> server will complain in the log and refuse to load the new file.
> <...>
> (Of course, this won't guard you against writing semantically wrong
> things, but apache2ctl won't do that either.) 

So, at least one person in the history of the universe (besides me)
has wanted a tool that does this.

In addition to a simple syntax check, there's a bunch of "config wisdom"
tidbits I've encountering, which is scattered through talks, commit
messages, and mailing list discussion, and documentation notes
(chapter 17, paragraph 12). These could be collected into a tool that:

- Checks your configuration's syntax
- Checks for semantically legal values ('read committed' not
'read_committed' )
- Warns of unrecognized keys ("'hot_standby' is not a valid GUC in v9.1").
- Is version-aware.
- Serves as an "executable" form of past experience.
- Describes the config problem in a structured way (as an expression,
for example)
- Includes a friendly, semi-verbose, description of the problem.
- Describes ways to fix the problem, *right there*.
- Is independent from server (but reuses the same parser), particularly
any life-cycle commands such as restart.

Users who adopt the tool will also seem more likely to contribute back
coverage for any new pitfalls they fall into, which can yield feedback
for developers and drive improvements in documentation.

Those inclined can use the tool in their ops repo' CI.

Two talk videos have been particularly valuable sources for example
of configuration that can bite you:

"...Lag - What's Wrong with My Slave?"  (Samantha Billington, 2015)
https://youtu.be/If22EwtCFKc

"PostgreSQL Replication Tutorial"(Josh berkus,2015 )
https://youtu.be/GobQw9LMEaw

What I've collected so far:

- Quoting rules for recovery.conf and postgresql.conf are different
- 'primary_conninfo' is visible to unprivileged users, so including a
password is a security risk.
- streaming replication + read slave should consider turning on
"hot_standby_feedback".
- replication_timeout < wal_receiver_status_interval is bad.
- setting max_wal_senders too low so no streaming replication
block backup with pg_basebackup
- max_wal_senders without wal_level
- recently on "weekly news":
  Fujii Masao pushed:

  Document that max_worker_processes must be high enough in standby.
  The setting values of some parameters including max_worker_processes
  must be equal to or higher than the values on the master. However,
  previously max_worker_processes was not listed as such parameter in
  the document. So this commit adds it to that list.  Back-patch to
  9.4 where max_worker_processes was added.

http://git.postgresql.org/pg/commitdiff/1ea5ce5c5f204918b8a9fa6eaa8f3f1374aa8aec

- turning on replication with default max_wal_senders =0
- FATAL:  WAL archival (archive_mode=on) requires wal_level "archive",
"hot_standby", or "logical"

There must be more, please mention any other checks you feel should
be included.

Note: The server _will_ explicitly complain about the last two but
a bad "wal_level" for example is only checked once the server
is already down:

"LOG:  parameter "wal_level" cannot be changed without restarting the
server"

Implementation: without getting too far ahead, I feel rules should be
stored as independent files in a drop directory, in some lightweight,
structured format (JSON, YAML, custom textual format), with some
mandatory fields (version, severity, description, solution(s)).
This makes it easy for people to add new checks without making any
oblique language demands (Perl isn't as popular as it used
to be)

Comments?

Amir



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Small documentation fix in src/interfaces/ecpg/preproc/po/pt_BR.po

2015-10-08 Thread Andreas 'ads' Scherbaum



Hello,

On 10/07/2015 05:18 PM, Euler Taveira wrote:

On 06-10-2015 19:49, Andreas 'ads' Scherbaum wrote:

When working on a script, I stumbled over a mistake in the pt_BR.po
translation for ecpg. Patch attached.


I've already fixed it in the translation git. It'll be available only in
the next set of releases.


Thank you.

Care to commit the other one for zh_CN.po as well?


Thanks,

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project
diff --git a/src/bin/initdb/po/zh_CN.po b/src/bin/initdb/po/zh_CN.po
index 3986eab..1dd6dcd 100644
--- a/src/bin/initdb/po/zh_CN.po
+++ b/src/bin/initdb/po/zh_CN.po
@@ -727,7 +727,7 @@ msgid ""
 "Report bugs to .\n"
 msgstr ""
 "\n"
-"æ¥åéè¯¯è³ .\n"
+"æ¥åéè¯¯è³ .\n"
 
 #: initdb.c:2907
 msgid ""

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: index-only scans with partial indexes

2015-10-08 Thread Tomas Vondra




On 10/08/2015 07:30 AM, Kyotaro HORIGUCHI wrote:

Hello,


The attached patch applies on the latest v5 patch and will
address above issues. (And modifies expected files, which are the
manifestation of this improovement).


As you see, it is a quite bad choice. Ugly and unreadable and
fragile.


I suppose you mean placing the list into IndexClauseSet?



The cause of this seeming mismatch would be the place to hold
indexrinfos. It is determined only by baserestrictinfo and
indpred. Any other components are not involved. So IndexClauseSet
is found not to be the best place after all, I suppose.

Instead, I came to think that the better place is
IndexOptInfo. Partial indexes are examined in check_partial_index
and it seems to be the most proper place to check this so far.


AFAIK there's only one IndexOptInfo instance per index, so I'm not sure 
how would that work with queries that use the index in multiple places? 
Imagine for example table joined to itself, where both sides use the 
index with different conditions.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-10-08 Thread Beena Emerson

Amir Rohan wrote:
> But implementing a mechanism that can be used by other features in
> the future seems the deciding factor here, rather then the brevity of a 
> bespoke mini-language.

One decision to be taken is which among JSON or mini-language is better for
the SR setting.
Mini language can fit into the postgresql.conf single line.

For JSON currently a different file is used. But as said earlier, in case
composite types are required in future for other parameters then having
multiple .conf files does not make sense. To avoid this we can:
-   support multi-line GUC which would be helpful for other comma-separated
conf values along with s_s_names.  (This can make mini-language more
readable as well)
-   Allow JSON support in postgresql.conf. So that other parameters in 
future
can use JSON as well within postgresql.conf. 

What are the chances of future data requiring JSON? I think rare.

> > And I'm really not much in favor of a separate file; if we go that way
> > then we're going to have to reinvent a huge amount of infrastructure
> > that already exists for GUCs.
>
> Adding support for JSON objects (or some other kind of composite data
> type) 
> to the .conf parser would negate the need for one, and would also solve
> the
> problem being discussed for future cases.

With the current pg_syncinfo file, the only code added was to check the
pg_syncinfo file in the specified path and read the entire content of the
file into a variable which was used for further parsing which could have
been avoided with multi-line GUC.




-
Beena Emerson

--
View this message in context: 
http://postgresql.nabble.com/Support-for-N-synchronous-standby-servers-take-2-tp5849384p5869285.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.

On Thu, Oct 8, 2015 at 6:03 PM, Amir Rohan wrote:
> On 10/08/2015 10:39 AM, Michael Paquier wrote:
>>> Someone mentioned a daisy chain setup which sounds fun. Anything else in
>>> particular? Also, it would be nice to have some canned way to measure
>>> end-to-end replication latency for variable number of nodes.
>>
>> Hm. Do you mean comparing the LSN position between two nodes even if
>> both nodes are not connected to each other? What would you use it for?
>>
>
> In a cascading replication setup, the typical _time_ it takes for a
> COMMIT on master to reach the slave (assuming constant WAL generation
> rate) is an important operational metric.

Hm. You mean the exact amount of time it gets to be sure that a given
WAL position has been flushed on a cascading standby, be it across
multiple layers. Er, that's a bit tough without patching the backend
where I guess we would need to keep a track of when a LSN position has
been flushed. And calls of gettimeofday are expensive, so that does
not sound like a plausible alternative here to me...

> It would be useful to catch future regressions for that metric,
> which may happen even when a patch doesn't outright break cascading
> replication. Just automating the measurement could be useful if
> there's no pg facility that tracks performance over time in
> a regimented fashion. I've seen multiple projects which consider
> a "benchmark suite" to be part of its testing strategy.

Ah, OK. I see. That's a bit out of scope of this patch, and that's
really OS-dependent, but as long as the comparisons can be done on the
same OS it would make sense.

> As for the "daisy chain" thing, it was (IIRC) mentioned in a josh berkus
> talk I caught on youtube. It's possible to setup cascading replication,
> take down the master, and then reinsert it as replicating slave, so that
> you end up with *all* servers replicating from the
> ancestor in the chain, and no master. I think it was more
> a fun hack then anything, but also an interesting corner case to
> investigate.

Ah, yes. I recall this one. I am sure it made the audience smile. All
the nodes link to each other in closed circle.

>>> What about going back through the commit log and writing some regression
>>> tests for the real stinkers, if someone care to volunteer some candidate
>>> bugs
>>
>> I have drafted a list with a couple of items upthread:
>> http://www.postgresql.org/message-id/cab7npqsgffsphocrhfoasdanipvn6wsh2nykf1kayrm+9_m...@mail.gmail.com
>> So based on the existing patch the list becomes as follows:
>> - wal_retrieve_retry_interval with a high value, say setting to for
>> example 2/3s and loop until it is applied by checking it is it has
>> been received by the standby every second.
>> - recovery_target_action
>> - archive_cleanup_command
>> - recovery_end_command
>> - pg_xlog_replay_pause and pg_xlog_replay_resume
>> In the list of things that could have a test, I recall that we should
>> test as well 2PC with the recovery delay, look at a1105c3d. This could
>> be included in 005.
>
> a1105c3 Mar 23 Fix copy & paste error in 4f1b890b137.  Andres Freund
> 4f1b890 Mar 15 Merge the various forms of transaction commit & abort
> records.  Andres Freund
>
> Is that the right commit?

That's this one. a1105c3 was actually rather tricky... The idea is to
simply check the WAL replay delay with COMMIT PREPARED.

>> The advantage of implementing that now is that we could see if the
>> existing routines are solid enough or not.
>
> I can do this if you point me at a self-contained thread/#issue.

Hm. This patch is already 900 lines, perhaps it would be wiser not to
make it more complicated for now..
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.

On 10/08/2015 04:47 PM, Michael Paquier wrote:
> On Thu, Oct 8, 2015 at 6:03 PM, Amir Rohan wrote:
>> On 10/08/2015 10:39 AM, Michael Paquier wrote:
 Someone mentioned a daisy chain setup which sounds fun. Anything else in
 particular? Also, it would be nice to have some canned way to measure
 end-to-end replication latency for variable number of nodes.
>>>
>>> Hm. Do you mean comparing the LSN position between two nodes even if
>>> both nodes are not connected to each other? What would you use it for?
>>>
>>
>> In a cascading replication setup, the typical _time_ it takes for a
>> COMMIT on master to reach the slave (assuming constant WAL generation
>> rate) is an important operational metric.
> 
> Hm. You mean the exact amount of time it gets to be sure that a given
> WAL position has been flushed on a cascading standby, be it across
> multiple layers. Er, that's a bit tough without patching the backend
> where I guess we would need to keep a track of when a LSN position has
> been flushed. And calls of gettimeofday are expensive, so that does
> not sound like a plausible alternative here to me...
> 

Wouldn't this work?

1) start timer
2) Grab pg_stat_replication.sent_location from master
3) pg_switch_xlog() # I /think/ we want this, could be wrong
4) Poll slave's pg_last_xlog_replay_location() until LSN shows up
5) stop timer

Amir





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] check fails on Fedora 23

2015-10-08 Thread Pavel Raiskup

On Tuesday 06 of October 2015 17:59:23 Andrew Dunstan wrote:
> 
> On 10/06/2015 05:45 PM, Thomas Munro wrote:
> > On Wed, Oct 7, 2015 at 9:49 AM, Robert Haas  wrote:
> >> On Sun, Oct 4, 2015 at 11:52 AM, Andrew Dunstan  
> >> wrote:
> >>> Isn't this arguably a Fedora regression? What did they change in F23 to 
> >>> make
> >>> it fail? I note that F23 is still in Beta.
> >> Maybe, but it's pretty unfriendly for us to complain about a library
> >> issue, if it is one, by failing an Assert().  People with
> >> non-assert-enabled builds will just get wrong answers.  Yuck.
> >>
> >> Thinking about how this could happen, I believe that one possibility
> >> is that there are two strings A and B and a locale L such that
> >> strcoll_l(A, B, L) and memcmp(strxfrm(A, L), strxfrm(B, L)) disagree
> >> (that is, the results are of different sign, or one is zero and the
> >> other is not).
> > I wonder if Glibc bug 18589 is relevant.  Bug 18934 says "Note that
> > these unittests pass with glibc-2.21 but fail with 2.22 and current
> > git due to bug 18589 which points to a broken change in the collate
> > algorithm that needs to be reverted first."  Hungarian is mentioned.
> > Doesn't Fedora 23 include glibc-2.22?  Is it possible that that bug
> > affects strcoll but not strxfrm?
> >
> > https://sourceware.org/bugzilla/show_bug.cgi?id=18589
> > https://sourceware.org/bugzilla/show_bug.cgi?id=18934
> >
> 
> 
> Yes, it's 2.22:
> 
> [vagrant@localhost ~ ]$ rpm -q -a | grep glibc
> glibc-2.22-3.fc23.x86_64
> glibc-devel-2.22-3.fc23.x86_64
> glibc-common-2.22-3.fc23.x86_64
> glibc-headers-2.22-3.fc23.x86_64
> 
> cheers
> 
> andrew

Yup, broken glibc:
https://bugzilla.redhat.com/show_bug.cgi?id=1269895

Pavel



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: pg_confcheck - syntactic & semantic validation of postgresql configuration files

On 10/08/2015 02:38 PM, Tom Lane wrote:
> Amir Rohan  writes:
>> Comments?
> 
> ISTM that all of the "functional" parts of this are superseded by
> pg_file_settings; 

To use the new pg_file_settings view you need:
1( 9.5+
2( a running server

The feature is describes as follows in a97e0c3354ace5d74

> The information returned includes the configuration file name, line
> number in that file, sequence number indicating when the parameter is
> loaded )useful to see if it is later masked by another definition of
> the same parameter(, parameter name, and what it is set to at that
> point. This information is updated on reload of the server.

None of which has much overlap in purpose with what I was suggesting, so
I don't see why you think it actually makes it redundant.

> or at least, if they aren't, you need to provide a
> rationale that doesn't consist only of pointing to pre-9.5 discussions.

The original rationale already does AFAICT.

> The "advice" parts of it maybe are still useful, but a tool that's just
> meant for that would look quite a bit different.  Maybe what you're really
> after is to update pgtune.
> 

In the sense that pgtune processes .conf files and helps the user choose
better settings, yes there's commonality.
But in that it looks at 5 GUCs or so and applies canned formulas to
silently write out a new .conf without explanation, it's not the
same tool.

> Also, as a general comment, the sketch you provided seems to require
> porting everything the server knows about 

> GUC file syntax,

yes, a parser, possibly sharing code with the server.

> file locations,

specified via command line option, yes.

> variable names and values 

misc/guc.c is eminently parseable in that sense.
For some types, validation is trivial.
For others, we do the best we can and improve incrementally.

> into some other representation that apparently
> is not C, nor Perl either.  

Well, it's data, it probably should have that anyway.

You raise an interesting issue that having the schema for the
configuration described in a more congenial format than C code would
have made implementing this easier, and may actually be worthwhile
in its own right.

Aside on parsing, when starting a brand new configuration file was
suggested in
caog9aphycpmtypaawfd3_v7svokbnecfivmrc1axhb40zbs...@mail.gmail.com/

only so json could be used for configuring that feature,
I wrote a parser that extends postgresql.conf to accept JSON objects as
values as a toy exercise. It's not rocket science.

So, None of the above seem like a major hurdle to me.

> I think that's likely to be impossible to keep accurate or up to date.  

If significant rot sets in a few releases down the line, there may be
more false positives. But then again if users found it useful so
developers cared about keeping it up to data, it wouldn't get that way.
Again, previous note about DRY and lack of explicit schema for GUCs
except in code form.

Also, this depends crucially on the GUC churn rate, which admittedly you
are far better qualified to pronounce on than me.

> Just as a thought experiment,
> ask yourself how you'd validate the TimeZone setting in external code,
bearing in mind that
> anytime you don't give exactly the same answer the server would, your tool
> has failed to be helpful.

A tool may be extremely useful for a hundred checks, and poor
in a few others. That would make it imperfect, not useless.

If TimeZone turns out to be an extremely challenging GUC to validate,
I'd unceremoniously skip the semantic check for it.

Kind Regards,
Amir

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-10-08 Thread Beena Emerson

Hello,

The JSON method was used in the patch because it seemed to be the group
consensus.

Requirement:
- Grouping : Specify a list of node names with the required number of
ACK for the group. We could have priority or quorum group. Quorum treats
all the standby in same level and ACK from any k can be considered. In
priority behavior, ACK must be received from the specified k lowest priority
servers for a successful transaction.
- Group names to enable easier status reporting for group. The
topmost group may not be named. It will be assigned a default name. All the
sub groups are to be compulsorily named.
- Not more than 3 groups with 1 level of nesting expected

Behavior in submitted patch:
- The name of the top most group is named ‘Default Group”. All the
other standby_names or groups will have to be listed within this.
- When more than 1 connected standby has the same name then the
highest LSN among them is chosen. Example: 2 priority in (X,Y,Z). If there 2
nodes X connected, even though both X have returned ACK, the server will
wait for ACK from Y.
- There are no “potential” standbys. In quorum behavior, there are
no fixed standbys which are to be in sync, all members are equal. ACK from
any specified n nodes from a set is considered success.

Further:
- improvements to pg_stat_replication to give the node tree and
status?
- Manipulate/Edit conf setting using functions.
- Regression tests

Mini-lang:
[] - to specify prioirty
() - to specify quorum
Format - : []
Not specifying count defaults to 1.
Ex: s_s_names = '2(cluster1: 1(A,B), cluster2: 2[X,Y,Z], U)'

JSON
It would contain 2 main keys: "sync_info" and "groups"
The "sync_info" would consist of "quorum"/"priority" with the count and
"nodes"/"group" with the group name or node list.
The optional "groups" key would list out all the "group" mentioned within
"sync_info" along with the node list.
Ex: {
"sync_info":
{
"quorum":2,
"nodes":
[
{"quorum":1,"group":"cluster1"},
{"prioirty":2,"group": "cluster2"},
"U"
]
},
"groups":
{
"cluster1":["A","B"],
"cluster2":["X","Y","z"]
}
}

JSON and mini-language:
- JSON is more verbose
- You can define a group and use it multiple times in sync settings
but since no many levels or nesting is expected I am not sure how useful
this will be.
- Though JSON parser is inbuilt, additional code is required to check
for the required format of JSON. For mini-language, new parser will have to
be written.

Despite all, I feel the mini-language is better mainly for its brevity.
Also, it will not require additional GUC parser support (multi line).

-
Beena Emerson

--
View this message in context:
http://postgresql.nabble.com/Support-for-N-synchronous-standby-servers-take-2-tp5849384p5869286.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RLS bug in expanding security quals

2015-10-08 Thread Dean Rasheed

On 8 October 2015 at 05:45, Haribabu Kommi  wrote:
> On Thu, Oct 8, 2015 at 2:54 PM, Stephen Frost  wrote:
>> It's quite late here, but I'll take a look at this in more depth
>> tomorrow.
>>
>> Based on what the Assert's testing, I took an educated guess and tried
>> running without the UNION ALL, which appeared to work correctly.
>
> Yes, it works fine without UNION ALL.
>

I took a look at this and it appears to be a bug in the UNION ALL
handling of queries with security quals. An even simpler test case
that triggers it is:

create role test_user1;
drop table if exists foo cascade;
create table foo(a int);
grant all on foo to test_user1;

alter table foo enable row level security;
create policy foo_policy on foo for select using (a > 0);

set role test_user1;
explain (verbose, costs off)
SELECT a FROM foo
UNION ALL
SELECT 1;

What's happening is that flatten_simple_union_all() and
pull_up_union_leaf_queries() is building translated_vars lists on
appendrels, prior to the security quals being expanded, and those
security quals are adjusted to point to the parent rel. Then the code
to expand the security quals on the child RTEs no longer sees any Vars
pointing to those RTEs, so the resulting subquery RTEs end up with
empty targetlists.

This appears to be a simple oversight in expand_security_qual() -- it
needs to look at and update the Vars in the translated_vars lists of
the appendrels to work properly. I think I wasn't expecting for things
outside the Query to be pointing into its guts in this way.

Attached is a simple patch that appears to work, but it needs more
testing (and some regression tests).

Regards,
Dean
diff --git a/src/backend/optimizer/prep/prepsecurity.c b/src/backend/optimizer/prep/prepsecurity.c
new file mode 100644
index c4b61df..c0d8a35
--- a/src/backend/optimizer/prep/prepsecurity.c
+++ b/src/backend/optimizer/prep/prepsecurity.c
@@ -263,7 +263,8 @@ expand_security_qual(PlannerInfo *root,
 			 * Replace any variables in the outer query that refer to the
 			 * original relation RTE with references to columns that we will
 			 * expose in the new subquery, building the subquery's targetlist
-			 * as we go.
+			 * as we go.  Also replace any references in the translated_vars
+			 * lists of any appendrels.
 			 */
 			context.rt_index = rt_index;
 			context.sublevels_up = 0;
@@ -274,6 +275,8 @@ expand_security_qual(PlannerInfo *root,

 			security_barrier_replace_vars((Node *) parse, );
 			security_barrier_replace_vars((Node *) tlist, );
+			security_barrier_replace_vars((Node *) root->append_rel_list,
+		  );

 			heap_close(context.rel, NoLock);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-10-08 Thread Beena Emerson

Sawada Masahiko wrote:
>
> I agree with adding support for multi-line GUC parameters.
> But I though it is:
> param = 'param1,
> param2,
> param3'
>
> This reads as 'value1,value2,value3'.

Use of '\' ensures that omission the closing quote does not break the entire
file.





-
Beena Emerson

--
View this message in context: 
http://postgresql.nabble.com/Support-for-N-synchronous-standby-servers-take-2-tp5849384p5869289.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Windows PostgreSQL and PgAdmin3 Open Source Installers

2015-10-08 Thread Alexey Slaykovsky


Hello, hackers!
We haven't find how was EnterpriseDB PostgreSQL built for Windows OS. So 
we choose to build PostgreSQL and PgAdmin3 on ours own and publish our 
results for community.
Sources and patches are living here: 
https://github.com/postgrespro/pgwininstall
PostgreSQL builds with nls (libintl), libxml2, libxslt, Python, Perl, 
uuid, zlib and iconv. All dependencies except Perl and Python are 
building from source code.
Now that build have no additional extensions such as PostGIS or 
pg_repack but there are plans to add it to build. In addition we have no 
processed upgrade scenario from EnterpriseDB build of PostgreSQL.
We hope interested in Windows community will link up to develop these 
scripts on github.
For example, we have fresh 9.4.5 built by these scripts so you can try 
it here: https://yadi.sk/d/T0PCWguBja8o5



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Tsvector editing functions

2015-10-08 Thread Teodor Sigaev


There is patch that adds some editing routines for tsvector type.

...

When submitting a patch, it's a good idea to explain why someone would
want the feature you are adding.  Maybe that's obvious to you, but it
isn't clear to me why we'd want this.



Some examples:
tsvector delete(tsvector, text)
remove wronlgy indexed word (may, be a stop word)
text[] to_array(tsvector)
In my practice, I needed it to work with smlar module.
tsvector to_tsvector(text[])
Converts list of tags to tsvector, because search in tsvector is more
flexible and fast than array's equivalents
set unnest(tsvector)
Count some complicated statistics.

That functions mostly needed in utility processing rather in workflow.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS]WIP: Covering + unique indexes.

2015-10-08 Thread Anastasia Lubennikova



Hi hackers,

I'm working on a patch that allows to combine covering and unique 
functionality for btree indexes.


_Previous discussion was here:_
1) Proposal thread 

2) Message with proposal clarification 



In a nutshell, the feature allows to create index with "key" columns and 
"included" columns.
"key" columns can be used as scan keys. Unique constraint relates only 
to "key" columns.

"included" columns may be used as scan keys if they have suitable opclass.
Both "key" and "included" columns can be returned from index by 
IndexOnlyScan.


Btree is the default index and it's used everywhere. So it requires 
properly testing. Volunteers are welcome)


_Use case:_
- We have a table (c1, c2, c3, c4);
- We need to have an unique index on (c1, c2).
- We would like to have a covering index on all columns to avoid reading 
of heap pages.


Old way:
CREATE UNIQUE INDEX olduniqueidx ON oldt USING btree (c1, c2);
CREATE INDEX oldcoveringidx ON oldt USING btree (c1, c2, c3, c4);

What's wrong?
Two indexes contain repeated data. Overhead to data manipulation 
operations and database size.


New way:
CREATE UNIQUE INDEX newidx ON newt USING btree (c1, c2) INCLUDING (c3, c4);

The patch is attached.
In 'test.sql' you can find a test with detailed comments on each step, 
and comparison of old and new indexes.


New feature has following syntax:
CREATE UNIQUE INDEX newidx ON newt USING btree (c1, c2) INCLUDING (c3, c4);
Keyword INCLUDING defines the "included" columns of index. These columns 
aren't concern to unique constraint.
Also, them are not stored in index inner pages. It allows to decrease 
index size.


_Results:_
1) Additional covering index is not required anymore.
2) New index can use IndexOnlyScan on queries, where old index can't.

For example,
explain analyze select c1, c2 from newt where c1<1 and c3<20;

*more examples in 'test.sql'

_Future work:_
To do opclasses for "included" columns optional.

CREATE TABLE tbl (c1 int, c4 box);
CREATE UNIQUE INDEX idx ON tbl USING btree (c1) INCLUDING (c4);

If we don't need c4 as an index scankey, we don't need any btree opclass 
on it.

But we still want to have it in covering index for queries like

SELECT c4 FROM tbl WHERE c1=1000;
SELECT * FROM tbl WHERE c1=1000;

--
Anastasia Lubennikova
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company





test.sql
Description: application/sql
diff --git a/src/backend/access/common/indextuple.c b/src/backend/access/common/indextuple.c
index dc588d7..83f24c3 100644
--- a/src/backend/access/common/indextuple.c
+++ b/src/backend/access/common/indextuple.c
@@ -19,6 +19,7 @@
 #include "access/heapam.h"
 #include "access/itup.h"
 #include "access/tuptoaster.h"
+#include "utils/rel.h"
 
 
 /* 
@@ -441,3 +442,30 @@ CopyIndexTuple(IndexTuple source)
 	memcpy(result, source, size);
 	return result;
 }
+
+/*
+ * Reform index tuple. Truncate nonkey (INCLUDED) attributes.
+ */
+IndexTuple
+index_reform_tuple(Relation idxrel, IndexTuple olditup, int natts, int nkeyatts)
+{
+	TupleDesc   itupdesc = RelationGetDescr(idxrel);
+	Datum   values[INDEX_MAX_KEYS];
+	boolisnull[INDEX_MAX_KEYS];
+	IndexTuple	newitup;
+
+	Assert(natts <= INDEX_MAX_KEYS);
+	Assert(nkeyatts > 0);
+	Assert(nkeyatts <= natts);
+
+	index_deform_tuple(olditup, itupdesc, values, isnull);
+
+	/* form new tuple that will contain only key attributes */
+	itupdesc->natts = nkeyatts;
+	newitup = index_form_tuple(itupdesc, values, isnull);
+	newitup->t_tid = olditup->t_tid;
+
+	itupdesc->natts = natts;
+
+	return newitup;
+}
diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index 77c2fdf..d14df12 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -108,18 +108,23 @@ _bt_doinsert(Relation rel, IndexTuple itup,
 			 IndexUniqueCheck checkUnique, Relation heapRel)
 {
 	bool		is_unique = false;
-	int			natts = rel->rd_rel->relnatts;
+	int			nkeyatts = rel->rd_rel->relnatts;
 	ScanKey		itup_scankey;
 	BTStack		stack;
 	Buffer		buf;
 	OffsetNumber offset;
 
+	Assert (rel->rd_index != NULL);
+	Assert(rel->rd_index->indnatts != 0);
+	Assert(rel->rd_index->indnkeyatts != 0);
+	nkeyatts = IndexRelationGetNumberOfKeyAttributes(rel);
+
 	/* we need an insertion scan key to do our search, so build one */
 	itup_scankey = _bt_mkscankey(rel, itup);
 
 top:
 	/* find the first page containing this key */
-	stack = _bt_search(rel, natts, itup_scankey, false, , BT_WRITE);
+	stack = _bt_search(rel, nkeyatts, itup_scankey, false, , BT_WRITE);
 
 	offset = InvalidOffsetNumber;
 
@@ -134,7 +139,7 @@ top:
 	 * move right in the tree.  See Lehman and Yao for an excruciatingly
 	 * precise description.
 	 */
-	buf = _bt_moveright(rel, buf, natts, itup_scankey, false,

[HACKERS] removing set_latch_on_sigusr1

As part of the dynamic background worker mechanism, I introduced a
flag set_latch_on_sigusr1.  When set, a process sets its own latch any
time it receives SIGUSR1.  The original motivation for this was that
the postmaster sends SIGUSR1 to a process to indicate the death of a
background worker, but cannot directly set the process latch since we
want to minimize the postmaster's contact surface with the main shared
memory segment.  The reason I introduced a new flag instead of just
*always* setting the latch was because I thought somebody might
complain about the cost of setting the latch every time.  But now I
think that's exactly what we should do: forget the flag, always set
the latch.  If you get a SIGUSR1, somebody is trying to get your
attention.  At worst, setting the latch will cause whatever latch-wait
loop got interrupted to re-execute without making forward progress.
Most of the time, though, you'll do something like
CHECK_FOR_INTERRUPTS() in that loop, and you'll probably find that
you've got one.

The code we're talking about here is in procsignal_sigusr1_handler.
That function can potentially call HandleCatchupInterrupt,
HandleNotifyInterrupt, HandleParallelMessageInterrupt, or
RecoveryConflictInterrupt.   And all of those functions set the
process latch.  So the only thing we're gaining by having
set_latch_on_sigusr1 is the possibility that we might avoid setting
the latch in response to either (1) a postmaster signal related to a
background worker that we happen not to care about at the moment or
(2) a totally spurious SIGUSR1.  But neither of those events should
happen very often, so what's the point?  The cost of setting the latch
when we don't need to is typically small, but NOT setting it when we
should have done can lead to an indefinite hang.

As it happens, the TupleQueueFunnelNext function I recently committed
has such a hazard, which I failed to spot during review and testing.
If people don't like this, I can instead cause that function to set
the flag.  But every place that sets the flag has to use a
PG_TRY()/PG_CATCH() block to make sure the old value of the flag gets
restored.  I'm pretty sure that's going to burn more cycles than the
flag can ever hope to save, not to mention the risk of bugs due to
people forgetting to add necessary volatile qualifiers.  We've already
got four PG_TRY() blocks in the code to cater to this stupid flag, and
if we keep it around I'm sure we'll accumulate at least a few more.

Patch attached.  Objections?  Suggestions?  Comments?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 68c9505..c38d486 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -954,45 +954,31 @@ WaitForBackgroundWorkerStartup(BackgroundWorkerHandle *handle, pid_t *pidp)
 {
 	BgwHandleStatus status;
 	int			rc;
-	bool		save_set_latch_on_sigusr1;
 
-	save_set_latch_on_sigusr1 = set_latch_on_sigusr1;
-	set_latch_on_sigusr1 = true;
-
-	PG_TRY();
+	for (;;)
 	{
-		for (;;)
-		{
-			pid_t		pid;
+		pid_t		pid;
 
-			CHECK_FOR_INTERRUPTS();
+		CHECK_FOR_INTERRUPTS();
 
-			status = GetBackgroundWorkerPid(handle, );
-			if (status == BGWH_STARTED)
-*pidp = pid;
-			if (status != BGWH_NOT_YET_STARTED)
-break;
-
-			rc = WaitLatch(MyLatch,
-		   WL_LATCH_SET | WL_POSTMASTER_DEATH, 0);
+		status = GetBackgroundWorkerPid(handle, );
+		if (status == BGWH_STARTED)
+			*pidp = pid;
+		if (status != BGWH_NOT_YET_STARTED)
+			break;
 
-			if (rc & WL_POSTMASTER_DEATH)
-			{
-status = BGWH_POSTMASTER_DIED;
-break;
-			}
+		rc = WaitLatch(MyLatch,
+	   WL_LATCH_SET | WL_POSTMASTER_DEATH, 0);
 
-			ResetLatch(MyLatch);
+		if (rc & WL_POSTMASTER_DEATH)
+		{
+			status = BGWH_POSTMASTER_DIED;
+			break;
 		}
+
+		ResetLatch(MyLatch);
 	}
-	PG_CATCH();
-	{
-		set_latch_on_sigusr1 = save_set_latch_on_sigusr1;
-		PG_RE_THROW();
-	}
-	PG_END_TRY();
 
-	set_latch_on_sigusr1 = save_set_latch_on_sigusr1;
 	return status;
 }
 
@@ -1009,40 +995,26 @@ WaitForBackgroundWorkerShutdown(BackgroundWorkerHandle *handle)
 {
 	BgwHandleStatus status;
 	int			rc;
-	bool		save_set_latch_on_sigusr1;
-
-	save_set_latch_on_sigusr1 = set_latch_on_sigusr1;
-	set_latch_on_sigusr1 = true;
 
-	PG_TRY();
+	for (;;)
 	{
-		for (;;)
-		{
-			pid_t		pid;
+		pid_t		pid;
 
-			CHECK_FOR_INTERRUPTS();
+		CHECK_FOR_INTERRUPTS();
 
-			status = GetBackgroundWorkerPid(handle, );
-			if (status == BGWH_STOPPED)
-return status;
+		status = GetBackgroundWorkerPid(handle, );
+		if (status == BGWH_STOPPED)
+			return status;
 
-			rc = WaitLatch(>procLatch,
-		   WL_LATCH_SET | WL_POSTMASTER_DEATH, 0);
+		rc = WaitLatch(>procLatch,
+	   WL_LATCH_SET | WL_POSTMASTER_DEATH, 0);
 
-			if (rc & WL_POSTMASTER_DEATH)
-return BGWH_POSTMASTER_DIED;
+		if (rc & WL_POSTMASTER_DEATH)
+			return BGWH_POSTMASTER_DIED;
 
-			ResetLatch(>procLatch);
-

Re: [HACKERS] removing set_latch_on_sigusr1

2015-10-08 Thread Andres Freund

On October 8, 2015 5:41:37 PM GMT+02:00, Robert Haas  
wrote:
>Patch attached.  Objections?  Suggestions?  Comments?

I've not reviewed the patch, but +1 for the plan.

Andres

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Getting sorted data from foreign server

On Tue, Oct 6, 2015 at 6:46 AM, Ashutosh Bapat
 wrote:
> standard_qp_callback() sets root->query_pathkeys to pathkeys on which the
> result of query_planner are expected be sorted upon (see the function for
> more details). The patch checks if any prefix of these pathkeys can be
> entirely evaluated using the foreign relation and at the foreign server
> under consideration. If yes, it gets estimates of costs involved and adds
> paths with those pathkeys. There can be multiple pathkeyless paths added for
> a given base relation. For every such path one path with pathkeys is added.
> If there is an index matching on the foreign server, getting the data sorted
> from foreign server improves execution time as seen from the results. The
> patch adds this functionality entirely in postgres_fdw.

In the interest of full disclosure, I asked Ashutosh to work on this
patch and have discussed the design with him several times.  I believe
that this is a good direction for PostgreSQL to be going.  It's
trivially easy right now to write a query against an FDW that performs
needlessly easy, because a join, or a sort, or an aggregate is
performed on the local server rather than the remote one.   I, and
EnterpriseDB, want that to get fixed.  However, we also want it to get
fixed in the best possible way, and not to do anything unless there is
consensus on it.  So, if anyone has opinions on this topic, please
jump in.

That having been said, here are some review comments on this patch:

- You consider pushing down ORDER BY if any prefix of the query
pathkeys satisfy is_foreign_expr(), but that doesn't seem right to me.
If the user says SELECT * FROM remotetab ORDER BY a, unsafe(a),
ordering the result set by a doesn't help us much.  We've talked a few
times about an incremental sort capability that would take a stream of
tuples already ordered by one or more columns and sort each group by
additional columns, but I don't think we have that currently.  Without
that capability, I don't think there's much benefit in sorting by a
prefix of the pathkeys.  I suspect that if we can't get them all, it's
not worth doing.

- Right now, you have this code below the point where we bail out if
use_remote_estimate is not set.  If we keep it like that, the comment
needs updating.  But I suggest that we consider an ordered path even
if we are not using remote estimates.  Estimate the sorted path to
cost somewhat more than the unsorted path, so that we only choose that
path if the sort actually benefits us.  I don't know exactly how to
come up with a principled estimate given that we don't actually know
whether the remote side will need an extra sort or not, but maybe a
dumb estimate is still better than not trying a sorted path.

- In the long run, we should probably either add some configuration so
that the local side can make better estimates even without
use_remote_estimate, or maybe have a way for the FDW to keep a cache
of data in the system catalogs that is updated by ANALYZE.  Then,
analyzing a foreign table could store information locally about
indexes and so forth, instead of starting each query planning cycle
with no knowledge about the remote side.  That's not a matter for this
patch, I don't think, but it seems like something we should do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Teach Catalog.pm how many attributes there should be per DATA() line

On Tue, Oct 6, 2015 at 9:15 AM, David Christensen  wrote:
> Fixes a build issue I ran into while adding some columns to system tables:
>
> Throws a build error if we encounter a different number of fields in a
> DATA() line than we expect for the catalog in question.
>
> Previously, it was possible to silently ignore any mismatches at build
> time which could result in symbol undefined errors at link time.  Now
> we stop and identify the infringing line as soon as we encounter it,
> which greatly speeds up the debugging process.

I think this is a GREAT idea, but this line made me laugh[1]:

+warn "No Natts defined yet, silently skipping check...\n";

I suggest that we make that a fatal error.  Like "Could not find
definition Natts_pg_proc before start of DATA".

Secondly, I don't think we should check this at this point in the
code, because then it blindly affects everybody who uses Catalog.pm.
Let's pick one specific place to do this check.  I suggest genbki.pl,
inside the loop with this comment: "# Ordinary catalog with DATA
line(s)"

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] Because if you're producing a warning, it's not silent!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Teach Catalog.pm how many attributes there should be per DATA() line

2015-10-08 Thread David Christensen


> On Oct 8, 2015, at 11:23 AM, Robert Haas  wrote:
> 
> On Tue, Oct 6, 2015 at 9:15 AM, David Christensen  wrote:
>> Fixes a build issue I ran into while adding some columns to system tables:
>> 
>>Throws a build error if we encounter a different number of fields in a
>>DATA() line than we expect for the catalog in question.
>> 
>>Previously, it was possible to silently ignore any mismatches at build
>>time which could result in symbol undefined errors at link time.  Now
>>we stop and identify the infringing line as soon as we encounter it,
>>which greatly speeds up the debugging process.
> 
> I think this is a GREAT idea, but this line made me laugh[1]:
> 
> +warn "No Natts defined yet, silently skipping check...\n";
> 
> I suggest that we make that a fatal error.  Like "Could not find
> definition Natts_pg_proc before start of DATA”.

That’s fine with me; my main consideration was to make sure nothing broke in 
the status quo, including dependencies I was unaware of.

> Secondly, I don't think we should check this at this point in the
> code, because then it blindly affects everybody who uses Catalog.pm.
> Let's pick one specific place to do this check.  I suggest genbki.pl,
> inside the loop with this comment: "# Ordinary catalog with DATA
> line(s)"

I’m happy to move it around, but If everything is in order, how will this 
affect things at all?  If we’re in a good state this condition should never 
trigger.
--
David Christensen
PostgreSQL Team Manager
End Point Corporation
da...@endpoint.com
785-727-1171







-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS]WIP: Covering + unique indexes.

2015-10-08 Thread Thom Brown

On 8 October 2015 at 16:18, Anastasia Lubennikova
 wrote:
>
> Hi hackers,
>
> I'm working on a patch that allows to combine covering and unique
> functionality for btree indexes.
>
> Previous discussion was here:
> 1) Proposal thread
> 2) Message with proposal clarification
>
> In a nutshell, the feature allows to create index with "key" columns and
> "included" columns.
> "key" columns can be used as scan keys. Unique constraint relates only to
> "key" columns.
> "included" columns may be used as scan keys if they have suitable opclass.
> Both "key" and "included" columns can be returned from index by
> IndexOnlyScan.
>
> Btree is the default index and it's used everywhere. So it requires properly
> testing.  Volunteers are welcome)
>
> Use case:
> - We have a table (c1, c2, c3, c4);
> - We need to have an unique index on (c1, c2).
> - We would like to have a covering index on all columns to avoid reading of
> heap pages.
>
> Old way:
> CREATE UNIQUE INDEX olduniqueidx ON oldt USING btree (c1, c2);
> CREATE INDEX oldcoveringidx ON oldt USING btree (c1, c2, c3, c4);
>
> What's wrong?
> Two indexes contain repeated data. Overhead to data manipulation operations
> and database size.
>
> New way:
> CREATE UNIQUE INDEX newidx ON newt USING btree (c1, c2) INCLUDING (c3, c4);
>
> The patch is attached.
> In 'test.sql' you can find a test with detailed comments on each step, and
> comparison of old and new indexes.
>
> New feature has following syntax:
> CREATE UNIQUE INDEX newidx ON newt USING btree (c1, c2) INCLUDING (c3, c4);
> Keyword INCLUDING defines the "included" columns of index. These columns
> aren't concern to unique constraint.
> Also, them are not stored in index inner pages. It allows to decrease index
> size.
>
> Results:
> 1) Additional covering index is not required anymore.
> 2) New index can use IndexOnlyScan on queries, where old index can't.
>
> For example,
> explain analyze select c1, c2 from newt where c1<1 and c3<20;
>
> *more examples in 'test.sql'
>
> Future work:
> To do opclasses for "included" columns optional.
>
> CREATE TABLE tbl (c1 int, c4 box);
> CREATE UNIQUE INDEX idx ON tbl USING btree (c1) INCLUDING (c4);
>
> If we don't need c4 as an index scankey, we don't need any btree opclass on
> it.
> But we still want to have it in covering index for queries like
>
> SELECT c4 FROM tbl WHERE c1=1000;
> SELECT * FROM tbl WHERE c1=1000;

The definition output needs a space after "INCLUDING":

# SELECT 
pg_get_indexdef('people_first_name_last_name_email_idx'::regclass::oid);
 pg_get_indexdef
--
 CREATE UNIQUE INDEX people_first_name_last_name_email_idx ON people
USING btree (first_name, last_name) INCLUDING(email)
(1 row)


There is also no collation output:

# CREATE UNIQUE INDEX test_idx ON people (first_name COLLATE "en_GB",
last_name) INCLUDING (email COLLATE "pl_PL");
CREATE INDEX

# SELECT pg_get_indexdef('test_idx'::regclass::oid);
   pg_get_indexdef
-
 CREATE UNIQUE INDEX test_idx ON people USING btree (first_name
COLLATE "en_GB", last_name) INCLUDING(email)
(1 row)


As for functioning, it works as described:

# EXPLAIN SELECT email FROM people WHERE (first_name,last_name) =
('Paul','Freeman');
QUERY PLAN
--
 Index Only Scan using people_first_name_last_name_email_idx on people
 (cost=0.28..1.40 rows=1 width=21)
   Index Cond: ((first_name = 'Paul'::text) AND (last_name = 'Freeman'::text))
(2 rows)


Typo:

"included columns must not intersects with key columns"

should be:

"included columns must not intersect with key columns"


One thing I've noticed you can do with your patch, which you haven't
mentioned, is have a non-unique covering index:

# CREATE INDEX covering_idx ON people (first_name) INCLUDING (last_name);
CREATE INDEX

# EXPLAIN SELECT first_name, last_name FROM people WHERE first_name = 'Paul';
   QUERY PLAN
-
 Index Only Scan using covering_idx on people  (cost=0.28..1.44 rows=4 width=13)
   Index Cond: (first_name = 'Paul'::text)
(2 rows)

But this appears to behave as if it were a regular multi-column index,
in that it will use the index for ordering rather than sort after
fetching from the index.  So is this really stored the same as a
multi-column index?  The index sizes aren't identical, so something is
different.

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] [PATCH] Comment fixes

On Tue, Oct 6, 2015 at 9:17 AM, David Christensen  wrote:
> Use the correct name “pgindent” in comments.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] minor doc patch

On Tue, Oct 6, 2015 at 12:07 PM, Nikolay Shaplov
 wrote:
> This patch changes  in
>
> http://www.postgresql.org/docs/9.5/static/sql-createtype.html
>
> "variable length" into "variable-length", since in other places there it is
> written with "-" in the middle, not " ".

Makes sense to me.  I'm not sure we want to do this everywhere, but it
surely makes sense here, so committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RLS bug in expanding security quals

2015-10-08 Thread Dean Rasheed

On 8 October 2015 at 15:05, Dean Rasheed  wrote:
> Attached is a simple patch that appears to work, but it needs more
> testing (and some regression tests).
>

Here's an updated patch with an extra regression test case that
triggers the issue.

I've also updated the function comment for expand_security_quals() to
better explain the situations where it actually has work to do --
tables with RLS and updates to auto-updatable security barrier views,
but not SELECTs from security berrier views. This explains why this
bug doesn't affect security barrier views (UNION ALL views aren't
auto-updatable), so only 9.5 and HEAD need to be patched.

Regards,
Dean
diff --git a/src/backend/optimizer/prep/prepsecurity.c b/src/backend/optimizer/prep/prepsecurity.c
new file mode 100644
index c4b61df..ee1e1e4
--- a/src/backend/optimizer/prep/prepsecurity.c
+++ b/src/backend/optimizer/prep/prepsecurity.c
@@ -56,6 +56,12 @@ static bool security_barrier_replace_var
  * the others, providing protection against malicious user-defined security
  * barriers.  The first security barrier qual in the list will be used in the
  * innermost subquery.
+ *
+ * In practice, the only RTEs that will have security barrier quals are those
+ * that refer to tables with row-level security, or which are the target
+ * relation of an update to an auto-updatable security barrier view.  RTEs
+ * that read from a security barrier view will have already been expanded by
+ * the rewriter.
  */
 void
 expand_security_quals(PlannerInfo *root, List *tlist)
@@ -263,7 +269,8 @@ expand_security_qual(PlannerInfo *root,
 			 * Replace any variables in the outer query that refer to the
 			 * original relation RTE with references to columns that we will
 			 * expose in the new subquery, building the subquery's targetlist
-			 * as we go.
+			 * as we go.  Also replace any references in the translated_vars
+			 * lists of any appendrels.
 			 */
 			context.rt_index = rt_index;
 			context.sublevels_up = 0;
@@ -274,6 +281,8 @@ expand_security_qual(PlannerInfo *root,
 
 			security_barrier_replace_vars((Node *) parse, );
 			security_barrier_replace_vars((Node *) tlist, );
+			security_barrier_replace_vars((Node *) root->append_rel_list,
+		  );
 
 			heap_close(context.rel, NoLock);
 
diff --git a/src/test/regress/expected/rowsecurity.out b/src/test/regress/expected/rowsecurity.out
new file mode 100644
index 6becf59..c844499
--- a/src/test/regress/expected/rowsecurity.out
+++ b/src/test/regress/expected/rowsecurity.out
@@ -640,6 +640,26 @@ EXPLAIN (COSTS OFF) SELECT * FROM t1 WHE
  Filter: ((a % 2) = 0)
 (12 rows)
 
+-- union all query
+SELECT a, b, oid FROM t2 UNION ALL SELECT a, b, oid FROM t3;
+ a |  b  | oid 
+---+-+-
+ 1 | abc | 201
+ 3 | cde | 203
+ 1 | xxx | 301
+ 2 | yyy | 302
+ 3 | zzz | 303
+(5 rows)
+
+EXPLAIN (COSTS OFF) SELECT a, b, oid FROM t2 UNION ALL SELECT a, b, oid FROM t3;
+  QUERY PLAN   
+---
+ Append
+   ->  Seq Scan on t2
+ Filter: ((a % 2) = 1)
+   ->  Seq Scan on t3
+(4 rows)
+
 -- superuser is allowed to bypass RLS checks
 RESET SESSION AUTHORIZATION;
 SET row_security TO OFF;
diff --git a/src/test/regress/sql/rowsecurity.sql b/src/test/regress/sql/rowsecurity.sql
new file mode 100644
index 662f520..b230a0f
--- a/src/test/regress/sql/rowsecurity.sql
+++ b/src/test/regress/sql/rowsecurity.sql
@@ -255,6 +255,10 @@ EXPLAIN (COSTS OFF) SELECT * FROM t1 FOR
 SELECT * FROM t1 WHERE f_leak(b) FOR SHARE;
 EXPLAIN (COSTS OFF) SELECT * FROM t1 WHERE f_leak(b) FOR SHARE;
 
+-- union all query
+SELECT a, b, oid FROM t2 UNION ALL SELECT a, b, oid FROM t3;
+EXPLAIN (COSTS OFF) SELECT a, b, oid FROM t2 UNION ALL SELECT a, b, oid FROM t3;
+
 -- superuser is allowed to bypass RLS checks
 RESET SESSION AUTHORIZATION;
 SET row_security TO OFF;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bugs and bug tracking

2015-10-08 Thread Nathan Wagner

On Wed, Oct 07, 2015 at 03:06:50PM -0400, Stephen Frost wrote:
> * Nathan Wagner (nw...@hydaspes.if.org) wrote:
> > I have added full text searching to my tracker.  I only index the first
> > 50 KB of each message.  There's apparently a one MB limit on that
> > anyway, which a few messages exceed.  I figure anything important is
> > probably in the first 50KB.  I could be wrong.  I could re-index fairly
> > easily.  It seems to work pretty well.
> 
> Note that we have FTS for the -bugs, and all the other, mailing lists..

True, but that finds emails.  The search I have finds bugs (well, bug reports
anyway).  Specifically, I have the following function:

create or replace function bugvector(bugid bigint)
returns tsvector language 'sql' as $$
select tsvagg(
setweight(to_tsvector(substr(body(msg), 1, 50*1024)), 'D')
||
setweight(to_tsvector(header_value(msg, 'Subject')), 'C')
)
from emails
where bug = $1
$$ strict;

which, as you can see, collects into one tsvector all the emails associated
with that particular bug.  So a search hit is for the whole bug.  There's
probably some search artifacts here.  I suspect a bug with a long email thread
will be ranked higher than a one with a short thread.  Perhaps that's ok
though.

-- 
nw

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More work on SortSupport for text - strcoll() and strxfrm() caching

On Wed, Oct 7, 2015 at 8:09 PM, Peter Geoghegan  wrote:
> On Tue, Oct 6, 2015 at 1:16 PM, Robert Haas  wrote:
>> If you would care to revise the patch accordingly, I will commit it
>> (barring objections from others, of course).
>
> Here is a revision of 0001-*, with both BSWAP32() and BSWAP64() in a
> new header, src/port/pg_bswap.h.
>
> No revisions were required to any other patch in the patch series to
> make this work, and so I only include a revised 0001-*.

Great.  I've committed that, minus the sortsupport.h changes which I
think should be part of 0002, and which in any case I'd like to
discuss a bit more.  It seems to me that (1) ABBREV_STRING_UINT isn't
a great name for this and (2) the comment is awfully long for the
thing to which it refers.  I suggest that we instead call it
DatumToBigEndian(), put it pg_bswap.h, and change the comments to
something like this:

/*
 * Rearrange the bytes of a Datum into big-endian order.
 *
 * One possible application of this macro is to make comparisons
cheaper.  An integer
 * comparison of the new Datums will return the same result as a memcmp() on the
 * original Datums, but the integer comparison should be much cheaper.
 */

The specific way that this is used by various sortsupport routines can
be adequately explained in the comments for those routines.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Multi-tenancy with RLS

On Tue, Oct 6, 2015 at 7:29 AM, Stephen Frost  wrote:
> * Haribabu Kommi (kommi.harib...@gmail.com) wrote:
>> On Tue, Oct 6, 2015 at 10:56 AM, Haribabu Kommi
>>  wrote:
>> > Here I attached an updated version of the patch with the following changes.
>>
>> I found some problems related to providing multi-tenancy on a system
>> catalog view.
>> This is because, system catalog view uses the owner that is created
>> the user instead
>> of the current user by storing the user information in "checkAsUser"
>> field in RangeTblEntry
>> structure.
>
> Right, when querying through a view to tables underneath, we use the
> permissions of the view owner.  View creators should be generally aware
> of this already.
>
> I agree that it adds complications to the multi-tenancy idea since the
> system views, today, allow viewing of all objects.  There are two ways
> to address that:
>
> Modify the system catalog views to include the same constraints that the
> policies on the tables do
>
> or
>
> Allow RLS policies against views and then create the necessary policies
> on the views in the catalog.
>
> My inclination is to work towards the latter as that's a capability we'd
> like to have anyway.

We've got one reloption for views already - security_barrier.  Maybe
we could have another one that effectively changes a particular view
from "security definer" as it is today to "security invoker".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Documentation bug in 9.5/ master - pg_replication_origin_session_setup

On Wed, Oct 7, 2015 at 2:38 AM, Pallavi Sontakke
 wrote:
> Hi All,
>
> There appears to be a typo error in documentation of this function. Actual
> function is 'pg_replication_origin_session_setup' while documentation has it
> as 'pg_replication_origin_setup_session'.
>
> Please find patch for 9.5 and master attached.

Thanks, committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

2015-10-08 Thread Andres Freund

On October 8, 2015 7:35:24 PM GMT+02:00, Simon Riggs  
wrote:

>The problem is we won't be able to tell the two formats apart, since
>they
>both are just lots of bits. So we won't be able to tell if the file is
>old
>format or new format, which could lead to loss of information that
>relates
>to visibility. 

I don't see the problem? I mean catversion will reliably tell you which format 
the vm is in?

We could additionally use the opportunity to as a metapage, but that seems like 
an independent thing.


Andres

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More work on SortSupport for text - strcoll() and strxfrm() caching

On Thu, Oct 8, 2015 at 10:13 AM, Robert Haas  wrote:
> It seems to me that (1) ABBREV_STRING_UINT isn't
> a great name for this and (2) the comment is awfully long for the
> thing to which it refers.  I suggest that we instead call it
> DatumToBigEndian(), put it pg_bswap.h, and change the comments to
> something like this:
>
> /*
>  * Rearrange the bytes of a Datum into big-endian order.
>  *
>  * One possible application of this macro is to make comparisons
> cheaper.  An integer
>  * comparison of the new Datums will return the same result as a memcmp() on 
> the
>  * original Datums, but the integer comparison should be much cheaper.
>  */
>
> The specific way that this is used by various sortsupport routines can
> be adequately explained in the comments for those routines.

This is pretty clearly something specific to SortSupport. I'm not
opposed to changing the name and making the comments more terse along
those lines, but I think it should live in sortsupport.h. The macro
byteswaps datums on little-endian platforms only, which seems very
specific.

I think that we're going to have SortSupport with abbreviation for
UUIDs and bytea at some point, and maybe character(n). Centralizing
information about this to sortsupport.h makes sense to me.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bugs and bug tracking

2015-10-08 Thread Oleg Bartunov

On Thu, Oct 8, 2015 at 8:11 PM, Nathan Wagner  wrote:

> On Wed, Oct 07, 2015 at 03:06:50PM -0400, Stephen Frost wrote:
> > * Nathan Wagner (nw...@hydaspes.if.org) wrote:
> > > I have added full text searching to my tracker.  I only index the first
> > > 50 KB of each message.  There's apparently a one MB limit on that
> > > anyway, which a few messages exceed.  I figure anything important is
> > > probably in the first 50KB.  I could be wrong.  I could re-index fairly
> > > easily.  It seems to work pretty well.
>

we have a patch, which eliminates 1MB limit, will be published soon.


> >
> > Note that we have FTS for the -bugs, and all the other, mailing lists..
>
> True, but that finds emails.  The search I have finds bugs (well, bug
> reports
> anyway).  Specifically, I have the following function:
>
> create or replace function bugvector(bugid bigint)
> returns tsvector language 'sql' as $$
> select tsvagg(
> setweight(to_tsvector(substr(body(msg), 1, 50*1024)), 'D')
> ||
> setweight(to_tsvector(header_value(msg, 'Subject')), 'C')
> )
> from emails
> where bug = $1
> $$ strict;
>
> which, as you can see, collects into one tsvector all the emails associated
> with that particular bug.  So a search hit is for the whole bug.  There's
> probably some search artifacts here.  I suspect a bug with a long email
> thread
> will be ranked higher than a one with a short thread.  Perhaps that's ok
> though.
>
>
it's possible to write bugs specific parser for fts. Also, order results by
date submitted, so we always will have originated message first.



> --
> nw
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: [HACKERS] Freeze avoidance of very large table.

2015-10-08 Thread Simon Riggs

On 1 October 2015 at 23:30, Josh Berkus  wrote:

> On 10/01/2015 07:43 AM, Robert Haas wrote:
> > On Thu, Oct 1, 2015 at 9:44 AM, Fujii Masao 
> wrote:
> >> I wonder how much it's worth renaming only the file extension while
> >> there are many places where "visibility map" and "vm" are used,
> >> for example, log messages, function names, variables, etc.
> >
> > I'd be inclined to keep calling it the visibility map (vm) even if it
> > also contains freeze information.
> >
>
> -1 to rename.  Visibility Map is a perfectly good name.

The name can stay the same, but specifically the file extension should
change.

This patch changes the layout of existing information:
* _vm stores one bit per page
* _$new stores two bits per page

The problem is we won't be able to tell the two formats apart, since they
both are just lots of bits. So we won't be able to tell if the file is old
format or new format, which could lead to loss of information that relates
to visibility. If we think something is all-visible when it is not, this is
effectively data corruption.

In light of lessons learned from multixactids, I think its important that
we are able to tell the difference between an old format and a new format
visibility map.

My suggestion to do so was to call it "vfm", so we indicate that it is now
a Visibility & Freeze Map

I don't care if we change the name, but I do care if we can't tell the
difference between a failed upgrade, a normal upgrade and a server that has
been upgraded multiple times. Alternate suggestions welcome.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Freeze avoidance of very large table.

2015-10-08 Thread Masahiko Sawada

On Thu, Oct 8, 2015 at 7:03 PM, Fujii Masao  wrote:
> On Mon, Oct 5, 2015 at 7:31 PM, Masahiko Sawada  wrote:
>> On Sat, Oct 3, 2015 at 3:41 AM, Robert Haas  wrote:
>>> On Fri, Oct 2, 2015 at 11:23 AM, Alvaro Herrera
>>>  wrote:
> + /* all-frozen information is also cleared at the same time 
> */
>   PageClearAllVisible(page);
> + PageClearAllFrozen(page);

 I wonder if it makes sense to have a macro to clear both in unison,
 which seems a very common pattern.
>>>
>>> I think PageClearAllVisible should clear both, and there should be no
>>> other macro.  There is no event that causes a page to cease being
>>> all-visible that does not also cause it to cease being all-frozen.
>>> You might think that deleting or locking a tuple would fall into that
>>> category - but nope, XMAX needs to be cleared or the tuple pruned, or
>>> there will be problems after wraparound.
>>>
>>
>> Thank you for your advice.
>> I understood.
>>
>> I changed the patch so that PageClearAllVisible clear both bits, and
>> removed ClearAllFrozen.
>> Attached the latest v16 patch which contains draft version documentation 
>> patch.
>
> Thanks for updating the patch! Here are another review comments.
>

Thank you for reviewing!
Attached the latest patch.

> +ereport(elevel,
> +(errmsg("skipped %d frozen pages acoording to visibility map",
> +vacrelstats->vmskipped_frozen_pages)));
>
> Typo: acoording should be according.
>
> When vmskipped_frozen_pages is 1, "1 frozen pages" in log message
> sounds incorrect in terms of grammar. So probably errmsg_plural()
> should be used here.

Thank you for your advice.
Fixed.

> +relallvisible = visibilitymap_count(rel,
> VISIBILITYMAP_ALL_VISIBLE);
> +relallfrozen = visibilitymap_count(rel, 
> VISIBILITYMAP_ALL_FROZEN);
>
> We can refactor visibilitymap_count() so that it counts the numbers of
> both all-visible and all-frozen tuples at the same time, in order to
> avoid reading through visibility map twice.

I agree.
I've changed so.

> heap_page_is_all_visible() can set all_frozen to TRUE even when
> it returns FALSE. This is odd because the page must not be all frozen
> when it's not all visible. heap_page_is_all_visible() should set
> all_frozen to FALSE whenever all_visible is set to FALSE?
> Probably it's better to forcibly set all_frozen to FALSE at the end of
> the function whenever all_visible is FALSE.

Fixed.

> +if (PageIsAllVisible(page))
>  {
> -Assert(BufferIsValid(*vmbuffer));
>
> Why did you remove this assertion?

It's my mistake.
Fixed.

> +if (all_frozen)
> +{
> +PageSetAllFrozen(page);
> +flags |= VISIBILITYMAP_ALL_FROZEN;
> +}
>
> Why didn't you call visibilitymap_test() for all frozen case here?

Same as above.
Fixed.

> In visibilitymap_set(), the argument flag must be either
> (VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN) or
> VISIBILITYMAP_ALL_VISIBLE. So I think that it's better to add
> Assert() which checks whether the specified flag is valid or not.

I agree.
I added Assert() to beginning of visibilitymap_set() function.

> + * caller is expected to set PD_ALL_VISIBLE or
> + * PD_ALL_FROZEN first.
> + */
> +Assert(PageIsAllVisible(heapPage) ||
> PageIsAllFrozen(heapPage));
>
> This should be the following?
>
>   Assert(((flag | VISIBILITYMAP_ALL_VISIBLE) && PageIsAllVisible(heapPage)) ||
>   ((flag | VISIBILITYMAP_ALL_FROZEN) && 
> PageIsAllFrozen(heapPage)));

I agree.
Fixed.

Regards,

--
Masahiko Sawada
diff --git a/contrib/pgstattuple/pgstatapprox.c b/contrib/pgstattuple/pgstatapprox.c
index 22c5f7a..b1b6a06 100644
--- a/contrib/pgstattuple/pgstatapprox.c
+++ b/contrib/pgstattuple/pgstatapprox.c
@@ -87,7 +87,7 @@ statapprox_heap(Relation rel, output_type *stat)
 		 * If the page has only visible tuples, then we can find out the free
 		 * space from the FSM and move on.
 		 */
-		if (visibilitymap_test(rel, blkno, ))
+		if (visibilitymap_test(rel, blkno, , VISIBILITYMAP_ALL_VISIBLE))
 		{
 			freespace = GetRecordedFreeSpace(rel, blkno);
 			stat->tuple_len += BLCKSZ - freespace;
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 97ef618..f8aa18b 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -1841,6 +1841,18 @@
  
 
  
+   relallfrozen
+   int4
+   
+   
+Number of pages that are marked all-frozen in the tables's
+visibility map. It is updated by VACUUM.
+ANALYZE, and a few DDL coomand such as
+CREATE INDEX.
+   
+ 
+
+ 
   reltoastrelid
   oid
   pg_class.oid
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 5081da0..6bd4d57 100644

Re: [HACKERS] Getting sorted data from foreign server

2015-10-08 Thread Jeremy Harris

On 08/10/15 17:09, Robert Haas wrote:
> - You consider pushing down ORDER BY if any prefix of the query
> pathkeys satisfy is_foreign_expr(), but that doesn't seem right to me.
> If the user says SELECT * FROM remotetab ORDER BY a, unsafe(a),
> ordering the result set by a doesn't help us much.  We've talked a few
> times about an incremental sort capability that would take a stream of
> tuples already ordered by one or more columns and sort each group by
> additional columns, but I don't think we have that currently.  Without
> that capability, I don't think there's much benefit in sorting by a
> prefix of the pathkeys.  I suspect that if we can't get them all, it's
> not worth doing.

That depends how often the additional columns affect the sorted
order, if the sort method takes advantage of sorted input.
-- 
Jeremy



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Compiler warning

Commit 7e2a18a9 must have caused this compiler warning which I now see
on the master branch with my standard release build settings:

gcc -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -fexcess-precision=standard -O2 -g3 -gdwarf-4 -Werror -I.
-I../../../../src/include -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE
-I/usr/include/libxml2   -c -o logicalfuncs.o logicalfuncs.c -MMD -MP
-MF .deps/logicalfuncs.Po
postmaster.c: In function ‘ServerLoop’:
postmaster.c:1777:9: error: ‘now’ may be used uninitialized in this
function [-Werror=maybe-uninitialized]
(now - AbortStartTime) >= SIGKILL_CHILDREN_AFTER_SECS)
 ^
cc1: all warnings being treated as errors
make[3]: *** [postmaster.o] Error 1


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.

On Thu, Oct 8, 2015 at 11:28 PM, Amir Rohan wrote:
> Wouldn't this work?
> 1) start timer
> 2) Grab pg_stat_replication.sent_location from master
> 3) pg_switch_xlog() # I /think/ we want this, could be wrong

For a warm standby, you would want that, but this depends on two factors:
- The moment master completes archiving of this segment
- The moment standby restores it.
On slow machines, those two things become by far the bottleneck,
imagine a PI restricted on I/O with a low-class SD card in the worst
case (I maintain one, with a good card, still the I/O is a
bottleneck).

> 4) Poll slave's pg_last_xlog_replay_location() until LSN shows up
> 5) stop timer

That's not really solid, there is an interval of time between the
moment the LSN position is taken from the master and the standby. An
accurate method is to log/store on master when a given WAL position
has been flushed to disk, and do the same on slave at replay for this
LSN position. In any case this is doing to flood badly the logs of
both nodes, and as the backend cares about the performance of
operations in this code path we won't want to do that anyway.

To make it short, it seems to me that simply waiting until the LSN a
test is waiting for has been replayed is just but fine for this set of
tests to ensure their run consistency, let's not forget that this is
the goal here.
Regards,
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Compiler warning

2015-10-08 Thread Tom Lane

Peter Geoghegan  writes:
> Commit 7e2a18a9 must have caused this compiler warning which I now see
> on the master branch with my standard release build settings:

[ scratches head... ]  Dunno why my compiler didn't complain about that.
Will fix it in a bit.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: index-only scans with partial indexes

2015-10-08 Thread Kyotaro HORIGUCHI

Hello,

At Thu, 08 Oct 2015 15:24:35 +0200, Tomas Vondra  
wrote in <56166e93.8000...@2ndquadrant.com>
> 
> 
> On 10/08/2015 07:30 AM, Kyotaro HORIGUCHI wrote:
> > Hello,
> >
> >> The attached patch applies on the latest v5 patch and will
> >> address above issues. (And modifies expected files, which are the
> >> manifestation of this improovement).
> >
> > As you see, it is a quite bad choice. Ugly and unreadable and
> > fragile.
> 
> I suppose you mean placing the list into IndexClauseSet?

No, it is a reasonable choice, It's not bad if it has valid
clauses only for partial indexes.  What is Ugl.. is my additional
patch:(, which let IndexClauseSet to have valid clauses.

> > The cause of this seeming mismatch would be the place to hold
> > indexrinfos. It is determined only by baserestrictinfo and
> > indpred. Any other components are not involved. So IndexClauseSet
> > is found not to be the best place after all, I suppose.
> >
> > Instead, I came to think that the better place is
> > IndexOptInfo. Partial indexes are examined in check_partial_index
> > and it seems to be the most proper place to check this so far.
> 
> AFAIK there's only one IndexOptInfo instance per index, so I'm not
> sure how would that work with queries that use the index in multiple
> places?

No matter if the index is used multiple places, indexrinfos is
determined only with baserestrictinfos of the owner relation and
itself's indpred, which are invariant through the following steps.

One arguable point on the place is that check_partial_indexes
could be called again with longer restrictions (by eclass
claues), but the comment suggests that the last call will be
valid in the following steps so creating indexrinfos in every
calls should be valid.

However possibly a bit innefficient, we can choose to use the
first-created indexrinfos, which would be longer than that could
be re-created.

> Imagine for example table joined to itself, where both sides
> use the index with different conditions.

Such table appears as multiple baserels having maybe different
baserestrictinfo, so no problme on the case.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RLS bug in expanding security quals

2015-10-08 Thread Stephen Frost

Dean,

* Dean Rasheed (dean.a.rash...@gmail.com) wrote:
> On 8 October 2015 at 15:05, Dean Rasheed  wrote:
> > Attached is a simple patch that appears to work, but it needs more
> > testing (and some regression tests).
> 
> Here's an updated patch with an extra regression test case that
> triggers the issue.

Thanks!

> I've also updated the function comment for expand_security_quals() to
> better explain the situations where it actually has work to do --
> tables with RLS and updates to auto-updatable security barrier views,
> but not SELECTs from security berrier views. This explains why this
> bug doesn't affect security barrier views (UNION ALL views aren't
> auto-updatable), so only 9.5 and HEAD need to be patched.

Excellent, I definitely like the additional comments.

I plan to do a bit more testing tomorrow morning, but barring any
issues found or concerns raised, I'll push this sometime tomorrow.

Thanks again!

Stephen


signature.asc
Description: Digital signature

[HACKERS] Re: [BUGS] BUG #13611: test_postmaster_connection failed (Windows, listen_addresses = '0.0.0.0' or '::')

2015-10-08 Thread Noah Misch

On Mon, Sep 14, 2015 at 12:36:14AM -0400, Tom Lane wrote:
> Tatsuo Ishii  writes:
> >> When pg_ctl tries to connect to postmaster, it uses "0.0.0.0" as the
> >> target ip address. Unfortunately "0.0.0.0" is not a valid address on
> >> Windows and it fails. Shouldn't pg_ctl translate "0.0.0.0" to
> >> "127.0.0.1" in this case?
> 
> > I think this is definitely a bug. I privately heard from the reporter
> > that if postmaster is started by not using pg_ctl, it happily starts
> > with "listen_addresses = '0.0.0.0'. That means, postmaster itself
> > works as advertised, but pg_ctl does not.
> 
> I looked at this before, and could not see anything in either the
> postmaster or pg_ctl that would invent the address 0.0.0.0 out of
> thin air.  I think this report most likely depends on some
> misconfiguration of the OP's system.  I doubt it should be our business
> to work around such misconfiguration.

Use of "0.0.0.0" or "::" as a socket destination address is not portable.  The
Windows connect() documentation says, "If the address member of the structure
specified by the name parameter is filled with zeros, connect will return the
error WSAEADDRNOTAVAIL."  OpenBSD 5.0 behaves the same way.  NetBSD 6.0 does
not accept ::, but it accepts 0.0.0.0.  (For this to affect pg_ctl on
non-Windows platforms, you would need to empty unix_socket_directories.)

> In particular, magically
> substituting 127.0.0.1 for 0.0.0.0 seems utterly without principle.

Binding a listening socket to "0.0.0.0" listens on every local IPv4 address,
and 127.0.0.1 is one of those addresses.  That's the principle.  It's
inelegant, but I expect it to work everywhere.

nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Multi-tenancy with RLS

2015-10-08 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
> We've got one reloption for views already - security_barrier.  Maybe
> we could have another one that effectively changes a particular view
> from "security definer" as it is today to "security invoker".

As I recall, there was a previous suggestion (honestly, I thought it was
your idea) to have a reloption which made views "fully" security
definer, in that functions in the view definition would run as the view
owner instead of the view invoker.

I liked that idea, though we would need to have a function to say "who
is the 'outer' user?" (CURRENT_USER always being the owner with the
above described reloption).

I'm less sure about the idea of having a view which runs entirely as the
view invoker, but I'm not against it either.

I do think both of those are independent of supporting policies for
views and foreign tables though, which we'd want even if we had
reloptions for the above ideas.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] More work on SortSupport for text - strcoll() and strxfrm() caching

On Thu, Oct 8, 2015 at 11:37 AM, Robert Haas  wrote:
> I'm not convinced.  Doesn't this exact same concept get used for
> over-the-wire communication between BE and LE machines?  There, this
> operation is spelled htonl/ntohl.  Some systems even have htonll, but
> I'm sure there are still a bunch that don't.

I continue to disagree with that. The spelling of the macro that you
propose suggests that this process occurs at a relatively high level
of abstraction, which is misleading. Datums that have abbreviated key
bytes packed into them are in general kind of special. All the same,
here is a revision of the patch series along those lines. I'll also
have to update the UUID patch to independently note the same issues.

I should point out that I did not call the macro DatumToBigEndian(),
because it's actually the other way around. I called it
DatumToLittleEndian(), since the unsigned integer comparator would
work correctly on big-endian systems without calling any new macro
(which is of course why the new macro does nothing on big-endian
systems). We start off with a big endian Datum/unsigned integer on all
platforms, and then we byteswap it to make it a little-endian unsigned
integer if and when that's required (i.e. only on little-endian
systems).

-- 
Peter Geoghegan
From ddc2bce56f8d375a22d7a635dd402ad1e507b85c Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Sat, 3 Oct 2015 16:11:02 -0700
Subject: [PATCH 2/3] Add two text sort caching optimizations

Firstly, cache strxfrm() blobs across calls made to the text SortSupport
abbreviation routine.  Secondly, cache strcoll() results across calls to
the text comparison routine (SortSupport authoritative comparator only).

The cost of doing this should be immeasurably small in cases where the
optimization does not help, while the benefits in cases where the
optimization is effective should be quite significant.
---
 src/backend/utils/adt/varlena.c | 75 ++---
 1 file changed, 71 insertions(+), 4 deletions(-)

diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index fadd827..d814502 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -62,6 +62,9 @@ typedef struct
 	char	   *buf2;			/* 2nd string, or abbreviation strxfrm() buf */
 	int			buflen1;
 	int			buflen2;
+	int			last_len1;		/* Length of last buf1 string/strxfrm() blob */
+	int			last_len2;		/* Length of last buf2 string/strxfrm() blob */
+	int			last_returned;	/* Last comparison result (cache) */
 	bool		collate_c;
 	hyperLogLogState abbr_card; /* Abbreviated key cardinality state */
 	hyperLogLogState full_card; /* Full key cardinality state */
@@ -1832,9 +1835,20 @@ btsortsupport_worker(SortSupport ssup, Oid collid)
 		tss->buflen1 = TEXTBUFLEN;
 		tss->buf2 = palloc(TEXTBUFLEN);
 		tss->buflen2 = TEXTBUFLEN;
+		/* Start with invalid values */
+		tss->last_len1 = -1;
+		tss->last_len2 = -1;
 #ifdef HAVE_LOCALE_T
 		tss->locale = locale;
 #endif
+		/*
+		 * To avoid somehow confusing a strxfrm() blob and an original string
+		 * within bttextfastcmp_locale(), initialize last returned cache to a
+		 * sentinel value.  A platform-specific actual strcoll() return value
+		 * of INT_MIN seems unlikely, but if that occurs it will not cause
+		 * wrong answers.
+		 */
+		tss->last_returned = INT_MIN;
 		tss->collate_c = collate_c;
 		ssup->ssup_extra = tss;
 
@@ -1897,6 +1911,7 @@ bttextfastcmp_locale(Datum x, Datum y, SortSupport ssup)
 {
 	text	   *arg1 = DatumGetTextPP(x);
 	text	   *arg2 = DatumGetTextPP(y);
+	bool		arg1_match;
 	TextSortSupport *tss = (TextSortSupport *) ssup->ssup_extra;
 
 	/* working state */
@@ -1915,6 +1930,11 @@ bttextfastcmp_locale(Datum x, Datum y, SortSupport ssup)
 	/* Fast pre-check for equality, as discussed in varstr_cmp() */
 	if (len1 == len2 && memcmp(a1p, a2p, len1) == 0)
 	{
+		/*
+		 * No change in buf1 or buf2 contents, so avoid changing last_len1 or
+		 * last_len2.  Existing contents of buffers might still be used by next
+		 * call.
+		 */
 		result = 0;
 		goto done;
 	}
@@ -1932,10 +1952,43 @@ bttextfastcmp_locale(Datum x, Datum y, SortSupport ssup)
 		tss->buf2 = MemoryContextAlloc(ssup->ssup_cxt, tss->buflen2);
 	}
 
-	memcpy(tss->buf1, a1p, len1);
-	tss->buf1[len1] = '\0';
-	memcpy(tss->buf2, a2p, len2);
-	tss->buf2[len2] = '\0';
+	/*
+	 * We're likely to be asked to compare the same strings repeatedly, and
+	 * memcmp() is so much cheaper than strcoll() that it pays to try to cache
+	 * comparisons, even though in general there is no reason to think that
+	 * that will work out (every text datum may be unique).  Caching does not
+	 * slow things down measurably when it doesn't work out, and can speed
+	 * things up by rather a lot when it does.  In part, this is because the
+	 * memcmp() compares data from cachelines that are needed in L1 cache even
+	 * when the last comparison's result cannot be reused.
+

Re: [HACKERS] RLS bug in expanding security quals

2015-10-08 Thread Haribabu Kommi

On Fri, Oct 9, 2015 at 3:50 AM, Dean Rasheed  wrote:
> On 8 October 2015 at 15:05, Dean Rasheed  wrote:
>> Attached is a simple patch that appears to work, but it needs more
>> testing (and some regression tests).
>>
>
> Here's an updated patch with an extra regression test case that
> triggers the issue.
>
> I've also updated the function comment for expand_security_quals() to
> better explain the situations where it actually has work to do --
> tables with RLS and updates to auto-updatable security barrier views,
> but not SELECTs from security berrier views. This explains why this
> bug doesn't affect security barrier views (UNION ALL views aren't
> auto-updatable), so only 9.5 and HEAD need to be patched.

Thanks for the patch. I didn't find any problem in my test with the patch.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SortSupport for UUID type

On Tue, Oct 6, 2015 at 1:15 PM, Robert Haas  wrote:
> +tmp = ((uint32) res ^ (uint32) ((uint64) res >> 32));
>
> The outer set of parentheses here seems pretty worthless.  Perhaps one
> of the parentheses at the end of the statement should be moved to just
> after "res".  That seems like it would add considerably clarity.

This is more or less lifted from numeric_abbrev_convert_var(). Perhaps
you should change it there too. The extra set of parenthesis are
removed in the attached patch. The patch also mechanically updates
things to be consistent with the text changes on the text thread [1]
-- I had to rebase.

>> This patch is not all that exciting -- the techniques used here are
>> very simple, and it's all familiar territory for us. I expect that
>> this patch will not be at all contentious when someone eventually gets
>> around to reviewing it.
>
> There is no doubt that we need more reviewers.

I'm trying to do review following my burst of productivity on sorting
(especially external sorting) -- I should manage to do a bunch of
patch review when I return from vacation in about a month (I leave in
a couple of weeks). I'm currently privately helping a new contributor
with a large project in its early stages, so that is something.
Perhaps you'll hear more about that before too long.

[1] 
http://www.postgresql.org/message-id/CAM3SWZTaVFBwtHF87OpNGN2r2_he-wsmN53HmqyWYPM=k51...@mail.gmail.com
-- 
Peter Geoghegan
From 1427af6b0b6ab404898e9de1c74e72e14ac31ae3 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Tue, 21 Jul 2015 16:18:25 -0700
Subject: [PATCH 3/3] Add SortSupport routine for UUID data type

This introduces a simple encoding scheme to produce abbreviated keys:
pack as many bytes of each UUID as will fit into a Datum.  On
little-endian machines, a byteswap is also performed; the abbreviated
comparator can therefore just consist of a simple 3-way unsigned integer
comparison.  This comports with regular UUID comparisons, because they
use memcmp() to compare authoritative UUID datums.

Note:  This patch depends on the text unsigned integer comparison patch,
which was independently submitted as infrastructure for another feature
patch that speeds up text sorts.
---
 src/backend/utils/adt/uuid.c| 187 
 src/include/catalog/pg_amproc.h |   1 +
 src/include/catalog/pg_proc.h   |   2 +
 src/include/utils/builtins.h|   1 +
 4 files changed, 191 insertions(+)

diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index bb76b8d..c86395b 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -14,8 +14,12 @@
 #include "postgres.h"
 
 #include "access/hash.h"
+#include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
+#include "port/pg_bswap.h"
 #include "utils/builtins.h"
+#include "utils/guc.h"
+#include "utils/sortsupport.h"
 #include "utils/uuid.h"
 
 /* uuid size in bytes */
@@ -27,8 +31,21 @@ struct pg_uuid_t
 	unsigned char data[UUID_LEN];
 };
 
+/* sortsupport for uuid */
+typedef struct
+{
+	int64		input_count;	/* number of non-null values seen */
+	bool		estimating;		/* true if estimating cardinality */
+
+	hyperLogLogState abbr_card; /* cardinality estimator */
+} uuid_sortsupport_state;
+
 static void string_to_uuid(const char *source, pg_uuid_t *uuid);
 static int	uuid_internal_cmp(const pg_uuid_t *arg1, const pg_uuid_t *arg2);
+static int	uuid_fast_cmp(Datum x, Datum y, SortSupport ssup);
+static int	uuid_cmp_abbrev(Datum x, Datum y, SortSupport ssup);
+static bool	uuid_abbrev_abort(int memtupcount, SortSupport ssup);
+static Datum	uuid_abbrev_convert(Datum original, SortSupport ssup);
 
 Datum
 uuid_in(PG_FUNCTION_ARGS)
@@ -222,6 +239,176 @@ uuid_cmp(PG_FUNCTION_ARGS)
 	PG_RETURN_INT32(uuid_internal_cmp(arg1, arg2));
 }
 
+/*
+ * Sort support strategy routine
+ */
+Datum
+uuid_sortsupport(PG_FUNCTION_ARGS)
+{
+	SortSupport	ssup = (SortSupport) PG_GETARG_POINTER(0);
+
+	ssup->comparator = uuid_fast_cmp;
+	ssup->ssup_extra = NULL;
+
+	if (ssup->abbreviate)
+	{
+		uuid_sortsupport_state	   *uss;
+		MemoryContextoldcontext;
+
+		oldcontext = MemoryContextSwitchTo(ssup->ssup_cxt);
+
+		uss = palloc(sizeof(uuid_sortsupport_state));
+		uss->input_count = 0;
+		uss->estimating = true;
+		initHyperLogLog(>abbr_card, 10);
+
+		ssup->ssup_extra = uss;
+
+		ssup->comparator = uuid_cmp_abbrev;
+		ssup->abbrev_converter = uuid_abbrev_convert;
+		ssup->abbrev_abort = uuid_abbrev_abort;
+		ssup->abbrev_full_comparator = uuid_fast_cmp;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	PG_RETURN_VOID();
+}
+
+/*
+ * SortSupport comparison func
+ */
+static int
+uuid_fast_cmp(Datum x, Datum y, SortSupport ssup)
+{
+	pg_uuid_t  *arg1 = DatumGetUUIDP(x);
+	pg_uuid_t  *arg2 = DatumGetUUIDP(y);
+
+	return uuid_internal_cmp(arg1, arg2);
+}
+
+/*
+ * Abbreviated key comparison func
+ */
+static int
+uuid_cmp_abbrev(Datum x, Datum y, SortSupport ssup)
+{
+	if (x > y)

Re: [HACKERS] Foreign join pushdown vs EvalPlanQual

2015-10-08 Thread Etsuro Fujita


On 2015/10/08 19:55, Etsuro Fujita wrote:

I noticed that the approach using a column to populate the foreign
scan's slot directly wouldn't work well in some cases.  For example,
consider:

SELECT * FROM verysmall v LEFT JOIN (bigft1 JOIN bigft2 ON bigft1.x =
bigft2.x) ON v.q = bigft1.q AND v.r = bigft2.r FOR UPDATE OF v;


Oops, I should have written "JOIN", not "LEFT JOIN".


The best plan is presumably something like this as you said before:

LockRows
-> Nested Loop
-> Seq Scan on verysmall v
-> Foreign Scan on bigft1 and bigft2
 Remote SQL: SELECT * FROM bigft1 JOIN bigft2 ON bigft1.x =
bigft2.x AND bigft1.q = $1 AND bigft2.r = $2

Consider the EvalPlanQual testing to see if the updated version of a
tuple in v satisfies the query.  If we use the column in the testing, we
would get the wrong results in some cases.


More precisely, we would get the wrong result when the value of v.q or 
v.r in the updated version has changed.


I don't have a good idea for this, so would an approach using an local 
join execution plan be the good way to go?


Best regards,
Etsuro Fujita



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] removing set_latch_on_sigusr1

2015-10-08 Thread Haribabu Kommi

On Fri, Oct 9, 2015 at 2:41 AM, Robert Haas  wrote:
>
> As it happens, the TupleQueueFunnelNext function I recently committed
> has such a hazard, which I failed to spot during review and testing.
> If people don't like this, I can instead cause that function to set
> the flag.  But every place that sets the flag has to use a
> PG_TRY()/PG_CATCH() block to make sure the old value of the flag gets
> restored.  I'm pretty sure that's going to burn more cycles than the
> flag can ever hope to save, not to mention the risk of bugs due to
> people forgetting to add necessary volatile qualifiers.  We've already
> got four PG_TRY() blocks in the code to cater to this stupid flag, and
> if we keep it around I'm sure we'll accumulate at least a few more.
>
> Patch attached.  Objections?  Suggestions?  Comments?

Once I also faced a problem with set_latch_on_sigusr1 flag in our development.

+1 for removal.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Multi-tenancy with RLS

2015-10-08 Thread Haribabu Kommi

On Fri, Oct 9, 2015 at 2:04 PM, Stephen Frost  wrote:
> * Robert Haas (robertmh...@gmail.com) wrote:
>> We've got one reloption for views already - security_barrier.  Maybe
>> we could have another one that effectively changes a particular view
>> from "security definer" as it is today to "security invoker".
>
> As I recall, there was a previous suggestion (honestly, I thought it was
> your idea) to have a reloption which made views "fully" security
> definer, in that functions in the view definition would run as the view
> owner instead of the view invoker.
>
> I liked that idea, though we would need to have a function to say "who
> is the 'outer' user?" (CURRENT_USER always being the owner with the
> above described reloption).
>
> I'm less sure about the idea of having a view which runs entirely as the
> view invoker, but I'm not against it either.

I changed in function check_enable_rls to use the invoker id instead of owner id
for all the system objects, the catalog table policies are getting
applied and it is
working fine till now in my multi-tenancy testing.

Currently I am writing tests to validate it against all user objects also.
If this change works for all user objects also, then we may not needed
the security invoker
reloption.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-10-08 Thread Amit Kapila

On Thu, Oct 8, 2015 at 7:31 PM, Beena Emerson 
wrote:
>
>
> Mini-lang:
> [] - to specify prioirty
> () - to specify quorum
> Format -  :  []
> Not specifying count defaults to 1.
> Ex: s_s_names = '2(cluster1: 1(A,B), cluster2: 2[X,Y,Z], U)'
>
> JSON
> It would contain 2 main keys: "sync_info" and  "groups"
> The "sync_info" would consist of "quorum"/"priority" with the count and
> "nodes"/"group" with the group name or node list.
> The optional "groups" key would list out all the "group" mentioned within
> "sync_info" along with the node list.
>  Ex: {
>  "sync_info":
>  {
>   "quorum":2,
>   "nodes":
>   [
>{"quorum":1,"group":"cluster1"},
>{"prioirty":2,"group": "cluster2"},
>"U"
>   ]
>  },
>  "groups":
>  {
>   "cluster1":["A","B"],
>   "cluster2":["X","Y","z"]
>  }
> }
>
> JSON  and mini-language:
>- JSON is more verbose
>- You can define a group and use it multiple times in sync settings
> but since no many levels or nesting is expected I am not sure how useful
> this will be.
>- Though JSON parser is inbuilt, additional code is required to
check
> for the required format of JSON. For mini-language, new parser will have
to
> be written.
>

Sounds like both the approaches have some pros and cons, also there are
some people who prefer mini-language and others who prefer JSON.  I think
one thing that might help, is to check how other databases support this
feature or somewhat similar to this feature (mainly with respect to User
Interface), as that can help us in knowing what users are already familiar
with.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] pg_ctl/pg_rewind tests vs. slow AIX buildfarm members

On Thu, Oct 8, 2015 at 3:09 PM, Michael Paquier
 wrote:
> On Wed, Oct 7, 2015 at 11:52 PM, Tom Lane  wrote:
>> Michael Paquier  writes:
>>> On Sat, Sep 26, 2015 at 9:12 AM, Tom Lane wrote:
 So the attached modified patch adjusts the PID-match logic and some
 comments, but is otherwise what I posted before.  I believe that this
 might actually work on Windows, but I have no way to test it.  Someone
 please try that?  (Don't forget to test the service-start path, too.)
>>
>>> If "pg_ctl start" is invoked directly from a command prompt, it works.
>>> Now, if I run "pg_ctl start" within a script (in an msysgit
>>> environment for example), pg_ctl fails, complaining that
>>> postmaster.pid already exists. The TAP tests get broken as well,
>>
>> Reading that again, I think you mean that "pg_ctl start" works but you
>> didn't try "pg_ctl start -w" manually?  Because it's "-w" that's at
>> issue here, and the failing test cases are using that.
>
> Yes, sorry. I fat-fingered the command from the prompt and forgot the
> -w switch. test_postmaster_connection just behaves incorrectly, and
> exists immediately with PQPING_NO_RESPONSE, aka "Stopped waiting,
> could not start server, blah".
>
>> I think there is still room to salvage something without fully rewriting
>> the postmaster invocation logic to avoid using CMD, because it's still
>> true that checking whether the CMD process is still there should be as
>> good as checking the postmaster proper.  We just can't use kill() for it.
>> I'd be inclined to get rid of the use of kill() on Unix as well; as Noah
>> mentioned upthread, if we're using fork/exec we might as well pay
>> attention to waitpid's results instead.  On Windows, I imagine that the
>> thing to do is have start_postmaster() save aside the handle for the CMD
>> process, and then in test_postmaster_connection(), check the handle state
>> to see if the process is still running (I assume there's a way to do
>> that).
>
> That would be WaitForSingleObject(handle, ms_timeout) ==
> WAIT_OBJECT_0, no? The handle should be picked up from
> CreateRestrictedProcess, and I think that CloseHandle should not be
> called on pi.hProcess if we are going to wait for it afterwards.
> Reference:
> https://msdn.microsoft.com/en-us/library/windows/desktop/ms687032%28v=vs.85%29.aspx

I had a look at that, and attached is an updated patch showing the
concept of using an HANDLE that pg_ctl waits for. I agree that saving
an HANDLE the way this patch does using a static variable is a bit
ugly especially knowing that service registration uses
test_postmaster_connection as well with directly a postmaster. The
good thing is that tapcheck runs smoothly, with one single failure
though: the second call to pg_ctl start -w may succeed instead of
failing if kicked within an interval of 3 seconds after the first one,
based on the tests on my Windows VM. My guess is that this is caused
by the fact that we monitor the shell process and not the postmaster
itself. Thoughts?
Regards,
-- 
Michael
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index ba189e7..81e0631 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #ifdef HAVE_SYS_RESOURCE_H
@@ -110,6 +111,7 @@ static SERVICE_STATUS status;
 static SERVICE_STATUS_HANDLE hStatus = (SERVICE_STATUS_HANDLE) 0;
 static HANDLE shutdownHandles[2];
 static pid_t postmasterPID = -1;
+static HANDLE commandProcess = INVALID_HANDLE_VALUE;
 
 #define shutdownEvent	  shutdownHandles[0]
 #define postmasterProcess shutdownHandles[1]
@@ -153,10 +155,10 @@ static int	CreateRestrictedProcess(char *cmd, PROCESS_INFORMATION *processInfo,
 static pgpid_t get_pgpid(bool is_status_request);
 static char **readfile(const char *path);
 static void free_readfile(char **optlines);
-static int	start_postmaster(void);
+static pgpid_t start_postmaster(void);
 static void read_post_opts(void);
 
-static PGPing test_postmaster_connection(bool);
+static PGPing test_postmaster_connection(pgpid_t pm_pid, bool do_checkpoint);
 static bool postmaster_is_alive(pid_t pid);
 
 #if defined(HAVE_GETRLIMIT) && defined(RLIMIT_CORE)
@@ -419,36 +421,70 @@ free_readfile(char **optlines)
  * start/test/stop routines
  */
 
-static int
+/*
+ * Start the postmaster and return its PID.
+ *
+ * Currently, on Windows what we return is the PID of the shell process
+ * that launched the postmaster (and, we trust, is waiting for it to exit).
+ * So the PID is usable for "is the postmaster still running" checks,
+ * but cannot be compared directly to postmaster.pid.
+ */
+static pgpid_t
 start_postmaster(void)
 {
 	char		cmd[MAXPGPATH];
 
 #ifndef WIN32
+	pgpid_t		pm_pid;
+
+	/* Flush stdio channels just before fork, to avoid double-output problems */
+	fflush(stdout);
+	fflush(stderr);
+
+	pm_pid = fork();
+	if (pm_pid < 0)
+	{
+		/* fork failed */
+

Re: [HACKERS] proposal: PL/Pythonu - function ereport

2015-10-08 Thread Pavel Stehule

2015-10-08 12:11 GMT+02:00 Pavel Stehule :

> Hi
>
> We cannot to raise PostgreSQL exception with setting all possible fields.
> I propose new function
>
> plpy.ereport(level, [ message [, detail [, hint [, sqlstate, ... )
>
> The implementation will be based on keyword parameters, so only required
> parameters should be used.
>
> Examples:
>
> plpy.ereport(plpy.NOTICE, 'some message', 'some detai')
> plpy.ereport(plpy.ERROR, 'some message', sqlstate = 'zx243');
>

patch attached

regards

Pavel


>
> Comments, notices, objections?
>
> Regards
>
> Pavel
>
commit 2593493f583f06a784b5d0b56f5efa04e62ad07a
Author: Pavel Stehule 
Date:   Thu Oct 8 20:51:17 2015 +0200

initial

diff --git a/src/pl/plpython/expected/plpython_error.out b/src/pl/plpython/expected/plpython_error.out
index 1f52af7..76d8b2c 100644
--- a/src/pl/plpython/expected/plpython_error.out
+++ b/src/pl/plpython/expected/plpython_error.out
@@ -422,3 +422,59 @@ EXCEPTION WHEN SQLSTATE 'SILLY' THEN
 	-- NOOP
 END
 $$ LANGUAGE plpgsql;
+CREATE FUNCTION nested_error_ereport() RETURNS text
+	AS
+'def fun1():
+	raise plpy.ereport(plpy.ERROR, "boom")
+
+def fun2():
+	fun1()
+
+def fun3():
+	fun2()
+
+fun3()
+return "not reached"
+'
+	LANGUAGE plpythonu;
+SELECT nested_error_ereport();
+ERROR:  plpy.SPIError: boom
+CONTEXT:  Traceback (most recent call last):
+  PL/Python function "nested_error_ereport", line 10, in 
+fun3()
+  PL/Python function "nested_error_ereport", line 8, in fun3
+fun2()
+  PL/Python function "nested_error_ereport", line 5, in fun2
+fun1()
+  PL/Python function "nested_error_ereport", line 2, in fun1
+raise plpy.ereport(plpy.ERROR, "boom")
+PL/Python function "nested_error_ereport"
+CREATE FUNCTION ereport_test()
+RETURNS void AS $$
+  plpy.ereport(plpy.NOTICE, "some notice")
+$$ LANGUAGE plpythonu;
+SELECT ereport_test();
+NOTICE:  some notice
+ ereport_test 
+--
+ 
+(1 row)
+
+\set VERBOSITY verbose
+CREATE OR REPLACE FUNCTION ereport_test()
+RETURNS void AS $$
+  plpy.ereport(plpy.NOTICE, "some notice", "some detail", "any hint", schema='schema_yy', table='table_xx', column='column_zz')
+$$ LANGUAGE plpythonu;
+SELECT ereport_test();
+NOTICE:  0: some notice
+DETAIL:  some detail
+HINT:  any hint
+SCHEMA NAME:  schema_yy
+TABLE NAME:  table_xx
+COLUMN NAME:  column_zz
+LOCATION:  PLy_ereport, plpy_plpymodule.c:428
+ ereport_test 
+--
+ 
+(1 row)
+
diff --git a/src/pl/plpython/expected/plpython_test.out b/src/pl/plpython/expected/plpython_test.out
index 7b76faf..cceb5b8 100644
--- a/src/pl/plpython/expected/plpython_test.out
+++ b/src/pl/plpython/expected/plpython_test.out
@@ -43,9 +43,9 @@ contents.sort()
 return ", ".join(contents)
 $$ LANGUAGE plpythonu;
 select module_contents();
-   module_contents
---
- Error, Fatal, SPIError, cursor, debug, error, execute, fatal, info, log, notice, prepare, quote_ident, quote_literal, quote_nullable, spiexceptions, subtransaction, warning
+ module_contents 
+-
+ DEBUG, ERROR, Error, Fatal, INFO, LOG, NOTICE, SPIError, WARNING, cursor, debug, ereport, error, execute, fatal, info, log, notice, prepare, quote_ident, quote_literal, quote_nullable, spiexceptions, subtransaction, warning
 (1 row)
 
 CREATE FUNCTION elog_test() RETURNS void
diff --git a/src/pl/plpython/plpy_elog.c b/src/pl/plpython/plpy_elog.c
index 15406d6..6825732 100644
--- a/src/pl/plpython/plpy_elog.c
+++ b/src/pl/plpython/plpy_elog.c
@@ -21,9 +21,10 @@ PyObject   *PLy_exc_fatal = NULL;
 PyObject   *PLy_exc_spi_error = NULL;
 
 
-static void PLy_traceback(char **xmsg, char **tbmsg, int *tb_depth);
 static void PLy_get_spi_error_data(PyObject *exc, int *sqlerrcode, char **detail,
-	   char **hint, char **query, int *position);
+	   char **hint, char **query, int *position,
+	   char **schema_name, char **table_name, char **column_name,
+	   char **datatype_name, char **constraint_name);
 static char *get_source_line(const char *src, int lineno);
 
 
@@ -51,12 +52,19 @@ PLy_elog(int elevel, const char *fmt,...)
 	char	   *hint = NULL;
 	char	   *query = NULL;
 	int			position = 0;
+	char	   *schema_name = NULL;
+	char	   *table_name = NULL;
+	char

Re: [HACKERS] More work on SortSupport for text - strcoll() and strxfrm() caching

On Thu, Oct 8, 2015 at 2:07 PM, Peter Geoghegan  wrote:
> On Thu, Oct 8, 2015 at 10:13 AM, Robert Haas  wrote:
>> It seems to me that (1) ABBREV_STRING_UINT isn't
>> a great name for this and (2) the comment is awfully long for the
>> thing to which it refers.  I suggest that we instead call it
>> DatumToBigEndian(), put it pg_bswap.h, and change the comments to
>> something like this:
>>
>> /*
>>  * Rearrange the bytes of a Datum into big-endian order.
>>  *
>>  * One possible application of this macro is to make comparisons
>> cheaper.  An integer
>>  * comparison of the new Datums will return the same result as a memcmp() on 
>> the
>>  * original Datums, but the integer comparison should be much cheaper.
>>  */
>>
>> The specific way that this is used by various sortsupport routines can
>> be adequately explained in the comments for those routines.
>
> This is pretty clearly something specific to SortSupport. I'm not
> opposed to changing the name and making the comments more terse along
> those lines, but I think it should live in sortsupport.h. The macro
> byteswaps datums on little-endian platforms only, which seems very
> specific.
>
> I think that we're going to have SortSupport with abbreviation for
> UUIDs and bytea at some point, and maybe character(n). Centralizing
> information about this to sortsupport.h makes sense to me.

I'm not convinced.  Doesn't this exact same concept get used for
over-the-wire communication between BE and LE machines?  There, this
operation is spelled htonl/ntohl.  Some systems even have htonll, but
I'm sure there are still a bunch that don't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Getting sorted data from foreign server