Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-07-29 Thread Sawada Masahiko
On Tue, Jul 21, 2015 at 3:50 PM, Michael Paquier
 wrote:
> On Mon, Jul 20, 2015 at 9:59 PM, Beena Emerson  
> wrote:
>> Simon Riggs wrote:
>>
>>> The choice between formats is not
>>> solely predicated on whether we have multi-line support.
>>
>>> I still think writing down some actual use cases would help bring the
>>> discussion to a conclusion. Inventing a general facility is hard without
>>> some clear goals about what we need to support.
>>
>> We need to at least support the following:
>> - Grouping: Specify of standbys along with the minimum number of commits
>> required from the group.
>> - Group Type: Groups can either be priority or quorum group.
>
> As far as I understood at the lowest level a group is just an alias
> for a list of nodes, quorum or priority are properties that can be
> applied to a group of nodes when this group is used in the expression
> to define what means synchronous commit.
>
>> - Group names: to simplify status reporting
>> - Nesting: At least 2 levels of nesting
>
> If I am following correctly, at the first level there is the
> definition of the top level objects, like groups and sync expression.
>

The grouping and using same application_name different server is similar.
How does the same application_name different server work?

>> Using JSON, sync rep parameter to replicate in 2 different clusters could be
>> written as:
>>
>>{"remotes":
>> {"quorum": 2,
>>  "servers": [{"london":
>> {"priority": 2,
>>  "servers": ["lndn1", "lndn2", "lndn3"]
>> }}
>> ,
>>   {"nyc":
>> {"priority": 1,
>>  "servers": ["ny1", "ny2"]
>> }}
>>   ]
>> }
>> }
>> The same parameter in the new language (as suggested above) could be written
>> as:
>>  'remotes: 2(london: 1[lndn1, lndn2, lndn3], nyc: 1[ny1, ny2])'
>
> OK, there is a typo. That's actually 2(london: 2[lndn1, lndn2, lndn3],
> nyc: 1[ny1, ny2]) in your grammar. Honestly, if we want group aliases,
> I think that JSON makes the most sense. One of the advantage of a
> group is that you can use it in several places in the blob and set
> different properties into it, hence we should be able to define a
> group out of the sync expression.
> Hence I would think that something like that makes more sense:
> {
> "sync_standby_names":
> {
> "quorum":2,
> "nodes":
> [
> {"priority":1,"group":"cluster1"},
> {"quorum":2,"nodes":["node1","node2","node3"]}
> ]
> },
> "groups":
> {
> "cluster1":["node11","node12","node13"],
> "cluster2":["node21","node22","node23"]
> }
> }
>
>> Also, I was thinking the name of the main group could be optional.
>> Internally, it can be given the name 'default group' or 'main group' for
>> status reporting.
>>
>> The above could also be written as:
>>  '2(london: 2[lndn1, lndn2, lndn3], nyc: 1[ny1, ny2])'
>>
>> backward compatible:
>> In JSON, while validating we may have to check if it starts with '{' to go
>
> Something worth noticing, application_name can begin with "{".
>
>> for JSON parsing else proceed with the current method.
>
>> A,B,C => 1[A,B,C]. This can be added in the new parser code.
>
> This makes sense. We could do the same for JSON-based format as well
> by reusing the in-memory structure used to deparse the blob when the
> former grammar is used as well.

If I validate s_s_name JSON syntax, I will definitely use JSONB,
rather than JSON.
Because JSONB has some useful operation functions for adding node,
deleting node to s_s_name today.
But the down side of using JSONB for s_s_name is that it could switch
in key name order place.(and remove duplicate key)
For example in the syntax Michael suggested,


* JSON (just casting JSON)
  json

 { +
 "sync_standby_names": +
 { +
 "quorum":2,   +
 "nodes":  +
 [ +
 {"priority":1,"group":"cluster1"},+
 {"quorum":2,"nodes":["node1","node2","node3"]}+
 ] +
 },+
 "groups": +
 { +
 "cluster1":["node11","node12","node13"],  +
  

Re: [HACKERS] Freeze avoidance of very large table.

2015-07-16 Thread Sawada Masahiko
On Wed, Jul 15, 2015 at 3:07 AM, Sawada Masahiko  wrote:
> On Wed, Jul 15, 2015 at 12:55 AM, Simon Riggs  wrote:
>> On 10 July 2015 at 15:11, Sawada Masahiko  wrote:
>>>
>>>
>>> Oops, I had forgotten to add new file heapfuncs.c.
>>> Latest patch is attached.
>>
>>
>> I think we've established the approach is desirable and defined the way
>> forwards for this, so this is looking good.
>
> If we want to move stuff like pg_stattuple, pg_freespacemap into core,
> we could move them into heapfuncs.c.
>
>> Some of my requests haven't been actioned yet, so I personally would not
>> commit this yet. I am happy to continue as reviewer/committer unless others
>> wish to take over.
>> The main missing item is pg_upgrade support, which won't happen by end of
>> CF1, so I am marking this as Returned With Feedback. Hopefully we can review
>> this again before CF2.
>
> I appreciate your reviewing.
> Yeah, the pg_upgrade support and regression test for VFM patch is
> almost done now, I will submit the patch in this week after testing it
> .

Attached patch is latest v9 patch.

I added:
- regression test for visibility map (visibilitymap.sql and
visibilitymap.out files)
- pg_upgrade support (rewriting vm file to vfm file)
- regression test for pg_upgrade

Please review it.

Regards,

--
Masahiko Sawada
diff --git a/contrib/pgstattuple/pgstatapprox.c b/contrib/pgstattuple/pgstatapprox.c
index 22c5f7a..b1b6a06 100644
--- a/contrib/pgstattuple/pgstatapprox.c
+++ b/contrib/pgstattuple/pgstatapprox.c
@@ -87,7 +87,7 @@ statapprox_heap(Relation rel, output_type *stat)
 		 * If the page has only visible tuples, then we can find out the free
 		 * space from the FSM and move on.
 		 */
-		if (visibilitymap_test(rel, blkno, &vmbuffer))
+		if (visibilitymap_test(rel, blkno, &vmbuffer, VISIBILITYMAP_ALL_VISIBLE))
 		{
 			freespace = GetRecordedFreeSpace(rel, blkno);
 			stat->tuple_len += BLCKSZ - freespace;
diff --git a/src/backend/access/heap/Makefile b/src/backend/access/heap/Makefile
index b83d496..806ce27 100644
--- a/src/backend/access/heap/Makefile
+++ b/src/backend/access/heap/Makefile
@@ -12,6 +12,7 @@ subdir = src/backend/access/heap
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = heapam.o hio.o pruneheap.o rewriteheap.o syncscan.o tuptoaster.o visibilitymap.o
+OBJS = heapam.o hio.o pruneheap.o rewriteheap.o syncscan.o tuptoaster.o visibilitymap.o \
+	heapfuncs.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 86a2e6b..796b76f 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2131,8 +2131,9 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	CheckForSerializableConflictIn(relation, NULL, InvalidBuffer);
 
 	/*
-	 * Find buffer to insert this tuple into.  If the page is all visible,
-	 * this will also pin the requisite visibility map page.
+	 * Find buffer to insert this tuple into.  If the page is all visible
+	 * or all frozen, this will also pin the requisite visibility map and
+	 * frozen map page.
 	 */
 	buffer = RelationGetBufferForTuple(relation, heaptup->t_len,
 	   InvalidBuffer, options, bistate,
@@ -2147,7 +2148,11 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	if (PageIsAllVisible(BufferGetPage(buffer)))
 	{
 		all_visible_cleared = true;
+
+		/* all-frozen information is also cleared at the same time */
 		PageClearAllVisible(BufferGetPage(buffer));
+		PageClearAllFrozen(BufferGetPage(buffer));
+
 		visibilitymap_clear(relation,
 			ItemPointerGetBlockNumber(&(heaptup->t_self)),
 			vmbuffer);
@@ -2448,7 +2453,11 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
 		if (PageIsAllVisible(page))
 		{
 			all_visible_cleared = true;
+
+			/* all-frozen information is also cleared at the same time */
 			PageClearAllVisible(page);
+			PageClearAllFrozen(page);
+
 			visibilitymap_clear(relation,
 BufferGetBlockNumber(buffer),
 vmbuffer);
@@ -2731,9 +2740,9 @@ heap_delete(Relation relation, ItemPointer tid,
 
 	/*
 	 * If we didn't pin the visibility map page and the page has become all
-	 * visible while we were busy locking the buffer, we'll have to unlock and
-	 * re-lock, to avoid holding the buffer lock across an I/O.  That's a bit
-	 * unfortunate, but hopefully shouldn't happen often.
+	 * visible or all frozen while we were busy locking the buffer, we'll
+	 * have to unlock and re-lock, to avoid holding the buffer lock across an
+	 * I/O.  That's a bit unfortunate, but hopefully shouldn't happen often.
 	 */
 	if (vmbuffer == InvalidBuffer && PageIsAllVisible(page))
 	{
@@ -2925,10 +2934,15 @@ l1:
 	 */
 	PageSetPrunable(page, xid);
 
+

Re: [HACKERS] Freeze avoidance of very large table.

2015-07-14 Thread Sawada Masahiko
On Wed, Jul 15, 2015 at 12:55 AM, Simon Riggs  wrote:
> On 10 July 2015 at 15:11, Sawada Masahiko  wrote:
>>
>>
>> Oops, I had forgotten to add new file heapfuncs.c.
>> Latest patch is attached.
>
>
> I think we've established the approach is desirable and defined the way
> forwards for this, so this is looking good.

If we want to move stuff like pg_stattuple, pg_freespacemap into core,
we could move them into heapfuncs.c.

> Some of my requests haven't been actioned yet, so I personally would not
> commit this yet. I am happy to continue as reviewer/committer unless others
> wish to take over.
> The main missing item is pg_upgrade support, which won't happen by end of
> CF1, so I am marking this as Returned With Feedback. Hopefully we can review
> this again before CF2.

I appreciate your reviewing.
Yeah, the pg_upgrade support and regression test for VFM patch is
almost done now, I will submit the patch in this week after testing it
.

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-07-13 Thread Sawada Masahiko
On Mon, Jul 13, 2015 at 9:22 PM, Andres Freund  wrote:
> On 2015-07-13 21:03:07 +0900, Sawada Masahiko wrote:
>> Even If we implement rewriting tool for vm into pg_upgrade, it will
>> take time as much as revacuum because it need whole scanning table.
>
> Why would it? Sure, you can only set allvisible and not the frozen bit,
> but that's fine. That way the cost for freezing can be paid over time.
>
> If we require terrabytes of data to be scanned, including possibly
> rewriting large portions due to freezing, before index only scans work
> and most vacuums act in a partial manner the migration to 9.6 will be a
> major pain for our users.

Ah, If we set all bit as not all-frozen,  we don't need to whole table
scanning, only scan vm.
And I agree with this.

But please image the case where old cluster has table which is very
large, read-only and vacuum freeze is done.
In this case, the all-frozen bit of such table in new cluster will not
set, unless we do vacuum freeze again.
The information of all-frozen of such table is lacked.

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-07-13 Thread Sawada Masahiko
On Fri, Jul 10, 2015 at 10:06 PM, Beena Emerson  wrote:
> Hello,
>
> Tue, Jul 7, 2015 at 02:56 AM, Josh Berkus wrote:
>> pro-JSON:
>>
>> * standard syntax which is recognizable to sysadmins and devops.
>> * can use JSON/JSONB functions with ALTER SYSTEM SET to easily make
>> additions/deletions from the synch rep config.
>> * can add group labels (see below)
>
> Adding group labels do have a lot of values but as Amit has pointed out,
> with little modification, they can be included in GUC as well. It will not
> make it any more complex.
>
> On Tue, Jul 7, 2015 at 2:19 PM, Michael Paquier wrote:
>
>> Something like pg_syncinfo/ coupled with a LW lock, we already do
>> something similar for replication slots with pg_replslot/.
>
> I was trying to figure out how the JSON metadata can be used.
> It would have to be set using a given set of functions. Right?
> I am sorry this question is very basic.
>
> The functions could be something like:
> 1. pg_add_synch_set(set_name NAME, quorum INT, is_priority bool, set_members
> VARIADIC)
>
> This will be used to add a sync set. The set_members can be individual
> elements of another set name. The parameter is_priority is used to decide
> whether the set is priority (true) set or quorum (false). This function call
> will  create a folder pg_syncinfo/groups/$NAME and store the json blob?
>
> The root group would be automatically sset by finding the group which is not
> included in other groups? or can be set by another function?
>
> 2. pg_modify_sync_set(set_name NAME, quorum INT, is_priority bool,
> set_members VARIADIC)
>
> This will update the pg_syncinfo/groups/$NAME to store the new values.
>
> 3. pg_drop_synch_set(set_name NAME)
>
> This will update the pg_syncinfo/groups/$NAME folder. Also all the groups
> which included this would be updated?
>
> 4. pg_show_synch_set()
>
> this will display the current sync setting in json format.
>
> Am I missing something?
>
> Is JSON being preferred because it would be ALTER SYSTEM friendly and in a
> format already known to users?
>
> In a real-life scenario, at most how many groups and nesting would be
> expected?
>

I might missing something but, these functions will generate WAL?
If they does, we will face the situation where we need to wait
forever, Fujii-san pointed out.


Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-07-13 Thread Sawada Masahiko
On Mon, Jul 13, 2015 at 7:46 PM, Amit Kapila  wrote:
> On Mon, Jul 13, 2015 at 3:39 PM, Sawada Masahiko 
> wrote:
>>
>> On Tue, Jul 7, 2015 at 9:07 PM, Simon Riggs  wrote:
>> > On 6 July 2015 at 17:28, Simon Riggs  wrote:
>> >
>> >> I think we need something for pg_upgrade to rewrite existing VMs.
>> >> Otherwise a large read only database would suddenly require a massive
>> >> revacuum after upgrade, which seems bad. That can wait for now until we
>> >> all
>> >> agree this patch is sound.
>> >
>> >
>> > Since we need to rewrite the "vm" map, I think we should call the new
>> > map
>> > "vfm"
>> >
>> > That way we will be able to easily check whether the rewrite has been
>> > conducted on all relations.
>> >
>> > Since the maps are just bits there is no other way to tell that a map
>> > has
>> > been rewritten
>>
>> To avoid revacuum after upgrade, you meant that we need to rewrite
>> each bit of vm to corresponding bits of vfm, if it's from
>> not-supporting vfm version(i.g., 9.5 or earlier ). right?
>> If so, we will need to do whole scanning table, which is expensive as
>> well.
>> Clearing vm and do revacuum would be nice, rather than doing in
>> upgrading, I think.
>>
>
> How will you ensure to have revacuum for all the tables after
> upgrading?

We use script file which are generated by pg_upgrade.

>  Till the time Vacuum is done on the tables that
> have vm before upgrade, any queries on those tables can
> become slower.

Even If we implement rewriting tool for vm into pg_upgrade, it will
take time as much as revacuum because it need whole scanning table.
I meant that we rewrite vm using by existing facility (i.g., vacuum
(freeze)), instead of implementing new rewriting tool for vm.

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-07-13 Thread Sawada Masahiko
On Tue, Jul 7, 2015 at 9:07 PM, Simon Riggs  wrote:
> On 6 July 2015 at 17:28, Simon Riggs  wrote:
>
>> I think we need something for pg_upgrade to rewrite existing VMs.
>> Otherwise a large read only database would suddenly require a massive
>> revacuum after upgrade, which seems bad. That can wait for now until we all
>> agree this patch is sound.
>
>
> Since we need to rewrite the "vm" map, I think we should call the new map
> "vfm"
>
> That way we will be able to easily check whether the rewrite has been
> conducted on all relations.
>
> Since the maps are just bits there is no other way to tell that a map has
> been rewritten

To avoid revacuum after upgrade, you meant that we need to rewrite
each bit of vm to corresponding bits of vfm, if it's from
not-supporting vfm version(i.g., 9.5 or earlier ). right?
If so, we will need to do whole scanning table, which is expensive as well.
Clearing vm and do revacuum would be nice, rather than doing in
upgrading, I think.

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-07-10 Thread Sawada Masahiko
On Fri, Jul 10, 2015 at 10:43 PM, Fujii Masao  wrote:
> On Fri, Jul 10, 2015 at 2:41 PM, Sawada Masahiko  
> wrote:
>> On Fri, Jul 10, 2015 at 3:05 AM, Sawada Masahiko  
>> wrote:
>>>
>>> Also something for pg_upgrade is also not yet.
>>>
>>> TODO
>>> - Test case for this feature
>>> - pg_upgrade support.
>>>
>>
>> I had forgotten to change the fork name of visibility map to "vfm".
>> Attached latest v7 patch.
>> Please review it.
>
> The compilation failed on my machine...
>
> gcc -Wall -Wmissing-prototypes -Wpointer-arith
> -Wdeclaration-after-statement -Wendif-labels
> -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
> -fwrapv -g -O0 -I../../../../src/include -D_GNU_SOURCE   -c -o
> visibilitymap.o visibilitymap.c
> make[4]: *** No rule to make target `heapfuncs.o', needed by
> `objfiles.txt'.  Stop.
> make[4]: *** Waiting for unfinished jobs
> ( echo src/backend/access/index/genam.o
> src/backend/access/index/indexam.o ) >objfiles.txt
> make[4]: Leaving directory `/home/postgres/pgsql/git/src/backend/access/index'
> gcc -Wall -Wmissing-prototypes -Wpointer-arith
> -Wdeclaration-after-statement -Wendif-labels
> -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
> -fwrapv -g -O0 -I../../../src/include -D_GNU_SOURCE   -c -o
> tablespace.o tablespace.c
> gcc -Wall -Wmissing-prototypes -Wpointer-arith
> -Wdeclaration-after-statement -Wendif-labels
> -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
> -fwrapv -g -O0 -I../../../src/include -D_GNU_SOURCE   -c -o
> instrument.o instrument.c
> make[4]: Leaving directory `/home/postgres/pgsql/git/src/backend/access/heap'
> make[3]: *** [heap-recursive] Error 2
> make[3]: Leaving directory `/home/postgres/pgsql/git/src/backend/access'
> make[2]: *** [access-recursive] Error 2
> make[2]: *** Waiting for unfinished jobs
>

Oops, I had forgotten to add new file heapfuncs.c.
Latest patch is attached.

Regards,

--
Sawada Masahiko


000_add_frozen_bit_into_visibilitymap_v8.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-07-10 Thread Sawada Masahiko
On Fri, Jul 10, 2015 at 3:42 AM, Jeff Janes  wrote:
> On Wed, Jul 8, 2015 at 10:10 PM, Sawada Masahiko 
> wrote:
>>
>> On Thu, Jul 9, 2015 at 4:31 AM, Jeff Janes  wrote:
>> > On Fri, Jul 3, 2015 at 1:25 AM, Sawada Masahiko 
>> > wrote:
>> >>
>> >> It's impossible to have VM bits set to frozen but not visible.
>> >> These bit are controlled independently. But eventually, when
>> >> all-frozen bit is set, all-visible is also set.
>> >
>> >
>> > If that combination is currently impossible, could it be used indicate
>> > that
>> > the page is all empty?
>>
>> Yeah, the status of that VM bits set to frozen but not visible is
>> impossible, so we could use this status for another something status
>> of the page.
>>
>> > Having a crash-proof bitmap of all-empty pages would make vacuum
>> > truncation
>> > scans much more efficient.
>>
>> The empty page is always marked all-visible by vacuum today, it's not
>> enough?
>
>
> The "current" vacuum can just remember that they were empty as well as
> all-visible.
>
> But the next vacuum that occurs on the table won't know that they are empty,
> just that they are all-visible, so it can't truncate them away without
> having to read each one first.

Yeah, it would be effective for vacuum empty page.

>
> It is a minor thing, but if there is no other use for this fourth
> "bit-space", it seems a shame to waste it when there is some use for it.  I
> haven't looked at the code around this area to know how hard it would be to
> implement the setting and clearing of the bit.

I think so too, we would be able to use unused fourth status of bits
efficiently.
Should I include these improvement into this patch?
This topic should be discussed on another thread after this feature is
committed, I think.

Regards,

--
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-07-09 Thread Sawada Masahiko
On Fri, Jul 10, 2015 at 3:05 AM, Sawada Masahiko  wrote:
>
> Also something for pg_upgrade is also not yet.
>
> TODO
> - Test case for this feature
> - pg_upgrade support.
>

I had forgotten to change the fork name of visibility map to "vfm".
Attached latest v7 patch.
Please review it.

Regards,

--
Sawada Masahiko
diff --git a/contrib/pgstattuple/pgstatapprox.c b/contrib/pgstattuple/pgstatapprox.c
index 22c5f7a..b1b6a06 100644
--- a/contrib/pgstattuple/pgstatapprox.c
+++ b/contrib/pgstattuple/pgstatapprox.c
@@ -87,7 +87,7 @@ statapprox_heap(Relation rel, output_type *stat)
 		 * If the page has only visible tuples, then we can find out the free
 		 * space from the FSM and move on.
 		 */
-		if (visibilitymap_test(rel, blkno, &vmbuffer))
+		if (visibilitymap_test(rel, blkno, &vmbuffer, VISIBILITYMAP_ALL_VISIBLE))
 		{
 			freespace = GetRecordedFreeSpace(rel, blkno);
 			stat->tuple_len += BLCKSZ - freespace;
diff --git a/src/backend/access/heap/Makefile b/src/backend/access/heap/Makefile
index b83d496..806ce27 100644
--- a/src/backend/access/heap/Makefile
+++ b/src/backend/access/heap/Makefile
@@ -12,6 +12,7 @@ subdir = src/backend/access/heap
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = heapam.o hio.o pruneheap.o rewriteheap.o syncscan.o tuptoaster.o visibilitymap.o
+OBJS = heapam.o hio.o pruneheap.o rewriteheap.o syncscan.o tuptoaster.o visibilitymap.o \
+	heapfuncs.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 86a2e6b..ac74100 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -88,7 +88,7 @@ static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
 static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
 Buffer newbuf, HeapTuple oldtup,
 HeapTuple newtup, HeapTuple old_key_tup,
-bool all_visible_cleared, bool new_all_visible_cleared);
+			bool all_visible_cleared, bool new_all_visible_cleared);
 static void HeapSatisfiesHOTandKeyUpdate(Relation relation,
 			 Bitmapset *hot_attrs,
 			 Bitmapset *key_attrs, Bitmapset *id_attrs,
@@ -2131,8 +2131,9 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	CheckForSerializableConflictIn(relation, NULL, InvalidBuffer);
 
 	/*
-	 * Find buffer to insert this tuple into.  If the page is all visible,
-	 * this will also pin the requisite visibility map page.
+	 * Find buffer to insert this tuple into.  If the page is all visible
+	 * or all frozen, this will also pin the requisite visibility map and
+	 * frozen map page.
 	 */
 	buffer = RelationGetBufferForTuple(relation, heaptup->t_len,
 	   InvalidBuffer, options, bistate,
@@ -2147,10 +2148,13 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	if (PageIsAllVisible(BufferGetPage(buffer)))
 	{
 		all_visible_cleared = true;
+
+		/* all-frozen information is also cleared at the same time */
 		PageClearAllVisible(BufferGetPage(buffer));
+		PageClearAllFrozen(BufferGetPage(buffer));
+
 		visibilitymap_clear(relation,
-			ItemPointerGetBlockNumber(&(heaptup->t_self)),
-			vmbuffer);
+			ItemPointerGetBlockNumber(&(heaptup->t_self)), vmbuffer);
 	}
 
 	/*
@@ -2448,10 +2452,12 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
 		if (PageIsAllVisible(page))
 		{
 			all_visible_cleared = true;
+
+			/* all-frozen information is also cleared at the same time */
 			PageClearAllVisible(page);
-			visibilitymap_clear(relation,
-BufferGetBlockNumber(buffer),
-vmbuffer);
+			PageClearAllFrozen(page);
+
+			visibilitymap_clear(relation, BufferGetBlockNumber(buffer), vmbuffer);
 		}
 
 		/*
@@ -2495,7 +2501,9 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
 			/* the rest of the scratch space is used for tuple data */
 			tupledata = scratchptr;
 
-			xlrec->flags = all_visible_cleared ? XLH_INSERT_ALL_VISIBLE_CLEARED : 0;
+			if (all_visible_cleared)
+xlrec->flags = XLH_INSERT_ALL_VISIBLE_CLEARED;
+
 			xlrec->ntuples = nthispage;
 
 			/*
@@ -2731,9 +2739,9 @@ heap_delete(Relation relation, ItemPointer tid,
 
 	/*
 	 * If we didn't pin the visibility map page and the page has become all
-	 * visible while we were busy locking the buffer, we'll have to unlock and
-	 * re-lock, to avoid holding the buffer lock across an I/O.  That's a bit
-	 * unfortunate, but hopefully shouldn't happen often.
+	 * visible or all frozen while we were busy locking the buffer, we'll
+	 * have to unlock and re-lock, to avoid holding the buffer lock across an
+	 * I/O.  That's a bit unfortunate, but hopefully shouldn't happen often.
 	 */
 	if (vmbuffer == InvalidBuffer && PageIsAllVisible(page))
 	{
@@ -2925,12 +2933,16 @@ l1:
 	 */
 	PageSetPrunable(page, xid);
 
+	/* clear PD_ALL_VISIBLE and PD_ALL_FORZEN f

Re: [HACKERS] Freeze avoidance of very large table.

2015-07-09 Thread Sawada Masahiko
On Tue, Jul 7, 2015 at 8:49 PM, Amit Kapila  wrote:
> On Thu, Jul 2, 2015 at 9:00 PM, Sawada Masahiko 
> wrote:
>>
>>
>> Thank you for bug report, and comments.
>>
>> Fixed version is attached, and source code comment is also updated.
>> Please review it.
>>
>
> I am looking into this patch and would like to share my findings with
> you:

Thank you for comment.
I appreciate you taking time to review this patch.

>
> 1.
> @@ -2131,8 +2133,9 @@ heap_insert(Relation relation, HeapTuple tup,
> CommandId cid,
>
> CheckForSerializableConflictIn(relation, NULL, InvalidBuffer);
>
>   /*
> - * Find buffer to insert this
> tuple into.  If the page is all visible,
> - * this will also pin the requisite visibility map page.
> +
>  * Find buffer to insert this tuple into.  If the page is all visible
> + * of all frozen, this will also pin
> the requisite visibility map and
> + * frozen map page.
>   */
>
> typo in comments.
>
> /of all frozen/or all frozen

Fixed.

> 2.
> visibilitymap.c
> + * The visibility map is a bitmap with two bits (all-visible and all-frozen
> + * per heap page.
>
> /and all-frozen/and all-frozen)
> closing round bracket is missing.

Fixed.

> 3.
> visibilitymap.c
> -/*#define TRACE_VISIBILITYMAP */
> +#define TRACE_VISIBILITYMAP
>
> why is this hash define opened?

Fixed.

> 4.
> -visibilitymap_count(Relation rel)
> +visibilitymap_count(Relation rel, bool for_visible)
>
> This API needs to count set bits for either visibility info, frozen info
> or both (if required), it seems better to have second parameter as
> uint8 flags rather than bool. Also, if it is required to be called at most
> places for both visibility and frozen bits count, why not get them
> in one call?

Fixed.

> 5.
> Clearing visibility and frozen bit separately for the dml
> operations would lead locking/unlocking the corresponding buffer
> twice, can we do it as a one operation.  I think this is suggested
> by Simon as well.

Latest patch clears bits in one operation, and set all-frozen with
all-visible in one operation.
We can judge the page is all-frozen in two places: first scanning the
page(lazy_scan_heap), and after cleaning garbage(lazy_vacuum_page).

> 6.
> - * Before locking the buffer, pin the visibility map page if it appears to
> - * be necessary.
> Since we haven't got the lock yet, someone else might be
> + * Before locking the buffer, pin the
> visibility map if it appears to be
> + * necessary.  Since we haven't got the lock yet, someone else might
> be
>
> Why you have deleted 'page' in above comment?

Fixed.

> 7.
> @@ -3490,21 +3532,23 @@ l2:
>   UnlockTupleTuplock(relation, &(oldtup.t_self), *lockmode);
>
> if (vmbuffer != InvalidBuffer)
>   ReleaseBuffer(vmbuffer);
> +
>   bms_free
> (hot_attrs);
>
> Seems unnecessary change.

Fixed.

> 8.
> @@ -1919,11 +1919,18 @@ index_update_stats(Relation rel,
>   {
>   BlockNumber relpages =
> RelationGetNumberOfBlocks(rel);
>   BlockNumber relallvisible;
> + BlockNumber
> relallfrozen;
>
>   if (rd_rel->relkind != RELKIND_INDEX)
> - relallvisible =
> visibilitymap_count(rel);
> + {
> + relallvisible = visibilitymap_count(rel,
> true);
> + relallfrozen = visibilitymap_count(rel, false);
> + }
>   else
> /* don't bother for indexes */
> + {
>   relallvisible = 0;
> +
> relallfrozen = 0;
> + }
>
> I think in this function, you have forgotten to update the
> relallfrozen value in pg_class.

Fixed.

> 9.
> vacuumlazy.c
>
> @@ -253,14 +258,16 @@ lazy_vacuum_rel(Relation onerel, int options,
> VacuumParams *params,
>   * NB: We
> need to check this before truncating the relation, because that
>   * will change ->rel_pages.
>   */
> -
> if (vacrelstats->scanned_pages < vacrelstats->rel_pages)
> + if ((vacrelstats->scanned_pages +
> vacrelstats->vmskipped_pages)
> + < vacrelstats->rel_pages)
>   {
> - Assert(!scan_all);
>
> Why you have removed this Assert, won't the count of
> vacrelstats->scanned_pages + vacrelstats->vmskipped_pages be
> equal to vacrelstats->rel_pages when scall_all = true.

Fixed.

> 10.
> vacuumlazy.c
> lazy_vacuum_rel()
> ..
> + scanned_all |= scan_all;
> +
>
> Why this new assignment is added, please add a comment to
> explain it.

It's not necessary, removed.

> 11.
> lazy_scan_heap()
> ..
> + * Also, skipping even a single page accorind to all-visible bit of
> + * visibility map means that we can't update relfrozenxid, so we only want
> + * to do it if we can skip a goodly number. On the other hand, we count
> + * both how

Re: [HACKERS] Freeze avoidance of very large table.

2015-07-08 Thread Sawada Masahiko
On Thu, Jul 9, 2015 at 4:31 AM, Jeff Janes  wrote:
> On Fri, Jul 3, 2015 at 1:25 AM, Sawada Masahiko 
> wrote:
>>
>> On Fri, Jul 3, 2015 at 1:23 AM, Simon Riggs  wrote:
>> > On 2 July 2015 at 16:30, Sawada Masahiko  wrote:
>> >
>> >>
>> >> Also, the flags of each heap page header might be set PD_ALL_FROZEN,
>> >> as well as all-visible
>> >
>> >
>> > Is it possible to have VM bits set to frozen but not visible?
>> >
>> > The description makes those two states sound independent of each other.
>> >
>> > Are they? Or not? Do we test for an impossible state?
>> >
>>
>> It's impossible to have VM bits set to frozen but not visible.
>> These bit are controlled independently. But eventually, when
>> all-frozen bit is set, all-visible is also set.
>
>
> If that combination is currently impossible, could it be used indicate that
> the page is all empty?

Yeah, the status of that VM bits set to frozen but not visible is
impossible, so we could use this status for another something status
of the page.

> Having a crash-proof bitmap of all-empty pages would make vacuum truncation
> scans much more efficient.

The empty page is always marked all-visible by vacuum today, it's not enough?

Regards,

--
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More logging for autovacuum

2015-07-07 Thread Sawada Masahiko
On Sat, Jul 4, 2015 at 2:30 PM, Amit Kapila  wrote:
> On Thu, Jul 2, 2015 at 4:41 AM, Gurjeet Singh  wrote:
>>
>> On Fri, Aug 17, 2007 at 3:14 PM, Alvaro Herrera
>>  wrote:
>>>
>>> Gregory Stark wrote:
>>> >
>>> > I'm having trouble following what's going on with autovacuum and I'm
>>> > finding
>>> > the existing logging insufficient. In particular that it's only logging
>>> > vacuum
>>> > runs *after* the vacuum finishes makes it hard to see what vacuums are
>>> > running
>>> > at any given time. Also, I want to see what is making autovacuum decide
>>> > to
>>> > forgo vacuuming a table and the log with that information is at DEBUG2.
>>>
>>> So did this idea go anywhere?
>>
>>
>> Assuming the thread stopped here, I'd like to rekindle the proposal.
>>
>> log_min_messages acts as a single gate for everything headed for the
>> server logs; controls for per-background process logging do not exist. If
>> one wants to see DEBUG/INFO messages for just the Autovacuum (or
>> checkpointer, bgwriter, etc.), they have to set log_min_messages to that
>> level, but the result would be a lot of clutter from other processes to
>> grovel through, to see the messages of interest.
>>
>
> I think that will be quite helpful.  During the patch development of
> parallel sequential scan, it was quite helpful to see the LOG messages
> of bgworkers, however one of the recent commits (91118f1a) have
> changed those to DEBUG1, now if I have to enable all DEBUG1
> messages, then what I need will be difficult to find in all the log
> messages.
> Having control of separate logging for background tasks will serve such
> a purpose.
>

+1

I sometime want to set log_min_messages per process, especially when
less than DEBUG level log is needed.
It's not easy to set log level to particular process from immediately
after beginning of launch today.

Regards,

--
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More logging for autovacuum

2015-07-07 Thread Sawada Masahiko
On Sat, Jul 4, 2015 at 2:30 PM, Amit Kapila  wrote:
> On Thu, Jul 2, 2015 at 4:41 AM, Gurjeet Singh  wrote:
>>
>> On Fri, Aug 17, 2007 at 3:14 PM, Alvaro Herrera
>>  wrote:
>>>
>>> Gregory Stark wrote:
>>> >
>>> > I'm having trouble following what's going on with autovacuum and I'm
>>> > finding
>>> > the existing logging insufficient. In particular that it's only logging
>>> > vacuum
>>> > runs *after* the vacuum finishes makes it hard to see what vacuums are
>>> > running
>>> > at any given time. Also, I want to see what is making autovacuum decide
>>> > to
>>> > forgo vacuuming a table and the log with that information is at DEBUG2.
>>>
>>> So did this idea go anywhere?
>>
>>
>> Assuming the thread stopped here, I'd like to rekindle the proposal.
>>
>> log_min_messages acts as a single gate for everything headed for the
>> server logs; controls for per-background process logging do not exist. If
>> one wants to see DEBUG/INFO messages for just the Autovacuum (or
>> checkpointer, bgwriter, etc.), they have to set log_min_messages to that
>> level, but the result would be a lot of clutter from other processes to
>> grovel through, to see the messages of interest.
>>
>
> I think that will be quite helpful.  During the patch development of
> parallel sequential scan, it was quite helpful to see the LOG messages
> of bgworkers, however one of the recent commits (91118f1a) have
> changed those to DEBUG1, now if I have to enable all DEBUG1
> messages, then what I need will be difficult to find in all the log
> messages.
> Having control of separate logging for background tasks will serve such
> a purpose.
>
>> The facilities don't yet exist, but it'd be nice if such parameters when
>> unset (ie NULL) pick up the value of log_min_messages. So by default, the
>> users will get the same behaviour as today, but can choose to tweak per
>> background-process logging when needed.
>>
>
> Will this proposal allow user to see all the messages by all the background
> workers or will it be per background task.  Example in some cases user
> might want to see the messages by all bgworkers and in some cases one
> might want to see just messages by AutoVacuum and it's workers.
>
> I think here designing User Interface is an important work if others also
> agree
> that such an option is useful, could you please elaborate your proposal?

+1

I sometime want to set log_min_messages per process, especially when
less than DEBUG level log is needed.
It's not easy to set log level to particular process from immediately
after beginning of launch today.

Regards,

--
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-07-07 Thread Sawada Masahiko
On Wed, Jul 8, 2015 at 12:37 AM, Andres Freund  wrote:
> On 2015-07-07 16:25:13 +0100, Simon Riggs wrote:
>> I don't think pg_freespacemap is the right place.
>
> I agree that pg_freespacemap sounds like an odd location.
>
>> I'd prefer to add that as a single function into core, so we can write
>> formal tests.
>
> With the advent of src/test/modules it's not really a prerequisite for
> things to be builtin to be testable. I think there's fair arguments for
> moving stuff like pg_stattuple, pg_freespacemap, pg_buffercache into
> core at some point, but that's probably a separate discussion.
>

I understood.
So I will place bunch of test like src/test/module/visibilitymap_test,
which contains  some tests regarding this feature,
and gather them into one patch.

Regards,

--
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-07-07 Thread Sawada Masahiko
> So I don't understand why you have two separate calls to visibilitymap_clear()
> Surely the logic should be to clear both bits at the same time?
Yes, you're right. all-frozen bit should be cleared at the same time
as clearing all-visible bit.

> 1. Both bits unset   ~(VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN)
> which can be changed to state 2 only
> 2. VISIBILITYMAP_ALL_VISIBLE only
> which can be changed state 1 or state 3
> 3. VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN
> which can be changed to state 1 only
> If that is the case please simplify the logic for setting and unsetting the 
> bits so they are set together efficiently.
> At the same time please also put in Asserts to ensure that the state logic is 
> maintained when it is set and when it is tested.
>
> In patch, during Vacuum first the frozen bit is set and then the visibility
> will be set in a later operation, now if the crash happens between those
> 2 operations, then isn't it possible that the frozen bit is set and visible
> bit is not set?

In current patch, frozen bit is set first in lazy_scan_heap(), so it's
possible to have VM bits set frozen bit but not visible as Amit
pointed out.
To fix it, I'm modifying the patch to more simpler and setting both
bits at the same time efficiently.

> I would also like to see the visibilitymap_test function exposed in SQL,
> so we can write code to examine the map contents for particular ctids.
> By doing that we can then write a formal test that shows the evolution of 
> tuples from insertion,
> vacuuming and freezing, testing the map has been set correctly at each stage.
> I guess that needs to be done as an isolationtest so we have an observer that 
> contrains the xmin in various ways.
> In light of multixact bugs, any code that changes the on-disk tuple metadata 
> needs formal tests.

Attached patch adds a few function to contrib/pg_freespacemap to
explore the inside of visibility map, which I used for my test.
I hope it helps for testing this feature.

> I think we need something for pg_upgrade to rewrite existing VMs.
> Otherwise a large read only database would suddenly require a massive
> revacuum after upgrade, which seems bad. That can wait for now until we all
> agree this patch is sound.

Yeah, I will address them.

Regards,

--
Sawada Masahiko


001_visibilitymap_test_function.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-07-06 Thread Sawada Masahiko
On Thu, Jul 2, 2015 at 9:31 PM, Fujii Masao  wrote:
> On Thu, Jul 2, 2015 at 5:44 PM, Beena Emerson  wrote:
>> Hello,
>> There has been a lot of discussion. It has become a bit confusing.
>> I am summarizing my understanding of the discussion till now.
>> Kindly let me know if I missed anything important.
>>
>> Backward compatibility:
>> We have to provide support for the current format and behavior for
>> synchronous replication (The first running standby from list s_s_names)
>> In case the new format does not include GUC, then a special value to be
>> specified for s_s_names to indicate that.
>>
>> Priority and quorum:
>> Quorum treats all the standby with same priority while in priority behavior,
>> each one has a different priority and ACK must be received from the
>> specified k lowest priority servers.
>> I am not sure how combining both will work out.
>> Mostly we would like to have some standbys from each data center to be in
>> sync. Can it not be achieved by quorum only?
>
> So you're wondering if there is the use case where both quorum and priority 
> are
> used together?
>
> For example, please imagine the case where you have two standby servers
> (say A and B) in local site, and one standby server (say C) in remote disaster
> recovery site. You want to set up sync replication so that the master waits 
> for
> ACK from either A or B, i.e., the setting of 1(A, B). Also only when either A
> or B crashes, you want to make the master wait for ACK from either the
> remaining local standby or C. On the other hand, you don't want to use the
> setting like 1(A, B, C). Because in this setting, C can be sync standby when
> the master craches, and both A and B might be very behind of C. In this case,
> you need to promote the remote standby server C to new master,,, this is what
> you'd like to avoid.
>
> The setting that you need is 1(1[A, C], 1[B, C]) in Michael's proposed 
> grammer.
>

If we set the remote disaster recovery site up as synch replica, we
would get some big latencies even though we use quorum commit.
So I think this case Fujii-san suggested is a good configuration, and
many users would want to use it.
I tend to agree with combine quorum and prioritization into one GUC
parameter while keeping backward compatibility.

Regards,

--
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-07-06 Thread Sawada Masahiko
On Fri, Jul 3, 2015 at 5:25 PM, Sawada Masahiko  wrote:
> On Fri, Jul 3, 2015 at 1:23 AM, Simon Riggs  wrote:
>> On 2 July 2015 at 16:30, Sawada Masahiko  wrote:
>>
>>>
>>> Also, the flags of each heap page header might be set PD_ALL_FROZEN,
>>> as well as all-visible
>>
>>
>> Is it possible to have VM bits set to frozen but not visible?
>>
>> The description makes those two states sound independent of each other.
>>
>> Are they? Or not? Do we test for an impossible state?
>>
>
> It's impossible to have VM bits set to frozen but not visible.
> These bit are controlled independently. But eventually, when
> all-frozen bit is set, all-visible is also set.

Attached latest version including some bug fix.
Please review it.

Regards,

--
Sawada Masahiko


000_add_frozen_bit_into_visibilitymap_v5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synch failover WAS: Support for N synchronous standby servers - take 2

2015-07-03 Thread Sawada Masahiko
On Fri, Jul 3, 2015 at 6:23 PM, Fujii Masao  wrote:
> On Fri, Jul 3, 2015 at 5:59 PM, Sawada Masahiko  wrote:
>> Yeah, quorum commit is helpful for minimizing data loss in comparison
>> with today replication.
>> But in this your case, how can we know which server we should use as
>> the next master server, after local data center got down?
>> If we choose a wrong one, we would get the data loss.
>
> Check the progress of each server, e.g., by using
> pg_last_xlog_replay_location(),
> and choose the server which is ahead of as new master.
>

Thanks. So we can choice the next master server using by checking the
progress of each server, if hot standby is enabled.
And a such procedure is needed even today replication.

I think that the #2 problem which is Josh pointed out seems to be solved;
1. I need to ensure that data is replicated to X places.
2. I need to *know* which places data was synchronously replicated
to when the master goes down.
And we can address #1 problem using quorum commit.

Thought?

Regards,

--
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synch failover WAS: Support for N synchronous standby servers - take 2

2015-07-03 Thread Sawada Masahiko
On Fri, Jul 3, 2015 at 12:18 PM, Fujii Masao  wrote:
> On Fri, Jul 3, 2015 at 6:54 AM, Josh Berkus  wrote:
>> On 07/02/2015 12:44 PM, Andres Freund wrote:
>>> On 2015-07-02 11:50:44 -0700, Josh Berkus wrote:
>>>> So there's two parts to this:
>>>>
>>>> 1. I need to ensure that data is replicated to X places.
>>>>
>>>> 2. I need to *know* which places data was synchronously replicated to
>>>> when the master goes down.
>>>>
>>>> My entire point is that (1) alone is useless unless you also have (2).
>>>
>>> I think there's a good set of usecases where that's really not the case.
>>
>> Please share!  My plea for usecases was sincere.  I can't think of any.
>>
>>>> And do note that I'm talking about information on the replica, not on
>>>> the master, since in any failure situation we don't have the old
>>>> master around to check.
>>>
>>> How would you, even theoretically, synchronize that knowledge to all the
>>> replicas? Even when they're temporarily disconnected?
>>
>> You can't, which is why what we need to know is when the replica thinks
>> it was last synced from the replica side.  That is, a sync timestamp and
>> lsn from the last time the replica ack'd a sync commit back to the
>> master successfully.  Based on that information, I can make an informed
>> decision, even if I'm down to one replica.
>>
>>>> ... because we would know definitively which servers were in sync.  So
>>>> maybe that's the use case we should be supporting?
>>>
>>> If you want automated failover you need a leader election amongst the
>>> surviving nodes. The replay position is all they need to elect the node
>>> that's furthest ahead, and that information exists today.
>>
>> I can do that already.  If quorum synch commit doesn't help us minimize
>> data loss any better than async replication or the current 1-redundant,
>> why would we want it?  If it does help us minimize data loss, how?
>
> In your example of "2" : { "local_replica", "london_server", "nyc_server" },
> if there is not something like quorum commit, only local_replica is synch
> and the other two are async. In this case, if the local data center gets
> destroyed, you need to promote either london_server or nyc_server. But
> since they are async, they might not have the data which have been already
> committed in the master. So data loss! Of course, as I said yesterday,
> they might have all the data and no data loss happens at the promotion.
> But the point is that there is no guarantee that no data loss happens.
> OTOH, if we use quorum commit, we can guarantee that either london_server
> or nyc_server has all the data which have been committed in the master.
>
> So I think that quorum commit is helpful for minimizing the data loss.
>

Yeah, quorum commit is helpful for minimizing data loss in comparison
with today replication.
But in this your case, how can we know which server we should use as
the next master server, after local data center got down?
If we choose a wrong one, we would get the data loss.

Regards,

--
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-07-03 Thread Sawada Masahiko
On Fri, Jul 3, 2015 at 1:23 AM, Simon Riggs  wrote:
> On 2 July 2015 at 16:30, Sawada Masahiko  wrote:
>
>>
>> Also, the flags of each heap page header might be set PD_ALL_FROZEN,
>> as well as all-visible
>
>
> Is it possible to have VM bits set to frozen but not visible?
>
> The description makes those two states sound independent of each other.
>
> Are they? Or not? Do we test for an impossible state?
>

It's impossible to have VM bits set to frozen but not visible.
These bit are controlled independently. But eventually, when
all-frozen bit is set, all-visible is also set.

Regards,

--
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-07-02 Thread Sawada Masahiko
On Thu, Jul 2, 2015 at 1:06 AM, Fujii Masao  wrote:
> On Thu, Jul 2, 2015 at 12:13 AM, Sawada Masahiko  
> wrote:
>> On Thu, May 28, 2015 at 11:34 AM, Sawada Masahiko  
>> wrote:
>>> On Thu, Apr 30, 2015 at 8:07 PM, Sawada Masahiko  
>>> wrote:
>>>> On Fri, Apr 24, 2015 at 11:21 AM, Sawada Masahiko  
>>>> wrote:
>>>>> On Fri, Apr 24, 2015 at 1:31 AM, Jim Nasby  
>>>>> wrote:
>>>>>> On 4/23/15 11:06 AM, Petr Jelinek wrote:
>>>>>>>
>>>>>>> On 23/04/15 17:45, Bruce Momjian wrote:
>>>>>>>>
>>>>>>>> On Thu, Apr 23, 2015 at 09:45:38AM -0400, Robert Haas wrote:
>>>>>>>> Agreed, no extra file, and the same write volume as currently.  It 
>>>>>>>> would
>>>>>>>> also match pg_clog, which uses two bits per transaction --- maybe we 
>>>>>>>> can
>>>>>>>> reuse some of that code.
>>>>>>>>
>>>>>>>
>>>>>>> Yeah, this approach seems promising. We probably can't reuse code from
>>>>>>> clog because the usage pattern is different (key for clog is xid, while
>>>>>>> for visibility/freeze map ctid is used). But visibility map storage
>>>>>>> layer is pretty simple so it should be easy to extend it for this use.
>>>>>>
>>>>>>
>>>>>> Actually, there may be some bit manipulation functions we could reuse;
>>>>>> things like efficiently counting how many things in a byte are set. 
>>>>>> Probably
>>>>>> doesn't make sense to fully refactor it, but at least CLOG is a good 
>>>>>> source
>>>>>> for cut/paste/whack.
>>>>>>
>>>>>
>>>>> I agree with adding a bit that indicates corresponding page is
>>>>> all-frozen into VM, just like CLOG.
>>>>> I'll change the patch as second version patch.
>>>>>
>>>>
>>>> The second patch is attached.
>>>>
>>>> In second patch, I added a bit that indicates all tuples in page are
>>>> completely frozen into visibility map.
>>>> The visibility map became a bitmap with two bit per heap page:
>>>> all-visible and all-frozen.
>>>> The logics around vacuum, insert/update/delete heap are almost same as
>>>> previous version.
>>>>
>>>> This patch lack some point: documentation, comment in source code,
>>>> etc, so it's WIP patch yet,
>>>> but I think that it's enough to discuss about this.
>>>>
>>>
>>> The previous patch is no longer applied cleanly to HEAD.
>>> The attached v2 patch is latest version.
>>>
>>> Please review it.
>>
>> Attached new rebased version patch.
>> Please give me comments!
>
> Now we should review your design and approach rather than code,
> but since I got an assertion error while trying the patch, I report it.
>
> "initdb -D test -k" caused the following assertion failure.
>
> vacuuming database template1 ... TRAP:
> FailedAssertion("!PageHeader) (heapPage))->pd_flags & 0x0004))",
> File: "visibilitymap.c", Line: 328)
> sh: line 1: 83785 Abort trap: 6
> "/dav/000_add_frozen_bit_into_visibilitymap_v3/bin/postgres" --single
> -F -O -c search_path=pg_catalog -c exit_on_error=true template1 >
> /dev/null
> child process exited with exit code 134
> initdb: removing data directory "test"

Thank you for bug report, and comments.

Fixed version is attached, and source code comment is also updated.
Please review it.

And I explain again here about what this patch does, current design.

- A additional bit for visibility map.
I added additional bit, say all-frozen bit, which indicates whether
the all pages of corresponding page are frozen, to visibility map.
This structure is similar to CLOG.
So the size of VM grew as twice as today.
Also, the flags of each heap page header might be set PD_ALL_FROZEN,
as well as all-visible

- Set and clear a all-frozen bit
Update and delete and insert(multi insert) operation would clear a bit
of that page, and clear flags of page header at same time.
Only vauum operation can set a bit if all tuple of a page are frozen.

- Anti-wrapping vacuum
We have to scan whole table for XID anti-warring today, and it's
really quite expensive because disk I/O.
The main benefit of this proposal is to reduce and avoid such
ex

Re: [HACKERS] Freeze avoidance of very large table.

2015-07-01 Thread Sawada Masahiko
On Thu, May 28, 2015 at 11:34 AM, Sawada Masahiko  wrote:
> On Thu, Apr 30, 2015 at 8:07 PM, Sawada Masahiko  
> wrote:
>> On Fri, Apr 24, 2015 at 11:21 AM, Sawada Masahiko  
>> wrote:
>>> On Fri, Apr 24, 2015 at 1:31 AM, Jim Nasby  wrote:
>>>> On 4/23/15 11:06 AM, Petr Jelinek wrote:
>>>>>
>>>>> On 23/04/15 17:45, Bruce Momjian wrote:
>>>>>>
>>>>>> On Thu, Apr 23, 2015 at 09:45:38AM -0400, Robert Haas wrote:
>>>>>> Agreed, no extra file, and the same write volume as currently.  It would
>>>>>> also match pg_clog, which uses two bits per transaction --- maybe we can
>>>>>> reuse some of that code.
>>>>>>
>>>>>
>>>>> Yeah, this approach seems promising. We probably can't reuse code from
>>>>> clog because the usage pattern is different (key for clog is xid, while
>>>>> for visibility/freeze map ctid is used). But visibility map storage
>>>>> layer is pretty simple so it should be easy to extend it for this use.
>>>>
>>>>
>>>> Actually, there may be some bit manipulation functions we could reuse;
>>>> things like efficiently counting how many things in a byte are set. 
>>>> Probably
>>>> doesn't make sense to fully refactor it, but at least CLOG is a good source
>>>> for cut/paste/whack.
>>>>
>>>
>>> I agree with adding a bit that indicates corresponding page is
>>> all-frozen into VM, just like CLOG.
>>> I'll change the patch as second version patch.
>>>
>>
>> The second patch is attached.
>>
>> In second patch, I added a bit that indicates all tuples in page are
>> completely frozen into visibility map.
>> The visibility map became a bitmap with two bit per heap page:
>> all-visible and all-frozen.
>> The logics around vacuum, insert/update/delete heap are almost same as
>> previous version.
>>
>> This patch lack some point: documentation, comment in source code,
>> etc, so it's WIP patch yet,
>> but I think that it's enough to discuss about this.
>>
>
> The previous patch is no longer applied cleanly to HEAD.
> The attached v2 patch is latest version.
>
> Please review it.

Attached new rebased version patch.
Please give me comments!

Regards,

--
Sawada Masahiko


000_add_frozen_bit_into_visibilitymap_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-07-01 Thread Sawada Masahiko
On Tue, Jun 30, 2015 at 2:40 AM, Josh Berkus  wrote:
> On 06/29/2015 01:01 AM, Michael Paquier wrote:
>
> You're confusing two separate things.  The primary manageability problem
> has nothing to do with altering the parameter.  The main problem is: if
> there is more than one synch candidate, how do we determine *after the
> master dies* which candidate replica was in synch at the time of
> failure?  Currently there is no way to do that.  This proposal plans to,
> effectively, add more synch candidate configurations without addressing
> that core design failure *at all*.  That's why I say that this patch
> decreases overall reliability of the system instead of increasing it.
>
> When I set up synch rep today, I never use more than two candidate synch
> servers because of that very problem.  And even with two I have to check
> replay point because I have no way to tell which replica was in-sync at
> the time of failure.  Even in the current limited feature, this
> significantly reduces the utility of synch rep.  In your proposal, where
> I could have multiple synch rep groups in multiple geos, how on Earth
> could I figure out what to do when the master datacenter dies?

We can have same application name servers today, it's like group.
So there are two problems regarding fail-over:
1. How can we know which group(set) we should use? (group means
application_name here)
2. And how can we decide which a server of that group we should
promote to the next master server?

#1, it's one of the big problem, I think.
I haven't came up with correct solution yet, but we would need to know
which server(group) is the best for promoting
without the running old master server.
For example, improving pg_stat_replication view. or the mediation
process always check each progress of standby.

#2, I guess the best solution is that the DBA can promote any server of group.
That is, DBA always can promote server without considering state of
server of that group.
It's not difficult, always using lowest LSN of a group as group LSN.

>
> BTW, ALTER SYSTEM is a strong reason to use JSON for the synch rep GUC
> (assuming it's one parameter) instead of some custom syntax.  If it's
> JSON, we can validate it in psql, whereas if it's some custom syntax we
> have to wait for the db to reload and fail to figure out that we forgot
> a comma.  Using JSON would also permit us to use jsonb_set and
> jsonb_delete to incrementally change the configuration.

Sounds convenience and flexibility. I agree with this json format
parameter only if we don't combine both quorum and prioritization.
Because of  backward compatibility.
I tend to use json format value and it's new separated GUC parameter.
Anyway, if we use json, I'm imaging parameter values like below.
{
"group1" : {
"quorum" : 1,
"standbys" : [
{
"a" : {
"quorum" : 2,
"standbys" : [
"c", "d"
]
}
},
"b"
]
}
}


> Question: what happens *today* if we have two different synch rep
> strings in two different *.conf files?  I wouldn't assume that anyone
> has tested this ...

We use last defied parameter even if sync rep strings in several file, right?

Regards,

--
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_file_settings view vs. Windows

2015-06-29 Thread Sawada Masahiko
On Mon, Jun 29, 2015 at 11:42 PM, Tom Lane  wrote:
> Sawada Masahiko  writes:
>> On Mon, Jun 29, 2015 at 12:20 AM, Tom Lane  wrote:
>>> So let me make a radical proposal that both gets rid of the portability
>>> problem and, arguably, makes the view more useful than it is today.
>>> I think we should define the view as showing, not what was in the config
>>> file(s) at the last SIGHUP, but what is in the files *now*.  That means
>>> you could use it to validate edits before you attempt to apply them,
>>> rather than having to pull the trigger and then ask if it worked.  And yet
>>> it would remain just about as useful as it is now for post-mortem checks
>>> when a SIGHUP didn't do what you expected.
>
>> You meant that we parse each GUC parameter in configration file, and
>> then pass changeVal=false to set_config_option whenever
>> pg_file_settings is refered.
>> So this view would be used for checking whether currently
>> configuration file is applied successfully or not, right?
>
> Well, it would check whether the current contents of the file could be
> applied successfully.
>
>> The such a feature would be also good, but the main purpose of
>> pg_file_settings was to resolve where each GUC parameter came from,
>> especially in case of using ALTER SYSTEM command.
>> I'm not sure that it would be adequate for our originally purpose.
>
> I'm not following.  Surely pg_settings itself is enough to tell you
> where the currently-applied GUC value came from.
>

Ah yes, it would be enough to accomplish originally purpose.
In order to see current contents of each configuration file, we need
read them whenever pg_file_settings is referred, right?

Regards,

--
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_file_settings view vs. Windows

2015-06-28 Thread Sawada Masahiko
On Mon, Jun 29, 2015 at 12:20 AM, Tom Lane  wrote:
> I wrote:
>> I noticed that in EXEC_BACKEND builds (ie, Windows) the pg_file_settings
>> view doesn't act as its author presumably intended.  Specifically, it
>> reads as empty until/unless the current session processes a SIGHUP event.
>> ...
>> More or less bad alternative answers include:
>> ...
>> 3. Force a SIGHUP processing cycle but don't actually apply any of the
>> values.  This would be pretty messy though, especially if you wanted it
>> to produce results exactly like the normal case so far as detection of
>> incorrect values goes.  I don't think this is a good idea; it's going
>> to be too prone to corner-case bugs.
>
> I had second thoughts about how difficult this might be.  I had forgotten
> that set_config_option already has a changeVal argument that does more or
> less exactly what we need here: if false, it makes all the checks but
> doesn't actually apply the value.
>
> So let me make a radical proposal that both gets rid of the portability
> problem and, arguably, makes the view more useful than it is today.
> I think we should define the view as showing, not what was in the config
> file(s) at the last SIGHUP, but what is in the files *now*.  That means
> you could use it to validate edits before you attempt to apply them,
> rather than having to pull the trigger and then ask if it worked.  And yet
> it would remain just about as useful as it is now for post-mortem checks
> when a SIGHUP didn't do what you expected.
>
> This could be implemented by doing essentially a SIGHUP cycle but passing
> changeVal = false to set_config_option.  Other details would remain mostly
> as in my WIP patch of yesterday.  The temporary context for doing SIGHUP
> work still seems like a good idea, but we could flush it at the end of
> the view's execution rather than needing to hang onto it indefinitely.
>
> The main bug risk here is that there might be GUCs whose apply_hook is
> making validity checks that should have been in a check_hook.  However,
> any such coding is wrong already; and the symptom here would only be that
> the view might fail to point out values that can't be applied, which would
> be annoying but it's hardly catastrophic.
>
> Comments?
>

You meant that we parse each GUC parameter in configration file, and
then pass changeVal=false to set_config_option whenever
pg_file_settings is refered.
So this view would be used for checking whether currently
configuration file is applied successfully or not, right?

The such a feature would be also good, but the main purpose of
pg_file_settings was to resolve where each GUC parameter came from,
especially in case of using ALTER SYSTEM command.
I'm not sure that it would be adequate for our originally purpose.

Regards,

--
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Semantics of pg_file_settings view

2015-06-28 Thread Sawada Masahiko
On Mon, Jun 29, 2015 at 12:01 AM, Tom Lane  wrote:
> Sawada Masahiko  writes:
>> On Sun, Jun 28, 2015 at 12:47 AM, Tom Lane  wrote:
>>> However there's a further tweak to the view that I'd like to think about
>>> making.  Once this is in and we make the originally-discussed change to
>>> suppress application of duplicated GUC entries, it would be possible to
>>> mark the ignored entries in the view, which would save people the effort
>>> of figuring out manually which ones were ignored.  The question is exactly
>>> how to mark the ignored entries.  A simple tweak would be to put something
>>> in the "error" column, say "ignored because of later duplicate entry".
>>> However, this would break the property that an entry in the error column
>>> means there's something you'd better fix, which I think would be a useful
>>> rule for nagios-like monitoring tools.
>>>
>>> Another issue with the API as it stands here is that you can't easily
>>> distinguish errors that caused the entire config file to be ignored from
>>> those that only prevented application of one setting.
>>>
>>> The best idea I have at the moment is to also add a boolean "applied"
>>> column which is true if the entry was successfully applied.  Errors that
>>> result in the whole file being ignored would mean that *all* the entries
>>> show applied = false.  Otherwise, applied = false with nothing in the
>>> error column would imply that the entry was ignored due to a later
>>> duplicate.  There are multiple other ways it could be done of course;
>>> anyone want to lobby for some other design?
>
>> I think that we can find applied value by arranging
>> pg_file_settings.seqno ascending order.
>> The value which has highest seqno is the currently applied value, and
>> it's default value if pg_file_settings does not have that entry.
>
> Yeah, I realize that it's *possible* to get this information out of the
> view as it stands.  But it's not especially *convenient*.  And since this
> seems like the main if not only use-case for the view (at least prior to
> the addition of error information), I don't see why we insist on making it
> difficult for users to identify ignored entries.

Yep, I think that it will have enough information by adding additional
information of WIP patch.

>> (errcode(ERRCODE_CANT_CHANGE_RUNTIME_PARAM),
>>  errmsg("parameter \"%s\"
>> cannot be changed without restarting the server",
>> +   item->errmsg = pstrdup("parameter cannot be
>> changed without restarting the server");
>> error = true;
>> continue;
>
>> Also, the above hunk is wrong, because the item variable is wobbly and
>> the correspond line is not exists here.
>
> er ... what?
>

Sorry for confusing you.
Anyway I meant that I got SEGV after applied WIP patch, and the cause
is the above changes.
The case is following.
1. Add "port = 6543" to postgresql.conf and restart server.
2. Remove that added line from postgresql.conf
3. Reload.
Got SEGV.

Regards,

--
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-06-28 Thread Sawada Masahiko
On Sat, Jun 27, 2015 at 3:53 AM, Josh Berkus  wrote:
> On 06/26/2015 11:32 AM, Robert Haas wrote:
>> I think your proposal is worth considering, but you would need to fill
>> in a lot more details and explain how it works in detail, rather than
>> just via a set of example function calls.  The GUC-based syntax
>> proposal covers cases like multi-level rules and, now, prioritization,
>> and it's not clear how those would be reflected in what you propose.
>
> So what I'm seeing from the current proposal is:
>
> 1. we have several defined synchronous sets
> 2. each set requires a quorum of k  (defined per set)
> 3. within each set, replicas are arranged in priority order.
>
> One thing which the proposal does not implement is *names* for
> synchronous sets.  I would also suggest that if I lose this battle and
> we decide to go with a single stringy GUC, that we at least use JSON
> instead of defining our out, proprietary, syntax?

JSON would be more flexible for making synchronous set, but it will
make us to change how to parse configuration file to enable a value
contains newline.

> Point 3. also seems kind of vaguely defined.  Are we still relying on
> the idea that multiple servers have the same application_name to make
> them equal, and that anything else is a proritization?  That is, if we have:

Yep, I guess that the same application name servers have same
priority, and the servers in same set have same priority.
(The set means here that bunch of application name in GUC).

> replica1: appname=group1
> replica2: appname=group2
> replica3: appname=group1
> replica4: appname=group2
> replica5: appname=group1
> replica6: appname=group2
>
> And the definition:
>
> synchset: A
> quorum: 2
> members: [ group1, group2 ]
>
> Then the desired behavior would be: we must get acks from at least 2
> servers in group1, but if group1 isn't responding, then from group2?

In this case, If we want to use quorum commit (i.g., all replica have
same priority),
I guess that we must get ack from 2 *elements* in listed (both group1
and group2).
If quorumm = 1, we must get ack from either group1 or group2.

> What if *one* server in group1 responds?  What do we do?  Do we fail the
> whole group and try for 2 out of 3 in group2?  Or do we only need one in
> group2?  In which case, what prioritization is there?  Who could
> possibly use anything so complex?

If some servers have same application name, the master server will get
each different ack(write, flush LSN) from
same application name servers. We can use the lowest LSN of them to
release backend waiters, for more safety.
But if only one server in group1 returns ack to the master server, and
other two servers are not working,
I guess the master server can use it because other servers is invalid server.
That is, we must get ack at least 1 from each group1 and group2.

> I'm personally not convinced that quorum and prioritization are
> compatible.  I suggest instead that quorum and prioritization should be
> exclusive alternatives, that is that a synch set should be either a
> quorum set (with all members as equals) or a prioritization set (if rep1
> fails, try rep2).  I can imagine use cases for either mode, but not one
> which would involve doing both together.
>

Yep, separating the GUC parameter between prioritization and quorum
could be also good idea.

Also I think that we must enable us to decide which server we should
promote when the master server is down.

Regards,

--
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-06-28 Thread Sawada Masahiko
On Fri, Jun 26, 2015 at 2:46 PM, Michael Paquier
 wrote:
> On Thu, Jun 25, 2015 at 8:32 PM, Simon Riggs  wrote:
>> Let's start with a complex, fully described use case then work out how to
>> specify what we want.
>
> Well, one of the most simple cases where quorum commit and this
> feature would be useful for is that, with 2 data centers:
> - on center 1, master A and standby B
> - on center 2, standby C and standby D
> With the current synchronous_standby_names, what we can do now is
> ensuring that one node has acknowledged the commit of master. For
> example synchronous_standby_names = 'B,C,D'. But you know that :)
> What this feature would allow use to do is for example being able to
> ensure that a node on the data center 2 has acknowledged the commit of
> master, meaning that even if data center 1 completely lost for a
> reason or another we have at least one node on center 2 that has lost
> no data at transaction commit.
>
> Now, regarding the way to express that, we need to use a concept of
> node group for each element of synchronous_standby_names. A group
> contains a set of elements, each element being a group or a single
> node. And for each group we need to know three things when a commit
> needs to be acknowledged:
> - Does my group need to acknowledge the commit?
> - If yes, how many elements in my group need to acknowledge it?
> - Does the order of my elements matter?
>
> That's where the micro-language idea makes sense to use. For example,
> we can define a group using separators and like (elt1,...eltN) or
> [elt1,elt2,eltN]. Appending a number in front of a group is essential
> as well for quorum commits. Hence for example, assuming that '()' is
> used for a group whose element order does not matter, if we use that:
> - k(elt1,elt2,eltN) means that we need for the k elements in the set
> to return true (aka commit confirmation).
> - k[elt1,elt2,eltN] means that we need for the first k elements in the
> set to return true.
>
> When k is not defined for a group, k = 1. Using only elements
> separated by commas for the upper group means that we wait for the
> first element in the set (for backward compatibility), hence:
> 1(elt1,elt2,eltN) <=> elt1,elt2,eltN
>

I think that you meant "1[elt1,elt2,eltN] <=> elt1,elt2,eltN" in this
case (for backward compatibility), right?

Regards,

--
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Semantics of pg_file_settings view

2015-06-27 Thread Sawada Masahiko
1 | 
> default_text_search_config | pg_catalog.english |
>  /home/postgres/testversion/data/conf.d/foo.conf |  1 |12 | 
> work_mem   | bogus  |
>  /home/postgres/testversion/data/postgresql.conf |628 |13 | bad   
>  | worse  | unrecognized configuration 
> parameter
> (13 rows)
>
> ISTM that this represents a quantum jump in the usefulness of the
> pg_file_settings view for its ostensible purpose of diagnosing config-file
> problems, so I would like to push forward with getting it done even though
> we're on the verge of 9.5alpha1.
>
> However there's a further tweak to the view that I'd like to think about
> making.  Once this is in and we make the originally-discussed change to
> suppress application of duplicated GUC entries, it would be possible to
> mark the ignored entries in the view, which would save people the effort
> of figuring out manually which ones were ignored.  The question is exactly
> how to mark the ignored entries.  A simple tweak would be to put something
> in the "error" column, say "ignored because of later duplicate entry".
> However, this would break the property that an entry in the error column
> means there's something you'd better fix, which I think would be a useful
> rule for nagios-like monitoring tools.
>
> Another issue with the API as it stands here is that you can't easily
> distinguish errors that caused the entire config file to be ignored from
> those that only prevented application of one setting.
>
> The best idea I have at the moment is to also add a boolean "applied"
> column which is true if the entry was successfully applied.  Errors that
> result in the whole file being ignored would mean that *all* the entries
> show applied = false.  Otherwise, applied = false with nothing in the
> error column would imply that the entry was ignored due to a later
> duplicate.  There are multiple other ways it could be done of course;
> anyone want to lobby for some other design?
>
> There is more that could be done with this basic idea; in particular,
> it would be useful if set_config failures would be broken down in more
> detail than "setting could not be applied".  That would require somewhat
> invasive refactoring though, and it would only be an incremental
> improvement in usability, so I'm disinclined to tackle the point under
> time pressure.
>
> Comments, better ideas?  Barring strong objections I'd like to get this
> finished over the weekend.
>

I think that we can find applied value by arranging
pg_file_settings.seqno ascending order.
The value which has highest seqno is the currently applied value, and
it's default value if pg_file_settings does not have that entry.
Because of above reason, I think it's enough to mark which entry was
not applied due to contain error in its config file rather than
marking which entry was applied.
For example, the 'ignored' column of the ignored parameter due to
contain the error in config file is true, OTOH, the ignored parameter
due to duplicate later is false.
Though?


@@ -289,6 +321,7 @@ ProcessConfigFile(GucContext context)

(errcode(ERRCODE_CANT_CHANGE_RUNTIME_PARAM),
 errmsg("parameter \"%s\"
cannot be changed without restarting the server",
+   item->errmsg = pstrdup("parameter cannot be
changed without restarting the server");
error = true;
continue;

Also, the above hunk is wrong, because the item variable is wobbly and
the correspond line is not exists here.
If we restart after removing a line to use default value which context
is SIGHUP(e,g, port), it leads to occur SEGV.

Regards,

--
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_file_settings view vs. Windows

2015-06-27 Thread Sawada Masahiko
On Sun, Jun 28, 2015 at 10:40 AM, Tom Lane  wrote:
> Michael Paquier  writes:
>> On Sun, Jun 28, 2015 at 8:20 AM, Tom Lane  wrote:
>>> Yeah, exactly.  Unfortunately I see no way to add a useful test, at least
>>> not one that will work in installcheck mode.  There's no way to predict
>>> what will be in the view in that case.  Even for "make check", the output
>>> would be pretty darn environment-dependent.
>
>> And also because this patch had no review input regarding Windows and
>> EXEC_BACKEND. I would suggest pinging the author (just did so),
>> waiting for a fix a bit, and move on with 4. if nothing happens.
>
> Well, mumble.  After playing with this for a bit, I'm fairly convinced
> that it offers useful functionality, especially with the error-reporting
> additions I've proposed.  Right now, there is no easy way to tell whether
> a SIGHUP has worked, or why not if not, unless you have access to the
> postmaster log.  So I think there's definite usefulness here for
> remote-administration scenarios.
>
> So I kinda think that alternative 1 (document the Windows deficiency)
> is better than having no such functionality at all.  Obviously a proper
> fix would be better yet, but that's something that could be rolled in
> later.
>
>> We usually require that a patch includes support for Windows as a
>> requirement (see for example discussions about why pg_fincore in not a
>> contrib module even if it overlaps a bit with pg_prewarm), why would
>> this patch have a different treatment?
>
> Agreed, but it was evidently not obvious to anyone that there was a
> portability issue in this code, else we'd have resolved the issue
> before it got committed.  As a thought experiment, what would happen
> if we'd not noticed this issue till post-release, which is certainly
> not implausible?
>
> Also, there are multiple pre-existing minor bugs (the leakage problem
> I mentioned earlier, and some other things I've found while hacking
> on the view patch) that we would have to deal with in some other
> way if we revert now.  I'd just as soon not detangle that.
>

Thank you for bug report.

I have not came up with portable idea yet, but I will deal with this
problem immediately.
If I couldn't come up with better solution, I'd like to propose #1 idea.
But it would be unavoidable to be revert it if there are any reason
for Windows support.

Regards,

--
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-06-25 Thread Sawada Masahiko
On Thu, Jun 25, 2015 at 7:32 AM, Simon Riggs  wrote:
> On 25 June 2015 at 05:01, Michael Paquier  wrote:
>>
>> On Thu, Jun 25, 2015 at 12:57 PM, Fujii Masao wrote:
>> > On Thu, Jun 25, 2015 at 12:15 PM, Michael Paquier wrote:
>> >> and that's actually equivalent to that in
>> >> the grammar: 1(AAA,BBB,CCC).
>> >
>> > I don't think that they are the same. In the case of 1(AAA,BBB,CCC),
>> > while
>> > two servers AAA and BBB are running, the master server may return a
>> > success
>> > of the transaction to the client just after it receives the ACK from
>> > BBB.
>> > OTOH, in the case of AAA,BBB, that never happens. The master must wait
>> > for
>> > the ACK from AAA to arrive before completing the transaction. And then,
>> > if AAA goes down, BBB should become synchronous standby.
>>
>> Ah. Right. I missed your point, that's a bad day... We could have
>> multiple separators to define group types then:
>> - "()" where the order of acknowledgement does not matter
>> - "[]" where it does not.
>> You would find the old grammar with:
>> 1[AAA,BBB,CCC]
>
> Let's start with a complex, fully described use case then work out how to
> specify what we want.
>
> I'm nervous of "it would be good ifs" because we do a ton of work only to
> find a design flaw.
>

I'm not sure specific implementation yet, but I came up with solution
for this case.

For example,
- s_s_name = '1(a, b), c, d'
The priority of both 'a' and 'b' are 1, and 'c' is 2, 'd' is 3.
i.g, 'b' and 'c' are potential sync node, and the quorum commit is
enable only between 'a' and 'b'.

- s_s_name = 'a, 1(b,c), d'
priority of 'a' is 1, 'b' and 'c' are 2, 'd' is 3.
So the quorum commit with 'b' and 'c' will be enabled after 'a' down.

With this idea, I think that we could use conventional syntax as in the past.
Though?

Regards,

--
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN function of pageinspect has inconsistency data type.

2015-06-19 Thread Sawada Masahiko
On Wed, Jun 17, 2015 at 4:11 PM, Jim Nasby  wrote:
> On 6/16/15 8:26 AM, Sawada Masahiko wrote:
>>
>> I noticed while using gin function of pageinspect that there are some
>> inconsistency data types.
>> For example, data type of GinMetaPageData.head, and tail  is
>> BlockNumber, i.g, uint32.
>> But in ginfunc.c, we get that value as int64.
>
>
> You can't put a uint32 into a signed int32 (which is what the SQL int is).
> That's why it's returning a bigint.

I understood why it's returning a bigint.
Thanks!

Regards,

--
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] GIN function of pageinspect has inconsistency data type.

2015-06-16 Thread Sawada Masahiko
Hi all,

I noticed while using gin function of pageinspect that there are some
inconsistency data types.
For example, data type of GinMetaPageData.head, and tail  is
BlockNumber, i.g, uint32.
But in ginfunc.c, we get that value as int64.

So I think the output is odd a little. Is it intentional?

- Before
postgres(1)=# select pending_head, pending_tail from
gin_metapage_info(get_raw_page('trgm_idx'::text, 0));
 pending_head | pending_tail
+--
   4294967295  |   4294967295
(1 row)

- After attaching patch
postgres(1)=# select pending_head, pending_tail from
gin_metapage_info(get_raw_page('trgm_idx'::text, 0));
 pending_head | pending_tail
+--
   -1   |   -1

There are some similar cases except above, so the fixing patch around
ginfunc.c is attached.

Regards,

---
Sawada Masahiko


fix_pageinspect_inconsistency_data_type.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-05-27 Thread Sawada Masahiko
On Thu, Apr 30, 2015 at 8:07 PM, Sawada Masahiko  wrote:
> On Fri, Apr 24, 2015 at 11:21 AM, Sawada Masahiko  
> wrote:
>> On Fri, Apr 24, 2015 at 1:31 AM, Jim Nasby  wrote:
>>> On 4/23/15 11:06 AM, Petr Jelinek wrote:
>>>>
>>>> On 23/04/15 17:45, Bruce Momjian wrote:
>>>>>
>>>>> On Thu, Apr 23, 2015 at 09:45:38AM -0400, Robert Haas wrote:
>>>>> Agreed, no extra file, and the same write volume as currently.  It would
>>>>> also match pg_clog, which uses two bits per transaction --- maybe we can
>>>>> reuse some of that code.
>>>>>
>>>>
>>>> Yeah, this approach seems promising. We probably can't reuse code from
>>>> clog because the usage pattern is different (key for clog is xid, while
>>>> for visibility/freeze map ctid is used). But visibility map storage
>>>> layer is pretty simple so it should be easy to extend it for this use.
>>>
>>>
>>> Actually, there may be some bit manipulation functions we could reuse;
>>> things like efficiently counting how many things in a byte are set. Probably
>>> doesn't make sense to fully refactor it, but at least CLOG is a good source
>>> for cut/paste/whack.
>>>
>>
>> I agree with adding a bit that indicates corresponding page is
>> all-frozen into VM, just like CLOG.
>> I'll change the patch as second version patch.
>>
>
> The second patch is attached.
>
> In second patch, I added a bit that indicates all tuples in page are
> completely frozen into visibility map.
> The visibility map became a bitmap with two bit per heap page:
> all-visible and all-frozen.
> The logics around vacuum, insert/update/delete heap are almost same as
> previous version.
>
> This patch lack some point: documentation, comment in source code,
> etc, so it's WIP patch yet,
> but I think that it's enough to discuss about this.
>

The previous patch is no longer applied cleanly to HEAD.
The attached v2 patch is latest version.

Please review it.

Regards,

---
Sawada Masahiko
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index caacc10..fcbf06a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -88,7 +88,8 @@ static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
 static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
 Buffer newbuf, HeapTuple oldtup,
 HeapTuple newtup, HeapTuple old_key_tup,
-bool all_visible_cleared, bool new_all_visible_cleared);
+bool all_visible_cleared, bool new_all_visible_cleared,
+bool all_frozen_cleared, bool new_all_frozen_cleared);
 static void HeapSatisfiesHOTandKeyUpdate(Relation relation,
 			 Bitmapset *hot_attrs,
 			 Bitmapset *key_attrs, Bitmapset *id_attrs,
@@ -2107,7 +2108,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	HeapTuple	heaptup;
 	Buffer		buffer;
 	Buffer		vmbuffer = InvalidBuffer;
-	bool		all_visible_cleared = false;
+	bool		all_visible_cleared = false,
+all_frozen_cleared = false;
 
 	/*
 	 * Fill in tuple header fields, assign an OID, and toast the tuple if
@@ -2131,8 +2133,9 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	CheckForSerializableConflictIn(relation, NULL, InvalidBuffer);
 
 	/*
-	 * Find buffer to insert this tuple into.  If the page is all visible,
-	 * this will also pin the requisite visibility map page.
+	 * Find buffer to insert this tuple into.  If the page is all visible
+	 * of all frozen, this will also pin the requisite visibility map and
+	 * frozen map page.
 	 */
 	buffer = RelationGetBufferForTuple(relation, heaptup->t_len,
 	   InvalidBuffer, options, bistate,
@@ -2150,7 +2153,16 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		PageClearAllVisible(BufferGetPage(buffer));
 		visibilitymap_clear(relation,
 			ItemPointerGetBlockNumber(&(heaptup->t_self)),
-			vmbuffer);
+			vmbuffer, VISIBILITYMAP_ALL_VISIBLE);
+	}
+
+	if (PageIsAllFrozen(BufferGetPage(buffer)))
+	{
+		all_frozen_cleared = true;
+		PageClearAllFrozen(BufferGetPage(buffer));
+		visibilitymap_clear(relation,
+			ItemPointerGetBlockNumber(&(heaptup->t_self)),
+			vmbuffer, VISIBILITYMAP_ALL_FROZEN);
 	}
 
 	/*
@@ -2199,6 +2211,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xlrec.flags = 0;
 		if (all_visible_cleared)
 			xlrec.flags |= XLH_INSERT_ALL_VISIBLE_CLEARED;
+		if (all_frozen_cleared)
+			xlrec.flags |= XLH_INSERT_ALL_FROZEN_CLEARED;
 		if (options & HEAP_INSERT_SPECULATIVE)
 			xlrec.flags |= XLH_INSERT_IS_SPECULATIVE;
 		Assert(ItemPointerGetBlockNumber(&heaptup->t_self) == BufferGetBlockNumber(buffer));
@@ -2406,7 +2420,

Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-05-16 Thread Sawada Masahiko
On Fri, May 15, 2015 at 9:18 PM, Michael Paquier
 wrote:
> On Fri, May 15, 2015 at 8:55 PM, Beena Emerson  
> wrote:
>> There was a discussion on support for N synchronous standby servers started
>> by Michael. Refer
>> http://archives.postgresql.org/message-id/cab7npqr9c84ig0zuvhmqamq53vqsd4rc82vyci4dr27pvof...@mail.gmail.com
>> . The use of hooks and dedicated language was suggested, however, it seemed
>> to be an overkill for the scenario and there was no consensus on this.
>> Exploring GUC-land was preferred.
>
> Cool.
>
>> Please find attached a patch,  built on Michael's patch from above mentioned
>> thread, which supports choosing different number of nodes from each set i.e.
>> k nodes from set 1, l nodes from set 2, so on.
>> The format of synchronous_standby_names has been updated to standby name
>> followed by the required count separated by hyphen. Ex: 'aa-1, bb-3'.  The
>> transaction waits for all the specified number of standby in each group. Any
>> extra nodes with the same name will be considered potential. The special
>> entry * for the standby name is also supported.
>
> I don't think that this is going in the good direction, what was
> suggested mainly by Robert was to use a micro-language that would
> allow far more extensibility that what you are proposing. See for
> example ca+tgmobpwoenmmepfx0jwrvqufxvbqrv26ezq_xhk21gxrx...@mail.gmail.com
> for some ideas. IMO, before writing any patch in this area we should
> find a clear consensus on what we want to do. Also, unrelated to this
> patch, we should really get first the patch implementing the... Hum...
> infrastructure for regression tests regarding replication and
> archiving to be able to have actual tests for this feature (working on
> it for next CF).

The dedicated language for multiple sync replication would be more
extensibility as you said, but I think there are not a lot of user who
want to or should use this.
IMHO such a dedicated extensible feature could be extension module,
i.g. contrib. And we could implement more simpler feature into
PostgreSQL core with some restriction.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-05-14 Thread Sawada Masahiko
On Thu, May 14, 2015 at 9:58 AM, Robert Haas  wrote:
> On Wed, May 13, 2015 at 8:25 PM, Sawada Masahiko  
> wrote:
>> The v15 patch emits a line for each table when reindexing multiple
>> tables, and emits a line for each index when reindexing single table.
>> But v14 patch emits a line for each index, regardless of reindex target.
>> Should I change back to v14 patch?
>
> Uh, maybe.  What made you change it?
>

I thought that the users who want to reindex multiple tables are
interested in the time  to reindex whole table takes.
But I think it seems sensible to emit a line for each index even when
reindex multiple tables.
The v16 patch is based on v14 and a few modified is attached.

Regards,

---
Sawada Masahiko
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 998340c..703b760 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  
 
-REINDEX { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } name
+REINDEX [ ( { VERBOSE } [, ...] ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } name
 
  
 
@@ -150,6 +150,15 @@ REINDEX { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } rd_rel->relpersistence;
 	index_close(irel, NoLock);
 
-	reindex_index(indOid, false, persistence);
+	reindex_index(indOid, false, persistence, options);
 
 	return indOid;
 }
@@ -1775,7 +1775,7 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
  *		Recreate all indexes of a table (and of its toast table, if any)
  */
 Oid
-ReindexTable(RangeVar *relation)
+ReindexTable(RangeVar *relation, int options)
 {
 	Oid			heapOid;
 
@@ -1785,7 +1785,8 @@ ReindexTable(RangeVar *relation)
 
 	if (!reindex_relation(heapOid,
 		  REINDEX_REL_PROCESS_TOAST |
-		  REINDEX_REL_CHECK_CONSTRAINTS))
+		  REINDEX_REL_CHECK_CONSTRAINTS,
+		  options))
 		ereport(NOTICE,
 (errmsg("table \"%s\" has no indexes",
 		relation->relname)));
@@ -1802,7 +1803,8 @@ ReindexTable(RangeVar *relation)
  * That means this must not be called within a user transaction block!
  */
 void
-ReindexMultipleTables(const char *objectName, ReindexObjectType objectKind)
+ReindexMultipleTables(const char *objectName, ReindexObjectType objectKind,
+	  int options)
 {
 	Oid			objectOid;
 	Relation	relationRelation;
@@ -1938,11 +1940,14 @@ ReindexMultipleTables(const char *objectName, ReindexObjectType objectKind)
 		PushActiveSnapshot(GetTransactionSnapshot());
 		if (reindex_relation(relid,
 			 REINDEX_REL_PROCESS_TOAST |
-			 REINDEX_REL_CHECK_CONSTRAINTS))
-			ereport(DEBUG1,
-	(errmsg("table \"%s.%s\" was reindexed",
-			get_namespace_name(get_rel_namespace(relid)),
-			get_rel_name(relid;
+			 REINDEX_REL_CHECK_CONSTRAINTS,
+			 options))
+
+			if (options & REINDEXOPT_VERBOSE)
+ereport(INFO,
+		(errmsg("table \"%s.%s\" was reindexed",
+get_namespace_name(get_rel_namespace(relid)),
+get_rel_name(relid;
 		PopActiveSnapshot();
 		CommitTransactionCommand();
 	}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 0a6b069..33ea387 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1234,7 +1234,7 @@ ExecuteTruncate(TruncateStmt *stmt)
 			/*
 			 * Reconstruct the indexes to match, and we're done.
 			 */
-			reindex_relation(heap_relid, REINDEX_REL_PROCESS_TOAST);
+			reindex_relation(heap_relid, REINDEX_REL_PROCESS_TOAST, 0);
 		}
 
 		pgstat_count_truncate(rel);
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 76b63af..25839ee 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3856,6 +3856,7 @@ _copyReindexStmt(const ReindexStmt *from)
 	COPY_SCALAR_FIELD(kind);
 	COPY_NODE_FIELD(relation);
 	COPY_STRING_FIELD(name);
+	COPY_SCALAR_FIELD(options);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index e032142..c4b3615 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1935,6 +1935,7 @@ _equalReindexStmt(const ReindexStmt *a, const ReindexStmt *b)
 	COMPARE_SCALAR_FIELD(kind);
 	COMPARE_NODE_FIELD(relation);
 	COMPARE_STRING_FIELD(name);
+	COMPARE_SCALAR_FIELD(options);
 
 	return true;
 }
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index e71d926..2dce878 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -463,6 +463,10 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type 	explain_option_arg
 %type 	explain_option_elem
 %type 	explain_option_list
+
+%type 	reindex_target_type reindex_target_multitable
+%type 	reindex_option_list reindex_option_elem
+
 %type 	copy_generic_opt_arg copy_generic_opt_arg_list_item
 %type 	copy_generic_opt_elem
 %type 	copy_generic_opt_list copy_gener

Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-05-13 Thread Sawada Masahiko
On Thu, May 14, 2015 at 4:53 AM, Robert Haas  wrote:
> On Wed, May 13, 2015 at 2:10 PM, Alvaro Herrera
>  wrote:
>> Uh, are we really using INFO to log this?  I thought that was a
>> discouraged level to use anymore.  Why not NOTICE?
>
> Well, as Masahiko-san points out, VACUUM uses INFO.  I can't see any
> good reason to make this different.
>
>> Also, when multiple tables are reindexed, do we emit lines for each
>> index, or only for each table?  If we're going to emit a line for each
>> index in the single-table mode, it seems more sensible to do the same
>> for the multi-table forms also.
>
> Hmm, yeah, I agree with that.  I thought the patch worked that way,
> but I see now that it doesn't.  I think that should be changed.
>

The v15 patch emits a line for each table when reindexing multiple
tables, and emits a line for each index when reindexing single table.
But v14 patch emits a line for each index, regardless of reindex target.
Should I change back to v14 patch?

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-05-13 Thread Sawada Masahiko
On Thu, May 14, 2015 at 3:10 AM, Alvaro Herrera
 wrote:
> Uh, are we really using INFO to log this?  I thought that was a
> discouraged level to use anymore.  Why not NOTICE?
>

I think it should be INFO level because it is a information of REINDEX
command,such as progress of itself, CPU usage and so on. it would be
overkill if we output the logs as NOTICE level, and verbose outputs of
other maintenance command are emitted as INFO level.

> Also, when multiple tables are reindexed, do we emit lines for each
> index, or only for each table?  If we're going to emit a line for each
> index in the single-table mode, it seems more sensible to do the same
> for the multi-table forms also.
>

I agree that we emit lines for each table when we do reindex multiple tables.
The latest patch is attached.


Regards,

---
Sawada Masahiko


000_reindex_verbose_v15.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-05-13 Thread Sawada Masahiko
On Thu, May 14, 2015 at 12:28 AM, Robert Haas  wrote:
> On Thu, May 7, 2015 at 6:55 PM, Sawada Masahiko  wrote:
>> Sorry, I forgot attach files.
>
> Review comments:
>
> - Customarily we use int, rather than uint8, for flags variables.  I
> think we should do that here also.
>
> - reindex_index() emits a log message either way, but at DEBUG2 level
> without VERBOSE and at the INFO level with it.  I think we should skip
> it altogether without VERBOSE.  i.e. if (options & REINDEXOPT_VERBOSE)
> ereport(...)
>
> - The errmsg() in that function should not end with a period.
>
> - The 000 patch makes a pointless whitespace change to tab-complete.c.
>
> - Instead of an enumerated type (ReindexOption) just use #define to
> define the flag value.
>
> Apart from those fairly minor issues, I think this looks pretty solid.
>

Thank you for reviwing..
All fixed.

Regards,

---
Sawada Masahiko


000_reindex_verbose_v14.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Typo in reindexdb documentation

2015-05-09 Thread Sawada Masahiko
Hi all,

Attached patch fixes the typo is in reindexdb documentation regarding
REINDEX SCHEMA.

Regards,

---
Sawada Masahiko
diff --git a/doc/src/sgml/ref/reindexdb.sgml b/doc/src/sgml/ref/reindexdb.sgml
index b5b449c..a745f6c 100644
--- a/doc/src/sgml/ref/reindexdb.sgml
+++ b/doc/src/sgml/ref/reindexdb.sgml
@@ -30,7 +30,7 @@ PostgreSQL documentation
   --schema
   -S
  
- table
+ schema
 

 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-05-09 Thread Sawada Masahiko
On Sun, May 10, 2015 at 2:23 AM, Sawada Masahiko  wrote:
> On Sat, May 9, 2015 at 4:26 AM, Fabrízio de Royes Mello
>  wrote:
>>
>>
>> On Fri, May 8, 2015 at 4:23 PM, Fabrízio de Royes Mello
>>  wrote:
>>>
>>>
>>> On Thu, May 7, 2015 at 7:55 PM, Sawada Masahiko 
>>> wrote:
>>> >
>>> > On 5/7/15, Sawada Masahiko  wrote:
>>> > > On Wed, May 6, 2015 at 5:42 AM, Robert Haas >> > > > wrote:
>>> > >> On Tue, May 5, 2015 at 11:10 AM, Sawada Masahiko
>>> > >> >> > > > wrote:
>>> > >>> On Fri, May 1, 2015 at 9:04 PM, Robert Haas >> > > > wrote:
>>> > >>>> On Thu, Apr 30, 2015 at 11:05 PM, Sawada Masahiko
>>> > >>>> >> > > > wrote:
>>> > >>>>> VACUUM has both syntax: with parentheses and without parentheses.
>>> > >>>>> I think we should have both syntax for REINDEX like VACUUM does
>>> > >>>>> because it would be pain to put parentheses whenever we want to do
>>> > >>>>> REINDEX.
>>> > >>>>> Are the parentheses optional in REINDEX command?
>>> > >>>>
>>> > >>>> No.  The unparenthesized VACUUM syntax was added back before we
>>> > >>>> realized that that kind of syntax is a terrible idea.  It requires
>>> > >>>> every option to be a keyword, and those keywords have to be in a
>>> > >>>> fixed
>>> > >>>> order.  I believe the intention is to keep the old VACUUM syntax
>>> > >>>> around for backward-compatibility, but not to extend it.  Same for
>>> > >>>> EXPLAIN and COPY.
>>> > >>>
>>> > >>> REINDEX will have only one option VERBOSE for now.
>>> > >>> Even we're in a situation like that it's not clear to be added newly
>>> > >>> additional option to REINDEX now, we should need to put parenthesis?
>>> > >>
>>> > >> In my opinion, yes.  The whole point of a flexible options syntax is
>>> > >> that we can add new options without changing the grammar.  That
>>> > >> involves some compromise on the syntax, which doesn't bother me a
>>> > >> bit.
>>> > >> Our previous experiments with this for EXPLAIN and COPY and VACUUM
>>> > >> have worked out quite well, and I see no reason for pessimism here.
>>> > >
>>> > > I agree that flexible option syntax does not need to change grammar
>>> > > whenever we add new options.
>>> > > Attached patch is changed based on your suggestion.
>>> > > And the patch for reindexdb is also attached.
>>> > > Please feedbacks.
>>> > >
>>> > >>> Also I'm not sure that both implementation and documentation
>>> > >>> regarding
>>> > >>> VERBOSE option should be optional.
>>> > >>
>>> > >> I don't know what this means.
>>> > >>
>>> > >
>>> > > Sorry for confusing you.
>>> > > Please ignore this.
>>> > >
>>> >
>>> > Sorry, I forgot attach files.
>>> >
>>>
>>> I applied the two patches to master and I got some errors when compile:
>>>
>>> tab-complete.c: In function ‘psql_completion’:
>>> tab-complete.c:3338:12: warning: left-hand operand of comma expression has
>>> no effect [-Wunused-value]
>>> {"TABLE", "INDEX", "SYSTEM", "SCHEMA", "DATABASE", NULL};
>>> ^
>>> tab-complete.c:3338:21: warning: left-hand operand of comma expression has
>>> no effect [-Wunused-value]
>>> {"TABLE", "INDEX", "SYSTEM", "SCHEMA", "DATABASE", NULL};
>>>  ^
>>> tab-complete.c:3338:31: warning: left-hand operand of comma expression has
>>> no effect [-Wunused-value]
>>> {"TABLE", "INDEX", "SYSTEM", "SCHEMA", "DATABASE", NULL};
>>>^
>>> tab-complete.c:3338:41: warning: left-hand operand of comma expression has
>>> no effect [-Wunused-value]
>>> {"TABLE", "INDEX", "SYSTEM", &qu

Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-05-09 Thread Sawada Masahiko
On Sat, May 9, 2015 at 4:26 AM, Fabrízio de Royes Mello
 wrote:
>
>
> On Fri, May 8, 2015 at 4:23 PM, Fabrízio de Royes Mello
>  wrote:
>>
>>
>> On Thu, May 7, 2015 at 7:55 PM, Sawada Masahiko 
>> wrote:
>> >
>> > On 5/7/15, Sawada Masahiko  wrote:
>> > > On Wed, May 6, 2015 at 5:42 AM, Robert Haas > > > > wrote:
>> > >> On Tue, May 5, 2015 at 11:10 AM, Sawada Masahiko
>> > >> > > > > wrote:
>> > >>> On Fri, May 1, 2015 at 9:04 PM, Robert Haas > > > > wrote:
>> > >>>> On Thu, Apr 30, 2015 at 11:05 PM, Sawada Masahiko
>> > >>>> > > > > wrote:
>> > >>>>> VACUUM has both syntax: with parentheses and without parentheses.
>> > >>>>> I think we should have both syntax for REINDEX like VACUUM does
>> > >>>>> because it would be pain to put parentheses whenever we want to do
>> > >>>>> REINDEX.
>> > >>>>> Are the parentheses optional in REINDEX command?
>> > >>>>
>> > >>>> No.  The unparenthesized VACUUM syntax was added back before we
>> > >>>> realized that that kind of syntax is a terrible idea.  It requires
>> > >>>> every option to be a keyword, and those keywords have to be in a
>> > >>>> fixed
>> > >>>> order.  I believe the intention is to keep the old VACUUM syntax
>> > >>>> around for backward-compatibility, but not to extend it.  Same for
>> > >>>> EXPLAIN and COPY.
>> > >>>
>> > >>> REINDEX will have only one option VERBOSE for now.
>> > >>> Even we're in a situation like that it's not clear to be added newly
>> > >>> additional option to REINDEX now, we should need to put parenthesis?
>> > >>
>> > >> In my opinion, yes.  The whole point of a flexible options syntax is
>> > >> that we can add new options without changing the grammar.  That
>> > >> involves some compromise on the syntax, which doesn't bother me a
>> > >> bit.
>> > >> Our previous experiments with this for EXPLAIN and COPY and VACUUM
>> > >> have worked out quite well, and I see no reason for pessimism here.
>> > >
>> > > I agree that flexible option syntax does not need to change grammar
>> > > whenever we add new options.
>> > > Attached patch is changed based on your suggestion.
>> > > And the patch for reindexdb is also attached.
>> > > Please feedbacks.
>> > >
>> > >>> Also I'm not sure that both implementation and documentation
>> > >>> regarding
>> > >>> VERBOSE option should be optional.
>> > >>
>> > >> I don't know what this means.
>> > >>
>> > >
>> > > Sorry for confusing you.
>> > > Please ignore this.
>> > >
>> >
>> > Sorry, I forgot attach files.
>> >
>>
>> I applied the two patches to master and I got some errors when compile:
>>
>> tab-complete.c: In function ‘psql_completion’:
>> tab-complete.c:3338:12: warning: left-hand operand of comma expression has
>> no effect [-Wunused-value]
>> {"TABLE", "INDEX", "SYSTEM", "SCHEMA", "DATABASE", NULL};
>> ^
>> tab-complete.c:3338:21: warning: left-hand operand of comma expression has
>> no effect [-Wunused-value]
>> {"TABLE", "INDEX", "SYSTEM", "SCHEMA", "DATABASE", NULL};
>>  ^
>> tab-complete.c:3338:31: warning: left-hand operand of comma expression has
>> no effect [-Wunused-value]
>> {"TABLE", "INDEX", "SYSTEM", "SCHEMA", "DATABASE", NULL};
>>^
>> tab-complete.c:3338:41: warning: left-hand operand of comma expression has
>> no effect [-Wunused-value]
>> {"TABLE", "INDEX", "SYSTEM", "SCHEMA", "DATABASE", NULL};
>>  ^
>> tab-complete.c:3338:53: warning: left-hand operand of comma expression has
>> no effect [-Wunused-value]
>> {"TABLE", "INDEX", "SYSTEM", "SCHEMA", "DATABASE", NULL};
>>

Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-05-07 Thread Sawada Masahiko
On 5/7/15, Sawada Masahiko  wrote:
> On Wed, May 6, 2015 at 5:42 AM, Robert Haas  > wrote:
>> On Tue, May 5, 2015 at 11:10 AM, Sawada Masahiko  > wrote:
>>> On Fri, May 1, 2015 at 9:04 PM, Robert Haas  > wrote:
>>>> On Thu, Apr 30, 2015 at 11:05 PM, Sawada Masahiko
>>>>  > wrote:
>>>>> VACUUM has both syntax: with parentheses and without parentheses.
>>>>> I think we should have both syntax for REINDEX like VACUUM does
>>>>> because it would be pain to put parentheses whenever we want to do
>>>>> REINDEX.
>>>>> Are the parentheses optional in REINDEX command?
>>>>
>>>> No.  The unparenthesized VACUUM syntax was added back before we
>>>> realized that that kind of syntax is a terrible idea.  It requires
>>>> every option to be a keyword, and those keywords have to be in a fixed
>>>> order.  I believe the intention is to keep the old VACUUM syntax
>>>> around for backward-compatibility, but not to extend it.  Same for
>>>> EXPLAIN and COPY.
>>>
>>> REINDEX will have only one option VERBOSE for now.
>>> Even we're in a situation like that it's not clear to be added newly
>>> additional option to REINDEX now, we should need to put parenthesis?
>>
>> In my opinion, yes.  The whole point of a flexible options syntax is
>> that we can add new options without changing the grammar.  That
>> involves some compromise on the syntax, which doesn't bother me a bit.
>> Our previous experiments with this for EXPLAIN and COPY and VACUUM
>> have worked out quite well, and I see no reason for pessimism here.
>
> I agree that flexible option syntax does not need to change grammar
> whenever we add new options.
> Attached patch is changed based on your suggestion.
> And the patch for reindexdb is also attached.
> Please feedbacks.
>
>>> Also I'm not sure that both implementation and documentation regarding
>>> VERBOSE option should be optional.
>>
>> I don't know what this means.
>>
>
> Sorry for confusing you.
> Please ignore this.
>

Sorry, I forgot attach files.

Regards,

---
Sawada Masahiko
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 998340c..703b760 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  
 
-REINDEX { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } name
+REINDEX [ ( { VERBOSE } [, ...] ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } name
 
  
 
@@ -150,6 +150,15 @@ REINDEX { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } rd_rel->relpersistence;
 	index_close(irel, NoLock);
 
-	reindex_index(indOid, false, persistence);
+	reindex_index(indOid, false, persistence, options);
 
 	return indOid;
 }
@@ -1775,7 +1775,7 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
  *		Recreate all indexes of a table (and of its toast table, if any)
  */
 Oid
-ReindexTable(RangeVar *relation)
+ReindexTable(RangeVar *relation, uint8 options)
 {
 	Oid			heapOid;
 
@@ -1785,7 +1785,8 @@ ReindexTable(RangeVar *relation)
 
 	if (!reindex_relation(heapOid,
 		  REINDEX_REL_PROCESS_TOAST |
-		  REINDEX_REL_CHECK_CONSTRAINTS))
+		  REINDEX_REL_CHECK_CONSTRAINTS,
+		  options))
 		ereport(NOTICE,
 (errmsg("table \"%s\" has no indexes",
 		relation->relname)));
@@ -1802,7 +1803,8 @@ ReindexTable(RangeVar *relation)
  * That means this must not be called within a user transaction block!
  */
 void
-ReindexMultipleTables(const char *objectName, ReindexObjectType objectKind)
+ReindexMultipleTables(const char *objectName, ReindexObjectType objectKind,
+	  uint8 options)
 {
 	Oid			objectOid;
 	Relation	relationRelation;
@@ -1814,12 +1816,17 @@ ReindexMultipleTables(const char *objectName, ReindexObjectType objectKind)
 	List	   *relids = NIL;
 	ListCell   *l;
 	int			num_keys;
+	int			elevel = DEBUG2;
 
 	AssertArg(objectName);
 	Assert(objectKind == REINDEX_OBJECT_SCHEMA ||
 		   objectKind == REINDEX_OBJECT_SYSTEM ||
 		   objectKind == REINDEX_OBJECT_DATABASE);
 
+	/* Set option(s) */
+	if (options & REINDEXOPT_VERBOSE)
+		elevel = INFO;
+
 	/*
 	 * Get OID of object to reindex, being the database currently being used
 	 * by session for a database or for system catalogs, or the schema defined
@@ -1938,9 +1945,10 @@ ReindexMultipleTables(const char *objectName, ReindexObjectType objectKind)
 		PushActiveSnapshot(GetTransactionSnapshot());
 		if (reindex_relation(relid,
 			 REINDEX_REL_PROCESS_TOAST |
-			 REINDEX_REL_CHECK_CONSTRAINTS))
-			ereport(DEBUG1,
-	(errmsg("table \"%s.%s\" was reindexed",
+			 REINDEX_REL_CHECK_CONSTRAINTS,
+

[HACKERS] Proposal : REINDEX xxx VERBOSE

2015-05-07 Thread Sawada Masahiko
On Wed, May 6, 2015 at 5:42 AM, Robert Haas > wrote:
> On Tue, May 5, 2015 at 11:10 AM, Sawada Masahiko > wrote:
>> On Fri, May 1, 2015 at 9:04 PM, Robert Haas > wrote:
>>> On Thu, Apr 30, 2015 at 11:05 PM, Sawada Masahiko > wrote:
>>>> VACUUM has both syntax: with parentheses and without parentheses.
>>>> I think we should have both syntax for REINDEX like VACUUM does
>>>> because it would be pain to put parentheses whenever we want to do
>>>> REINDEX.
>>>> Are the parentheses optional in REINDEX command?
>>>
>>> No.  The unparenthesized VACUUM syntax was added back before we
>>> realized that that kind of syntax is a terrible idea.  It requires
>>> every option to be a keyword, and those keywords have to be in a fixed
>>> order.  I believe the intention is to keep the old VACUUM syntax
>>> around for backward-compatibility, but not to extend it.  Same for
>>> EXPLAIN and COPY.
>>
>> REINDEX will have only one option VERBOSE for now.
>> Even we're in a situation like that it's not clear to be added newly
>> additional option to REINDEX now, we should need to put parenthesis?
>
> In my opinion, yes.  The whole point of a flexible options syntax is
> that we can add new options without changing the grammar.  That
> involves some compromise on the syntax, which doesn't bother me a bit.
> Our previous experiments with this for EXPLAIN and COPY and VACUUM
> have worked out quite well, and I see no reason for pessimism here.

I agree that flexible option syntax does not need to change grammar
whenever we add new options.
Attached patch is changed based on your suggestion.
And the patch for reindexdb is also attached.
Please feedbacks.

>> Also I'm not sure that both implementation and documentation regarding
>> VERBOSE option should be optional.
>
> I don't know what this means.
>

Sorry for confusing you.
Please ignore this.

Regards,

---
Sawada Masahiko


-- 
Regards,

---
Sawada Masahiko


Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-05-05 Thread Sawada Masahiko
On Fri, May 1, 2015 at 9:04 PM, Robert Haas  wrote:
> On Thu, Apr 30, 2015 at 11:05 PM, Sawada Masahiko  
> wrote:
>> VACUUM has both syntax: with parentheses and without parentheses.
>> I think we should have both syntax for REINDEX like VACUUM does
>> because it would be pain to put parentheses whenever we want to do
>> REINDEX.
>> Are the parentheses optional in REINDEX command?
>
> No.  The unparenthesized VACUUM syntax was added back before we
> realized that that kind of syntax is a terrible idea.  It requires
> every option to be a keyword, and those keywords have to be in a fixed
> order.  I believe the intention is to keep the old VACUUM syntax
> around for backward-compatibility, but not to extend it.  Same for
> EXPLAIN and COPY.

REINDEX will have only one option VERBOSE for now.
Even we're in a situation like that it's not clear to be added newly
additional option to REINDEX now, we should need to put parenthesis?
Also I'm not sure that both implementation and documentation regarding
VERBOSE option should be optional.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-05-01 Thread Sawada Masahiko
On Fri, May 1, 2015 at 6:24 AM, David Steele  wrote:
> On 4/30/15 6:05 AM, Fujii Masao wrote:
>> On Thu, Apr 30, 2015 at 12:57 PM, Sawada Masahiko  
>> wrote:
>>>
>>> I have changed the status this to "Ready for Committer".
>>
>> The specification of "session audit logging" seems to be still unclear to me.
>> For example, why doesn't "session audit logging" log queries accessing to
>
> The idea was that queries consisting of *only* catalog relations are
> essentially noise.  This makes the log much quieter when tools like psql
> or PgAdmin are in use.  Queries that contain a mix of catalog and user
> tables are logged.
>
> However, you make a good point, so in the spirit of cautious defaults
> I've changed the behavior to log in this case and allow admins to turn
> off the behavior if they choose with a new GUC, pg_audit.log_catalog.
>
>> a catalog like pg_class? Why doesn't it log any queries executed in aborted
>> transaction state? Since there is no such information in the document,
>
> The error that aborts a transaction can easily be logged via the
> standard logging facility.  All prior statements in the transaction will
> be logged with pg_audit.  This is acceptable from an audit logging
> perspective as long as it is documented behavior, which as you point out
> it currently is not.
>
> This has now been documented in the caveats sections which should make
> it clearer to users.
>
>> I'm afraid that users would easily get confused with it. Even if we document 
>> it,
>> I'm not sure if the current behavior is good for the audit purpose. For 
>> example,
>> some users may want to log even queries accessing to the catalogs.
>
> Agreed, and this is now the default.
>
>> I have no idea about when the current CommitFest will end. But probably
>> we don't have that much time left. So I'm thinking that maybe we should pick 
>> up
>> small, self-contained and useful part from the patch and focus on that.
>> If we try to commit every features that the patch provides, we might get
>> nothing at least in 9.5, I'm afraid.
>
> May 15th is the feature freeze, so that does give a little time.  It's
> not clear to me what a "self-contained" part of the patch would be.  If
> you have specific ideas on what could be broken out I'm interested to
> hear them.
>
> Patch v11 is attached with the changes discussed here plus some other
> improvements to the documentation suggested by Erik.
>

For now, since pg_audit patch has a pg_audit_ddl_command_end()
function which uses part of un-committed "deparsing utility commands"
patch API,
pg_audit patch is completely depend on that patch.
If API name, interface are changed, it would affect for pg_audit patch.
I'm not sure about progress of "deparsing utility command" patch but
you have any idea if that patch is not committed into 9.5 until May
15?

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-04-30 Thread Sawada Masahiko
On Fri, May 1, 2015 at 1:38 AM, Robert Haas  wrote:
> On Thu, Apr 30, 2015 at 9:15 AM, Sawada Masahiko  
> wrote:
>> On Thu, Apr 30, 2015 at 8:39 PM, Robert Haas  wrote:
>>> On Thu, Apr 30, 2015 at 6:37 AM, Sawada Masahiko  
>>> wrote:
>>>> Attached v10 patch is latest version patch.
>>>> The syntax is,
>>>> REINDEX { INDEX | ... } name [ WITH ] [ VERBOSE ]
>>>>
>>>> That is, WITH clause is optional.
>>>
>>> I thought we agreed on moving this earlier in the command:
>>>
>>> http://www.postgresql.org/message-id/18569.1423159...@sss.pgh.pa.us
>>>
>>
>> Oh, I see.
>> Attached patch is modified syntax as
>> REINDEX [VERBOSE] { INDEX | ... } name
>>
>> Thought?
>
> I thought what we agreed on was:
>
> REINDEX (flexible options) { INDEX | ... } name
>
> The argument wasn't about whether to use flexible options, but where
> in the command to put them.
>

VACUUM has both syntax: with parentheses and without parentheses.
I think we should have both syntax for REINDEX like VACUUM does
because it would be pain to put parentheses whenever we want to do
REINDEX.
Are the parentheses optional in REINDEX command?

And CLUSTER should have syntax like that in future?

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-04-30 Thread Sawada Masahiko
On Thu, Apr 30, 2015 at 8:39 PM, Robert Haas  wrote:
> On Thu, Apr 30, 2015 at 6:37 AM, Sawada Masahiko  
> wrote:
>> Attached v10 patch is latest version patch.
>> The syntax is,
>> REINDEX { INDEX | ... } name [ WITH ] [ VERBOSE ]
>>
>> That is, WITH clause is optional.
>
> I thought we agreed on moving this earlier in the command:
>
> http://www.postgresql.org/message-id/18569.1423159...@sss.pgh.pa.us
>

Oh, I see.
Attached patch is modified syntax as
REINDEX [VERBOSE] { INDEX | ... } name

Thought?

Regards,

---
Sawada Masahiko


000_reindex_verbose_v11.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-04-30 Thread Sawada Masahiko
On Fri, Apr 24, 2015 at 11:21 AM, Sawada Masahiko  wrote:
> On Fri, Apr 24, 2015 at 1:31 AM, Jim Nasby  wrote:
>> On 4/23/15 11:06 AM, Petr Jelinek wrote:
>>>
>>> On 23/04/15 17:45, Bruce Momjian wrote:
>>>>
>>>> On Thu, Apr 23, 2015 at 09:45:38AM -0400, Robert Haas wrote:
>>>> Agreed, no extra file, and the same write volume as currently.  It would
>>>> also match pg_clog, which uses two bits per transaction --- maybe we can
>>>> reuse some of that code.
>>>>
>>>
>>> Yeah, this approach seems promising. We probably can't reuse code from
>>> clog because the usage pattern is different (key for clog is xid, while
>>> for visibility/freeze map ctid is used). But visibility map storage
>>> layer is pretty simple so it should be easy to extend it for this use.
>>
>>
>> Actually, there may be some bit manipulation functions we could reuse;
>> things like efficiently counting how many things in a byte are set. Probably
>> doesn't make sense to fully refactor it, but at least CLOG is a good source
>> for cut/paste/whack.
>>
>
> I agree with adding a bit that indicates corresponding page is
> all-frozen into VM, just like CLOG.
> I'll change the patch as second version patch.
>

The second patch is attached.

In second patch, I added a bit that indicates all tuples in page are
completely frozen into visibility map.
The visibility map became a bitmap with two bit per heap page:
all-visible and all-frozen.
The logics around vacuum, insert/update/delete heap are almost same as
previous version.

This patch lack some point: documentation, comment in source code,
etc, so it's WIP patch yet,
but I think that it's enough to discuss about this.

Please feedbacks.

Regards,

---
Sawada Masahiko
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index b504ccd..a06e16d 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -86,7 +86,8 @@ static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
 static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
 Buffer newbuf, HeapTuple oldtup,
 HeapTuple newtup, HeapTuple old_key_tup,
-bool all_visible_cleared, bool new_all_visible_cleared);
+bool all_visible_cleared, bool new_all_visible_cleared,
+bool all_frozen_cleared, bool new_all_frozen_cleared);
 static void HeapSatisfiesHOTandKeyUpdate(Relation relation,
 			 Bitmapset *hot_attrs,
 			 Bitmapset *key_attrs, Bitmapset *id_attrs,
@@ -2068,7 +2069,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	HeapTuple	heaptup;
 	Buffer		buffer;
 	Buffer		vmbuffer = InvalidBuffer;
-	bool		all_visible_cleared = false;
+	bool		all_visible_cleared = false,
+all_frozen_cleared = false;
 
 	/*
 	 * Fill in tuple header fields, assign an OID, and toast the tuple if
@@ -2092,8 +2094,9 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	CheckForSerializableConflictIn(relation, NULL, InvalidBuffer);
 
 	/*
-	 * Find buffer to insert this tuple into.  If the page is all visible,
-	 * this will also pin the requisite visibility map page.
+	 * Find buffer to insert this tuple into.  If the page is all visible
+	 * of all frozen, this will also pin the requisite visibility map and
+	 * frozen map page.
 	 */
 	buffer = RelationGetBufferForTuple(relation, heaptup->t_len,
 	   InvalidBuffer, options, bistate,
@@ -2110,7 +2113,16 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		PageClearAllVisible(BufferGetPage(buffer));
 		visibilitymap_clear(relation,
 			ItemPointerGetBlockNumber(&(heaptup->t_self)),
-			vmbuffer);
+			vmbuffer, VISIBILITYMAP_ALL_VISIBLE);
+	}
+
+	if (PageIsAllFrozen(BufferGetPage(buffer)))
+	{
+		all_frozen_cleared = true;
+		PageClearAllFrozen(BufferGetPage(buffer));
+		visibilitymap_clear(relation,
+			ItemPointerGetBlockNumber(&(heaptup->t_self)),
+			vmbuffer, VISIBILITYMAP_ALL_FROZEN);
 	}
 
 	/*
@@ -2157,6 +2169,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 
 		xlrec.offnum = ItemPointerGetOffsetNumber(&heaptup->t_self);
 		xlrec.flags = all_visible_cleared ? XLOG_HEAP_ALL_VISIBLE_CLEARED : 0;
+		if (all_frozen_cleared)
+			xlrec.flags |= XLOG_HEAP_ALL_FROZEN_CLEARED;
 		Assert(ItemPointerGetBlockNumber(&heaptup->t_self) == BufferGetBlockNumber(buffer));
 
 		/*
@@ -2350,7 +2364,8 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
 	{
 		Buffer		buffer;
 		Buffer		vmbuffer = InvalidBuffer;
-		bool		all_visible_cleared = false;
+		bool		all_visible_cleared = false,
+	all_frozen_cleared = false;
 		int			nthispage;
 
 		CHECK_FOR_INTERRUPTS();
@@ -2395,7 +2410,16 @@ heap_multi_insert(Relation relation, HeapTuple *tuples

Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-04-30 Thread Sawada Masahiko
On Fri, Apr 10, 2015 at 2:52 AM, Sawada Masahiko  wrote:
> On Thu, Apr 9, 2015 at 1:14 PM, Fujii Masao  wrote:
>> On Wed, Apr 8, 2015 at 10:53 PM, Sawada Masahiko  
>> wrote:
>>> On Wed, Apr 8, 2015 at 1:09 PM, Fujii Masao  wrote:
>>>> On Wed, Apr 8, 2015 at 1:57 AM, Sawada Masahiko  
>>>> wrote:
>>>>> On Wed, Apr 8, 2015 at 1:11 AM, Fabrízio de Royes Mello
>>>>>  wrote:
>>>>>>
>>>>>> On Tue, Apr 7, 2015 at 1:04 PM, Sawada Masahiko 
>>>>>> wrote:
>>>>>>>
>>>>>>> On Tue, Apr 7, 2015 at 10:12 PM, Fabrízio de Royes Mello
>>>>>>>  wrote:
>>>>>>> >
>>>>>>> > On Tue, Apr 7, 2015 at 7:22 AM, Sawada Masahiko 
>>>>>>> > 
>>>>>>> > wrote:
>>>>>>> >>
>>>>>>> >> Thank you for your reviewing.
>>>>>>> >> I modified the patch and attached latest version patch(v7).
>>>>>>> >> Please have a look it.
>>>>>>> >>
>>>>>>> >
>>>>>>> > Looks good to me. Attached patch (v8) just fix a tab indentation in
>>>>>>> > gram.y.
>>>>>>> >
>>>>>>>
>>>>>>> I had forgotten fix a tab indentation, sorry.
>>>>>>> Thank you for reviewing!
>>>>>>> It looks good to me too.
>>>>>>> Can this patch be marked as "Ready for Committer"?
>>>>>>>
>>>>>>
>>>>>> +1
>>>>>>
>>>>>
>>>>> Changed status to "Ready for Committer".
>>>>
>>>> The patch adds new syntax like "REINDEX ... WITH VERBOSE", i.e., () is not
>>>> added after WITH clause. Did we reach the consensus about this syntax?
>>>> The last email from Robert just makes me think that () should be added
>>>> into the syntax.
>>>>
>>>
>>> Thank you for taking time for this patch!
>>
>> I removed the FORCE option from REINDEX, so you would need to update the 
>> patch.
>
> Thanks.
> I will change the patch with this change.
>
>>> This was quite complicated issue since we already have a lot of style
>>> command currently.
>>> We can think many of style from various perspective: kind of DDL, new
>>> or old command, maintenance command. And each command has each its
>>> story.
>>> I believe we have reached the consensus with this style at least once
>>> (please see previous discussion), but we might needs to discuss
>>> more...
>>
>> Okay, another question is that; WITH must be required whenever the options
>> are specified? Or should it be abbreviatable?
>
> In previous discussion, the WITH clause is always required by VERBOSE
> option. I don't think to enable us to abbreviate WITH clause for now.
> Also, at this time that FORCE option is removed, we could bring back
> idea is to put VERBOSE at after object name like CLUSTER. (INDEX,
> TABLE, etc.)
>

Attached v10 patch is latest version patch.
The syntax is,
REINDEX { INDEX | ... } name [ WITH ] [ VERBOSE ]

That is, WITH clause is optional.

Regards,

---
Sawada Masahiko
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 998340c..2e8db5a 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  
 
-REINDEX { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } name
+REINDEX { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } name [ WITH ] [ VERBOSE ]
 
  
 
@@ -150,6 +150,15 @@ REINDEX { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } rd_rel->relpersistence;
 	index_close(irel, NoLock);
 
-	reindex_index(indOid, false, persistence);
+	reindex_index(indOid, false, persistence, verbose);
 
 	return indOid;
 }
@@ -1775,7 +1775,7 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
  *		Recreate all indexes of a table (and of its toast table, if any)
  */
 Oid
-ReindexTable(RangeVar *relation)
+ReindexTable(RangeVar *relation, bool verbose)
 {
 	Oid			heapOid;
 
@@ -1785,7 +1785,8 @@ ReindexTable(RangeVar *relation)
 
 	if (!reindex_relation(heapOid,
 		  REINDEX_REL_PROCESS_TOAST |
-		  REINDEX_REL_CHECK_CONSTRAINTS))
+		  REINDEX_REL_CHECK_CONSTRAINTS,
+		  verbose))
 		ereport(NOTICE,
 (errmsg("table \"%s\" has no indexes",
 		relation->relname)));
@@ -1802,7 +1803,7 @@ ReindexTable(RangeVar *relation)
  * That mea

Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-04-29 Thread Sawada Masahiko
On Wed, Apr 29, 2015 at 12:17 AM, David Steele  wrote:
> On 4/28/15 2:14 AM, Sawada Masahiko wrote:
>> On Fri, Apr 24, 2015 at 3:23 AM, David Steele  wrote:
>>> I've also added some checking to make sure that if anything looks funny
>>> on the stack an error will be generated.
>>>
>>> Thanks for the feedback!
>>>
>>
>> Thank you for updating the patch!
>> I ran the postgres regression test on database which is enabled
>> pg_audit, it works fine.
>> Looks good to me.
>>
>> If someone don't have review comment or bug report, I will mark this
>> as "Ready for Committer".
>
> Thank you!  I appreciate all your work reviewing this patch and of
> course everyone else who commented on, reviewed or tested the patch
> along the way.
>

I have changed the status this to "Ready for Committer".

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-04-29 Thread Sawada Masahiko
On Thu, Apr 30, 2015 at 6:31 AM, David Steele  wrote:
> On 4/29/15 5:16 PM, Robert Haas wrote:
>> On Fri, Apr 24, 2015 at 2:40 PM, David Steele  wrote:
>>>   
>>>The view pg_file_settings provides access to
>>> run-time parameters
>>>that are defined in configuration files via SQL.  In contrast to
>>>pg_settings a row is provided for each
>>> occurrence
>>>of the parameter in a configuration file.  This is helpful for
>>> discovering why one value
>>>may have been used in preference to another when the parameters were
>>> loaded.
>>>   
>>
>> This seems to imply that this gives information about only a subset of
>> configuration files; specifically, those auto-generated based on SQL
>> commands - i.e. postgresql.conf.auto.  But I think it's supposed to
>> give information about all configuration files, regardless of how
>> generated.  Am I wrong?  If not, I'd suggest "run-time parameters that
>> are defined in configuration files via SQL" -> "run-time parameters
>> stored in configuration files".
>
> The view does give information about all configuration files regardless
> of how they were created.
>
> That's what I intended the text to say but I think your phrasing is clearer.
>

Thank you for reviewing.
I agree with this.
Attached patch is updated version v10.

Regards,

---
Sawada Masahiko
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 4b79958..adb8628 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7560,6 +7560,11 @@
  
 
  
+  pg_file_settings
+  parameter settings of file
+ 
+
+ 
   pg_shadow
   database users
  
@@ -9173,6 +9178,74 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  
 
+ 
+  pg_file_settings
+
+  
+   pg_file_settings
+  
+
+  
+   The view pg_file_settings provides access to
+   run-time parameters stored in configuration files.
+   In contrast to pg_settings a row is provided for
+   each occurrence of the parameter in a configuration file. This is helpful
+   for discovering why one value may have been used in preference to another
+   when the parameters were loaded.
+  
+
+  
+   pg_file_settings Columns
+
+  
+   
+
+ Name
+ Type
+ Description
+
+   
+   
+
+ sourcefile
+ text
+ A path of configration file
+
+
+ sourceline
+ integer
+ 
+  Line number within the configuration file the current value was
+  set at (null for values set from sources other than configuration files,
+  or when examined by a non-superuser)
+ 
+
+
+ seqno
+ integer
+ Order in which the setting was loaded from the configuration
+
+
+ name
+ text
+ 
+  Configuration file the current value was set in (null for values
+  set from sources other than configuration files, or when examined by a
+  non-superuser); helpful when using include directives in
+  configuration files
+ 
+
+
+ setting
+ text
+ Run-time configuration parameter name
+
+   
+  
+ 
+
+
+
  
   pg_shadow
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 2ad01f4..18921c4 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -411,6 +411,12 @@ CREATE RULE pg_settings_n AS
 
 GRANT SELECT, UPDATE ON pg_settings TO PUBLIC;
 
+CREATE VIEW pg_file_settings AS
+   SELECT * FROM pg_show_all_file_settings() AS A;
+
+REVOKE ALL on pg_file_settings FROM PUBLIC;
+REVOKE EXECUTE ON FUNCTION pg_show_all_file_settings() FROM PUBLIC;
+
 CREATE VIEW pg_timezone_abbrevs AS
 SELECT * FROM pg_timezone_abbrevs();
 
diff --git a/src/backend/utils/misc/guc-file.l b/src/backend/utils/misc/guc-file.l
index c5e0fac..873d950 100644
--- a/src/backend/utils/misc/guc-file.l
+++ b/src/backend/utils/misc/guc-file.l
@@ -119,6 +119,8 @@ ProcessConfigFile(GucContext context)
 	ConfigVariable *item,
 			   *head,
 			   *tail;
+	ConfigFileVariable *guc_array;
+	size_t			guc_array_size;
 	int			i;
 
 	/*
@@ -217,6 +219,9 @@ ProcessConfigFile(GucContext context)
 		gconf->status &= ~GUC_IS_IN_FILE;
 	}
 
+	/* Reset number of guc_file_variables */
+	num_guc_file_variables = 0;
+
 	/*
 	 * Check if all the supplied option names are valid, as an additional
 	 * quasi-syntactic check on the validity of the config file.  It is
@@ -255,6 +260,7 @@ ProcessConfigFile(GucContext context)
 			error = true;
 			ConfFileWithError = item->filename;
 		}
+		num_guc_file_variables++;
 	}
 
 	/*
@@ -342,6 +348,47 @@ ProcessConfigFile(GucContext context)
 	}
 
 	/*
+	 * Calculate size of guc_array and allocate it. From the secound time to allcate memory,
+	 * we should free old allocated memory.
+

Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-04-29 Thread Sawada Masahiko
On Wed, Apr 29, 2015 at 12:20 AM, David Steele  wrote:
> On 4/27/15 10:37 PM, Sawada Masahiko wrote:
>>
>> Thank you for your reviewing.
>> Attached v8 patch is latest version.
>
> I posted a review through the CF app but it only went to the list
> instead of recipients of the latest message.  install-checkworld is
> failing but the fix is pretty trivial.
>
> See:
> http://www.postgresql.org/message-id/20150428145626.2632.75287.p...@coridan.postgresql.org
>

Thank you for reviewing!
I have fixed regarding regression test.
The latest patch is attached.

Regards,

---
Sawada Masahiko


000_pg_file_settings_view_v9.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-04-27 Thread Sawada Masahiko
On Fri, Apr 24, 2015 at 3:23 AM, David Steele  wrote:
> On 4/23/15 5:49 AM, Sawada Masahiko wrote:
>>
>> I'm concerned that behaviour of pg_audit has been changed at a few
>> times as far as I remember. Did we achieve consensus on this design?
>
> The original author Abhijit expressed support for the SESSION/OBJECT
> concept before I started working on the code and so has Stephen Frost.
> As far as I know all outstanding comments from the community have been
> addressed.
>
> Overall behavior has not changed very much since being submitted to the
> CF in February - mostly just tweaks and additional options.
>
>> And one question; OBJECT logging of all tuple deletion (i.g. DELETE
>> FROM hoge) seems like not work as follows.
>>
>>
>> =# grant all on bar TO masahiko;
>>
>> (1) Delete all tuple
>> =# insert into bar values(1);
>> =# delete from bar ;
>> NOTICE:  AUDIT: SESSION,47,1,WRITE,DELETE,TABLE,public.bar,delete from bar ;
>> DELETE 1
>>
>> (2) Delete specified tuple (but same result as (1))
>> =# insert into bar values(1);
>> =# delete from bar where col = 1;
>> NOTICE:  AUDIT: OBJECT,48,1,WRITE,DELETE,TABLE,public.bar,delete from
>> bar where col = 1;
>> NOTICE:  AUDIT: SESSION,48,1,WRITE,DELETE,TABLE,public.bar,delete from
>> bar where col = 1;
>> DELETE 1
>
> Definitely a bug.  Object logging works in the second case because the
> select privileges on the "col" column trigger logging.  I have fixed
> this and added a regression test.
>
> I also found a way to get the stack memory context under the query
> memory context.  Because of the order of execution it requires moving
> the memory context but I still think it's a much better solution.  I was
> able to remove most of the stack pops (except function logging) and the
> output remained stable.
>
> I've also added some checking to make sure that if anything looks funny
> on the stack an error will be generated.
>
> Thanks for the feedback!
>

Thank you for updating the patch!
I ran the postgres regression test on database which is enabled
pg_audit, it works fine.
Looks good to me.

If someone don't have review comment or bug report, I will mark this
as "Ready for Committer".

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-04-27 Thread Sawada Masahiko
On Tue, Apr 28, 2015 at 3:22 AM, David Steele  wrote:
> On 4/27/15 10:31 AM, Sawada Masahiko wrote:
>> Thank you for your review comment!
>> The latest patch is attached.
>
> Looks good overall - a few more comments below:

Thank you for your reviewing.
Attached v8 patch is latest version.

> diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
> +
> + seqno
> + integer
> + Sequence number of current view
>
> I would suggest:
>
> Order in which the setting was loaded from the configuration

FIx.

> +
> +
> + name
> + text
>
> diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
> +/*
> + * show_all_file_settings
> + */
> +
> +#define NUM_PG_FILE_SETTINGS_ATTS 5
> +
> +Datum
> +show_all_file_settings(PG_FUNCTION_ARGS)
>
> It would be good to get more detail in the function comment, as well as
> more comments in the function itself.

Added some comments.

> A minor thing, but there are a number of whitespace errors when applying
> the patch:
>
> ../000_pg_file_settings_view_v6.patch:159: indent with spaces.
> free(guc_file_variables[i].name);
> ../000_pg_file_settings_view_v6.patch:160: indent with spaces.
> free(guc_file_variables[i].value);
> ../000_pg_file_settings_view_v6.patch:161: indent with spaces.
> free(guc_file_variables[i].filename);
> ../000_pg_file_settings_view_v6.patch:178: indent with spaces.
> guc_array->name = guc_strdup(FATAL, item->name);
> ../000_pg_file_settings_view_v6.patch:179: indent with spaces.
> guc_array->value = guc_strdup(FATAL, item->value);
> warning: squelched 2 whitespace errors
> warning: 7 lines add whitespace errors.
>
> I'm sure the committer would appreciate it if you'd clean those up.

I tried to get rid of white space.

Regards,

---
Sawada Masahiko
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 898865e..50b93cf 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7437,6 +7437,11 @@
  
 
  
+  pg_file_settings
+  parameter settings of file
+ 
+
+ 
   pg_shadow
   database users
  
@@ -9050,6 +9055,74 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  
 
+ 
+  pg_file_settings
+
+  
+   pg_file_settings
+  
+
+  
+   The view pg_file_settings provides access to
+   run-time parameters that are defined in configuration files via SQL.
+   In contrast to pg_settings a row is provided for
+   each occurrence of the parameter in a configuration file. This is helpful
+   for discovering why one value may have been used in preference to another
+   when the parameters were loaded.
+  
+
+  
+   pg_file_settings Columns
+
+  
+   
+
+ Name
+ Type
+ Description
+
+   
+   
+
+ sourcefile
+ text
+ A path of configration file
+
+
+ sourceline
+ integer
+ 
+  Line number within the configuration file the current value was
+  set at (null for values set from sources other than configuration files,
+  or when examined by a non-superuser)
+ 
+
+
+ seqno
+ integer
+ Order in which the setting was loaded from the configuration
+
+
+ name
+ text
+ 
+  Configuration file the current value was set in (null for values
+  set from sources other than configuration files, or when examined by a
+  non-superuser); helpful when using include directives in
+  configuration files
+ 
+
+
+ setting
+ text
+ Run-time configuration parameter name
+
+   
+  
+ 
+
+
+
  
   pg_shadow
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 4c35ef4..31faab0 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -411,6 +411,12 @@ CREATE RULE pg_settings_n AS
 
 GRANT SELECT, UPDATE ON pg_settings TO PUBLIC;
 
+CREATE VIEW pg_file_settings AS
+   SELECT * FROM pg_show_all_file_settings() AS A;
+
+REVOKE ALL on pg_file_settings FROM PUBLIC;
+REVOKE EXECUTE ON FUNCTION pg_show_all_file_settings() FROM PUBLIC;
+
 CREATE VIEW pg_timezone_abbrevs AS
 SELECT * FROM pg_timezone_abbrevs();
 
diff --git a/src/backend/utils/misc/guc-file.l b/src/backend/utils/misc/guc-file.l
index c5e0fac..873d950 100644
--- a/src/backend/utils/misc/guc-file.l
+++ b/src/backend/utils/misc/guc-file.l
@@ -119,6 +119,8 @@ ProcessConfigFile(GucContext context)
 	ConfigVariable *item,
 			   *head,
 			   *tail;
+	ConfigFileVariable *guc_array;
+	size_t			guc_array_size;
 	int			i;
 
 	/*
@@ -217,6 +219,9 @@ ProcessConfigFile(GucContext context)
 		gconf->status &= ~GUC_IS_IN_FILE;
 	}
 
+	/* Reset number of guc_file_variables */
+	num_guc_file_variables = 0;
+
 	/*

Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-04-27 Thread Sawada Masahiko
On Mon, Apr 27, 2015 at 11:31 PM, Sawada Masahiko  wrote:
> On Sat, Apr 25, 2015 at 3:40 AM, David Steele  wrote:
>> On 4/4/15 9:21 AM, Sawada Masahiko wrote:
>>> I added documentation changes to patch is attached.
>>> Also I tried to use memory context for allocation of guc_file_variable
>>> in TopMemoryContext,
>>> but it was failed access after received SIGHUP.
>> Below is my review of the v5 patch:
>
> Thank you for your review comment!
> The latest patch is attached.
>
>> diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
>> + 
>> +  pg_file_settings
>> +
>> +  
>> +   pg_file_settings
>> +  
>> +
>> +  
>> +   The view pg_file_settings provides confirm
>> parameters via SQL,
>> +   which is written in configuration file. This view can be update by
>> reloading configration file.
>> +  
>>
>> Perhaps something like:
>>
>>   
>>The view pg_file_settings provides access to
>> run-time parameters
>>that are defined in configuration files via SQL.  In contrast to
>>pg_settings a row is provided for each
>> occurrence
>>of the parameter in a configuration file.  This is helpful for
>> discovering why one value
>>may have been used in preference to another when the parameters were
>> loaded.
>>   
>>
>> +  
>> +   pg_file_settings Columns
>> +
>> +  
>> +   
>> +
>> + Name
>> + Type
>> + Description
>> +
>> +   
>> +   
>> +
>> + sourcefile
>> + text
>> + A path of configration file
>>
>> From pg_settings:
>>
>>   Configuration file the current value was set in (null for
>>   values set from sources other than configuration files, or when
>>   examined by a non-superuser);
>>   helpful when using include directives in configuration
>> files
>>
>> +
>> +
>> + sourceline
>> + integer
>> + The line number in configuration file
>>
>> From pg_settings:
>>
>>   Line number within the configuration file the current value was
>>   set at (null for values set from sources other than configuration
>> files,
>>   or when examined by a non-superuser)
>>   
>>
>>
>> +
>> +
>> + name
>> + text
>> + Parameter name in configuration file
>>
>> From pg_settings:
>>
>>   Run-time configuration parameter name
>>
>> +
>> +
>> + setting
>> + text
>> + Value of the parameter in configuration file
>> +
>> +   
>> +  
>> + 
>> +
>> +
>
> The documentation is updated based on your suggestion.
>
>> diff --git a/src/backend/utils/misc/guc-file.l
>> b/src/backend/utils/misc/guc-file.l
>> @@ -342,6 +345,42 @@ ProcessConfigFile(GucContext context)
>> +guc_array = guc_file_variables;
>> +for (item = head; item; item = item->next, guc_array++)
>> +{
>> +free(guc_array->name);
>> +free(guc_array->value);
>> +free(guc_array->filename);
>>
>> There's an issue freeing these when calling pg_reload_conf():
>
> Fix.
>
>> postgres=# alter system set max_connections = 100;
>> ALTER SYSTEM
>> postgres=# select * from pg_reload_conf();
>> LOG:  received SIGHUP, reloading configuration files
>>  pg_reload_conf
>> 
>>  t
>> (1 row)
>>
>> postgres=# postgres(25078,0x7fff747b2300) malloc: *** error for object
>> 0x424d380044: pointer being freed was not allocated
>> *** set a breakpoint in malloc_error_break to debug
>>
>> Of course a config reload can't change max_connections, but I wouldn't
>> expect it to crash, either.
>>
>> +guc_array->sourceline = -1;
>>
>> I can't see the need for this since it is reassigned further down.
>
> Fix.
>
>> --
>>
>> Up-thread David J had recommended an ordering column (or seqno) that
>> would provide the order in which the settings were loaded.  I think this
>> would give valuable information that can't be gleaned from the line
>> numbers alone.
>>
>> Personally I think it would be fine to start from 1 and increment for
>> each setting found, rather than rank within a setting.  If the user
>> wants per setting ranking that's what SQL is for.  So the output would
>> look something like:
>>
>>sourcefile | sourceline | seqno |  name   |  setting
>> --++---+-+---
>>  /db/postgresql.conf  | 64 | 1 | max_connections | 100
>>  /db/postgresql.conf  |116 | 2 | shared_buffers  | 128MB
>>  /db/postgresql.conf  |446 | 3 | log_timezone|
>> US/Eastern
>>  /db/postgresql.auto.conf |  3 | 4 | max_connections | 200
>>
>
> Yep, I agree with you.
> Added seqno column into pg_file_settings view.
>

Attached patch is modified to apply to HEAD.
v7 patch is latest.

Regards,

---
Sawada Masahiko


000_pg_file_settings_view_v7.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-04-27 Thread Sawada Masahiko
On Sat, Apr 25, 2015 at 3:40 AM, David Steele  wrote:
> On 4/4/15 9:21 AM, Sawada Masahiko wrote:
>> I added documentation changes to patch is attached.
>> Also I tried to use memory context for allocation of guc_file_variable
>> in TopMemoryContext,
>> but it was failed access after received SIGHUP.
> Below is my review of the v5 patch:

Thank you for your review comment!
The latest patch is attached.

> diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
> + 
> +  pg_file_settings
> +
> +  
> +   pg_file_settings
> +  
> +
> +  
> +   The view pg_file_settings provides confirm
> parameters via SQL,
> +   which is written in configuration file. This view can be update by
> reloading configration file.
> +  
>
> Perhaps something like:
>
>   
>The view pg_file_settings provides access to
> run-time parameters
>that are defined in configuration files via SQL.  In contrast to
>pg_settings a row is provided for each
> occurrence
>of the parameter in a configuration file.  This is helpful for
> discovering why one value
>may have been used in preference to another when the parameters were
> loaded.
>   
>
> +  
> +   pg_file_settings Columns
> +
> +  
> +   
> +
> + Name
> + Type
> + Description
> +
> +   
> +   
> +
> + sourcefile
> + text
> + A path of configration file
>
> From pg_settings:
>
>   Configuration file the current value was set in (null for
>   values set from sources other than configuration files, or when
>   examined by a non-superuser);
>   helpful when using include directives in configuration
> files
>
> +
> +
> + sourceline
> + integer
> + The line number in configuration file
>
> From pg_settings:
>
>   Line number within the configuration file the current value was
>   set at (null for values set from sources other than configuration
> files,
>   or when examined by a non-superuser)
>   
>
>
> +
> +
> + name
> + text
> + Parameter name in configuration file
>
> From pg_settings:
>
>   Run-time configuration parameter name
>
> +
> +
> + setting
> + text
> + Value of the parameter in configuration file
> +
> +   
> +  
> + 
> +
> +

The documentation is updated based on your suggestion.

> diff --git a/src/backend/utils/misc/guc-file.l
> b/src/backend/utils/misc/guc-file.l
> @@ -342,6 +345,42 @@ ProcessConfigFile(GucContext context)
> +guc_array = guc_file_variables;
> +for (item = head; item; item = item->next, guc_array++)
> +{
> +free(guc_array->name);
> +free(guc_array->value);
> +free(guc_array->filename);
>
> There's an issue freeing these when calling pg_reload_conf():

Fix.

> postgres=# alter system set max_connections = 100;
> ALTER SYSTEM
> postgres=# select * from pg_reload_conf();
> LOG:  received SIGHUP, reloading configuration files
>  pg_reload_conf
> 
>  t
> (1 row)
>
> postgres=# postgres(25078,0x7fff747b2300) malloc: *** error for object
> 0x424d380044: pointer being freed was not allocated
> *** set a breakpoint in malloc_error_break to debug
>
> Of course a config reload can't change max_connections, but I wouldn't
> expect it to crash, either.
>
> +guc_array->sourceline = -1;
>
> I can't see the need for this since it is reassigned further down.

Fix.

> --
>
> Up-thread David J had recommended an ordering column (or seqno) that
> would provide the order in which the settings were loaded.  I think this
> would give valuable information that can't be gleaned from the line
> numbers alone.
>
> Personally I think it would be fine to start from 1 and increment for
> each setting found, rather than rank within a setting.  If the user
> wants per setting ranking that's what SQL is for.  So the output would
> look something like:
>
>sourcefile | sourceline | seqno |  name   |  setting
> ------++---+-+---
>  /db/postgresql.conf  | 64 | 1 | max_connections | 100
>  /db/postgresql.conf  |116 | 2 | shared_buffers  | 128MB
>  /db/postgresql.conf  |446 | 3 | log_timezone|
> US/Eastern
>  /db/postgresql.auto.conf |  3 | 4 | max_connections | 200
>

Yep, I agree with you.
Added seqno column into pg_file_settings view.

Regards,

---
Sawada Masahiko


000_pg_file_settings_view_v6.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-04-23 Thread Sawada Masahiko
On Fri, Apr 24, 2015 at 1:31 AM, Jim Nasby  wrote:
> On 4/23/15 11:06 AM, Petr Jelinek wrote:
>>
>> On 23/04/15 17:45, Bruce Momjian wrote:
>>>
>>> On Thu, Apr 23, 2015 at 09:45:38AM -0400, Robert Haas wrote:
>>> Agreed, no extra file, and the same write volume as currently.  It would
>>> also match pg_clog, which uses two bits per transaction --- maybe we can
>>> reuse some of that code.
>>>
>>
>> Yeah, this approach seems promising. We probably can't reuse code from
>> clog because the usage pattern is different (key for clog is xid, while
>> for visibility/freeze map ctid is used). But visibility map storage
>> layer is pretty simple so it should be easy to extend it for this use.
>
>
> Actually, there may be some bit manipulation functions we could reuse;
> things like efficiently counting how many things in a byte are set. Probably
> doesn't make sense to fully refactor it, but at least CLOG is a good source
> for cut/paste/whack.
>

I agree with adding a bit that indicates corresponding page is
all-frozen into VM, just like CLOG.
I'll change the patch as second version patch.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-04-23 Thread Sawada Masahiko
On Mon, Apr 20, 2015 at 10:17 PM, David Steele  wrote:
> On 4/20/15 4:40 AM, Sawada Masahiko wrote:
>>
>> Thank you for updating the patch.
>>
>> One question about regarding since v7 (or later) patch is;
>> What is the different between OBJECT logging and SESSION logging?
>
> In brief, session logging can log anything that happens in a user
> session while object logging only targets DML and SELECT on selected
> relations.
>
> The pg_audit.log_relation setting is intended to mimic the prior
> behavior of pg_audit before object logging was added.
>
> In general, I would not expect pg_audit.log = 'read, write' to be used
> with pg_audit.role.  In other words, session logging of DML and SELECT
> should probably not be turned on at the same time as object logging is
> in use.  Object logging is intended to be a fine-grained version of
> pg_audit.log = 'read, write' (one or both).

Thank you for your explanation!
I understood about how to use two logging style.

>> I used v9 patch with "pg_audit.log_relation = on", and got quite
>> similar but different log as follows.
>>
>> =# select * from hoge, bar where hoge.col = bar.col;
>> NOTICE:  AUDIT: OBJECT,8,1,READ,SELECT,TABLE,public.hoge,"select *
>> from hoge, bar where hoge.col = bar.col;"
>> NOTICE:  AUDIT: SESSION,8,1,READ,SELECT,TABLE,public.hoge,"select *
>> from hoge, bar where hoge.col = bar.col;"
>> NOTICE:  AUDIT: OBJECT,8,1,READ,SELECT,TABLE,public.bar,"select * from
>> hoge, bar where hoge.col = bar.col;"
>> NOTICE:  AUDIT: SESSION,8,1,READ,SELECT,TABLE,public.bar,"select *
>> from hoge, bar where hoge.col = bar.col;"
>>
>> The behaviour of SESSION log is similar to OBJECT log now, and SESSION
>> log per session (i.g, pg_audit.log_relation = off) is also similar to
>> 'log_statement = all'. (enhancing log_statement might be enough)
>> So I couldn't understand needs of SESSION log.
>
> Session logging is quite different from 'log_statement = all'.  See:
>
> http://www.postgresql.org/message-id/552323b2.8060...@pgmasters.net
>
> and/or the "Why pg_audit?" section of the pg_audit documentation.
>
> I agree that it may make sense in the future to merge session logging
> into log_statements, but for now it does provide important additional
> functionality for creating audit logs.
>

I'm concerned that behaviour of pg_audit has been changed at a few
times as far as I remember. Did we achieve consensus on this design?

And one question; OBJECT logging of all tuple deletion (i.g. DELETE
FROM hoge) seems like not work as follows.


=# grant all on bar TO masahiko;

(1) Delete all tuple
=# insert into bar values(1);
=# delete from bar ;
NOTICE:  AUDIT: SESSION,47,1,WRITE,DELETE,TABLE,public.bar,delete from bar ;
DELETE 1

(2) Delete specified tuple (but same result as (1))
=# insert into bar values(1);
=# delete from bar where col = 1;
NOTICE:  AUDIT: OBJECT,48,1,WRITE,DELETE,TABLE,public.bar,delete from
bar where col = 1;
NOTICE:  AUDIT: SESSION,48,1,WRITE,DELETE,TABLE,public.bar,delete from
bar where col = 1;
DELETE 1

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-04-23 Thread Sawada Masahiko
On Thu, Apr 23, 2015 at 3:24 AM, Robert Haas  wrote:
> On Wed, Apr 22, 2015 at 12:39 PM, Heikki Linnakangas  wrote:
>> The thing that made me nervous about that approach is that it made the LSN
>> of each page critical information. If you somehow zeroed out the LSN, you
>> could no longer tell which pages are frozen and which are not. I'm sure it
>> could be made to work - and I got it working to some degree anyway - but
>> it's a bit scary. It's similar to the multixid changes in 9.3: multixids
>> also used to be data that you can just zap at restart, and when we changed
>> the rules so that you lose data if you lose multixids, we got trouble. Now,
>> LSNs are much simpler, and there wouldn't be anything like the
>> multioffset/member SLRUs that you'd have to keep around forever or vacuum,
>> but still..
>
> LSNs are already pretty critical.  If they're in the future, you can't
> flush those pages.  Ever.  And if they're wrong in either direction,
> crash recovery is broken.  But it's still worth thinking about ways
> that we could make this more robust.
>
> I keep coming back to the idea of treating any page that is marked as
> all-visible as frozen, and deferring freezing until the page is again
> modified.  The big downside of this is that if the page is set as
> all-visible and then immediately thereafter modified, it sucks to have
> to freeze when the XIDs in the page are still present in CLOG.  But if
> we could determine from the LSN that the XIDs in the page are new
> enough to still be considered valid, then we could skip freezing in
> those cases and only do it when the page is "old".  That way, if
> somebody zeroed out the LSN (why, oh why?) the worst that would happen
> is that we'd do some extra freezing when the page was next modified.

In your idea, if we have WORM (write-once read-many) table then these
tuples in page would not be frozen at all unless we do VACUUM FREEZE.
Also in this situation, from the second time VACUUM FREEZE would need
to scan only pages of increment from last freezing, we could reduce
I/O, but we would still need to do explicitly freezing for
anti-wrapping as in the past. WORM table has huge data in general, and
that data would be increase rapidly, so it would also be expensive.

>
>> I would feel safer if we added a completely new "epoch" counter to the page
>> header, instead of reusing LSNs. But as we all know, changing the page
>> format is a problem for in-place upgrade, and takes some space too.
>
> Yeah.  We have a serious need to reduce the size of our on-disk
> format.  On a TPC-C-like workload Jan Wieck recently tested, our data
> set was 34% larger than another database at the beginning of the test,
> and 80% larger by the end of the test.  And we did twice the disk
> writes.  See "The Elephants in the Room.pdf" at
> https://sites.google.com/site/robertmhaas/presentations
>

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-04-21 Thread Sawada Masahiko
On Wed, Apr 22, 2015 at 12:02 AM, Andres Freund  wrote:
> On 2015-04-21 23:59:45 +0900, Sawada Masahiko wrote:
>> The page as frozen could have the dead tuple for now, but I think to change
>> to that the frozen page guarantees that page is all frozen *and* all
>> visible.
>
> It shouldn't. That'd potentially cause corruption after a wraparound. A
> tuple's visibility might change due to that.

The page as frozen could have some dead tuples, right?
I think we should to clear a bit of FrozenMap (and flag of page
header) on delete operation, and a bit is set only by vacuum.
So accordingly, the page as frozen guarantees that all frozen and all visible?

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Freeze avoidance of very large table.

2015-04-21 Thread Sawada Masahiko
On Tue, Apr 21, 2015 at 7:00 AM, Jim Nasby  wrote:
> On 4/20/15 2:45 AM, Sawada Masahiko wrote:
>>
>> Current patch adds new source file src/backend/access/heap/frozenmap.c
>> which is quite similar to visibilitymap.c. They have similar code but
>> are separated for now. I do refactoring these source code like adding
>> bitmap.c, if needed.
>

Thank you for having a look this patch.

>
> My feeling is we'd definitely want this refactored; it looks to be a whole
> lot of duplicated code. But before working on that we should get consensus
> that a FrozenMap is a good idea.

Yes, we need to get consensus about FrozenMap before starting work on.
In addition to comment you pointed out, I noticed that one problems I
should address, that a bit of FrozenMap need to be cleared on deletion and
(i.g. xmax is set).
The page as frozen could have the dead tuple for now, but I think to change
to that the frozen page guarantees that page is all frozen *and* all
visible.

> Are there any meaningful differences between the two, besides the obvious
> name changes?

No, there aren't.

> I think there's also a bunch of XLOG stuff that could be refactored too...

I agree with you.

>> Also, when skipping vacuum by visibility map, we can skip at least
>> SKIP_PAGE_THESHOLD consecutive page, but such mechanism is not in
>> frozen map.
>
>
> That's probably something else that can be factored out, since it's
> basically the same logic. I suspect we just need to && some of the checks
so
> we're looking at both FM and VM at the same time.

FrozenMap is used to skip scan only when anti-wrapping vacuum or freezing
all tuples (i.g scan_all is true).
The normal vacuum uses only VM, doesn't use FM for now.

> Other comments...
>
> It would be nice if we didn't need another page bit for FM; do you see any
> reasonable way that could happen?

We may be able to remove page bit for FM from page header, but I'm not sure
we could do that.

> +* If we didn't pin the visibility(and frozen) map page and the
page
> has
> +* become all visible(and frozen) while we were busy locking the
> buffer,
> +* or during some subsequent window during which we had it
unlocked,
> +* we'll have to unlock and re-lock, to avoid holding the buffer
> lock
> +* across an I/O.  That's a bit unfortunate, especially since
we'll
> now
> +* have to recheck whether the tuple has been locked or updated
> under us,
> +* but hopefully it won't happen very often.
>  */
>
> s/(and frozen)/ or frozen/
>
>
> + * Reply XLOG_HEAP3_FROZENMAP record.
> s/Reply/Replay/

Understood.

>
> +   /*
> +* XLogReplayBufferExtended locked the buffer. But
> frozenmap_set
> +* will handle locking itself.
> +*/
> +   LockBuffer(fmbuffer, BUFFER_LOCK_UNLOCK);
>
> Doesn't this create a race condition?
>
>
> Are you sure the bit in finish_heap_swap() is safe? If so, we should add
the
> same the same for the visibility map too (it certainly better be all
visible
> if it's frozen...)

We can not ensure page is all visible even if we execute VACUUM FULL,
because of dead tuple could be remained. e.g. the case when other process
does insert and update to same tuple in same transaction before VACUUM FULL.
I was thinking that the FrozenMap is free of the influence of delete
operation. But as I said at top of this mail, a bit of FrozenMap needs to
be cleared on deletion.
So I will remove these related code as you mentioned.

>
>
>
> +   /*
> +* Current block is all-visible.
> +* If frozen map represents that it's all frozen
and
> this
> +* function is called for freezing tuples, we can
> skip to
> +* vacuum block.
> +*/
>
> I would state this as "Even if scan_all is true, we can skip blocks that
are
> marked as frozen."
>
> +   if (frozenmap_test(onerel, blkno, &fmbuffer) &&
> scan_all)
>
> I suspect it's faster to reverse those tests (scan_all &&
> frozenmap_test())... but why do we even need to look at scan_all? AFAICT
if
> a block as frozen we can skip it unconditionally.

The tuple which is frozen and dead, could be remained in page is marked all
frozen, in currently patch.
i.g., There is possible to exist the page is not all visible but marked
frozen.
But I'm thinking to change that.

>
>
> +   /*
> +* If the un-frozen tuple is remaining in current
> page an

Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-04-20 Thread Sawada Masahiko
On Thu, Apr 16, 2015 at 2:34 AM, David Steele  wrote:
> On 4/15/15 11:30 AM, Sawada Masahiko wrote:
>> On Wed, Apr 15, 2015 at 10:52 AM, Sawada Masahiko  
>> wrote:
>>> I tested v8 patch with CURSOR case I mentioned before, and got
>>> segmentation fault again.
>>> Here are log messages in my environment,
>>>
>>> =# select test();
>>>  LOG:  server process (PID 29730) was terminated by signal 11:
>>> Segmentation fault
>>> DETAIL:  Failed process was running: select test();
>>> LOG:  terminating any other active server processes
>>> WARNING:  terminating connection because of crash of another server process
>>> DETAIL:  The postmaster has commanded this server process to roll back
>>> the current transaction and exit, because another server process
>>> exited abnormally and possibly corrupted shared memory.
>>> HINT:  In a moment you should be able to reconnect to the database and
>>> repeat your command.
>>> FATAL:  the database system is in recovery mode
>>
>> I investigated this problem and inform you my result here.
>>
>> When we execute test() function I mentioned, the three stack items in
>> total are stored into auditEventStack.
>> The two of them are freed by stack_pop() -> stack_free() (i.g,
>> stack_free() is called by stack_pop()).
>> One of them is freed by PortalDrop() -> .. ->
>> MemoryContextDeleteChildren() -> ... -> stack_free().
>> And it is freed at the same time as deleting pg_audit memory context,
>> and stack will be completely empty.
>>
>> But after freeing all items, finish_xact_command() function could call
>> PortalDrop(), so ExecutorEnd() function could be called again.
>> pg_audit has ExecutorEnd_hook, so postgres tries to free that item.. SEGV.
>>
>> In my environment, the following change fixes it.
>>
>> --- pg_audit.c.org2015-04-15 14:21:07.541866525 +0900
>> +++ pg_audit.c2015-04-15 11:36:53.758877339 +0900
>> @@ -1291,7 +1291,7 @@
>>  standard_ExecutorEnd(queryDesc);
>>
>>  /* Pop the audit event off the stack */
>> -if (!internalStatement)
>> +if (!internalStatement && auditEventStack != NULL)
>>  {
>>  stack_pop(auditEventStack->stackId);
>>  }
>>
>> It might be good idea to add Assert() at before calling stack_pop().
>>
>> I'm not sure this change is exactly correct, but I hope this
>> information helps you.
>
> I appreciate you taking the time to look - this is the same conclusion I
> came to.  I also hardened another area that I thought might be vulnerable.
>
> I've seen this problem with explicit cursors before (and fixed it in
> another place a while ago).  The memory context that is current in
> ExecutorStart is freed before ExecutorEnd is called only in this case as
> far as I can tell.  I'm not sure this is very consistent behavior.
>
> I have attached patch v9 which fixes this issue as you suggested, but
> I'm not completely satisfied with it.  It seems like there could be an
> unintentional pop from the stack in a case of deeper nesting.  This
> might not be possible but it's hard to disprove.
>
> I'll think about it some more, but meanwhile this patch addresses the
> present issue.

Thank you for updating the patch.

One question about regarding since v7 (or later) patch is;
What is the different between OBJECT logging and SESSION logging?

I used v9 patch with "pg_audit.log_relation = on", and got quite
similar but different log as follows.

=# select * from hoge, bar where hoge.col = bar.col;
NOTICE:  AUDIT: OBJECT,8,1,READ,SELECT,TABLE,public.hoge,"select *
from hoge, bar where hoge.col = bar.col;"
NOTICE:  AUDIT: SESSION,8,1,READ,SELECT,TABLE,public.hoge,"select *
from hoge, bar where hoge.col = bar.col;"
NOTICE:  AUDIT: OBJECT,8,1,READ,SELECT,TABLE,public.bar,"select * from
hoge, bar where hoge.col = bar.col;"
NOTICE:  AUDIT: SESSION,8,1,READ,SELECT,TABLE,public.bar,"select *
from hoge, bar where hoge.col = bar.col;"

The behaviour of SESSION log is similar to OBJECT log now, and SESSION
log per session (i.g, pg_audit.log_relation = off) is also similar to
'log_statement = all'. (enhancing log_statement might be enough)
So I couldn't understand needs of SESSION log.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-04-20 Thread Sawada Masahiko
On Tue, Apr 7, 2015 at 11:22 AM, Sawada Masahiko  wrote:
> On Tue, Apr 7, 2015 at 7:53 AM, Jim Nasby  wrote:
>> On 4/6/15 5:18 PM, Greg Stark wrote:
>>>
>>> Only I would suggest thinking of it in terms of two orthogonal boolean
>>> flags rather than three states. It's easier to reason about whether a
>>> table has a specific property than trying to control a state machine in
>>> a predefined pathway.
>>>
>>> So I would say the two flags are:
>>> READONLY: guarantees nothing can be dirtied
>>> ALLFROZEN: guarantees no unfrozen tuples are present
>>>
>>> In practice you can't have the later without the former since vacuum
>>> can't know everything is frozen unless it knows nobody is inserting. But
>>> perhaps there will be cases in the future where that's not true.
>>
>>
>> I'm not so sure about that. There's a logical state progression here (see
>> below). ISTM it's easier to just enforce that in one place instead of a
>> bunch of places having to check multiple conditions. But, I'm not wed to a
>> single field.
>>
>>> Incidentally there are number of other optimisations tat over had in
>>> mind that are only possible on frozen read-only tables:
>>>
>>> 1) Compression: compress the pages and pack them one after the other.
>>> Build a new fork with offsets for each page.
>>>
>>> 2) Automatic partition elimination where the statistics track the
>>> minimum and maximum value per partition (and number of tuples) and treat
>>> then as implicit constraints. In particular it would magically make read
>>> only empty parent partitions be excluded regardless of the where clause.
>>
>>
>> AFAICT neither of those actually requires ALLFROZEN, no? You'll need to
>> uncompact and re-compact for #1 when you actually freeze (which maybe isn't
>> worth it), but freezing isn't absolutely required. #2 would only require
>> that everything in the relation is visible; not frozen.
>>
>> I think there's value here to having an ALLVISIBLE state as well as
>> ALLFROZEN.
>>
>
> Based on may suggestions, I'm going to deal with FM at first as one
> patch. It would be simply mechanism and similar to VM, at first patch.
> - Each bit of FM represent single page
> - The bit is set only by vacuum
> - The bit is un-set by inserting and updating and deleting
>
> At second, I'll deal with simply read-only table and 2 states,
> Read/Write(default) and ReadOnly as one patch. ITSM the having the
> Frozen state needs to more discussion. read-only table just allow us
> to disable any updating table, and it's controlled by read-only flag
> pg_class has. And DDL command which changes these status is like ALTER
> TABLE SET READ ONLY, or READ WRITE.
> Also as Alvaro's suggested, the read-only table affect not only
> freezing table but also performance optimization. I'll consider
> including them when I deal with read-only table.
>

Attached WIP patch adds Frozen Map which enables us to avoid whole
table vacuuming even when full scan is required: preventing XID
wraparound failures.

Frozen Map is a bitmap with one bit per heap page, and quite similar
to Visibility Map. A set bit means that all tuples on heap page are
completely frozen, therefore we don't need to do vacuum freeze that
page.
A bit is set when vacuum(or autovacuum) figures out that all tuples on
corresponding heap page are completely frozen, and a bit is cleared
when INSERT and UPDATE(only new heap page) are executed.

Current patch adds new source file src/backend/access/heap/frozenmap.c
which is quite similar to visibilitymap.c. They have similar code but
are separated for now. I do refactoring these source code like adding
bitmap.c, if needed.
Also, when skipping vacuum by visibility map, we can skip at least
SKIP_PAGE_THESHOLD consecutive page, but such mechanism is not in
frozen map.

Please give me feedbacks.

Regards,

---
Sawada Masahiko
diff --git a/src/backend/access/heap/Makefile b/src/backend/access/heap/Makefile
index b83d496..53f07fd 100644
--- a/src/backend/access/heap/Makefile
+++ b/src/backend/access/heap/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/access/heap
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = heapam.o hio.o pruneheap.o rewriteheap.o syncscan.o tuptoaster.o visibilitymap.o
+OBJS = heapam.o hio.o pruneheap.o rewriteheap.o syncscan.o tuptoaster.o visibilitymap.o frozenmap.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/heap/frozenmap.c b/src/backend/access/heap/frozenmap.c
new file mode 100644
index 000..6e64cb8
--- /dev/null
+++ b

Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-04-15 Thread Sawada Masahiko
On Wed, Apr 15, 2015 at 10:52 AM, Sawada Masahiko  wrote:
> On Wed, Apr 15, 2015 at 8:57 AM, David Steele  wrote:
>> On 4/14/15 7:13 PM, Tatsuo Ishii wrote:
>>> This patch does not apply cleanly due to the moving of pgbench (patch
>>> to filelist.sgml failed).
>>
>> Thank you for pointing that out!
>>
>> Ironic that it was the commit directly after the one I was testing with
>> that broke the patch.  It appears the end of the last CF is a very bad
>> time to be behind HEAD.
>>
>> Fixed in attached v8 patch.
>
> Thank you for updating the patch!
>
> I applied the patch and complied them successfully without WARNING.
>
> I tested v8 patch with CURSOR case I mentioned before, and got
> segmentation fault again.
> Here are log messages in my environment,
>
> =# select test();
>  LOG:  server process (PID 29730) was terminated by signal 11:
> Segmentation fault
> DETAIL:  Failed process was running: select test();
> LOG:  terminating any other active server processes
> WARNING:  terminating connection because of crash of another server process
> DETAIL:  The postmaster has commanded this server process to roll back
> the current transaction and exit, because another server process
> exited abnormally and possibly corrupted shared memory.
> HINT:  In a moment you should be able to reconnect to the database and
> repeat your command.
> FATAL:  the database system is in recovery mode
>
> I hope that these messages helps you to address this problem.
> I will also try to address this.
>
> Regards,
>
> ---
> Sawada Masahiko



> I will also try to address this.

I investigated this problem and inform you my result here.

When we execute test() function I mentioned, the three stack items in
total are stored into auditEventStack.
The two of them are freed by stack_pop() -> stack_free() (i.g,
stack_free() is called by stack_pop()).
One of them is freed by PortalDrop() -> .. ->
MemoryContextDeleteChildren() -> ... -> stack_free().
And it is freed at the same time as deleting pg_audit memory context,
and stack will be completely empty.

But after freeing all items, finish_xact_command() function could call
PortalDrop(), so ExecutorEnd() function could be called again.
pg_audit has ExecutorEnd_hook, so postgres tries to free that item.. SEGV.

In my environment, the following change fixes it.

--- pg_audit.c.org2015-04-15 14:21:07.541866525 +0900
+++ pg_audit.c2015-04-15 11:36:53.758877339 +0900
@@ -1291,7 +1291,7 @@
 standard_ExecutorEnd(queryDesc);

 /* Pop the audit event off the stack */
-if (!internalStatement)
+if (!internalStatement && auditEventStack != NULL)
 {
 stack_pop(auditEventStack->stackId);
 }

It might be good idea to add Assert() at before calling stack_pop().

I'm not sure this change is exactly correct, but I hope this
information helps you.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-04-14 Thread Sawada Masahiko
On Wed, Apr 15, 2015 at 8:57 AM, David Steele  wrote:
> On 4/14/15 7:13 PM, Tatsuo Ishii wrote:
>> This patch does not apply cleanly due to the moving of pgbench (patch
>> to filelist.sgml failed).
>
> Thank you for pointing that out!
>
> Ironic that it was the commit directly after the one I was testing with
> that broke the patch.  It appears the end of the last CF is a very bad
> time to be behind HEAD.
>
> Fixed in attached v8 patch.

Thank you for updating the patch!

I applied the patch and complied them successfully without WARNING.

I tested v8 patch with CURSOR case I mentioned before, and got
segmentation fault again.
Here are log messages in my environment,

=# select test();
 LOG:  server process (PID 29730) was terminated by signal 11:
Segmentation fault
DETAIL:  Failed process was running: select test();
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process
exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
FATAL:  the database system is in recovery mode

I hope that these messages helps you to address this problem.
I will also try to address this.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-04-09 Thread Sawada Masahiko
On Thu, Apr 9, 2015 at 1:14 PM, Fujii Masao  wrote:
> On Wed, Apr 8, 2015 at 10:53 PM, Sawada Masahiko  
> wrote:
>> On Wed, Apr 8, 2015 at 1:09 PM, Fujii Masao  wrote:
>>> On Wed, Apr 8, 2015 at 1:57 AM, Sawada Masahiko  
>>> wrote:
>>>> On Wed, Apr 8, 2015 at 1:11 AM, Fabrízio de Royes Mello
>>>>  wrote:
>>>>>
>>>>> On Tue, Apr 7, 2015 at 1:04 PM, Sawada Masahiko 
>>>>> wrote:
>>>>>>
>>>>>> On Tue, Apr 7, 2015 at 10:12 PM, Fabrízio de Royes Mello
>>>>>>  wrote:
>>>>>> >
>>>>>> > On Tue, Apr 7, 2015 at 7:22 AM, Sawada Masahiko 
>>>>>> > wrote:
>>>>>> >>
>>>>>> >> Thank you for your reviewing.
>>>>>> >> I modified the patch and attached latest version patch(v7).
>>>>>> >> Please have a look it.
>>>>>> >>
>>>>>> >
>>>>>> > Looks good to me. Attached patch (v8) just fix a tab indentation in
>>>>>> > gram.y.
>>>>>> >
>>>>>>
>>>>>> I had forgotten fix a tab indentation, sorry.
>>>>>> Thank you for reviewing!
>>>>>> It looks good to me too.
>>>>>> Can this patch be marked as "Ready for Committer"?
>>>>>>
>>>>>
>>>>> +1
>>>>>
>>>>
>>>> Changed status to "Ready for Committer".
>>>
>>> The patch adds new syntax like "REINDEX ... WITH VERBOSE", i.e., () is not
>>> added after WITH clause. Did we reach the consensus about this syntax?
>>> The last email from Robert just makes me think that () should be added
>>> into the syntax.
>>>
>>
>> Thank you for taking time for this patch!
>
> I removed the FORCE option from REINDEX, so you would need to update the 
> patch.

Thanks.
I will change the patch with this change.

>> This was quite complicated issue since we already have a lot of style
>> command currently.
>> We can think many of style from various perspective: kind of DDL, new
>> or old command, maintenance command. And each command has each its
>> story.
>> I believe we have reached the consensus with this style at least once
>> (please see previous discussion), but we might needs to discuss
>> more...
>
> Okay, another question is that; WITH must be required whenever the options
> are specified? Or should it be abbreviatable?

In previous discussion, the WITH clause is always required by VERBOSE
option. I don't think to enable us to abbreviate WITH clause for now.
Also, at this time that FORCE option is removed, we could bring back
idea is to put VERBOSE at after object name like CLUSTER. (INDEX,
TABLE, etc.)

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-04-08 Thread Sawada Masahiko
On Wed, Apr 8, 2015 at 1:09 PM, Fujii Masao  wrote:
> On Wed, Apr 8, 2015 at 1:57 AM, Sawada Masahiko  wrote:
>> On Wed, Apr 8, 2015 at 1:11 AM, Fabrízio de Royes Mello
>>  wrote:
>>>
>>> On Tue, Apr 7, 2015 at 1:04 PM, Sawada Masahiko 
>>> wrote:
>>>>
>>>> On Tue, Apr 7, 2015 at 10:12 PM, Fabrízio de Royes Mello
>>>>  wrote:
>>>> >
>>>> > On Tue, Apr 7, 2015 at 7:22 AM, Sawada Masahiko 
>>>> > wrote:
>>>> >>
>>>> >> Thank you for your reviewing.
>>>> >> I modified the patch and attached latest version patch(v7).
>>>> >> Please have a look it.
>>>> >>
>>>> >
>>>> > Looks good to me. Attached patch (v8) just fix a tab indentation in
>>>> > gram.y.
>>>> >
>>>>
>>>> I had forgotten fix a tab indentation, sorry.
>>>> Thank you for reviewing!
>>>> It looks good to me too.
>>>> Can this patch be marked as "Ready for Committer"?
>>>>
>>>
>>> +1
>>>
>>
>> Changed status to "Ready for Committer".
>
> The patch adds new syntax like "REINDEX ... WITH VERBOSE", i.e., () is not
> added after WITH clause. Did we reach the consensus about this syntax?
> The last email from Robert just makes me think that () should be added
> into the syntax.
>

Thank you for taking time for this patch!

This was quite complicated issue since we already have a lot of style
command currently.
We can think many of style from various perspective: kind of DDL, new
or old command, maintenance command. And each command has each its
story.
I believe we have reached the consensus with this style at least once
(please see previous discussion), but we might needs to discuss
more...

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-04-07 Thread Sawada Masahiko
On Wed, Apr 8, 2015 at 1:11 AM, Fabrízio de Royes Mello
 wrote:
>
> On Tue, Apr 7, 2015 at 1:04 PM, Sawada Masahiko 
> wrote:
>>
>> On Tue, Apr 7, 2015 at 10:12 PM, Fabrízio de Royes Mello
>>  wrote:
>> >
>> > On Tue, Apr 7, 2015 at 7:22 AM, Sawada Masahiko 
>> > wrote:
>> >>
>> >> Thank you for your reviewing.
>> >> I modified the patch and attached latest version patch(v7).
>> >> Please have a look it.
>> >>
>> >
>> > Looks good to me. Attached patch (v8) just fix a tab indentation in
>> > gram.y.
>> >
>>
>> I had forgotten fix a tab indentation, sorry.
>> Thank you for reviewing!
>> It looks good to me too.
>> Can this patch be marked as "Ready for Committer"?
>>
>
> +1
>

Changed status to "Ready for Committer".
Thank you for final reviewing!

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-04-07 Thread Sawada Masahiko
On Tue, Apr 7, 2015 at 10:12 PM, Fabrízio de Royes Mello
 wrote:
>
> On Tue, Apr 7, 2015 at 7:22 AM, Sawada Masahiko 
> wrote:
>>
>> Thank you for your reviewing.
>> I modified the patch and attached latest version patch(v7).
>> Please have a look it.
>>
>
> Looks good to me. Attached patch (v8) just fix a tab indentation in gram.y.
>

I had forgotten fix a tab indentation, sorry.
Thank you for reviewing!
It looks good to me too.
Can this patch be marked as "Ready for Committer"?

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-04-07 Thread Sawada Masahiko
On Tue, Apr 7, 2015 at 9:32 AM, Fabrízio de Royes Mello
 wrote:
>
> On Mon, Apr 6, 2015 at 10:21 AM, Sawada Masahiko 
> wrote:
>>
>> On Fri, Mar 13, 2015 at 5:10 PM, Kyotaro HORIGUCHI
>>  wrote:
>> > Hello, I have some trivial comments about the latest patch.
>> >
>> > At Thu, 12 Mar 2015 21:15:14 +0900, Sawada Masahiko
>> >  wrote in
>> > 
>> > sawada.mshk> On Thu, Mar 12, 2015 at 6:36 AM, Jim Nasby
>> >  wrote:
>> >> >>> >Are the parenthesis necessary? No other WITH option requires them,
>> >> >>> > other
>> >> >>> >than create table/matview (COPY doesn't actually require them).
>> >> >>> >
>> >> >>
>> >> >> I was imagining EXPLAIN syntax.
>> >> >> Is there some possibility of supporting multiple options for REINDEX
>> >> >> command in future?
>> >> >> If there is, syntax will be as follows, REINDEX { INDEX | ... } name
>> >> >> WITH VERBOSE, XXX, XXX;
>> >> >> I thought style with parenthesis is better than above style.
>> >> >
>> >> >
>> >> > The thing is, ()s are actually an odd-duck. Very little supports it,
>> >> > and
>> >> > while COPY allows it they're not required. EXPLAIN is a different
>> >> > story,
>> >> > because that's not WITH; we're actually using () *instead of* WITH.
>> >> >
>> >> > So because almost all commands that use WITH doen't even accept (), I
>> >> > don't
>> >> > think this should either. It certainly shouldn't require them,
>> >> > because
>> >> > unlike EXPLAIN, there's no need to require them.
>> >> >
>> >>
>> >> I understood what your point is.
>> >> Attached patch is changed syntax, it does not have parenthesis.
>> >
>> > As I looked into the code to find what the syntax would be, I
>> > found some points which would be better to be fixed.
>> >
>> > In gram.y the options is a list of cstring but it is not necesary
>> > to be a list because there's only one kind of option now.
>> >
>> > If you prefer it to be a list, I have a comment for the way to
>> > make string list in gram.y. You stored bare cstring in the
>> > options list but I think it is not the preferable form. I suppose
>> > the followings are preferable. Corresponding fixes are needed in
>> > ReindexTable, ReindexIndex, ReindexMultipleTables.
>> >
>> > $ = list_make1(makeString($1);
>> >  
>> > $ = lappend($1, list_make1(makeString($3));
>> >
>> >
>> > In equalfuncs.c, _equalReindexStmt forgets to compare the member
>> > options. _copyReindexStmt also forgets to copy it. The way you
>> > constructed the options list prevents them from doing their jobs
>> > using prepared methods. Comparing and copying the member "option"
>> > is needed even if it becomes a simple string.
>> >
>>
>> I revised patch, and changed gram.y as I don't use the list.
>> So this patch adds new syntax,
>> REINDEX { INDEX | ... } name WITH VERBOSE;
>>
>> Also documentation is updated.
>> Please give me feedbacks.
>>
>
> Some notes:
>
> 1) There are a trailing space in src/backend/parser/gram.y:
>
> -   | REINDEX DATABASE name opt_force
> +   | REINDEX reindex_target_multitable name WITH opt_verbose
> {
> ReindexStmt *n = makeNode(ReindexStmt);
> -   n->kind = REINDEX_OBJECT_DATABASE;
> +   n->kind = $2;
> n->name = $3;
> n->relation = NULL;
> +   n->verbose = $5;
> $$ = (Node *)n;
> }
> ;
>
>
> 2) The documentation was updated and is according the behaviour.
>
>
> 3) psql autocomplete is ok.
>
>
> 4) Lack of regression tests. I think you should add some regression like
> that:
>
> fabrizio=# \set VERBOSITY terse
> fabrizio=# create table reindex_verbose(id integer primary key);
> CREATE TABLE
> fabrizio=# reindex table reindex_verbose with verbose;
> INFO:  index "reindex_verbose_pkey" was reindexed.
> REINDEX
>
>
> 5) Code style and organization is ok
>
>
> 6) You should add the new field ReindexStmt->verbose to
> src/bac

Re: [HACKERS] Freeze avoidance of very large table.

2015-04-06 Thread Sawada Masahiko
On Tue, Apr 7, 2015 at 7:53 AM, Jim Nasby  wrote:
> On 4/6/15 5:18 PM, Greg Stark wrote:
>>
>> Only I would suggest thinking of it in terms of two orthogonal boolean
>> flags rather than three states. It's easier to reason about whether a
>> table has a specific property than trying to control a state machine in
>> a predefined pathway.
>>
>> So I would say the two flags are:
>> READONLY: guarantees nothing can be dirtied
>> ALLFROZEN: guarantees no unfrozen tuples are present
>>
>> In practice you can't have the later without the former since vacuum
>> can't know everything is frozen unless it knows nobody is inserting. But
>> perhaps there will be cases in the future where that's not true.
>
>
> I'm not so sure about that. There's a logical state progression here (see
> below). ISTM it's easier to just enforce that in one place instead of a
> bunch of places having to check multiple conditions. But, I'm not wed to a
> single field.
>
>> Incidentally there are number of other optimisations tat over had in
>> mind that are only possible on frozen read-only tables:
>>
>> 1) Compression: compress the pages and pack them one after the other.
>> Build a new fork with offsets for each page.
>>
>> 2) Automatic partition elimination where the statistics track the
>> minimum and maximum value per partition (and number of tuples) and treat
>> then as implicit constraints. In particular it would magically make read
>> only empty parent partitions be excluded regardless of the where clause.
>
>
> AFAICT neither of those actually requires ALLFROZEN, no? You'll need to
> uncompact and re-compact for #1 when you actually freeze (which maybe isn't
> worth it), but freezing isn't absolutely required. #2 would only require
> that everything in the relation is visible; not frozen.
>
> I think there's value here to having an ALLVISIBLE state as well as
> ALLFROZEN.
>

Based on may suggestions, I'm going to deal with FM at first as one
patch. It would be simply mechanism and similar to VM, at first patch.
- Each bit of FM represent single page
- The bit is set only by vacuum
- The bit is un-set by inserting and updating and deleting

At second, I'll deal with simply read-only table and 2 states,
Read/Write(default) and ReadOnly as one patch. ITSM the having the
Frozen state needs to more discussion. read-only table just allow us
to disable any updating table, and it's controlled by read-only flag
pg_class has. And DDL command which changes these status is like ALTER
TABLE SET READ ONLY, or READ WRITE.
Also as Alvaro's suggested, the read-only table affect not only
freezing table but also performance optimization. I'll consider
including them when I deal with read-only table.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-04-06 Thread Sawada Masahiko
On Mon, Apr 6, 2015 at 10:17 PM, Jim Nasby  wrote:
> On 4/6/15 1:46 AM, Sawada Masahiko wrote:
>>
>> On Sun, Apr 5, 2015 at 8:21 AM, Jeff Janes  wrote:
>>>
>>> On Sat, Apr 4, 2015 at 3:10 PM, Jim Nasby 
>>> wrote:
>>>>
>>>>
>>>> On 4/3/15 12:59 AM, Sawada Masahiko wrote:
>>>>>
>>>>>
>>>>> +   case HEAPTUPLE_LIVE:
>>>>> +   case HEAPTUPLE_RECENTLY_DEAD:
>>>>> +   case HEAPTUPLE_INSERT_IN_PROGRESS:
>>>>> +   case HEAPTUPLE_DELETE_IN_PROGRESS:
>>>>> +   if
>>>>> (heap_prepare_freeze_tuple(tuple.t_data, freezelimit,
>>>>> +
>>>>> mxactcutoff, &frozen[nfrozen]))
>>>>> +
>>>>> frozen[nfrozen++].offset
>>>>> = offnum;
>>>>> +   break;
>>>>
>>>>
>>>>
>>>> This doesn't seem safe enough to me. Can't there be tuples that are
>>>> still
>>>> new enough that they can't be frozen, and are still live?
>>>
>>>
>>>
>>> Yep.  I've set a table to read only while it contained unfreezable
>>> tuples,
>>> and the tuples remain unfrozen yet the read-only action claims to have
>>> succeeded.
>>>
>>>
>>>>
>>>> Somewhat related... instead of forcing the freeze to happen
>>>> synchronously,
>>>> can't we set this up so a table is in one of three states? Read/Write,
>>>> Read
>>>> Only, Frozen. AT_SetReadOnly and AT_SetReadWrite would simply change to
>>>> the
>>>> appropriate state, and all the vacuum infrastructure would continue to
>>>> process those tables as it does today. lazy_vacuum_rel would become
>>>> responsible for tracking if there were any non-frozen tuples if it was
>>>> also
>>>> attempting a freeze. If it discovered there were none, AND the table was
>>>> marked as ReadOnly, then it would change the table state to Frozen and
>>>> set
>>>> relfrozenxid = InvalidTransactionId and relminxid = InvalidMultiXactId.
>>>> AT_SetReadWrite could change relfrozenxid to it's own Xid as an
>>>> optimization. Doing it that way leaves all the complicated vacuum code
>>>> in
>>>> one place, and would eliminate concerns about race conditions with still
>>>> running transactions, etc.
>>>
>>>
>>>
>>> +1 here as well.  I might want to set tables to read only for reasons
>>> other
>>> than to avoid repeated freezing.  (After all, the name of the command
>>> suggests it is a general purpose thing) and wouldn't want to
>>> automatically
>>> trigger a vacuum freeze in the process.
>>>
>>
>> Thank you for comments.
>>
>>> Somewhat related... instead of forcing the freeze to happen
>>> synchronously, can't we set this up so a table is in one of three states?
>>> Read/Write, Read Only, Frozen. AT_SetReadOnly and AT_SetReadWrite would
>>> simply change to > the appropriate state, and all the vacuum infrastructure
>>> would continue to process those tables as it does today. lazy_vacuum_rel
>>> would become responsible for tracking if there were any non-frozen tuples if
>>> it was also attempting > a freeze. If it discovered there were none, AND the
>>> table was marked as ReadOnly, then it would change the table state to Frozen
>>> and set relfrozenxid = InvalidTransactionId and relminxid =
>>> InvalidMultiXactId. AT_SetReadWrite > could change relfrozenxid to it's own
>>> Xid as an optimization. Doing it that way leaves all the complicated vacuum
>>> code in one place, and would eliminate concerns about race conditions with
>>> still running transactions, etc.
>>
>>
>> I agree with 3 status, Read/Write, ReadOnly and Frozen.
>> But I'm not sure when we should do to freeze tuples, e.g., scan whole
>> tables.
>> I think that the any changes to table are completely
>> ignored/restricted if table is marked as ReadOnly table,
>> and it's accompanied by freezing tuples, just mark as ReadOnly.
>> Frozen table ensures that all tuples of its table completely has been
>> frozen, so it also needs to scan whole table as well.
>> e.g., we should need t

Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-04-06 Thread Sawada Masahiko
On Fri, Mar 13, 2015 at 5:10 PM, Kyotaro HORIGUCHI
 wrote:
> Hello, I have some trivial comments about the latest patch.
>
> At Thu, 12 Mar 2015 21:15:14 +0900, Sawada Masahiko  
> wrote in 
> sawada.mshk> On Thu, Mar 12, 2015 at 6:36 AM, Jim Nasby 
>  wrote:
>> >>> >Are the parenthesis necessary? No other WITH option requires them, other
>> >>> >than create table/matview (COPY doesn't actually require them).
>> >>> >
>> >>
>> >> I was imagining EXPLAIN syntax.
>> >> Is there some possibility of supporting multiple options for REINDEX
>> >> command in future?
>> >> If there is, syntax will be as follows, REINDEX { INDEX | ... } name
>> >> WITH VERBOSE, XXX, XXX;
>> >> I thought style with parenthesis is better than above style.
>> >
>> >
>> > The thing is, ()s are actually an odd-duck. Very little supports it, and
>> > while COPY allows it they're not required. EXPLAIN is a different story,
>> > because that's not WITH; we're actually using () *instead of* WITH.
>> >
>> > So because almost all commands that use WITH doen't even accept (), I don't
>> > think this should either. It certainly shouldn't require them, because
>> > unlike EXPLAIN, there's no need to require them.
>> >
>>
>> I understood what your point is.
>> Attached patch is changed syntax, it does not have parenthesis.
>
> As I looked into the code to find what the syntax would be, I
> found some points which would be better to be fixed.
>
> In gram.y the options is a list of cstring but it is not necesary
> to be a list because there's only one kind of option now.
>
> If you prefer it to be a list, I have a comment for the way to
> make string list in gram.y. You stored bare cstring in the
> options list but I think it is not the preferable form. I suppose
> the followings are preferable. Corresponding fixes are needed in
> ReindexTable, ReindexIndex, ReindexMultipleTables.
>
> $$ = list_make1(makeString($1);
>  
> $$ = lappend($1, list_make1(makeString($3));
>
>
> In equalfuncs.c, _equalReindexStmt forgets to compare the member
> options. _copyReindexStmt also forgets to copy it. The way you
> constructed the options list prevents them from doing their jobs
> using prepared methods. Comparing and copying the member "option"
> is needed even if it becomes a simple string.
>

I revised patch, and changed gram.y as I don't use the list.
So this patch adds new syntax,
REINDEX { INDEX | ... } name WITH VERBOSE;

Also documentation is updated.
Please give me feedbacks.

Regards,

---
Sawada Masahiko
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 0a4c7d4..27be1a4 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -22,6 +22,7 @@ PostgreSQL documentation
  
 
 REINDEX { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } name [ FORCE ]
+REINDEX { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } name WITH VERBOSE
 
  
 
@@ -159,6 +160,15 @@ REINDEX { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } rd_rel->relpersistence;
 	index_close(irel, NoLock);
 
-	reindex_index(indOid, false, persistence);
+	reindex_index(indOid, false, persistence, verbose);
 
 	return indOid;
 }
@@ -1775,7 +1775,7 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
  *		Recreate all indexes of a table (and of its toast table, if any)
  */
 Oid
-ReindexTable(RangeVar *relation)
+ReindexTable(RangeVar *relation, bool verbose)
 {
 	Oid			heapOid;
 
@@ -1785,7 +1785,8 @@ ReindexTable(RangeVar *relation)
 
 	if (!reindex_relation(heapOid,
 		  REINDEX_REL_PROCESS_TOAST |
-		  REINDEX_REL_CHECK_CONSTRAINTS))
+		  REINDEX_REL_CHECK_CONSTRAINTS,
+		  verbose))
 		ereport(NOTICE,
 (errmsg("table \"%s\" has no indexes",
 		relation->relname)));
@@ -1802,7 +1803,7 @@ ReindexTable(RangeVar *relation)
  * That means this must not be called within a user transaction block!
  */
 void
-ReindexMultipleTables(const char *objectName, ReindexObjectType objectKind)
+ReindexMultipleTables(const char *objectName, ReindexObjectType objectKind, bool verbose)
 {
 	Oid			objectOid;
 	Relation	relationRelation;
@@ -1814,6 +1815,7 @@ ReindexMultipleTables(const char *objectName, ReindexObjectType objectKind)
 	List	   *relids = NIL;
 	ListCell   *l;
 	int			num_keys;
+	int			elevel = verbose ? INFO : DEBUG2;
 
 	AssertArg(objectName);
 	Assert(objectKind == REINDEX_OBJECT_SCHEMA ||
@@ -1938,9 +1940,10 @@ ReindexMultipleTables(const char *objectName, ReindexObjectType objectKind)
 		PushActiveSnapshot(GetTransactionSnapshot());
 		if (reindex_relation(relid,
 			 REINDEX_REL_PROC

Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-04-06 Thread Sawada Masahiko
On Fri, Apr 3, 2015 at 10:01 PM, David Steele  wrote:
> On 4/3/15 3:59 AM, Sawada Masahiko wrote:
>> On Thu, Apr 2, 2015 at 2:46 AM, David Steele  wrote:
>>> Let me know if you see any other issues.
>>>
>>
>> I pulled HEAD, and then tried to compile source code after applied
>> following "deparsing utility command patch" without #0001 and #0002.
>> (because these two patches are already pushed)
>> <http://www.postgresql.org/message-id/20150325175954.gl3...@alvh.no-ip.org>
>>
>> But I could not use new pg_audit patch due to compile error (that
>> deparsing utility command patch might have).
>> I think I'm missing something, but I'm not sure.
>> Could you tell me which patch should I apply before compiling pg_audit?
>
> When Alvaro posted his last patch set he recommended applying them to
> b3196e65:
>
> http://www.postgresql.org/message-id/20150325175954.gl3...@alvh.no-ip.org
>
> Applying the 3/25 deparse patches onto b3196e65 (you'll see one trailing
> space error) then applying pg_audit-v6.patch works fine.
>

I tested v6 patch, but I got SEGV when I executed test() function I
mentioned up thread.
Could you address this problem?

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-04-06 Thread Sawada Masahiko
On Fri, Apr 3, 2015 at 10:01 PM, David Steele  wrote:
> On 4/3/15 3:59 AM, Sawada Masahiko wrote:
>> On Thu, Apr 2, 2015 at 2:46 AM, David Steele  wrote:
>>> Let me know if you see any other issues.
>>>
>>
>> I pulled HEAD, and then tried to compile source code after applied
>> following "deparsing utility command patch" without #0001 and #0002.
>> (because these two patches are already pushed)
>> <http://www.postgresql.org/message-id/20150325175954.gl3...@alvh.no-ip.org>
>>
>> But I could not use new pg_audit patch due to compile error (that
>> deparsing utility command patch might have).
>> I think I'm missing something, but I'm not sure.
>> Could you tell me which patch should I apply before compiling pg_audit?
>
> When Alvaro posted his last patch set he recommended applying them to
> b3196e65:
>
> http://www.postgresql.org/message-id/20150325175954.gl3...@alvh.no-ip.org
>
> Applying the 3/25 deparse patches onto b3196e65 (you'll see one trailing
> space error) then applying pg_audit-v6.patch works fine.
>

Thank you for your advice!
I'm testing pg_audit, so I will send report to you as soon as possible.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-04-05 Thread Sawada Masahiko
On Sun, Apr 5, 2015 at 8:21 AM, Jeff Janes  wrote:
> On Sat, Apr 4, 2015 at 3:10 PM, Jim Nasby  wrote:
>>
>> On 4/3/15 12:59 AM, Sawada Masahiko wrote:
>>>
>>> +   case HEAPTUPLE_LIVE:
>>> +   case HEAPTUPLE_RECENTLY_DEAD:
>>> +   case HEAPTUPLE_INSERT_IN_PROGRESS:
>>> +   case HEAPTUPLE_DELETE_IN_PROGRESS:
>>> +   if
>>> (heap_prepare_freeze_tuple(tuple.t_data, freezelimit,
>>> +
>>> mxactcutoff, &frozen[nfrozen]))
>>> +   frozen[nfrozen++].offset
>>> = offnum;
>>> +   break;
>>
>>
>> This doesn't seem safe enough to me. Can't there be tuples that are still
>> new enough that they can't be frozen, and are still live?
>
>
> Yep.  I've set a table to read only while it contained unfreezable tuples,
> and the tuples remain unfrozen yet the read-only action claims to have
> succeeded.
>
>
>>
>> Somewhat related... instead of forcing the freeze to happen synchronously,
>> can't we set this up so a table is in one of three states? Read/Write, Read
>> Only, Frozen. AT_SetReadOnly and AT_SetReadWrite would simply change to the
>> appropriate state, and all the vacuum infrastructure would continue to
>> process those tables as it does today. lazy_vacuum_rel would become
>> responsible for tracking if there were any non-frozen tuples if it was also
>> attempting a freeze. If it discovered there were none, AND the table was
>> marked as ReadOnly, then it would change the table state to Frozen and set
>> relfrozenxid = InvalidTransactionId and relminxid = InvalidMultiXactId.
>> AT_SetReadWrite could change relfrozenxid to it's own Xid as an
>> optimization. Doing it that way leaves all the complicated vacuum code in
>> one place, and would eliminate concerns about race conditions with still
>> running transactions, etc.
>
>
> +1 here as well.  I might want to set tables to read only for reasons other
> than to avoid repeated freezing.  (After all, the name of the command
> suggests it is a general purpose thing) and wouldn't want to automatically
> trigger a vacuum freeze in the process.
>

Thank you for comments.

> Somewhat related... instead of forcing the freeze to happen synchronously, 
> can't we set this up so a table is in one of three states? Read/Write, Read 
> Only, Frozen. AT_SetReadOnly and AT_SetReadWrite would simply change to > the 
> appropriate state, and all the vacuum infrastructure would continue to 
> process those tables as it does today. lazy_vacuum_rel would become 
> responsible for tracking if there were any non-frozen tuples if it was also 
> attempting > a freeze. If it discovered there were none, AND the table was 
> marked as ReadOnly, then it would change the table state to Frozen and set 
> relfrozenxid = InvalidTransactionId and relminxid = InvalidMultiXactId. 
> AT_SetReadWrite > could change relfrozenxid to it's own Xid as an 
> optimization. Doing it that way leaves all the complicated vacuum code in one 
> place, and would eliminate concerns about race conditions with still running 
> transactions, etc.

I agree with 3 status, Read/Write, ReadOnly and Frozen.
But I'm not sure when we should do to freeze tuples, e.g., scan whole tables.
I think that the any changes to table are completely
ignored/restricted if table is marked as ReadOnly table,
and it's accompanied by freezing tuples, just mark as ReadOnly.
Frozen table ensures that all tuples of its table completely has been
frozen, so it also needs to scan whole table as well.
e.g., we should need to scan whole table at two times. right?

> +1 here as well.  I might want to set tables to read only for reasons other 
> than to avoid repeated freezing.  (After all, the name of the command 
> suggests it is a general purpose thing) and wouldn't want to automatically 
> trigger a
> vacuum freeze in the process.
>
> There is another possibility here, too. We can completely divorce a ReadOnly 
> mode (which I think is useful for other things besides freezing) from the 
> question of whether we need to force-freeze a relation if we create a
> FrozenMap, similar to the visibility map. This has the added advantage of 
> helping freeze scans on relations that are not ReadOnly in the case of tables 
> that are insert-mostly or any other pattern where most pages stay all-frozen.
> Prior to the visibility map this would have been a rather daunting project, 
> but I believe this could piggyback on t

Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-04-04 Thread Sawada Masahiko
On Wed, Mar 11, 2015 at 11:46 PM, Sawada Masahiko  wrote:
> On Tue, Mar 10, 2015 at 12:08 PM, Stephen Frost  wrote:
>> Sawada,
>>
>> * Sawada Masahiko (sawada.m...@gmail.com) wrote:
>>> Thank you for your review!
>>> Attached file is the latest version (without document patch. I making it 
>>> now.)
>>> As per discussion, there is no change regarding of super user permission.
>>
>> Ok.  Here's another review.
>>
>
> Thank you for your review!
>
>>> diff --git a/src/backend/catalog/system_views.sql 
>>> b/src/backend/catalog/system_views.sql
>>> index 5e69e2b..94ee229 100644
>>> --- a/src/backend/catalog/system_views.sql
>>> +++ b/src/backend/catalog/system_views.sql
>>> @@ -414,6 +414,11 @@ CREATE RULE pg_settings_n AS
>>>
>>>  GRANT SELECT, UPDATE ON pg_settings TO PUBLIC;
>>>
>>> +CREATE VIEW pg_file_settings AS
>>> +   SELECT * FROM pg_show_all_file_settings() AS A;
>>> +
>>> +REVOKE ALL on pg_file_settings FROM public;
>>> +
>>
>> This suffers from a lack of pg_dump support, presuming that the
>> superuser is entitled to change the permissions on this function.  As it
>> turns out though, you're in luck in that regard since I'm working on
>> fixing that for a mostly unrelated patch.  Any feedback you have on what
>> I'm doing to pg_dump here:
>>
>> http://www.postgresql.org/message-id/20150307213935.go29...@tamriel.snowman.net
>>
>> Would certainly be appreciated.
>>
>
> Thank you for the info.
> I will read the discussion and send the feedbacks.
>
>>> diff --git a/src/backend/utils/misc/guc-file.l 
>>> b/src/backend/utils/misc/guc-file.l
>> [...]
>>> +  * Calculate size of guc_array and allocate it. From the secound time 
>>> to allcate memory,
>>> +  * we should free old allocated memory.
>>> +  */
>>> + guc_array_size = num_guc_file_variables * sizeof(struct 
>>> ConfigFileVariable);
>>> + if (!guc_file_variables)
>>> + {
>>> + /* For the first call */
>>> + guc_file_variables = (ConfigFileVariable *) guc_malloc(FATAL, 
>>> guc_array_size);
>>> + }
>>> + else
>>> + {
>>> + guc_array = guc_file_variables;
>>> + for (item = head; item; item = item->next, guc_array++)
>>> + {
>>> + free(guc_array->name);
>>> + free(guc_array->value);
>>> + free(guc_array->filename);
>>
>> It's a minor nit-pick, as the below loop should handle anything we care
>> about, but I'd go ahead and reset guc_array->sourceline to be -1 or
>> something too, just to show that everything's being handled here with
>> regard to cleanup.  Or perhaps just add a comment saying that the
>> sourceline for all currently valid entries will be updated.
>
> Fixed.
>
>>
>>> + guc_file_variables = (ConfigFileVariable *) 
>>> guc_realloc(FATAL, guc_file_variables, guc_array_size);
>>> + }
>>
>> Nasby made a comment, I believe, that we might actually be able to use
>> memory contexts here, which would certainly be much nicer as then you'd
>> be able to just throw away the whole context and create a new one.  Have
>> you looked at that approach at all?  Offhand, I'm not sure if it'd work
>> or not (I have my doubts..) but it seems worthwhile to consider.
>>
>
> I successfully used palloc() and pfree() when allocate and free
> guc_file_variable,
> but these variable seems to be freed already when I get value of
> pg_file_settings view via SQL.
>
>> Otherwise, it seems like this would address my previous concerns that
>> this would end up leaking memory, and even if it's a bit slower than one
>> might hope, it's not an operation we should be doing very frequently.
>>
>>> --- a/src/backend/utils/misc/guc.c
>>> +++ b/src/backend/utils/misc/guc.c
>> [...]
>>>  /*
>>> + * Return the total number of GUC File variables
>>> + */
>>> +int
>>> +GetNumConfigFileOptions(void)
>>> +{
>>> + return num_guc_file_variables;
>>> +}
>>
>> What's the point of this function..?  I'm not convinced it's the best
>> idea, but it certainly looks like the function and the variable are only
>> used from this file anyway?
>
> It's unnecessary.
&

Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-04-03 Thread Sawada Masahiko
On Thu, Apr 2, 2015 at 2:46 AM, David Steele  wrote:
> Hi Sawada,
>
> On 3/25/15 9:24 AM, David Steele wrote:
>> On 3/25/15 7:46 AM, Sawada Masahiko wrote:
>>> 2.
>>> I got ERROR when executing function uses cursor.
>>>
>>> 1) create empty table (hoge table)
>>> 2) create test function as follows.
>>>
>>> create function test() returns int as $$
>>> declare
>>> cur1 cursor for select * from hoge;
>>> tmp int;
>>> begin
>>> open cur1;
>>> fetch cur1 into tmp;
>>>return tmp;
>>> end$$
>>> language plpgsql ;
>>>
>>> 3) execute test function (got ERROR)
>>> =# select test();
>>> LOG:  AUDIT: SESSION,6,1,READ,SELECT,,,selecT test();
>>> LOG:  AUDIT: SESSION,6,2,FUNCTION,EXECUTE,FUNCTION,public.test,selecT 
>>> test();
>>> LOG:  AUDIT: SESSION,6,3,READ,SELECT,,,select * from hoge
>>> CONTEXT:  PL/pgSQL function test() line 6 at OPEN
>>> ERROR:  pg_audit stack is already empty
>>> STATEMENT:  selecT test();
>>>
>>> It seems like that the item in stack is already freed by deleting
>>> pg_audit memory context (in MemoryContextDelete()),
>>> before calling stack_pop in dropping of top-level Portal.
>
> This has been fixed and I have attached a new patch.  I've seen this
> with cursors before where the parent MemoryContext is freed before
> control is returned to ProcessUtility.  I think that's strange behavior
> but there's not a lot I can do about it.
>

Thank you for updating the patch!

> The code I put in to deal with this situation was not quite robust
> enough so I had to harden it a bit more.
>
> Let me know if you see any other issues.
>

I pulled HEAD, and then tried to compile source code after applied
following "deparsing utility command patch" without #0001 and #0002.
(because these two patches are already pushed)
<http://www.postgresql.org/message-id/20150325175954.gl3...@alvh.no-ip.org>

But I could not use new pg_audit patch due to compile error (that
deparsing utility command patch might have).
I think I'm missing something, but I'm not sure.
Could you tell me which patch should I apply before compiling pg_audit?

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-03-25 Thread Sawada Masahiko
On Wed, Mar 25, 2015 at 12:23 PM, David Steele  wrote:
>> On Wed, Mar 25, 2015 at 12:38 AM, David Steele  wrote:
>>>> 2. OBJECT auditing does not work before adding acl info to 
>>>> pg_class.rel_acl.
>>>> In following situation, pg_audit can not audit OBJECT log.
>>>> $ cat postgresql.conf | grep audit
>>>> shared_preload_libraries = 'pg_audit'
>>>> pg_audit.role = 'hoge_user'
>>>> pg_audit.log = 'read, write'
>>>> $ psql -d postgres -U hoge_user
>>>> =# create table hoge(col int);
>>>> =# select * from hoge;
>>>> LOG:  AUDIT: SESSION,3,1,READ,SELECT,,,select * from hoge;
>>>>
>>>> OBJECT audit log is not logged here since pg_class.rel_acl is empty
>>>> yet. (Only logged SESSION log)
>>>> So after creating another unconcerned role and grant any privilege to that 
>>>> user,
>>>> OBJECT audit is logged successfully.
>>>
>>> Yes, object auditing does not work until some grants have been made to
>>> the audit role.
>>>
>>>> =# create role bar_user;
>>>> =# grant select on hoge to bar_user;
>>>> =# select * from hoge;
>>>> LOG:  AUDIT: SESSION,4,1,READ,SELECT,,,select * from hoge;
>>>> LOG:  AUDIT: OBJECT,4,1,READ,SELECT,TABLE,public.hoge,select * from hoge;
>>>>
>>>> The both OBJCET and SESSION log are logged.
>>>
>>> Looks right to me.  If you don't want the session logging then disable
>>> pg_audit.log.
>>>
>>> Session and object logging are completely independent from each other:
>>> one or the other, or both, or neither can be enabled at any time.
>>
>> It means that OBJECT log is not logged just after creating table, even
>> if that table is touched by its owner.
>> To write OBJECT log, we need to grant privilege to role at least. right?
>
> Exactly.  Privileges must be granted to the audit role in order for
> object auditing to work.
>

The table owner always can touch its table.
So does it lead that table owner can get its table information while
hiding OBJECT logging?

Also I looked into latest patch again.
Here are two review comment.

1.
> typedef struct
> {
>int64 statementId;
>   int64 substatementId;
Both statementId and substatementId could be negative number.
I think these should be uint64 instead.

2.
I got ERROR when executing function uses cursor.

1) create empty table (hoge table)
2) create test function as follows.

create function test() returns int as $$
declare
cur1 cursor for select * from hoge;
tmp int;
begin
open cur1;
fetch cur1 into tmp;
   return tmp;
end$$
language plpgsql ;

3) execute test function (got ERROR)
=# select test();
LOG:  AUDIT: SESSION,6,1,READ,SELECT,,,selecT test();
LOG:  AUDIT: SESSION,6,2,FUNCTION,EXECUTE,FUNCTION,public.test,selecT test();
LOG:  AUDIT: SESSION,6,3,READ,SELECT,,,select * from hoge
CONTEXT:  PL/pgSQL function test() line 6 at OPEN
ERROR:  pg_audit stack is already empty
STATEMENT:  selecT test();

It seems like that the item in stack is already freed by deleting
pg_audit memory context (in MemoryContextDelete()),
before calling stack_pop in dropping of top-level Portal.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-03-24 Thread Sawada Masahiko
Hi David,

Thank you for your answer!

On Wed, Mar 25, 2015 at 12:38 AM, David Steele  wrote:
> Hi Sawada,
>
> Thank you for taking the time to look at the patch.
>
> On 3/24/15 10:28 AM, Sawada Masahiko wrote:
>> I've applied these patchese successfully.
>>
>> I looked into this module, and had a few comments as follows.
>> 1. pg_audit audits only one role currently.
>> In currently code, we can not multiple user as auditing user. Why?
>> (Sorry if this topic already has been discussed.)
>
> There is only one master audit role in a bid for simplicity.  However,
> there are two ways you can practically have multiple audit roles (both
> are mentioned in the docs):
>
> 1) The audit role honors inheritance so you can grant all your audit
> roles to the "master" role set in pg_audit.role and all the roles will
> be audited.
>
> 2) Since pg_audit.role is a GUC, you can set a different audit role per
> database by using ALTER DATABASE ... SET.  You can set the GUC per logon
> role as well though that would probably make things very complicated.
> The GUC is SUSET so normal users cannot tamper with it.

I understood.

>> 2. OBJECT auditing does not work before adding acl info to pg_class.rel_acl.
>> In following situation, pg_audit can not audit OBJECT log.
>> $ cat postgresql.conf | grep audit
>> shared_preload_libraries = 'pg_audit'
>> pg_audit.role = 'hoge_user'
>> pg_audit.log = 'read, write'
>> $ psql -d postgres -U hoge_user
>> =# create table hoge(col int);
>> =# select * from hoge;
>> LOG:  AUDIT: SESSION,3,1,READ,SELECT,,,select * from hoge;
>>
>> OBJECT audit log is not logged here since pg_class.rel_acl is empty
>> yet. (Only logged SESSION log)
>> So after creating another unconcerned role and grant any privilege to that 
>> user,
>> OBJECT audit is logged successfully.
>
> Yes, object auditing does not work until some grants have been made to
> the audit role.
>
>> =# create role bar_user;
>> =# grant select on hoge to bar_user;
>> =# select * from hoge;
>> LOG:  AUDIT: SESSION,4,1,READ,SELECT,,,select * from hoge;
>> LOG:  AUDIT: OBJECT,4,1,READ,SELECT,TABLE,public.hoge,select * from hoge;
>>
>> The both OBJCET and SESSION log are logged.
>
> Looks right to me.  If you don't want the session logging then disable
> pg_audit.log.
>
> Session and object logging are completely independent from each other:
> one or the other, or both, or neither can be enabled at any time.

It means that OBJECT log is not logged just after creating table, even
if that table is touched by its owner.
To write OBJECT log, we need to grant privilege to role at least. right?

>> 3. pg_audit logged OBJECT log even EXPLAIN command.
>> EXPLAIN command does not touch the table actually, but pg_audit writes
>> audit OBJECT log.
>> I'm not sure we need to log it. Is it intentional?
>
> This is intentional.  They are treated as queries since in production
> they should be relatively rare (that is, not part of a normal function
> or process) and it's good information to have because EXPLAIN can be
> used to determine the number of rows in a table, and could also be used
> to figure out when data is added or removed from a table.  In essence,
> it is a query even if it does not return row data.

Okay, I understood.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-03-24 Thread Sawada Masahiko
On Tue, Mar 24, 2015 at 3:17 AM, Alvaro Herrera
 wrote:
> Sawada Masahiko wrote:
>
>> I tied to look into latest patch, but got following error.
>>
>> masahiko [pg_audit] $ LANG=C make
>> gcc -Wall -Wmissing-prototypes -Wpointer-arith
>> -Wdeclaration-after-statement -Wendif-labels
>> -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
>> -fwrapv -g -fpic -I. -I. -I../../src/include -D_GNU_SOURCE   -c -o
>> pg_audit.o pg_audit.c
>> pg_audit.c: In function 'log_audit_event':
>> pg_audit.c:456: warning: ISO C90 forbids mixed declarations and code
>> pg_audit.c: In function 'pg_audit_ddl_command_end':
>> pg_audit.c:1436: error: 'pg_event_trigger_expand_command' undeclared
>> (first use in this function)
>
> You need to apply my deparsing patch first, last version of which I
> posted here:
> https://www.postgresql.org/message-id/20150316234406.gh3...@alvh.no-ip.org
>

Thank you for the info.
I've applied these patchese successfully.

I looked into this module, and had a few comments as follows.
1. pg_audit audits only one role currently.
In currently code, we can not multiple user as auditing user. Why?
(Sorry if this topic already has been discussed.)

2. OBJECT auditing does not work before adding acl info to pg_class.rel_acl.
In following situation, pg_audit can not audit OBJECT log.
$ cat postgresql.conf | grep audit
shared_preload_libraries = 'pg_audit'
pg_audit.role = 'hoge_user'
pg_audit.log = 'read, write'
$ psql -d postgres -U hoge_user
=# create table hoge(col int);
=# select * from hoge;
LOG:  AUDIT: SESSION,3,1,READ,SELECT,,,select * from hoge;

OBJECT audit log is not logged here since pg_class.rel_acl is empty
yet. (Only logged SESSION log)
So after creating another unconcerned role and grant any privilege to that user,
OBJECT audit is logged successfully.

=# create role bar_user;
=# grant select on hoge to bar_user;
=# select * from hoge;
LOG:  AUDIT: SESSION,4,1,READ,SELECT,,,select * from hoge;
LOG:  AUDIT: OBJECT,4,1,READ,SELECT,TABLE,public.hoge,select * from hoge;

The both OBJCET and SESSION log are logged.

3. pg_audit logged OBJECT log even EXPLAIN command.
EXPLAIN command does not touch the table actually, but pg_audit writes
audit OBJECT log.
I'm not sure we need to log it. Is it intentional?

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auditing extension for PostgreSQL (Take 2)

2015-03-23 Thread Sawada Masahiko
se
>> the deparse code (which I personally think is a good idea; Álvaro has
>> been working on it, and it looks very nice) to log extra information,
>> using the object access hook inevitably means we have to reimplement
>> the identification/classification code here.
>>
>> In "old" pgaudit, I think that extra effort is justified by the need to
>> be backwards compatible with pre-event trigger releases. In a 9.5-only
>> version, I am not at all convinced that this makes sense.
>>
>> Thoughts?
>
> I was nervous about basing pg_audit on code that I wasn't sure would be
> committed (at the time).  Since pg_event_trigger_get_creation_commands()
> is tied up with deparse, I honestly didn't feel like the triggers were
> bringing much to the table.
>
> That being said, I agree that the deparse code is very useful and now
> looks certain to be committed for 9.5.
>
> I have prepared a patch that brings event triggers and deparse back to
> pg_audit based on the Alvaro's dev/deparse branch at
> git://git.postgresql.org/git/2ndquadrant_bdr.git (commit 0447fc5).  I've
> updated the unit tests accordingly.
>
> I left in the OAT code for now.  It does add detail to one event that
> the event triggers do not handle (creating PK indexes) and I feel that
> it lends an element of safety in case something happens to the event
> triggers.  OAT is also required for function calls so the code path
> cannot be eliminated entirely.  I'm not committed to keeping the
> redundant OAT code, but I'd rather not remove it until deparse is
> committed to master.
>
> I've been hesitant to post this patch as it will not work in master
> (though it will compile), but I don't want to hold on to it any longer
> since the end of the CF is nominally just weeks away.  If you want to
> run the patch in master, you'll need to disable the
> pg_audit_ddl_command_end trigger.
>
> I've also addressed Fujii's concerns about logging parameters - this is
> now an option.  The event stack has been formalized and
> MemoryContextRegisterResetCallback() is used to cleanup the stack on errors.
>
> Let me know what you think.
>

Hi,

I tied to look into latest patch, but got following error.

masahiko [pg_audit] $ LANG=C make
gcc -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -g -fpic -I. -I. -I../../src/include -D_GNU_SOURCE   -c -o
pg_audit.o pg_audit.c
pg_audit.c: In function 'log_audit_event':
pg_audit.c:456: warning: ISO C90 forbids mixed declarations and code
pg_audit.c: In function 'pg_audit_ddl_command_end':
pg_audit.c:1436: error: 'pg_event_trigger_expand_command' undeclared
(first use in this function)
pg_audit.c:1436: error: (Each undeclared identifier is reported only once
pg_audit.c:1436: error: for each function it appears in.)
make: *** [pg_audit.o] Error 1

Am I missing something?

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-03-12 Thread Sawada Masahiko
On Thu, Mar 12, 2015 at 6:36 AM, Jim Nasby  wrote:
> On 3/11/15 6:33 AM, Sawada Masahiko wrote:
>>>>>>>
>>>>>>> As a refresher, current commands are:
>>>>>>> >>>>>
>>>>>>> >>>>>VACUUM (ANALYZE, VERBOSE) table1 (col1);
>>>>>>> >>>>>REINDEX INDEX index1 FORCE;
>>>>>>> >>>>>COPY table1 FROM 'file.txt' WITH (FORMAT csv);
>>>>>>> >>>>>CREATE MATERIALIZED VIEW mv1 WITH (storageparam, ...) AS qry
>>>>>>> >>>>> WITH
>>>>>>> >>>>>DATA;
>>>>>>> >>>>>CREATE EXTENSION ext1 WITH SCHEMA s1 VERSION v1 FROM over;
>>>>>>> >>>>>CREATE ROLE role WITH LOGIN;
>>>>>>> >>>>>GRANT  WITH GRANT OPTION;
>>>>>>> >>>>>CREATE VIEW v1 AS qry WITH CASCADED CHECK OPTION;
>>>>>>> >>>>>ALTER DATABASE db1 WITH CONNECTION LIMIT 50;
>>>>>>> >>>>>DECLARE c1 INSENSITIVE SCROLL CURSOR WITH HOLD;
>>>>>
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>>BTW, I'm fine with Tom's bare-word with WITH idea. That seems to be
>>>>> >>> the
>>>>> >>>most
>>>>> >>>consistent with everything else. Is there a problem with doing that?
>>>>> >>> I
>>>>> >>>know
>>>>> >>>getting syntax is one of the hard parts of new features, but it
>>>>> >>> seems
>>>>> >>>like
>>>>> >>>we reached consensus here...
>>>>
>>>> >>
>>>> >>
>>>> >>Attached is latest version patch based on Tom's idea as follows.
>>>> >>REINDEX { INDEX | ... } name WITH ( options [, ...] )
>>>
>>> >
>>> >
>>> >Are the parenthesis necessary? No other WITH option requires them, other
>>> >than create table/matview (COPY doesn't actually require them).
>>> >
>>
>> I was imagining EXPLAIN syntax.
>> Is there some possibility of supporting multiple options for REINDEX
>> command in future?
>> If there is, syntax will be as follows, REINDEX { INDEX | ... } name
>> WITH VERBOSE, XXX, XXX;
>> I thought style with parenthesis is better than above style.
>
>
> The thing is, ()s are actually an odd-duck. Very little supports it, and
> while COPY allows it they're not required. EXPLAIN is a different story,
> because that's not WITH; we're actually using () *instead of* WITH.
>
> So because almost all commands that use WITH doen't even accept (), I don't
> think this should either. It certainly shouldn't require them, because
> unlike EXPLAIN, there's no need to require them.
>

I understood what your point is.
Attached patch is changed syntax, it does not have parenthesis.

Regards,

---
Sawada Masahiko
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 0a4c7d4..3cea35f 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -22,6 +22,11 @@ PostgreSQL documentation
  
 
 REINDEX { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } name [ FORCE ]
+REINDEX { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } name WITH options [, ...]
+
+where option can be one of:
+
+VERBOSE
 
  
 
@@ -159,6 +164,15 @@ REINDEX { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } relpersistence);
+	reindex_index(indOid, false, indexRelation->relpersistence, verbose);
 
 	return indOid;
 }
@@ -1761,9 +1775,23 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
  *		Recreate all indexes of a table (and of its toast table, if any)
  */
 Oid
-ReindexTable(RangeVar *relation)
+ReindexTable(RangeVar *relation, List *options)
 {
 	Oid			heapOid;
+	bool		verbose = false;
+	ListCell	*lc;
+
+	/* Parse list of options */
+	foreach(lc, options)
+	{
+		char *opt = (char *) lfirst(lc);
+		if (strcmp(opt, "verbose") == 0)
+			verbose = true;
+		else
+			ereport(ERROR,
+	(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+	 errmsg("unrecognized REINDEX option \"%s\"", opt)));
+	}
 
 	/* The lock level used here should match reindex_relation(). */
 	heapOid = RangeVarGetRelidExtended(relation, ShareLock, false, false,
@@ -1771,7 +1799,8 @@ ReindexTable(RangeVar *relation)
 
 	if (!reindex_relation(heapOid,
 		  RE

Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-03-12 Thread Sawada Masahiko
On Tue, Mar 10, 2015 at 5:05 PM, Jim Nasby  wrote:
> On 3/9/15 9:43 PM, Sawada Masahiko wrote:
>>
>> On Fri, Mar 6, 2015 at 11:07 AM, Jim Nasby 
>> wrote:
>>>
>>> On 3/2/15 10:58 AM, Sawada Masahiko wrote:
>>>>
>>>>
>>>> On Wed, Feb 25, 2015 at 4:58 PM, Jim Nasby 
>>>> wrote:
>>>>>
>>>>>
>>>>> On 2/24/15 8:28 AM, Sawada Masahiko wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> According to the above discussion, VACUUM and REINDEX should have
>>>>>>> trailing options. Tom seems (to me) suggesting that SQL-style
>>>>>>> (bare word preceded by WITH) options and Jim suggesting '()'
>>>>>>> style options? (Anyway VACUUM gets the*third additional*  option
>>>>>>> sytle, but it is the different discussion from this)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Well, almost everything does a trailing WITH. We need to either stick
>>>>> with
>>>>> that for consistency, or add leading () as an option to those WITH
>>>>> commands.
>>>>>
>>>>> Does anyone know why those are WITH? Is it ANSI?
>>>>>
>>>>> As a refresher, current commands are:
>>>>>
>>>>>VACUUM (ANALYZE, VERBOSE) table1 (col1);
>>>>>REINDEX INDEX index1 FORCE;
>>>>>COPY table1 FROM 'file.txt' WITH (FORMAT csv);
>>>>>CREATE MATERIALIZED VIEW mv1 WITH (storageparam, ...) AS qry WITH
>>>>> DATA;
>>>>>CREATE EXTENSION ext1 WITH SCHEMA s1 VERSION v1 FROM over;
>>>>>CREATE ROLE role WITH LOGIN;
>>>>>GRANT  WITH GRANT OPTION;
>>>>>CREATE VIEW v1 AS qry WITH CASCADED CHECK OPTION;
>>>>>ALTER DATABASE db1 WITH CONNECTION LIMIT 50;
>>>>>DECLARE c1 INSENSITIVE SCROLL CURSOR WITH HOLD;
>>>
>>>
>>>
>>> BTW, I'm fine with Tom's bare-word with WITH idea. That seems to be the
>>> most
>>> consistent with everything else. Is there a problem with doing that? I
>>> know
>>> getting syntax is one of the hard parts of new features, but it seems
>>> like
>>> we reached consensus here...
>>
>>
>> Attached is latest version patch based on Tom's idea as follows.
>> REINDEX { INDEX | ... } name WITH ( options [, ...] )
>
>
> Are the parenthesis necessary? No other WITH option requires them, other
> than create table/matview (COPY doesn't actually require them).
>
>>>> We have discussed about this option including FORCE option, but I
>>>> think there are not user who want to use both FORCE and VERBOSE option
>>>> at same time.
>>>
>>>
>>>
>>> I find that very hard to believe... I would expect a primary use case for
>>> VERBOSE to be "I ran REINDEX, but it doesn't seem to have done
>>> anything...
>>> what's going on?" and that's exactly when you might want to use FORCE.
>>>
>>
>> In currently code, nothing happens even if FORCE option is specified.
>> This option completely exist for backward compatibility.
>> But this patch add new syntax including FORCE option for now.
>
>
> I forgot that. There's no reason to support it with the new stuff then.
>
> --
> Jim Nasby, Data Architect, Blue Treble Consulting
> Data in Trouble? Get it in Treble! http://BlueTreble.com

Attached patch is latest version patch changed syntax a little.
This patch adds following syntax for now.
 REINDEX { INDEX | ... } name WITH (VERBOSE);

But we are under the discussion regarding parenthesis, so there is
possibility of change.

Regards,

---
Sawada Masahiko


000_reindex_verbose_v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-03-11 Thread Sawada Masahiko
On Tue, Mar 10, 2015 at 12:08 PM, Stephen Frost  wrote:
> Sawada,
>
> * Sawada Masahiko (sawada.m...@gmail.com) wrote:
>> Thank you for your review!
>> Attached file is the latest version (without document patch. I making it 
>> now.)
>> As per discussion, there is no change regarding of super user permission.
>
> Ok.  Here's another review.
>

Thank you for your review!

>> diff --git a/src/backend/catalog/system_views.sql 
>> b/src/backend/catalog/system_views.sql
>> index 5e69e2b..94ee229 100644
>> --- a/src/backend/catalog/system_views.sql
>> +++ b/src/backend/catalog/system_views.sql
>> @@ -414,6 +414,11 @@ CREATE RULE pg_settings_n AS
>>
>>  GRANT SELECT, UPDATE ON pg_settings TO PUBLIC;
>>
>> +CREATE VIEW pg_file_settings AS
>> +   SELECT * FROM pg_show_all_file_settings() AS A;
>> +
>> +REVOKE ALL on pg_file_settings FROM public;
>> +
>
> This suffers from a lack of pg_dump support, presuming that the
> superuser is entitled to change the permissions on this function.  As it
> turns out though, you're in luck in that regard since I'm working on
> fixing that for a mostly unrelated patch.  Any feedback you have on what
> I'm doing to pg_dump here:
>
> http://www.postgresql.org/message-id/20150307213935.go29...@tamriel.snowman.net
>
> Would certainly be appreciated.
>

Thank you for the info.
I will read the discussion and send the feedbacks.

>> diff --git a/src/backend/utils/misc/guc-file.l 
>> b/src/backend/utils/misc/guc-file.l
> [...]
>> +  * Calculate size of guc_array and allocate it. From the secound time 
>> to allcate memory,
>> +  * we should free old allocated memory.
>> +  */
>> + guc_array_size = num_guc_file_variables * sizeof(struct 
>> ConfigFileVariable);
>> + if (!guc_file_variables)
>> + {
>> + /* For the first call */
>> + guc_file_variables = (ConfigFileVariable *) guc_malloc(FATAL, 
>> guc_array_size);
>> + }
>> + else
>> + {
>> + guc_array = guc_file_variables;
>> + for (item = head; item; item = item->next, guc_array++)
>> + {
>> + free(guc_array->name);
>> + free(guc_array->value);
>> + free(guc_array->filename);
>
> It's a minor nit-pick, as the below loop should handle anything we care
> about, but I'd go ahead and reset guc_array->sourceline to be -1 or
> something too, just to show that everything's being handled here with
> regard to cleanup.  Or perhaps just add a comment saying that the
> sourceline for all currently valid entries will be updated.

Fixed.

>
>> + guc_file_variables = (ConfigFileVariable *) guc_realloc(FATAL, 
>> guc_file_variables, guc_array_size);
>> + }
>
> Nasby made a comment, I believe, that we might actually be able to use
> memory contexts here, which would certainly be much nicer as then you'd
> be able to just throw away the whole context and create a new one.  Have
> you looked at that approach at all?  Offhand, I'm not sure if it'd work
> or not (I have my doubts..) but it seems worthwhile to consider.
>

I successfully used palloc() and pfree() when allocate and free
guc_file_variable,
but these variable seems to be freed already when I get value of
pg_file_settings view via SQL.

> Otherwise, it seems like this would address my previous concerns that
> this would end up leaking memory, and even if it's a bit slower than one
> might hope, it's not an operation we should be doing very frequently.
>
>> --- a/src/backend/utils/misc/guc.c
>> +++ b/src/backend/utils/misc/guc.c
> [...]
>>  /*
>> + * Return the total number of GUC File variables
>> + */
>> +int
>> +GetNumConfigFileOptions(void)
>> +{
>> + return num_guc_file_variables;
>> +}
>
> What's the point of this function..?  I'm not convinced it's the best
> idea, but it certainly looks like the function and the variable are only
> used from this file anyway?

It's unnecessary.
Fixed.

>> + if (call_cntr < max_calls)
>> + {
>> + char   *values[NUM_PG_FILE_SETTINGS_ATTS];
>> + HeapTuple   tuple;
>> + Datum   result;
>> + ConfigFileVariable conf;
>> + charbuffer[256];
>
> Isn't that buffer a bit.. excessive in size?

Fixed.

>
>> diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
>> index d3100d1..ebb96cc 100644

Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-03-11 Thread Sawada Masahiko
On Tue, Mar 10, 2015 at 5:05 PM, Jim Nasby  wrote:
> On 3/9/15 9:43 PM, Sawada Masahiko wrote:
>>
>> On Fri, Mar 6, 2015 at 11:07 AM, Jim Nasby 
>> wrote:
>>>
>>> On 3/2/15 10:58 AM, Sawada Masahiko wrote:
>>>>
>>>>
>>>> On Wed, Feb 25, 2015 at 4:58 PM, Jim Nasby 
>>>> wrote:
>>>>>
>>>>>
>>>>> On 2/24/15 8:28 AM, Sawada Masahiko wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> According to the above discussion, VACUUM and REINDEX should have
>>>>>>> trailing options. Tom seems (to me) suggesting that SQL-style
>>>>>>> (bare word preceded by WITH) options and Jim suggesting '()'
>>>>>>> style options? (Anyway VACUUM gets the*third additional*  option
>>>>>>> sytle, but it is the different discussion from this)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Well, almost everything does a trailing WITH. We need to either stick
>>>>> with
>>>>> that for consistency, or add leading () as an option to those WITH
>>>>> commands.
>>>>>
>>>>> Does anyone know why those are WITH? Is it ANSI?
>>>>>
>>>>> As a refresher, current commands are:
>>>>>
>>>>>VACUUM (ANALYZE, VERBOSE) table1 (col1);
>>>>>REINDEX INDEX index1 FORCE;
>>>>>COPY table1 FROM 'file.txt' WITH (FORMAT csv);
>>>>>CREATE MATERIALIZED VIEW mv1 WITH (storageparam, ...) AS qry WITH
>>>>> DATA;
>>>>>CREATE EXTENSION ext1 WITH SCHEMA s1 VERSION v1 FROM over;
>>>>>CREATE ROLE role WITH LOGIN;
>>>>>GRANT  WITH GRANT OPTION;
>>>>>CREATE VIEW v1 AS qry WITH CASCADED CHECK OPTION;
>>>>>ALTER DATABASE db1 WITH CONNECTION LIMIT 50;
>>>>>DECLARE c1 INSENSITIVE SCROLL CURSOR WITH HOLD;
>>>
>>>
>>>
>>> BTW, I'm fine with Tom's bare-word with WITH idea. That seems to be the
>>> most
>>> consistent with everything else. Is there a problem with doing that? I
>>> know
>>> getting syntax is one of the hard parts of new features, but it seems
>>> like
>>> we reached consensus here...
>>
>>
>> Attached is latest version patch based on Tom's idea as follows.
>> REINDEX { INDEX | ... } name WITH ( options [, ...] )
>
>
> Are the parenthesis necessary? No other WITH option requires them, other
> than create table/matview (COPY doesn't actually require them).
>

I was imagining EXPLAIN syntax.
Is there some possibility of supporting multiple options for REINDEX
command in future?
If there is, syntax will be as follows, REINDEX { INDEX | ... } name
WITH VERBOSE, XXX, XXX;
I thought style with parenthesis is better than above style.

>>>> We have discussed about this option including FORCE option, but I
>>>> think there are not user who want to use both FORCE and VERBOSE option
>>>> at same time.
>>>
>>>
>>>
>>> I find that very hard to believe... I would expect a primary use case for
>>> VERBOSE to be "I ran REINDEX, but it doesn't seem to have done
>>> anything...
>>> what's going on?" and that's exactly when you might want to use FORCE.
>>>
>>
>> In currently code, nothing happens even if FORCE option is specified.
>> This option completely exist for backward compatibility.
>> But this patch add new syntax including FORCE option for now.
>
>
> I forgot that. There's no reason to support it with the new stuff then.
>
> --
> Jim Nasby, Data Architect, Blue Treble Consulting
> Data in Trouble? Get it in Treble! http://BlueTreble.com

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-03-09 Thread Sawada Masahiko
On Fri, Mar 6, 2015 at 11:07 AM, Jim Nasby  wrote:
> On 3/2/15 10:58 AM, Sawada Masahiko wrote:
>>
>> On Wed, Feb 25, 2015 at 4:58 PM, Jim Nasby 
>> wrote:
>>>
>>> On 2/24/15 8:28 AM, Sawada Masahiko wrote:
>>>>>
>>>>>
>>>>> According to the above discussion, VACUUM and REINDEX should have
>>>>> trailing options. Tom seems (to me) suggesting that SQL-style
>>>>> (bare word preceded by WITH) options and Jim suggesting '()'
>>>>> style options? (Anyway VACUUM gets the*third additional*  option
>>>>> sytle, but it is the different discussion from this)
>>>
>>>
>>>
>>> Well, almost everything does a trailing WITH. We need to either stick
>>> with
>>> that for consistency, or add leading () as an option to those WITH
>>> commands.
>>>
>>> Does anyone know why those are WITH? Is it ANSI?
>>>
>>> As a refresher, current commands are:
>>>
>>>   VACUUM (ANALYZE, VERBOSE) table1 (col1);
>>>   REINDEX INDEX index1 FORCE;
>>>   COPY table1 FROM 'file.txt' WITH (FORMAT csv);
>>>   CREATE MATERIALIZED VIEW mv1 WITH (storageparam, ...) AS qry WITH DATA;
>>>   CREATE EXTENSION ext1 WITH SCHEMA s1 VERSION v1 FROM over;
>>>   CREATE ROLE role WITH LOGIN;
>>>   GRANT  WITH GRANT OPTION;
>>>   CREATE VIEW v1 AS qry WITH CASCADED CHECK OPTION;
>>>   ALTER DATABASE db1 WITH CONNECTION LIMIT 50;
>>>   DECLARE c1 INSENSITIVE SCROLL CURSOR WITH HOLD;
>
>
> BTW, I'm fine with Tom's bare-word with WITH idea. That seems to be the most
> consistent with everything else. Is there a problem with doing that? I know
> getting syntax is one of the hard parts of new features, but it seems like
> we reached consensus here...

Attached is latest version patch based on Tom's idea as follows.
REINDEX { INDEX | ... } name WITH ( options [, ...] )

>
>> We have discussed about this option including FORCE option, but I
>> think there are not user who want to use both FORCE and VERBOSE option
>> at same time.
>
>
> I find that very hard to believe... I would expect a primary use case for
> VERBOSE to be "I ran REINDEX, but it doesn't seem to have done anything...
> what's going on?" and that's exactly when you might want to use FORCE.
>

In currently code, nothing happens even if FORCE option is specified.
This option completely exist for backward compatibility.
But this patch add new syntax including FORCE option for now.

Todo
- tab completion
- reindexdb command

Regards,

---
Sawada Masahiko
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 0a4c7d4..a4109aa 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -22,6 +22,12 @@ PostgreSQL documentation
  
 
 REINDEX { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } name [ FORCE ]
+REINDEX { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } name WITH ( options [, ...] )
+
+where option can be one of:
+
+FORCE [ boolean ]
+VERBOSE [ boolean ]
 
  
 
@@ -159,6 +165,29 @@ REINDEX { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } boolean
+
+ 
+  Specifies whether the selected option should be turned on or off.
+  You can write TRUE, ON, or
+  1 to enable the option, and FALSE,
+  OFF, or 0 to disable it.  The
+  boolean value can also
+  be omitted, in which case TRUE is assumed.
+ 
+
+   
   
  
 
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index f85ed93..786f173 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -63,6 +63,7 @@
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/pg_rusage.h"
 #include "utils/syscache.h"
 #include "utils/tuplesort.h"
 #include "utils/snapmgr.h"
@@ -3130,13 +3131,18 @@ IndexGetRelation(Oid indexId, bool missing_ok)
  * reindex_index - This routine is used to recreate a single index
  */
 void
-reindex_index(Oid indexId, bool skip_constraint_checks, char persistence)
+reindex_index(Oid indexId, bool skip_constraint_checks, char persistence,
+bool verbose)
 {
 	Relation	iRel,
 heapRelation;
 	Oid			heapId;
 	IndexInfo  *indexInfo;
 	volatile bool skipped_constraint = false;
+	int			elevel = verbose ? INFO : DEBUG2;
+	PGRUsage	ru0;
+
+	pg_rusage_init(&ru0);
 
 	/*
 	 * Open and lock the parent heap relation.  ShareLock is sufficient since
@@ -3280,6 +3286,13 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence)
 		heap_close(pg_index, RowExclusiveLock);
 	}
 
+	/* Log what we did */
+	ereport(elevel,
+			(errmsg(&

[HACKERS] Wrong error message in REINDEX command

2015-03-07 Thread Sawada Masahiko
Hi,

I got wrong error message when I did REINDEX SYSTEM command in
transaction as follows.
It should say "ERROR:  REINDEX SYSTEM cannot run inside a transaction block"
Attached patch fixes it.

[postgres][5432](1)=# begin;
BEGIN
[postgres][5432](1)=# reindex system postgres;
ERROR:  REINDEX DATABASE cannot run inside a transaction block
STATEMENT:  reindex system postgres;

Regards,

---
Sawada Masahiko


fix_reindex_error_message.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-03-05 Thread Sawada Masahiko
On Sat, Feb 28, 2015 at 2:27 PM, Stephen Frost  wrote:
> Sawada,
>
> * Sawada Masahiko (sawada.m...@gmail.com) wrote:
>> Attached patch is latest version including following changes.
>> - This view is available to super use only
>> - Add sourceline coulmn
>
> Alright, first off, to Josh's point- I'm definitely interested in a
> capability to show where the heck a given config value is coming from.
> I'd be even happier if there was a way to *force* where config values
> have to come from, but that ship sailed a year ago or so.  Having this
> be for the files, specifically, seems perfectly reasonable to me.  I
> could see having another function which will report where a currently
> set GUC variable came from (eg: ALTER ROLE or ALTER DATABASE), but
> having a function for "which config file is this GUC set in" seems
> perfectly reasonable to me.
>
> To that point, here's a review of this patch.
>
>> diff --git a/src/backend/catalog/system_views.sql 
>> b/src/backend/catalog/system_views.sql
>> index 6df6bf2..f628cb0 100644
>> --- a/src/backend/catalog/system_views.sql
>> +++ b/src/backend/catalog/system_views.sql
>> @@ -412,6 +412,11 @@ CREATE RULE pg_settings_n AS
>>
>>  GRANT SELECT, UPDATE ON pg_settings TO PUBLIC;
>>
>> +CREATE VIEW pg_file_settings AS
>> +   SELECT * FROM pg_show_all_file_settings() AS A;
>> +
>> +REVOKE ALL on pg_file_settings FROM public;
>
> While this generally "works", the usual expectation is that functions
> that should be superuser-only have a check in the function rather than
> depending on the execute privilege.  I'm certainly happy to debate the
> merits of that approach, but for the purposes of this patch, I'd suggest
> you stick an if (!superuser()) ereport("must be superuser") into the
> function itself and not work about setting the correct permissions on
> it.
>
>> + ConfigFileVariable *guc;
>
> As this ends up being an array, I'd call it "guc_array" or something
> along those lines.  GUC in PG-land has a pretty specific single-entity
> meaning.
>
>> @@ -344,6 +346,21 @@ ProcessConfigFile(GucContext context)
>>   PGC_BACKEND, 
>> PGC_S_DYNAMIC_DEFAULT);
>>   }
>>
>> + guc_file_variables = (ConfigFileVariable *)
>> + guc_malloc(FATAL, num_guc_file_variables * 
>> sizeof(struct ConfigFileVariable));
>
> Uh, perhaps I missed it, but what happens on a reload?  Aren't we going
> to realloc this every time?  Seems like we should be doing a
> guc_malloc() the first time through but doing guc_realloc() after, or
> we'll leak memory on every reload.
>
>> + /*
>> +  * Apply guc config parameters to guc_file_variable
>> +  */
>> + guc = guc_file_variables;
>> + for (item = head; item; item = item->next, guc++)
>> + {
>> + guc->name = guc_strdup(FATAL, item->name);
>> + guc->value = guc_strdup(FATAL, item->value);
>> + guc->filename = guc_strdup(FATAL, item->filename);
>> + guc->sourceline = item->sourceline;
>> + }
>
> Uh, ditto and double-down here.  I don't see a great solution other than
> looping through the previous array and free'ing each of these, since we
> can't depend on the memory context machinery being up and ready at this
> point, as I recall.
>
>> diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
>> index 931993c..c69ce66 100644
>> --- a/src/backend/utils/misc/guc.c
>> +++ b/src/backend/utils/misc/guc.c
>> @@ -3561,9 +3561,17 @@ static const char *const map_old_guc_names[] = {
>>   */
>>  static struct config_generic **guc_variables;
>>
>> +/*
>> + * lookup of variables for pg_file_settings view.
>> + */
>> +static struct ConfigFileVariable *guc_file_variables;
>> +
>>  /* Current number of variables contained in the vector */
>>  static int   num_guc_variables;
>>
>> +/* Number of file variables */
>> +static int   num_guc_file_variables;
>> +
>
> I'd put these together, and add a comment explaining that
> guc_file_variables is an array of length num_guc_file_variables..
>
>>  /*
>> + * Return the total number of GUC File variables
>> + */
>> +int
>> +GetNumConfigFileOptions(void)
>> +{
>> + return num_guc_file_variables;
>> +}
>
> I don't see the point of this function..
>
>> +Datum
>> +show_all_file_settings(PG_FUNCTION_ARGS)
>> +{
&

Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-03-02 Thread Sawada Masahiko
On Wed, Feb 25, 2015 at 4:58 PM, Jim Nasby  wrote:
> On 2/24/15 8:28 AM, Sawada Masahiko wrote:
>>>
>>> According to the above discussion, VACUUM and REINDEX should have
>>> trailing options. Tom seems (to me) suggesting that SQL-style
>>> (bare word preceded by WITH) options and Jim suggesting '()'
>>> style options? (Anyway VACUUM gets the*third additional*  option
>>> sytle, but it is the different discussion from this)
>
>
> Well, almost everything does a trailing WITH. We need to either stick with
> that for consistency, or add leading () as an option to those WITH commands.
>
> Does anyone know why those are WITH? Is it ANSI?
>
> As a refresher, current commands are:
>
>  VACUUM (ANALYZE, VERBOSE) table1 (col1);
>  REINDEX INDEX index1 FORCE;
>  COPY table1 FROM 'file.txt' WITH (FORMAT csv);
>  CREATE MATERIALIZED VIEW mv1 WITH (storageparam, ...) AS qry WITH DATA;
>  CREATE EXTENSION ext1 WITH SCHEMA s1 VERSION v1 FROM over;
>  CREATE ROLE role WITH LOGIN;
>  GRANT  WITH GRANT OPTION;
>  CREATE VIEW v1 AS qry WITH CASCADED CHECK OPTION;
>  ALTER DATABASE db1 WITH CONNECTION LIMIT 50;
>  DECLARE c1 INSENSITIVE SCROLL CURSOR WITH HOLD;

We have discussed about this option including FORCE option, but I
think there are not user who want to use both FORCE and VERBOSE option
at same time.
Can we think and add new syntax without FORCE option while leaving
current Reindex statement syntax?

As prototype, I attached new version patch has the following syntax.
REINDEX { INDEX | TABLE | ... } relname [ FORCE | VERBOSE];

Regards,

---
Sawada Masahiko


000_reindex_verbose_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-02-24 Thread Sawada Masahiko
On Fri, Feb 20, 2015 at 12:24 PM, Kyotaro HORIGUCHI
 wrote:
> Hello,
>
> I showed an extreme number of examples to include *almost of all*
> variations of existing syntax of option specification. And showed
> what if all variations could be used for all commands. It was
> almost a mess. Sorry for the confusion.
>
> I think the issues at our hands are,
>
>  - Options location: at-the-end or right-after-the-keyword?
>
>  - FORCE options to be removed?
>
>  - Decide whether to allow bare word option if the options are to
>be located right after the keyword.
>
> Optinions or thoughts?
>
> 
> Rethinking from here.
>
> At Wed, 11 Feb 2015 14:34:17 -0600, Jim Nasby  
> wrote in <54dbbcc9.1020...@bluetreble.com>
>> On 2/5/15 12:01 PM, Tom Lane wrote:
> ...
>> > Meh.  Options-at-the-end seems by far the most sensible style to me.
>> > The options-right-after-the-keyword style is a mess, both logically
>> > and from a parsing standpoint, and the only reason we have it at all
>> > is historical artifact (ask Bruce about the origins of VACUUM ANALYZE
>> > over a beer sometime).
> ...
>> I know you were worried about accepting options anywhere because it
>> leads to reserved words, but perhaps we could support it just for
>> EXPLAIN and VACUUM, and then switch to trailing options if people
>> think that would be better.
>
> According to the above discussion, VACUUM and REINDEX should have
> trailing options. Tom seems (to me) suggesting that SQL-style
> (bare word preceded by WITH) options and Jim suggesting '()'
> style options? (Anyway VACUUM gets the *third additional* option
> sytle, but it is the different discussion from this)
>
> VACUUM [tname [(cname, ...)]] [({FULL [bool]|FREEZE | [bool]|...})]
>
>  =# VACUUM t1 (FULL, FREEZE);
>
> VACUUM [tname [(cname, ...)]] [WITH [FULL [bool]]|[FREEZE | [bool]|...]
>
>  =# VACUUM t1 WITH FULL;
>
> IMHO, we are so accustomed to call by the names "VACUUM FULL" or
> "VACUUM FREEZE" that the both of them look a bit uneasy.
>
>
> If the new syntax above is added, REINDEX should have *only* the
> trailing style.
>
> REINDEX [{INDEX|TABLE|...}] name [(VERBOSE [bool]|...)]
>
>  =# REINDEX TABLE t1 (VERBOSE);
>
> REINDEX [{INDEX|TABLE|...}] name [WITH {VERBOSE [bool]|...}]
>
>  =# REINDEX INDEX i_t1_pkey WITH VERBOSE;
>
> Also, both of them seems to be unintuitive..
>
>
> EXPLAIN.. it seems to be preferred to be as it is..
>
>
> As the result, it seems the best way to go on the current syntax
> for all of those commands.
>
> Optinions?
>
>
>
> At Wed, 18 Feb 2015 23:58:15 +0900, Sawada Masahiko  
> wrote in 
>> From consistency perspective,  I tend to agree with Robert to put
>> option at immediately after command name as follows.
>> REINDEX [(VERBOSE | FORCE)] {INDEX | ...} name;
>
> I don't object against it if you prefer it. The remaining issue
> is the choice between options-at-the-end or this
> options-right-after-the-keyword mentioned above. I prefer the
> more messy(:-) one..
>
>
>> Btw how long will the FORCE command available?
>
> The options is obsolete since 7.4. I think it should have been
> fade away long since and it's the time to remove it. But once the
> ancient option removed, the above syntax looks a bit uneasy and
> the more messy syntax looks natural.
>
>
> REINDEX [VERBOSE] {INDEX | ...} name;
>
> That do you think about this?
>

Thank you for summarizing them.

I said right-after-the-keyword is looks good to me.
But it's will be possible only if FORCE command is removed.
REINDEX command has FORCE option at the end, so REINDEX probably
should have options at the end.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-02-18 Thread Sawada Masahiko
>   VACUUM [FULL] [FREEZE] ... [ANALYZE] [tname [(cname, ...)]
> | VACUUM [({FULL [bool]|FREEZE [bool]|...}[,...])] [tname [(cname, ...)]
> | VACUUM [tname [(cname, ...)] [[WITH ]({FULL [bool]|FREEZE [bool]|...})]
>
> REINDEX [{INDEX|TABLE|...}] name [[WITH] (VERBOSE [bool]|...)]
>
> EXPLAIN [[WITH] ({ANALYZE [bool]|VERBOSE [bool]|... [,...]})] 
>

I don't think "(OptionName [bool])" style like "(VERBOSE on, FORCE
on)" is needed for REINDEX command.
EXPLAIN command has such option style because it has the FORMAT option
can have value excepting ON/TRUE or OFF/FALSE.(e.g., TEXT, XML)
But the value of REINDEX command option can have only ON or OFF.
I think the option name is good enough.

Next, regarding of the location of such option, the several
maintenance command like CLUSTER, VACUUM has option at immediately
after command name.
>From consistency perspective,  I tend to agree with Robert to put
option at immediately after command name as follows.
REINDEX [(VERBOSE | FORCE)] {INDEX | ...} name;

Btw how long will the FORCE command available?

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-02-04 Thread Sawada Masahiko
On Wed, Feb 4, 2015 at 2:44 PM, Tom Lane  wrote:
> Jim Nasby  writes:
>> On 2/3/15 5:08 PM, Tom Lane wrote:
>>> Jim Nasby  writes:
>>>> VACUUM puts the options before the table name, so ISTM it'd be best to
>>>> keep that with REINDEX. Either REINDEX (options) {INDEX | ...} or
>>>> REINDEX {INDEX | ...} (options).
>
>>> Well, I really really don't like the first of those.  IMO the command name
>>> is "REINDEX INDEX" etc, so sticking something in the middle of that is
>>> bogus.
>
>> Actually, is there a reason we can't just accept all 3? Forcing people
>> to remember exact ordering of options has always struck me as silly.
>
> And that's an even worse idea.  Useless "flexibility" in syntax tends to
> lead to unfortunate consequences like having to reserve keywords.
>

As per discussion, it seems to good with
REINDEX { INDEX | TABLE | etc } name [ ( option [, option ...] ) ]
or
REINDEX { INDEX | TABLE | etc } [ (option [, optoin ...] ) ] name
i.g., the options of reindex(VERBOSE and FORCE) are put at before or
after object name.

Because other maintenance command put option at before object name, I
think the latter is better.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-02-03 Thread Sawada Masahiko
On Tue, Feb 3, 2015 at 12:32 AM, Tom Lane  wrote:
> Sawada Masahiko  writes:
>> On Mon, Feb 2, 2015 at 9:21 PM, Michael Paquier
>>  wrote:
>>> Now, I think that it may
>>> be better to provide the keyword VERBOSE before the type of object
>>> reindexed as REINDEX [ VERBOSE ] object.
>
>> Actually, my first WIP version of patch added VERBOSE word at before
>> type of object.
>> I'm feeling difficult about that the position of VERBOSE word in
>> REINDEX statement.
>
> The way that FORCE was added to REINDEX was poorly thought out; let's not
> double down on that with another option added without any consideration
> for future expansion.  I'd be happier if we adopted something similar to
> the modern syntax for VACUUM and EXPLAIN, ie, comma-separated options in
> parentheses.
>

I understood.
I'm imagining new REINDEX syntax are followings.
- REINDEX (INDEX, VERBOSE) hoge_idx;
- REINDEX (TABLE) hoge_table;

i.g., I will add following syntax format,
REINDEX ( { INDEX | TABLE | SCHEMA | SYSTEM | DATABASE } , [VERBOSE] )
name [FORCE];

Thought?

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-02-02 Thread Sawada Masahiko
On Mon, Feb 2, 2015 at 9:21 PM, Michael Paquier
 wrote:
> On Mon, Feb 2, 2015 at 8:31 PM, Sawada Masahiko  wrote:
>> Attached patch adds VERBOSE option to REINDEX commands.
>> Please give me feedbacks.
> This could provide useful feedback to users.

Thanks.

> Now, I think that it may
> be better to provide the keyword VERBOSE before the type of object
> reindexed as REINDEX [ VERBOSE ] object.

Actually, my first WIP version of patch added VERBOSE word at before
type of object.
I'm feeling difficult about that the position of VERBOSE word in
REINDEX statement.

> In any case, at quick sight,
> the table completion for REINDEX is broken with your patch because by
> typing REINDEX VERBOSE you would show the list of objects and once
> again VERBOSE.

I have also rebased the tab-completion source, I think it's not happen.
In my environment, it does not show list of object and VERBOSE again
after typing REINDEX VERBOSE.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Proposal : REINDEX xxx VERBOSE

2015-02-02 Thread Sawada Masahiko
Hi all,

Attached patch adds VERBOSE option to REINDEX commands.
The another maintaining commands(VACUUM FULL, CLUSTER) has VERBOSE option,
but REINDEX has not been had it.

Examples is following,

- REINDEX TABLE
[postgres][5432](1)=# REINDEX TABLE VERBOSE hoge;
INFO:  index "hoge_idx" was reindexed.
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.02 sec.
INFO:  index "hoge2_idx" was reindexed.
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
REINDEX

- REINDEX SCHEMA
[postgres][5432](1)=# REINDEX SCHEMA VERBOSE s;
INFO:  index "hoge_idx" was reindexed.
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "hoge2_idx" was reindexed.
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  indexes of whole table "s.hoge" were reindexed
REINDEX

Please give me feedbacks.

Regards,

---
Sawada Masahiko


000_reindex_verbose_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-01-30 Thread Sawada Masahiko
On Sat, Jan 31, 2015 at 3:20 PM, Sawada Masahiko  wrote:
> On Sat, Jan 31, 2015 at 2:00 PM, Sawada Masahiko  
> wrote:
>> On Sat, Jan 31, 2015 at 4:56 AM, Tom Lane  wrote:
>>> David Johnston  writes:
>>>> On Fri, Jan 30, 2015 at 12:05 PM, David Fetter  wrote:
>>>>> Why might the contents of pg_settings be different from what would be
>>>>> in pg_file_settings, apart from the existence of this column?
>>>
>>>> The contents of pg_settings uses the setting name as a primary key.
>>>> The contents of pg_file_setting uses a compound key consisting of the
>>>> setting name and the source file.
>>>
>>> [ raised eyebrow... ]  Seems insufficient in the case that multiple
>>> settings of the same value occur in the same source file.  Don't you
>>> need a line number in the key as well?
>>>
>>
>> Yep, a line number is also needed.
>> The source file and line number would be primary key of pg_file_settings 
>> view.
>> I need to change like that.
>>
>
> Attached patch is latest version including following changes.
> - This view is available to super use only
> - Add sourceline coulmn
>
> Also I registered CF app.
>

Sorry, I registered unnecessary (extra) threads to CF app.
How can I delete them?

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-01-30 Thread Sawada Masahiko
On Sat, Jan 31, 2015 at 2:00 PM, Sawada Masahiko  wrote:
> On Sat, Jan 31, 2015 at 4:56 AM, Tom Lane  wrote:
>> David Johnston  writes:
>>> On Fri, Jan 30, 2015 at 12:05 PM, David Fetter  wrote:
>>>> Why might the contents of pg_settings be different from what would be
>>>> in pg_file_settings, apart from the existence of this column?
>>
>>> The contents of pg_settings uses the setting name as a primary key.
>>> The contents of pg_file_setting uses a compound key consisting of the
>>> setting name and the source file.
>>
>> [ raised eyebrow... ]  Seems insufficient in the case that multiple
>> settings of the same value occur in the same source file.  Don't you
>> need a line number in the key as well?
>>
>
> Yep, a line number is also needed.
> The source file and line number would be primary key of pg_file_settings view.
> I need to change like that.
>

Attached patch is latest version including following changes.
- This view is available to super use only
- Add sourceline coulmn

Also I registered CF app.

Regards,

---
Sawada Masahiko


000_pg_file_settings_view_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   3   >