from:"Amir Rohan"

Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-09-22 Thread Amir Rohan

>On 07/16/15, Robert Haas wrote:
>    
>>> * Developers will immediately understand the format
>>
>>I doubt it.  I think any format that we pick will have to be carefully
>>documented.  People may know what JSON looks like in general, but they
>>will not immediately know what bells and whistles are available in
>>this context.
>>
>>> * Easy to programmatically manipulate in a range of languages
>>
>> <...> I think it will be rare to need to parse the postgresql.conf string,
>> manipulate it programatically, and then put it back.
>
>On Sun, Jul 19, 2015 at 4:16 PM, Tom Lane  wrote:
>> Josh Berkus  writes:
>>> On 07/17/2015 04:36 PM, Jim Nasby wrote:
 I'm guessing it'd be really ugly/hard to support at least this GUC being
 multi-line?
>>
>>> Mind you, multi-line GUCs would be useful otherwise, but we don't want
>>> to hinge this feature on making that work.
>>
>> Do we really want such a global reduction in friendliness to make this
>> feature easier?
>
>Maybe shoehorning this into the GUC mechanism is the wrong thing, and
>what we really need is a new config file for this.  The information
>we're proposing to store seems complex enough to justify that.

It seems like:

1) There's a need to support structured data in configuration for future

needs as well, it isn't specific to this feature.
2) There should/must be a better way to validate configuration then
to restarting the server in search of syntax errors.

Creating a whole new configuration file for just one feature *and* in a different
format seems suboptimal.  What happens when the next 20 features need structured

config data, where does that go? will there be additional JSON config files *and* perhaps

new mini-language values in .conf as development continues?  How many dedicated

configuration files is too many?

Now, about JSON (Earlier Upthread):

On 07/01/15, Peter Eisentraut wrote:

> On 6/26/15 2:53 PM, Josh Berkus wrote:
> > I would also suggest that if I lose this battle and
> > we decide to go with a single stringy GUC, that we at least use JSON
> > instead of defining our out, proprietary, syntax?
>  
> Does JSON have a natural syntax for a set without order?

No. Nor Timestamps. It doesn't even distingush integer from float

(Though parsers do it for you in dynamic languages). It's all because

of its unsightly _javascript_ roots.

The current patch is now forced by JSON to conflate sets and lists, so

un/ordered semantics are no longer tied to type but to the specific configuration keys.

So, If a feature ever needs a key where the difference between set and list matters

and needs to support both, you'll need seperate keys (both with lists, but meaning different things)

or a separate "mode" key or something. Not terrible, just iffy.

Other have found JSON unsatisfactory before. For example, the clojure community

has made (at least) two attempts at alternatives, complete with the meh adoption

rates you'd expect despite being more capable formats:

http://blog.cognitect.com/blog/2014/7/22/transit
https://github.com/edn-format/edn

There's also YAML, TOML, etc', none as universal as JSON. But to reiterate, JSON itself

has Lackluster type support (no sets, no timestamps), is verbose, iseasy to malform when editing

(missed a curly brace, shouldn't use a single quote), isn't extensible, and my personal pet peeve

is that it doesn't allow non-string or bare-string keys in maps (a.k.a "death by double-quotes").

Python has the very natural {1,2,3} syntax for sets, but of course that's not part of JSON.

If  JSON wins out despite all this, one alternative not discussed is to extend

the .conf parser to accept json dicts as a fundamental type. e.g.:

###

data_directory = 'ConfigDir'   
port = 5432
work_mem = 4MB
hot_standby = off
client_min_messages = notice
log_error_verbosity = default
autovacuum_analyze_scale_factor = 0.1
synch_standby_config = {
  "sync_info": {
    "nodes": [
  {
    "priority": 1,
    "group": "cluster1"
  },
  "A"
    ],
    "quorum": 3
  },
  "groups": {
    "cluster1": [
  "B",
  "C"
    ]
  }
}

This *will* break someone's perl I would guess. Ironically, those scripts wouldn't have broken if

some structured format were in use for the configuration data when they were written...

`postgres --describe-config` is also pretty much tied to a line-oriented configuration.

Amir

p.s.

MIA configuration validation tool/switch should probably get a thread too.

Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-09-23 Thread Amir Rohan

> Sent: Thursday, September 24, 2015 at 3:11 AM
> 
> From: "Tom Lane" 
> Robert Haas  writes:
> > Well, I think that if we create our own mini-language, it may well be
> > possible to make the configuration for this compact enough to fit on
> > one line. If we use JSON, I think there's zap chance of that. But...
> > that's just what *I* think.
>> 

I've implemented a parser that reads you mini-language and dumps a JSON
equivalent. Once you start naming groups the line fills up quite quickly,
and on the other hands the JSON is verbose and fiddely.
But implementing a mechanism that can be used by other features in
the future seems the deciding factor here, rather then the brevity of a 
bespoke mini-language.

> 
> <...> we're best off avoiding the challenges of dealing with multi-line 
> postgresql.conf entries.
> 
> And I'm really not much in favor of a separate file; if we go that way
> then we're going to have to reinvent a huge amount of infrastructure
> that already exists for GUCs.
> 
> regards, tom lane

Adding support for JSON objects (or some other kind of composite data type) 
to the .conf parser would negate the need for one, and would also solve the
problem being discussed for future cases.
I don't know whether that would break some tooling you care about, 
but if there's interest, I can probably do some of that work.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.

2015-09-24 Thread Amir Rohan

On 08/14/2015 06:32 AM, Michael Paquier wrote:
> On Fri, Aug 14, 2015 at 12:54 AM, Michael Paquier
>  wrote:
>> On Mon, Jun 29, 2015 at 10:11 PM, Michael Paquier
>>  wrote:
>>> On Wed, Mar 18, 2015 at 1:59 PM, Michael Paquier
>>>  wrote:
 Feedback is of course welcome, but note that I am not seriously
 expecting any until we get into 9.6 development cycle and I am adding
 this patch to the next CF.
>>> I have moved this patch to CF 2015-09, as I have enough patches to
>>> take care of for now... Let's focus on Windows support and improvement
>>> of logging for TAP in the first round. That will be already a good
>>> step forward.
>> OK, attached is a new version of this patch, that I have largely
>> reworked to have more user-friendly routines for the tests. The number
>> of tests is still limited still it shows what this facility can do:
>> that's on purpose as it does not make much sense to code a complete
>> and complicated set of tests as long as the core routines are not
>> stable, hence let's focus on that first.
>> I have not done yet tests on Windows, I am expecting some tricks
>> needed for the archive and recovery commands generated for the tests.
> Attached is v3. I have tested and fixed the tests such as they can run
> on Windows. archive_command and restore_command are using Windows'
> copy when needed. There was also a bug with the use of a hot standby
> instead of a warm one, causing test 002 to fail.
> I am rather happy with the shape of this patch now, so feel free to review 
> it...
> Regards,

Michael, I've ran these and it worked fine for me.
See attached patch with a couple of minor fixes.

Amir
diff --git a/src/test/recovery/RecoveryTest.pm b/src/test/recovery/RecoveryTest.pm
index c015f3b..5cc977d 100644
--- a/src/test/recovery/RecoveryTest.pm
+++ b/src/test/recovery/RecoveryTest.pm
@@ -11,7 +11,7 @@ package RecoveryTest;
 # It is then up to the test to decide what to do with the newly-created
 # node.
 #
-# Environmenment configuration of each node is available through a set
+# Environment configuration of each node is available through a set
 # of global variables provided by this package, hashed depending on the
 # port number of a node:
 # - connstr_nodes connection string to connect to this node
diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl
index 757a639..2c59b73 100644
--- a/src/test/recovery/t/001_stream_rep.pl
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -31,9 +31,9 @@ psql $connstr_nodes{ $port_master },
 my $current_lsn = psql $connstr_nodes{ $port_master },
 	"SELECT pg_current_xlog_location();";
 my $caughtup_query = "SELECT '$current_lsn'::pg_lsn <= replay_location FROM pg_stat_replication";
-poll_query_until($caughtup_query, $connstr_nodes{ $port_master })
-	or die "Timed out while waiting for standby 1 to catch up";
 poll_query_until($caughtup_query, $connstr_nodes{ $port_standby_1 })
+	or die "Timed out while waiting for standby 1 to catch up";
+poll_query_until($caughtup_query, $connstr_nodes{ $port_standby_2 })
 	or die "Timed out while waiting for standby 2 to catch up";

 my $result = psql $connstr_nodes{ $port_standby_1 },
--
2.4.3


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.

2015-09-25 Thread Amir Rohan

On 09/25/2015 09:29 AM, Michael Paquier wrote:
> 
> 
> On Fri, Sep 25, 2015 at 3:11 PM, Amir Rohan  <mailto:amir.ro...@mail.com>> wrote:
> 
> On 08/14/2015 06:32 AM, Michael Paquier wrote:

> > I am rather happy with the shape of this patch now, so feel free
> to review it...
> > Regards,
> 
> Michael, I've ran these and it worked fine for me.
> See attached patch with a couple of minor fixes.
> 
> 
> Thanks! I still think that we could improve a bit more the way
> parametrization is done in postgresql.conf when a node is initialized by
> appending a list of parameters or have a set of hardcoded behaviors
> including a set of default parameters and their values... But well
> feedback is welcome regarding that. I also arrived at the conclusion
> that it would be better to place the new package file in src/test/perl
> instead of src/test/recovery to allow any users of the TAP tests to have
> it in their PERL5LIB path and to be able to call the new routines to
> create and manipulate nodes.
> -- 
> Michael 

Having a subcommand in Greg's PEG (http://github.com/gregs1104/peg),
that allows you to create one of several "canned" clusters would be
convenient as well, for manual testing and folling around with features.

Amir


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.

2015-09-25 Thread Amir Rohan

On 09/25/2015 01:47 PM, Michael Paquier wrote:
> On Fri, Sep 25, 2015 at 5:57 PM, Amir Rohan  wrote:
>>

> That's the kind of thing that each serious developer on this mailing
> list already has in a rather different shape but with the same final
> result: 

Oh, I guess I'll have to write one then. :)

> offer a way to set up a cluster ready for hacking in a single
> command, and being able to switch among them easily. I am not sure we
> would really find a cross-platform script generic enough to satisfy
> all the needs of folks here and integrate it directly in the tree :)
> 

Yes, perl/bash seems to be the standard and both have their shorcomings,
in either convenience or familiarity... a static binary +
scripts/configuration would be least resistance,and it doesn't actually
have to live in the tree, but it needs to be good enough out of the box
that someone new will prefer to use/improve it over rolling their own
from scratch, and people need to know it's there.

Coming back to the recovery testing package...
I was investigating some behaviour having to do with recovery
and tried your new library to write a repro case. This uncovers some
implicit assumptions in the package that can make things diffficult
when violated. I had to rewrite nearly every function I used.

Major pain points:

1) Lib stores metadata (ports,paths, etc') using ports as keys.
-> Assumes ports aren't reused.
-> Because assumes server keep running until the tear down.

And

2) Behaviour (paths in particular) is hardwired rather then overridable
defaults.

This is exactly what I needed to test, problems:
3) Can't stop server without clearing its testing data (the maps holding
paths and things). But that data might be specifically
needed, in particular the backup shouldn't disappear when the
server melts down or we have a very low-grade DBA on our hands.
4) Port assignment relies on liveness checks on running servers.
If a server is shut down and a new instantiated, the port will get
reused, data will get trashed, and various confusing things can happen.
5) Servers are shutdown with -m 'immediate', which can lead to races
in the script when archiving is turned on. That may be good for some
tests, but there's no control over it.

Other issues:
6. Directory structure, used one directory per thing but more logical
to place all things related to an instance under a single directory,
and name them according to role (57333_backup, and so on).
7. enable_restoring() uses "cp -i" 'archive_command', not a good fit
for an automated test.

Aside from running the tests, the convenience of writing them
needs to be considered. My perl is very weak, it's been at least
a decade, but It was difficult to make progress because everything
is geared toward a batch "pass/fail" run . Stdout is redirected,
and the log files can't be 'tail --retry -f' in another terminal,
because they're clobbered at every run. Also:
8. No canned way to output a pprinted overview of the running system
(paths, ports, for manual checking).
9. Finding things is difficult, See 6.
10. If a test passes/fails or dies due to a bug, everything is cleaned.
Great for testing, bad for postmortem.
11. a canned "server is responding to queries" helper would be convenient.

It might be a good idea to:
1) Never reuse ports during a test. Liveness checking is used
to avoid collisions, but should not determine order of assignment.
2) Decouple cleanup from server shutdown. Do the cleanup as the end of
test only, and allow the user to keep things around.
3) Adjust the directory structure to one top directory per server with
(PGDATA, backup, archive) subdirs.
4) Instead of passing ports around as keys, have _explicit functions
which can be called directly by the user (I'd like the backup *HERE*
please), with the current functions refactored to merely invoke them
by interpolating in the values associated with the port they were given.
4b) server shutdown should perheps be "smart" by default, or segmented
into calmly_bring_to_a_close(), pull_electric_plug() and
drop_down_the_stairs_into_swimming_pool().

Regards,
Amir

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Patch: Revised documentation on base backups

2015-09-27 Thread Amir Rohan

On 09/28/2015 06:33 AM, Michael Paquier wrote:
> On Sun, Sep 27, 2015 at 11:27 AM, Amir Rohan wrote:
>> Further editing. See attached V3.
>
> I think that you should consider adding this patch to the next commit
> fest so as we do not lose track of it:
> https://commitfest.postgresql.org/7/
>


I've recently picked up the base backup section of the documentation:
http://www.postgresql.org/docs/9.4/static/continuous-archiving.html
(Section 24.3.2) and went through like a tutorial. I'm a relative
newcomer to postgres, so this was the real deal.

The docs seem to be in need of some improvement, primarily because
it has the feel of having been monkey-patched repeatedly it in the
past. It needs to have someone with more experience read through it
and comment on technical accuracy and any knee-slapper error I
might have introduced.


One pain point in rewriting is the definition of terms, or lack
there of in the manual. I didn't find a section defining terms
conclusively for consistent referencing throughout the documentation.
For example, how should one refer to the directory housing
"postgresql.conf"?, is it:

1. The data directory
2. The storage directory
3. The PGDATA directory
4. The cluster directory
5. The server root directory
6. The config file directory
7. etc'

This lack of clear guidelines and uniformly applied terms
makes for some awkward verbal manoeuvring and compromises the
quality of the documentation (but admittedly, not its copiousness).

Amir

>From f763259e6e91d69dd1c41f67169fa0666b1f82d9 Mon Sep 17 00:00:00 2001
From: root 
Date: Sat, 26 Sep 2015 11:50:43 +0300
Subject: [PATCH] Rewritten documentation on base backups V2


diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index 7413666..e2727e1 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -427,8 +427,8 @@ tar -cf backup.tar /usr/local/pgsql/data
   
If simultaneous snapshots are not possible, one option is to shut down
the database server long enough to establish all the frozen snapshots.
-   Another option is to perform a continuous archiving base backup () because such backups are immune to file
+   Another option is to create a base backup using the continuous archiving feature
+   () because such backups are immune to file
system changes during the backup.  This requires enabling continuous
archiving just during the backup process; restore is done using
continuous archive recovery ().
@@ -752,60 +752,58 @@ test ! -f /mnt/server/archivedir/000100A90065 && cp pg_xlog/
Making a Base Backup
 

-The easiest way to perform a base backup is to use the
- tool. It can create
-a base backup either as regular files or as a tar archive. If more
-flexibility than  can provide is
-required, you can also make a base backup using the low level API
-(see ).
+A base backup consists of one or more WAL files and a .
+Together they sufficient to recreate the database's state at some point in the past.
+Once a base backup is made, the WAL files that precede its creation are no longer
+necessary in order to recover the database to some later point in time.
+   
+
+   
+The interval between base backups should usually be
+chosen based on how much storage you want to expend on archived WAL
+files, since you must keep all the archived WAL files back to your
+last base backup.
+You should also consider how long you are prepared to spend
+recovering, if recovery should be necessary — the system will have to
+replay all those WAL segments, and that could take awhile if it has
+been a long time since the last base backup.

 

-It is not necessary to be concerned about the amount of time it takes
-to make a base backup. However, if you normally run the
-server with full_page_writes disabled, you might notice a drop
-in performance while the backup runs since full_page_writes is
-effectively forced on during backup mode.
+Creating a base backup may be a lengthy process if you have a lots of data.
+Be aware that If you normally run the server with full_page_writes
+disabled, you might notice a drop in performance while the backup runs since
+full_page_writes is effectively forced on during backup mode.

 

 To make use of the backup, you will need to keep all the WAL
 segment files generated during and after the file system backup.
-To aid you in doing this, the base backup process
-creates a backup history file that is immediately
-stored into the WAL archive area. This file is named after the first
-WAL segment file that you need for the file system backup.
-For example, if the starting WAL file is
-0001123455CD the backup history file will be
-named something like
-0001123455CD.007C9330.backup. (The second
-part of the fi

Re: [HACKERS] No Issue Tracker - Say it Ain't So!

2015-10-01 Thread Amir Rohan

> On 09/30/2015 03:27 PM, Tom Lane wrote:
>> Josh Berkus  writes:
>>> On 09/30/2015 03:10 PM, Tom Lane wrote:
 I'd be feeling a lot more positive about this whole thread if any people
 had stepped up and said "yes, *I* will put in a lot of grunt-work to make
 something happen here".  The lack of any volunteers suggests strongly
 that this thread is a waste of time, just as the several similar ones
 before it have been.
>> 
>>> Hmmm?  Frost volunteered to stand up debbugs.
>> 
>> So he did, and did anyone volunteer to put data into it, or to do ongoing
>> curation of said data?  If we simply connect it up to the mailing lists,
>> and then stand back and wait for magic to happen, we will not ever have
>> anything that's any more useful than the existing mailing list archives.

On 10/01/2015 01:49 AM, Alvaro Herrera wrote:
> Josh Berkus wrote:
> 
>> Well, it's hard for anyone to volunteer when we don't know what the
>> actual volunteer tasks are.  I certainly intend to do *something* to
>> support the bug tracker system, but I don't know yet what that something is.
> 
> I volunteer to do something too, as long as I don't have to click on
> endless web forms.
> 

I volunteer to help populate the new tracker from whatever from the
currently exists in, and will also put into action any reasonable
scraping/augmentation ideas put forth to make it a more productive tool.

Amir


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.

2015-10-02 Thread Amir Rohan

On 10/02/2015 03:33 PM, Michael Paquier wrote:

> Any server instances created during the tests should never use a
> user-defined port for portability. Hence using those ports as keys
> just made sense. We could have for example custom names, that have
> port values assigned to them, but that's actually an overkill and
> complicates the whole facility.
> 

Something like:

global nPortsAssigned = 0;
AssignPort() -> return is_ok(nPortsAssigned++)

was what I used.

>> 2) Behaviour (paths in particular) is hardwired rather then overridable
>> defaults.
> 
> This is the case of all the TAP tests. We could always use the same
> base directory for all the nodes and then embed a sub-directory whose
> name is decided using the port number. But I am not really sure if
> that's a win.
> 

I understand, but it eliminates the kind of scenarios this convenience
package lets you express... conveniently.

>> This is exactly what I needed to test, problems:
>> 3) Can't stop server without clearing its testing data (the maps holding
>> paths and things). But that data might be specifically
>> needed, in particular the backup shouldn't disappear when the
>> server melts down or we have a very low-grade DBA on our hands.
> 
> OK, you have a point here. You may indeed want routines for to enable
> and disable a node completely decoupled from start and stop, with
> something like enable_node and disable_node that basically registers
> or unregisters it from the list of active nodes. I have updated the
> patch this way.
> 

Excellent.

>> 4) Port assignment relies on liveness checks on running servers.
>> If a server is shut down and a new instantiated, the port will get
>> reused, data will get trashed, and various confusing things can happen.
> 
> Right. The safest way to do that is to check in get_free_port if a
> port number is used by a registered node, and continue to loop in if
> that's the case. So done.
> 

That eliminates the "sweet and gentle" variant of the scenario, but it's
susceptible to the "ABA problem":
https://en.wikipedia.org/wiki/ABA_problem
https://youtu.be/CmxkPChOcvw?t=786

Granted, you have to try fairly hard to shoot yourself in the leg,
but since the solution is so simple, why not? If we never reuse ports
within a single test, this goes away.

>> 5) Servers are shutdown with -m 'immediate', which can lead to races
>> in the script when archiving is turned on. That may be good for some
>> tests, but there's no control over it.
> 
> I hesitated with fast here actually. So changed this way. We would
> want as wall a teardown command to stop the node with immediate and
> unregister the node from the active list.
> 

In particular, I was shutting down an archiving node and the archiving
was truncated. I *think* smart doesn't do this. But again, it's really
that the test writer can't easily override, not that the default is wrong.

>> Other issues:
>> 6. Directory structure, used one directory per thing but more logical
>> to place all things related to an instance under a single directory,
>> and name them according to role (57333_backup, and so on).
> 
> Er, well. The first version of the patch did so, and then I switched
> to an approach closer to what the existing TAP facility is doing. But
> well let's simplify things a bit.
> 

I know, I know, but:
1) an instance is a "thing" in your script, so having its associated
paraphernalia in one place makes more sense (maybe only to me).
2) That's often how folks (well, how I) arrange things in deployment,
at least with archive/backup as symlinks to the nas.

Alternatively, naming the dirs with a prefix (srv_foo_HASH,
backup_foo_backupname_HASH, etc') would work as well.

>> 7. enable_restoring() uses "cp -i" 'archive_command', not a good fit
>> for an automated test.
> 
> This seems like a good default to me, and actually that's portable on
> Windows easily. One could always append a custom archive_command in a
> test when for example testing conflicting archiving when archive_mode
> = always.
> 

Ok, I wasn't sure about this, but specifically activating a switch that
asks for input from the user during a test? hmm.

 >> 8. No canned way to output a pprinted overview of the running system
>> (paths, ports, for manual checking).
> 
> Hm. Why not... Are you suggesting something like print_current_conf
> that goes through all the registered nodes and outputs that? How would
> you use it?
>

For one thin, I could open a few terminals and `$(print_env_for_server
5437), so psql just worked.

I wish PEG had that as well.


>> 10. If a test passes/fails or dies due to a bug, everything is cleaned.
>> Great for testing, bad for postmortem.
> 
> That's something directly related to TestLib.pm where
> File:Temp:tempdir creates a temporary path with CLEANUP = 1. We had
> discussions regarding that actually...
> 

>> 11. a canned "server is responding to queries" helper would be convenient.
> 
> Do you mean a wrapper on pg_isready? Do you have use cases in mind f

[HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.

2015-10-03 Thread Amir Rohan

On 10/03/2015 02:38 PM, Michael Paquier wrote:
> On Fri, Oct 2, 2015 at 11:10 PM, Amir Rohan wrote:
>> On 10/02/2015 03:33 PM, Michael Paquier wrote:
>>> Any server instances created during the tests should never use a
>>> user-defined port for portability. Hence using those ports as keys
>>> just made sense. We could have for example custom names, that have
>>> port values assigned to them, but that's actually an overkill and
>>> complicates the whole facility.
>>>
>>
>> Something like:
>>
>> global nPortsAssigned = 0;
>> AssignPort() -> return is_ok(nPortsAssigned++)
>>
>> was what I used.
> 
> Why do you need that. Creating a node is in the most basic way a
> matter of calling make_master, make_*_standby where a port number, or
> identifier gets uniquely assigned to a node. The main point of the
> approach taken by this patch is to make port assignment transparent
> for the caller.
> 

See next.

>> Granted, you have to try fairly hard to shoot yourself in the leg,
>> but since the solution is so simple, why not? If we never reuse ports
>> within a single test, this goes away.
> 
> Well, you can reuse the same port number in a test. Simply teardown
> the existing node and then recreate a new one. I think that port
> number assignment to a node should be transparent to the caller, in
> our case the perl test script holding a scenario.
> 

What part of "Never assign the same port twice during one test"
makes this "not transparent to the user"?

If you're thinking about parallel test, I don't think you
need to worry. Availability checks take care of one part,
and the portnum-as-map-key-is-test-local takes care of the
other.

But, see next.

> 
>>>> 4) Port assignment relies on liveness checks on running servers.
>>>> If a server is shut down and a new instantiated, the port will get
>>>> reused, data will get trashed, and various confusing things can happen.
>>>
>>> Right. The safest way to do that is to check in get_free_port if a
>>> port number is used by a registered node, and continue to loop in if
>>> that's the case. So done.
>>>
>>
>> That eliminates the "sweet and gentle" variant of the scenario, but it's
>> susceptible to the "ABA problem":
>> https://en.wikipedia.org/wiki/ABA_problem
>> https://youtu.be/CmxkPChOcvw?t=786
> 
> I learnt a new thing here. That's basically an existing problem even
> with the existing perl test infrastructure relying on TestLib.pm when
> tests are run in parallel. What we would need here is a global mapping
> file storing all the port numbers used by all the nodes currently in
> the tests.
> 

Yeah, a poorman's way to ensure ports aren't reused (I wasn't very
clear at top of post ) is something like:

global nPortsAssigned = 0;

AssignPort():
   basePort = BASE_PORT;  # the lowest port we use
   while(!available(basePort+nPortsAssigned)):
   basePort++

   nPortsAssigned++

   return basePort;

It has its glaring faults, but would probably work ok.
In any case, I'm sure you can do better.

>>
>> Granted, you have to try fairly hard to shoot yourself in the leg,
>> but since the solution is so simple, why not? If we never reuse ports
>> within a single test, this goes away.
> 
> Well, you can reuse the same port number in a test. Simply teardown
> the existing node and then recreate a new one. I think that port
> number assignment to a node should be transparent to the caller, in
> our case the perl test script holding a scenario.
> 

I was using you *never* want to reuse port numbers. That is, as long
as the lib ensures we never reuse ports within one test, all kinds
of corner cases are eliminated.

>>>> 5) Servers are shutdown with -m 'immediate', which can lead to races
>>>> in the script when archiving is turned on. That may be good for some
>>>> tests, but there's no control over it.
>>>
>>> I hesitated with fast here actually. So changed this way. We would
>>> want as wall a teardown command to stop the node with immediate and
>>> unregister the node from the active list.
>>>
>>
>> In particular, I was shutting down an archiving node and the archiving
>> was truncated. I *think* smart doesn't do this. But again, it's really
>> that the test writer can't easily override, not that the default is wrong.
> 
> Ah, OK. Then fast is just fine. It shuts down the node correctly.
> "smart" would wait for all the current connections to finish but I am
> wondering if it

[HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.

2015-10-03 Thread Amir Rohan

On 10/03/2015 03:50 PM, Amir Rohan wrote:
> On 10/03/2015 02:38 PM, Michael Paquier wrote:
>> On Fri, Oct 2, 2015 at 11:10 PM, Amir Rohan wrote:
>>> On 10/02/2015 03:33 PM, Michael Paquier wrote:
>>>
>>> Granted, you have to try fairly hard to shoot yourself in the leg,
>>> but since the solution is so simple, why not? If we never reuse ports
>>> within a single test, this goes away.
>>
>> Well, you can reuse the same port number in a test. Simply teardown
>> the existing node and then recreate a new one. I think that port
>> number assignment to a node should be transparent to the caller, in
>> our case the perl test script holding a scenario.
>>
> 
> What part of "Never assign the same port twice during one test"
> makes this "not transparent to the user"?
> 
> If you're thinking about parallel tests, I don't think you
> need to worry. Availability checks take care of one part,

Except now that I think of it, that's definitely a race:

Thread1: is_available(5432) -> True
Thread2: is_available(5432) -> True
Thread1: listen(5432) -> True
Thread2: listen(5432) -> #$@#$&@#$^&$#@&

I don't know if parallel tests are actually supported, though.
If theye are, you're right that this is a shared global
resource wrt concurrency.

Amir



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.

2015-10-04 Thread Amir Rohan

On 10/02/2015 03:33 PM, Michael Paquier wrote:
>
>

Michael, I'm afraid my email bungling has damaged your thread.

I didn't include an "In-reply-To" header when I posted:

trinity-b4a8035d-59af-4c42-a37e-258f0f28e44a-1443795007012@3capp-mailcom-lxa08.

And we subsequently had our discussion over there instead of here, where
the commitfest app is tracking it.

https://commitfest.postgresql.org/6/197/

Perhaps it would help a little if you posted the latest patch here as
well? So that at least the app picks it up again.

Apologies for my ML n00bness,
Amir

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.

2015-10-04 Thread Amir Rohan

On 10/04/2015 04:29 PM, Andres Freund wrote:
> On October 4, 2015 3:27:00 PM GMT+02:00, Amir Rohan  
> wrote:
> 
>> Perhaps it would help a little if you posted the latest patch here as
>> well? So that at least the app picks it up again.
> 
> You can as additional threads in the cf app.
> 

Done, thank you.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.

2015-10-07 Thread Amir Rohan

On 10/07/2015 09:27 AM, Michael Paquier wrote:
> On Wed, Oct 7, 2015 at 7:51 AM, Michael Paquier
>  wrote:
>> On Wed, Oct 7, 2015 at 7:43 AM, Michael Paquier
>>  wrote:
>>> On Wed, Oct 7, 2015 at 5:58 AM, Robert Haas wrote:
 On Sat, Oct 3, 2015 at 7:38 AM, Michael Paquier wrote:
 It seems that these days 'make check' creates a directory in /tmp
 called /tmp/pg_regress-RANDOMSTUFF.  Listening on TCP ports is
 disabled, and the socket goes inside this directory with a name like
 .s.PGSQL.PORT.  You can connect with psql -h
 /tmp/pg_regress-RANDOMSTUFF -p PORT, but not over TCP.  This basically
 removes the risk of TCP port number collisions, as well as the risk of
 your temporary instance being hijacked by a malicious user on the same
 machine.
>>>
>>> Right, that's for example /var/folders/ on OSX, and this is defined
>>> once per test run via $tempdir_short. PGHOST is set to that as well.
>>
>> Er, mistake here. That's actually once per standard_initdb, except
>> that all the tests I have included in my patch run it just once to
>> create a master node. It seems that it would be wiser to set one
>> socket dir per node then, remove the port assignment stuff, and use
>> tempdir_short as a key to define a node as well as in the connection
>> string to this node. I'll update the patch later today...
> 
> So, my conclusion regarding multiple calls of make_master is that we
> should not allow to do it. On Unix/Linux we could have a separate unix
> socket directory for each node, but not on Windows where
> listen_addresses is set to look after 127.0.0.1. On Unix/Linux, PGHOST
> is set by the master node to a tempdir once and for all. Hence, to
> make the code more consistent, I think that we should keep the port
> lookup machinery here. An updated patch is attached.
> 

If parallel tests are supported, get_free_port is still racy even
with last_port_found because it's:
1) process-local.
2) even if it were shared, there's the race window between the
available-check and the listen() I mentioned upthread.

If parallel tests are explicitly disallowed, a comment to that
effect (and a note on things known to break) might help someone
down the road.

Also, the removal of poll_query_until from pg_rewind looks suspiciously
like a copy-paste gone bad. Pardon if I'm missing something.

Amir



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.

2015-10-07 Thread Amir Rohan

On 10/07/2015 10:29 AM, Michael Paquier wrote:
> On Wed, Oct 7, 2015 at 4:16 PM, Amir Rohan  wrote:
>> On 10/07/2015 09:27 AM, Michael Paquier wrote:
>>> On Wed, Oct 7, 2015 at 7:51 AM, Michael Paquier
>>>  wrote:
>>>> On Wed, Oct 7, 2015 at 7:43 AM, Michael Paquier
>>>>  wrote:
>>>>> On Wed, Oct 7, 2015 at 5:58 AM, Robert Haas wrote:
>>>>>> On Sat, Oct 3, 2015 at 7:38 AM, Michael Paquier wrote:
>>>>>> It seems that these days 'make check' creates a directory in /tmp
>>>>>> called /tmp/pg_regress-RANDOMSTUFF.  Listening on TCP ports is
>>>>>> disabled, and the socket goes inside this directory with a name like
>>>>>> .s.PGSQL.PORT.  You can connect with psql -h
>>>>>> /tmp/pg_regress-RANDOMSTUFF -p PORT, but not over TCP.  This basically
>>>>>> removes the risk of TCP port number collisions, as well as the risk of
>>>>>> your temporary instance being hijacked by a malicious user on the same
>>>>>> machine.
>>>>>
>>>>> Right, that's for example /var/folders/ on OSX, and this is defined
>>>>> once per test run via $tempdir_short. PGHOST is set to that as well.
>>>>
>>>> Er, mistake here. That's actually once per standard_initdb, except
>>>> that all the tests I have included in my patch run it just once to
>>>> create a master node. It seems that it would be wiser to set one
>>>> socket dir per node then, remove the port assignment stuff, and use
>>>> tempdir_short as a key to define a node as well as in the connection
>>>> string to this node. I'll update the patch later today...
>>>
>>> So, my conclusion regarding multiple calls of make_master is that we
>>> should not allow to do it. On Unix/Linux we could have a separate unix
>>> socket directory for each node, but not on Windows where
>>> listen_addresses is set to look after 127.0.0.1. On Unix/Linux, PGHOST
>>> is set by the master node to a tempdir once and for all. Hence, to
>>> make the code more consistent, I think that we should keep the port
>>> lookup machinery here. An updated patch is attached.
>>>
>> If parallel tests are supported, get_free_port is still racy even
>> with last_port_found because it's:
>> 1) process-local.
>> 2) even if it were shared, there's the race window between the
>> available-check and the listen() I mentioned upthread.
>>
>> If parallel tests are explicitly disallowed, a comment to that
>> effect (and a note on things known to break) might help someone
>> down the road.
> 
> Actually, no, port lookup will not map and parallel tests would work
> fine thinking more about it, each set of tests uses its own PGHOST to
> a private unix socket directory so even if multiple tests use the same
> port number they won't interact with each other because they connect
> to different socket paths. MinGW is a problem though, and an existing
> one in the perl test scripts, I recall that it can use make -j and
> that's on Windows where PGHOST is mapping to 127.0.0.1 only.
> 

ah, the portnum is actually a real tcp port only on windows, and
the race is limited to that case as you say. Note that in the
tcp case, using psql to check is wrong:
$ nc -l 8001  # listen on 8001
$ psql -X -h lo -p 8001 postgres < /dev/null psql: could not connect to
server: Connection refused
Is the server running on host "lo" (127.0.0.1) and accepting
TCP/IP connections on port 8001?

The port isn't free, but psql is really only checking if pg is there
and reports that the port is available. That's a fairly mild issue, though.

>> Also, the removal of poll_query_until from pg_rewind looks suspiciously
>> like a copy-paste gone bad. Pardon if I'm missing something.
> 
> Perhaps. Do you have a suggestion regarding that? It seems to me that
> this is more useful in TestLib.pm as-is.
> 

My mistake, the patch only shows some internal function being deleted
but RewindTest.pm (obviously) imports TestLib. You're right, TestLib is
a better place for it.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.

2015-10-08 Thread Amir Rohan

On 10/08/2015 08:19 AM, Michael Paquier wrote:
> On Wed, Oct 7, 2015 at 5:44 PM, Amir Rohan wrote:
>> On 10/07/2015 10:29 AM, Michael Paquier wrote:
>>> On Wed, Oct 7, 2015 at 4:16 PM, Amir Rohan wrote:
>>>> Also, the removal of poll_query_until from pg_rewind looks suspiciously
>>>> like a copy-paste gone bad. Pardon if I'm missing something.
>>>
>>> Perhaps. Do you have a suggestion regarding that? It seems to me that
>>> this is more useful in TestLib.pm as-is.
>>>
>>
>> My mistake, the patch only shows some internal function being deleted
>> but RewindTest.pm (obviously) imports TestLib. You're right, TestLib is
>> a better place for it.
> 
> OK. Here is a new patch version. I have removed the restriction
> preventing to call make_master multiple times in the same script (one
> may actually want to test some stuff related to logical decoding or
> FDW for example, who knows...), forcing PGHOST to always use the same
> value after it has been initialized. I have added a sanity check
> though, it is not possible to create a node based on a base backup if
> no master are defined. This looks like a cheap insurance... I also
> refactored a bit the code, using the new init_node_info to fill in the
> fields of a newly-initialized node, and I removed get_free_port,
> init_node, init_node_from_backup, enable_restoring and
> enable_streaming from the list of routines exposed to the users, those
> can be used directly with make_master, make_warm_standby and
> make_hot_standby. We could add them again if need be, somebody may
> want to be able to get a free port, set up a node without those
> generic routines, just that it does not seem necessary now.
> Regards,
> 

If you'd like, I can write up some tests for cascading replication which
are currently missing.

Someone mentioned a daisy chain setup which sounds fun. Anything else in
particular? Also, it would be nice to have some canned way to measure
end-to-end replication latency for variable number of nodes.
What about going back through the commit log and writing some regression
tests for the real stinkers, if someone care to volunteer some candidate
bugs

Amir





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.

2015-10-08 Thread Amir Rohan

On 10/08/2015 10:39 AM, Michael Paquier wrote:
> On Thu, Oct 8, 2015 at 3:59 PM, Amir Rohan  wrote:
>> On 10/08/2015 08:19 AM, Michael Paquier wrote:
>>> On Wed, Oct 7, 2015 at 5:44 PM, Amir Rohan wrote:
>>>> On 10/07/2015 10:29 AM, Michael Paquier wrote:
>>>>> On Wed, Oct 7, 2015 at 4:16 PM, Amir Rohan wrote:
>>>>>> Also, the removal of poll_query_until from pg_rewind looks suspiciously
>>>>>> like a copy-paste gone bad. Pardon if I'm missing something.
>>>>>
>>>>> Perhaps. Do you have a suggestion regarding that? It seems to me that
>>>>> this is more useful in TestLib.pm as-is.
>>>>>
>>>>
>>>> My mistake, the patch only shows some internal function being deleted
>>>> but RewindTest.pm (obviously) imports TestLib. You're right, TestLib is
>>>> a better place for it.
>>>
>>> OK. Here is a new patch version. I have removed the restriction
>>> preventing to call make_master multiple times in the same script (one
>>> may actually want to test some stuff related to logical decoding or
>>> FDW for example, who knows...), forcing PGHOST to always use the same
>>> value after it has been initialized. I have added a sanity check
>>> though, it is not possible to create a node based on a base backup if
>>> no master are defined. This looks like a cheap insurance... I also
>>> refactored a bit the code, using the new init_node_info to fill in the
>>> fields of a newly-initialized node, and I removed get_free_port,
>>> init_node, init_node_from_backup, enable_restoring and
>>> enable_streaming from the list of routines exposed to the users, those
>>> can be used directly with make_master, make_warm_standby and
>>> make_hot_standby. We could add them again if need be, somebody may
>>> want to be able to get a free port, set up a node without those
>>> generic routines, just that it does not seem necessary now.
>>> Regards,
>>>
>>
>> If you'd like, I can write up some tests for cascading replication which
>> are currently missing.
> 
> 001 is testing cascading, like that node1 -> node2 -> node3.
> 
>> Someone mentioned a daisy chain setup which sounds fun. Anything else in
>> particular? Also, it would be nice to have some canned way to measure
>> end-to-end replication latency for variable number of nodes.
> 
> Hm. Do you mean comparing the LSN position between two nodes even if
> both nodes are not connected to each other? What would you use it for?
> 

In a cascading replication setup, the typical _time_ it takes for a
COMMIT on master to reach the slave (assuming constant WAL generation
rate) is an important operational metric.

It would be useful to catch future regressions for that metric,
which may happen even when a patch doesn't outright break cascading
replication. Just automating the measurement could be useful if
there's no pg facility that tracks performance over time in
a regimented fashion. I've seen multiple projects which consider
a "benchmark suite" to be part of its testing strategy.


As for the "daisy chain" thing, it was (IIRC) mentioned in a josh berkus
talk I caught on youtube. It's possible to setup cascading replication,
take down the master, and then reinsert it as replicating slave, so that
you end up with *all* servers replicating from the
ancestor in the chain, and no master. I think it was more
a fun hack then anything, but also an interesting corner case to
investigate.

>> What about going back through the commit log and writing some regression
>> tests for the real stinkers, if someone care to volunteer some candidate
>> bugs
> 
> I have drafted a list with a couple of items upthread:
> http://www.postgresql.org/message-id/cab7npqsgffsphocrhfoasdanipvn6wsh2nykf1kayrm+9_m...@mail.gmail.com
> So based on the existing patch the list becomes as follows:
> - wal_retrieve_retry_interval with a high value, say setting to for
> example 2/3s and loop until it is applied by checking it is it has
> been received by the standby every second.
> - recovery_target_action
> - archive_cleanup_command
> - recovery_end_command
> - pg_xlog_replay_pause and pg_xlog_replay_resume
> In the list of things that could have a test, I recall that we should
> test as well 2PC with the recovery delay, look at a1105c3d. This could
> be included in 005.

a1105c3 Mar 23 Fix copy & paste error in 4f1b890b137.  Andres Freund
4f1b890 Mar 15 Merge the various forms of transaction commit & abort
records.  Andres Freund

Is that the right commit?

> The advantage of implementing that now is

[HACKERS] Proposal: pg_confcheck - syntactic & semantic validation of postgresql configuration files

2015-10-08 Thread Amir Rohan

Hi,

Related SO question from '13:

https://stackoverflow.com/questions/1619/how-to-syntax-check-postgresql-config-files

Peter Eisentraut answered:

> There is no way to do this that is similar to apache2ctl. If you reload
> the configuration files and there is a syntax error, the PostgreSQL
> server will complain in the log and refuse to load the new file.
> <...>
> (Of course, this won't guard you against writing semantically wrong
> things, but apache2ctl won't do that either.) 

So, at least one person in the history of the universe (besides me)
has wanted a tool that does this.

In addition to a simple syntax check, there's a bunch of "config wisdom"
tidbits I've encountering, which is scattered through talks, commit
messages, and mailing list discussion, and documentation notes
(chapter 17, paragraph 12). These could be collected into a tool that:

- Checks your configuration's syntax
- Checks for semantically legal values ('read committed' not
'read_committed' )
- Warns of unrecognized keys ("'hot_standby' is not a valid GUC in v9.1").
- Is version-aware.
- Serves as an "executable" form of past experience.
- Describes the config problem in a structured way (as an expression,
for example)
- Includes a friendly, semi-verbose, description of the problem.
- Describes ways to fix the problem, *right there*.
- Is independent from server (but reuses the same parser), particularly
any life-cycle commands such as restart.

Users who adopt the tool will also seem more likely to contribute back
coverage for any new pitfalls they fall into, which can yield feedback
for developers and drive improvements in documentation.

Those inclined can use the tool in their ops repo' CI.

Two talk videos have been particularly valuable sources for example
of configuration that can bite you:

"...Lag - What's Wrong with My Slave?"  (Samantha Billington, 2015)
https://youtu.be/If22EwtCFKc

"PostgreSQL Replication Tutorial"(Josh berkus,2015 )
https://youtu.be/GobQw9LMEaw

What I've collected so far:

- Quoting rules for recovery.conf and postgresql.conf are different
- 'primary_conninfo' is visible to unprivileged users, so including a
password is a security risk.
- streaming replication + read slave should consider turning on
"hot_standby_feedback".
- replication_timeout < wal_receiver_status_interval is bad.
- setting max_wal_senders too low so no streaming replication
block backup with pg_basebackup
- max_wal_senders without wal_level
- recently on "weekly news":
  Fujii Masao pushed:

  Document that max_worker_processes must be high enough in standby.
  The setting values of some parameters including max_worker_processes
  must be equal to or higher than the values on the master. However,
  previously max_worker_processes was not listed as such parameter in
  the document. So this commit adds it to that list.  Back-patch to
  9.4 where max_worker_processes was added.

http://git.postgresql.org/pg/commitdiff/1ea5ce5c5f204918b8a9fa6eaa8f3f1374aa8aec

- turning on replication with default max_wal_senders =0
- FATAL:  WAL archival (archive_mode=on) requires wal_level "archive",
"hot_standby", or "logical"

There must be more, please mention any other checks you feel should
be included.

Note: The server _will_ explicitly complain about the last two but
a bad "wal_level" for example is only checked once the server
is already down:

"LOG:  parameter "wal_level" cannot be changed without restarting the
server"

Implementation: without getting too far ahead, I feel rules should be
stored as independent files in a drop directory, in some lightweight,
structured format (JSON, YAML, custom textual format), with some
mandatory fields (version, severity, description, solution(s)).
This makes it easy for people to add new checks without making any
oblique language demands (Perl isn't as popular as it used
to be)

Comments?

Amir



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: pg_confcheck - syntactic & semantic validation of postgresql configuration files

2015-10-08 Thread Amir Rohan

On 10/08/2015 02:38 PM, Tom Lane wrote:
> Amir Rohan  writes:
>> Comments?
> 
> ISTM that all of the "functional" parts of this are superseded by
> pg_file_settings; 

To use the new pg_file_settings view you need:
1( 9.5+
2( a running server

The feature is describes as follows in a97e0c3354ace5d74

> The information returned includes the configuration file name, line
> number in that file, sequence number indicating when the parameter is
> loaded )useful to see if it is later masked by another definition of
> the same parameter(, parameter name, and what it is set to at that
> point. This information is updated on reload of the server.

None of which has much overlap in purpose with what I was suggesting, so
I don't see why you think it actually makes it redundant.

> or at least, if they aren't, you need to provide a
> rationale that doesn't consist only of pointing to pre-9.5 discussions.

The original rationale already does AFAICT.

> The "advice" parts of it maybe are still useful, but a tool that's just
> meant for that would look quite a bit different.  Maybe what you're really
> after is to update pgtune.
> 

In the sense that pgtune processes .conf files and helps the user choose
better settings, yes there's commonality.
But in that it looks at 5 GUCs or so and applies canned formulas to
silently write out a new .conf without explanation, it's not the
same tool.

> Also, as a general comment, the sketch you provided seems to require
> porting everything the server knows about 

> GUC file syntax,

yes, a parser, possibly sharing code with the server.

> file locations,

specified via command line option, yes.

> variable names and values 

misc/guc.c is eminently parseable in that sense.
For some types, validation is trivial.
For others, we do the best we can and improve incrementally.

> into some other representation that apparently
> is not C, nor Perl either.  

Well, it's data, it probably should have that anyway.

You raise an interesting issue that having the schema for the
configuration described in a more congenial format than C code would
have made implementing this easier, and may actually be worthwhile
in its own right.

Aside on parsing, when starting a brand new configuration file was
suggested in
caog9aphycpmtypaawfd3_v7svokbnecfivmrc1axhb40zbs...@mail.gmail.com/

only so json could be used for configuring that feature,
I wrote a parser that extends postgresql.conf to accept JSON objects as
values as a toy exercise. It's not rocket science.

So, None of the above seem like a major hurdle to me.

> I think that's likely to be impossible to keep accurate or up to date.  

If significant rot sets in a few releases down the line, there may be
more false positives. But then again if users found it useful so
developers cared about keeping it up to data, it wouldn't get that way.
Again, previous note about DRY and lack of explicit schema for GUCs
except in code form.

Also, this depends crucially on the GUC churn rate, which admittedly you
are far better qualified to pronounce on than me.

> Just as a thought experiment,
> ask yourself how you'd validate the TimeZone setting in external code,
bearing in mind that
> anytime you don't give exactly the same answer the server would, your tool
> has failed to be helpful.

A tool may be extremely useful for a hundred checks, and poor
in a few others. That would make it imperfect, not useless.

If TimeZone turns out to be an extremely challenging GUC to validate,
I'd unceremoniously skip the semantic check for it.

Kind Regards,
Amir

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.

2015-10-08 Thread Amir Rohan

On 10/08/2015 04:47 PM, Michael Paquier wrote:
> On Thu, Oct 8, 2015 at 6:03 PM, Amir Rohan wrote:
>> On 10/08/2015 10:39 AM, Michael Paquier wrote:
>>>> Someone mentioned a daisy chain setup which sounds fun. Anything else in
>>>> particular? Also, it would be nice to have some canned way to measure
>>>> end-to-end replication latency for variable number of nodes.
>>>
>>> Hm. Do you mean comparing the LSN position between two nodes even if
>>> both nodes are not connected to each other? What would you use it for?
>>>
>>
>> In a cascading replication setup, the typical _time_ it takes for a
>> COMMIT on master to reach the slave (assuming constant WAL generation
>> rate) is an important operational metric.
> 
> Hm. You mean the exact amount of time it gets to be sure that a given
> WAL position has been flushed on a cascading standby, be it across
> multiple layers. Er, that's a bit tough without patching the backend
> where I guess we would need to keep a track of when a LSN position has
> been flushed. And calls of gettimeofday are expensive, so that does
> not sound like a plausible alternative here to me...
> 

Wouldn't this work?

1) start timer
2) Grab pg_stat_replication.sent_location from master
3) pg_switch_xlog() # I /think/ we want this, could be wrong
4) Poll slave's pg_last_xlog_replay_location() until LSN shows up
5) stop timer

Amir





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.

2015-10-09 Thread Amir Rohan

On 10/09/2015 02:12 PM, Michael Paquier wrote:
> On Fri, Oct 9, 2015 at 8:25 AM, Michael Paquier
>  wrote:
>> On Thu, Oct 8, 2015 at 11:28 PM, Amir Rohan wrote:
>>> Wouldn't this work?
>>> 1) start timer
>>> 2) Grab pg_stat_replication.sent_location from master
>>> 3) pg_switch_xlog() # I /think/ we want this, could be wrong
>>
>> For a warm standby, you would want that, but this depends on two factors:
>> - The moment master completes archiving of this segment
>> - The moment standby restores it.
>> On slow machines, those two things become by far the bottleneck,
>> imagine a PI restricted on I/O with a low-class SD card in the worst
>> case (I maintain one, with a good card, still the I/O is a
>> bottleneck).
>>
>>> 4) Poll slave's pg_last_xlog_replay_location() until LSN shows up
>>> 5) stop timer
>>
>> That's not really solid, there is an interval of time between the
>> moment the LSN position is taken from the master and the standby. An
>> accurate method is to log/store on master when a given WAL position
>> has been flushed to disk, and do the same on slave at replay for this
>> LSN position. In any case this is doing to flood badly the logs of
>> both nodes, and as the backend cares about the performance of
>> operations in this code path we won't want to do that anyway.
>>
>> To make it short, it seems to me that simply waiting until the LSN a
>> test is waiting for has been replayed is just but fine for this set of
>> tests to ensure their run consistency, let's not forget that this is
>> the goal here.
> 
> In terms of features, it seems that this patch has everything it needs
> to allow one to design tests to work on both Linux and Windows, and it
> is careful regarding CVE-2014-0067. Thoughts about moving that as
> "Ready for committer"?
> 

Ok, I've put myself down as reviewer in cfapp. I don't think I can
provide any more useful feedback that would actually result in changes
at this point, but I'll read through the entire discussion once last
time and write down final comments/notes. After that I have no problem
marking this for a committer to look at.

Amir



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: pg_confcheck - syntactic & semantic validation of postgresql configuration files

2015-10-09 Thread Amir Rohan

On 10/09/2015 09:55 PM, Robert Haas wrote:
> On Thu, Oct 8, 2015 at 9:06 AM, Amir Rohan  wrote:
>> On 10/08/2015 02:38 PM, Tom Lane wrote:
>>> Amir Rohan  writes:
>>>> Comments?
>>>
>>> ISTM that all of the "functional" parts of this are superseded by
>>> pg_file_settings;
>>
>> To use the new pg_file_settings view you need:
>> 1( 9.5+
>> 2( a running server
>>
>> The feature is describes as follows in a97e0c3354ace5d74
>>
>>> The information returned includes the configuration file name, line
>>> number in that file, sequence number indicating when the parameter is
>>> loaded )useful to see if it is later masked by another definition of
>>> the same parameter(, parameter name, and what it is set to at that
>>> point. This information is updated on reload of the server.
>>
>> None of which has much overlap in purpose with what I was suggesting, so
>> I don't see why you think it actually makes it redundant.
> 
> I think Tom's point is that if you want to see whether the file
> reloads without errors, you can use this view for that.  That would
> catch stuff like wal_level=lgocial and
> default_transaction_isolation='read_committed'.
> 

It does catch bad syntax, but in most cases all you get is
"The setting could not be applied".  that's not great for enums
or a float instead of an int. I guess a future version will fix that
(or not). You need a running server to run a check. You need to monkey
with said server's configuration in place to run a check. You must be on
9.5+. The checking mechanism isn't extensible. Certainly not as easily
as dropping a new rule file somewhere. It doesn't check (AFAICT) for bad
combinations of values, for example it will tell you that you can't
change `wal_archive` without restart (without showing source location
btw, bug?), but not that you better set `wal_level` *before* you
restart.  It doesn't do any semantic checks. It won't warn you
about things that are not actually an error, just a bad idea.

Forgive me if any of the above betrays an ignorance of some of
pg_file_settings's finer points. I have only read the documentation and
tried it in a shell.

There was also concern about the prohibitive maintenance burden such a
tool would impose. ISTM all you actually need is a JSON file generated
from the output of `select * from pg_settings` to make the really
tedious bit completely automatic. The semantic checks are by their
nature "best effort", and it's my impression (and only that) that
they are more stable, in the sense that bad ideas remain so.

That's why I disagree with tom's suggestion that pg_file_settings
obviates the need for the tool I proposed. Which isn't at all to
say it doesn't solve another very well.

Amir

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.

2015-10-10 Thread Amir Rohan

On 10/10/2015 02:43 PM, Michael Paquier wrote:
> On Fri, Oct 9, 2015 at 8:53 PM, Michael Paquier
>  wrote:
>> On Fri, Oct 9, 2015 at 8:47 PM, Amir Rohan wrote:
>>> Ok, I've put myself down as reviewer in cfapp. I don't think I can
>>> provide any more useful feedback that would actually result in changes
>>> at this point, but I'll read through the entire discussion once last
>>> time and write down final comments/notes. After that I have no problem
>>> marking this for a committer to look at.
>>
>> OK. If you have any comments or remarks, please do not hesitate at all!
> 
> So, to let everybody know the issue, Amir has reported me offlist a
> bug in one of the tests that can be reproduced more easily on a slow
> machine:
> 

Yeah, I usually stick to the list for discussion, but I ran an earlier
version without issues and thought this might be a problem with my
system as I've changed things a bit this week.

Now that v9 fixes the probkem, here's a summary from going over the
entire thread one last time:

# Windows and TAP sets
Noah (2015-03) mentioned TAP doesn't work on windows, and hoped
this would include some work on that.

IIUC, the facilities and tests do run on windows, but focus was there
and not the preexisting TAP suite.

# Test coverage (in the future)
Andres wanted a test for xid/multixid wraparound which also raises
the question of the tests that will need to be written in the future.

The patch focuses on providing facilities, while providing new coverage
for several features. There should be a TODO list on the wiki (bug
tracker, actually), where the list of tests to be written can be managed.

Some were mentioned in the thread (multi/xid wraparound
hot_standby_feedback, max_standby_archive_delay and
max_standby_streaming_delay? recovery_target_action? some in your
original list?), but threads
are precisely where these things get lost in the cracks.

# Interactive use vs. TAP tests

Early on the goal was also to provide something for interactive use
in order to test scenarios. The shift has focused to the TAP tests
and some of the choices in the API reflect that. Interactive use
is possible, but wasn't a central requirement.

# Directory structure

I suggested keeping backup/log/PGDATA per instance, rejected.

# Parallel tests and port collisions

Lots about this. Final result is no port races are possible because
dedicated dirs are used per test, per instance. And because tcp
isn't used for connections on any platform (can you confirm that's
true on windows as well? I'm not familiar with sspi and what OSI
layer it lives on)

# Allow test to specify shutdown mode

Added

# decouple cleanup from node shutdown

Added (in latest patches?)

# Conveniences for test writing vs. running

My suggestions weren't picked up, but for one thing setting CLEANUP=0
in the lib (which means editing it...) can be useful for writers.

# blocking until server ready

pg_isready wrapper added.

# Multiple masters

back and forth, but supported in latest version.

That's it. I've ran the latest (v9) tests works and passed on my system
(fedora 64bit) and also under docker with --cpu-quota=1, which
simulates a slow machine.

Michael, is there anything else to do here or shall I mark this for
committer review?

Regards,
Amir

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.

2015-10-10 Thread Amir Rohan

On 10/10/2015 04:32 PM, Michael Paquier wrote:
> On Sat, Oct 10, 2015 at 9:04 PM, Amir Rohan wrote:
>> The patch focuses on providing facilities, while providing new coverage
>> for several features. There should be a TODO list on the wiki (bug
>> tracker, actually), where the list of tests to be written can be managed.
>> Some were mentioned in the thread (multi/xid wraparound
>> hot_standby_feedback, max_standby_archive_delay and
>> max_standby_streaming_delay? recovery_target_action? some in your
>> original list?), but threads
>> are precisely where these things get lost in the cracks.
> 
> Sure, that's an on-going task.
>  

I was arguing that it's an on-going task that would do
better if it had a TODO list, instead of "ideas for tests"
being scattered across 50-100 messages spanning a year or
more in one thread or another. You may disagree.

Amir



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.

2015-10-10 Thread Amir Rohan

On 10/10/2015 04:32 PM, Michael Paquier wrote:
> On Sat, Oct 10, 2015 at 9:04 PM, Amir Rohan wrote:
>> Now that v9 fixes the problem, here's a summary from going over the
>> entire thread one last time:
> 
> Thanks a lot for the summary of the events.
> 
>> # Windows and TAP sets
>> Noah (2015-03) mentioned TAP doesn't work on windows, and hoped
>> this would include some work on that.
>> IIUC, the facilities and tests do run on windows, but focus was there
>> and not the preexisting TAP suite.
> 
> They do work on Windows, see 13d856e.
> 

Thanks, I did not know that.

>> # Test coverage (in the future)
>> Andres wanted a test for xid/multixid wraparound which also raises
>> the question of the tests that will need to be written in the future.
> 
> I recall that this would have needed extra functions on the backend...
> 
>> The patch focuses on providing facilities, while providing new coverage
>> for several features. There should be a TODO list on the wiki (bug
>> tracker, actually), where the list of tests to be written can be managed.
>> Some were mentioned in the thread (multi/xid wraparound
>> hot_standby_feedback, max_standby_archive_delay and
>> max_standby_streaming_delay? recovery_target_action? some in your
>> original list?), but threads
>> are precisely where these things get lost in the cracks.
> 
> Sure, that's an on-going task.
> 
>> # Directory structure
>> I suggested keeping backup/log/PGDATA per instance, rejected.
> 
> I guess that I am still flexible on this one, the node information
> (own PGDATA, connection string, port, etc.) is logged as well so this
> is not a big deal to me...
> 
>> # Parallel tests and port collisions
>> Lots about this. Final result is no port races are possible because
>> dedicated dirs are used per test, per instance. And because tcp
>> isn't used for connections on any platform (can you confirm that's
>> true on windows as well? I'm not familiar with sspi and what OSI
>> layer it lives on)
> 
> On Windows you remain with the problem that all nodes initialized
> using TestLib.pm will listen to 127.0.0.1, sspi being used to ensure
> that the connection at user level is secure (additional entries in
> pg_hba.conf are added).
> 
>> # decouple cleanup from node shutdown
>> Added (in latest patches?)
> 
> Yes this was added.
> 
>> Michael, is there anything else to do here or shall I mark this for
>> committer review?
> 
> I have nothing else. Thanks a lot!
> 

Ok, marked for committer, I hope I'm following "correct" cf procedure.

Regards,
Amir




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.

2015-10-11 Thread Amir Rohan

On 10/11/2015 02:47 AM, Michael Paquier wrote:
> On Sat, Oct 10, 2015 at 10:52 PM, Amir Rohan  wrote:
>> On 10/10/2015 04:32 PM, Michael Paquier wrote:
>> I was arguing that it's an on-going task that would do
>> better if it had a TODO list, instead of "ideas for tests"
>> being scattered across 50-100 messages spanning a year or
>> more in one thread or another. You may disagree.
> 
> Let's be clear. I am fully in line with your point.
> 

I apologize -- that didn't came out right.
What I meant to suggest was "open an issue" to track
any works that needs to be done. But I guess that's
not the PG way.

Amir




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.

2015-10-11 Thread Amir Rohan

On 10/11/2015 01:19 PM, Michael Paquier wrote:
> On Sun, Oct 11, 2015 at 4:44 PM, Amir Rohan wrote:
>> On 10/11/2015 02:47 AM, Michael Paquier wrote:
>> I apologize -- that didn't came out right.
>> What I meant to suggest was "open an issue" to track
>> any works that needs to be done. But I guess that's
>> not the PG way.
> 
> No problem. I was not clear either. We could create a new item in the
> TODO list (https://wiki.postgresql.org/wiki/Todo) and link it to
> dedicated page on the wiki where all the potential tests would be
> listed.
> 

It couldn't hurt but also may be just a waste of your time.
I'm just realizing how central an issue tracker is to how I work and
how much not having one irritates me. Tough luck for me I guess.

Regards,
Amir



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: pg_confcheck - syntactic & semantic validation of postgresql configuration files

2015-10-12 Thread Amir Rohan

On 10/13/2015 02:02 AM, Robert Haas wrote:
> On Fri, Oct 9, 2015 at 4:38 PM, Amir Rohan  wrote:
>> It does catch bad syntax, but in most cases all you get is
>> "The setting could not be applied".  that's not great for enums
>> or a float instead of an int. I guess a future version will fix that
>> (or not).
> 
> I expect we would consider patches to improve the error messages if
> you (or someone else) wanted to propose such.  But you don't have to
> want to do that.
> 

>> You need a running server to run a check. You need to monkey
>> with said server's configuration in place to run a check. You must be on
>> 9.5+. The checking mechanism isn't extensible. Certainly not as easily
>> as dropping a new rule file somewhere. It doesn't check (AFAICT) for bad
>> combinations of values, for example it will tell you that you can't
>> change `wal_archive` without restart (without showing source location
>> btw, bug?), but not that you better set `wal_level` *before* you
>> restart.  It doesn't do any semantic checks. It won't warn you
>> about things that are not actually an error, just a bad idea.
> 
> So, I'm not saying that a config checker has no value.  In fact, I
> already said the opposite.  You seem to be jumping all over me here
> when all I was trying to do is explain what I think Tom was getting
> at.   I *do* think that pg_file_settings is a helpful feature that is
> certainly related to what you are trying to do, but I don't think that
> it means that a config checker is useless.  Fair?
> 

That wasn't my intention. Perhaps I'm overreacting to a long-standing
"Tom Lane's bucket of cold water" tradition. I'm new here.
I understand your point and I was only reiterating what in particular
makes the conf checker distinctly useful IMO, and what it could
provide that pg_settings doesn't.

I've looked at parts of the pg_settings implementation and indeed some
of that code (and legwork) could be reused so the mundane parts
of writing this will be less hassle. I might have missed that if Tom and
you hadn't pointed that out.

So, Fair, and thanks.

Amir



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: Memory prefetching while sequentially fetching from SortTuple array, tuplestore

2015-10-13 Thread Amir Rohan

On 10/11/2015 03:20 AM, Peter Geoghegan wrote:
> On Thu, Sep 3, 2015 at 5:35 PM, David Rowley
>  wrote:
>> My test cases are:
> 
> Note that my text caching and unsigned integer comparison patches have
> moved the baseline down quite noticeably. I think that my mobile
> processor out-performs the Xeon you used for this, which seems a
> little odd even taken the change in baseline performance into account.
> 

To add a caveat not yet mentioned, the idea behind  prefetching is to
scarifice spare memory bandwidth for performance. That can be
a winning bet on a quiet box (the one we benchmark on), but can backfire
on production db when the extra memory pressure can degrade all running
queries.  Something to test for, or at least keep in mind.

>> set work_mem ='1GB';
>> create table t1 as select md5(random()::text) from
>> generate_series(1,1000);
>>
>> Times are in milliseconds. Median and average over 10 runs.
>>
>> Test 1
>

I am the reluctant owner of outmoded hardware. Namely a core2 from
around 2007 on plain spinning metal. My results (linux 64bit):

--
Test 1
--
set work_mem ='1GB';
select count(distinct md5) from t1;

== Master ==
42771.040 ms <- outlier?
41704.570 ms
41631.660 ms
41421.877 ms

== Patch ==
42469.911 ms  <- outlier?
41378.556 ms
41375.870 ms
41118.105 ms
41096.283 ms
41095.705 ms

--
Test 2
--
select sum(rn) from (select row_number() over (order by md5) rn from t1) a;

== Master ==
44186.775 ms
44137.154 ms
44111.271 ms
44061.896 ms
44109.122 ms

== Patch ==
44592.590 ms
44403.076 ms
44276.170 ms

very slight difference in an ambiguous direction, but also no perf
catastrophe.

> It's worth considering that for some (possibly legitimate) reason, the
> built-in function call is ignored by your compiler, since GCC has
> license to do that. You might try this on both master and patched
> builds:
> 
> ~/postgresql/src/backend/utils/sort$ gdb -batch -ex 'file tuplesort.o'
> -ex 'disassemble tuplesort_gettuple_common' > prefetch_disassembly.txt
> 
> ...
> 
> Notably, there is a prefetchnta instruction here.
> 

I have verified the prefetech is emitted in the disassembly.

An added benefit of owning outmoded hardware is that the MSR for this
generation is public and I can disable individual prefetcher units by
twiddeling a bit.

Disabling the "HW prefetch" or the "DCU prefetch" units on a pacthed
version gave results that look relatively unchanged, which seems promising.
Disabling them both at once on an unpatched version shows a slowdown of
5-6% in test1 (43347.181, 43898.705, 43399.428). That gives an
indication of maximum potential gains in this direction, for this box
at least.

Finally, I notice my results are 4x slower than everyone else's.
That can be very tough on a man's pride, let me tell you.

Amir

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: pg_confcheck - syntactic & semantic validation of postgresql configuration files

2015-10-13 Thread Amir Rohan

On 10/13/2015 09:20 PM, Robert Haas wrote:
> On Mon, Oct 12, 2015 at 8:01 PM, Amir Rohan  wrote:
>> That wasn't my intention. Perhaps I'm overreacting to a long-standing
>> "Tom Lane's bucket of cold water" tradition. I'm new here.
>> I understand your point and I was only reiterating what in particular
>> makes the conf checker distinctly useful IMO, and what it could
>> provide that pg_settings doesn't.
>>
>> I've looked at parts of the pg_settings implementation and indeed some
>> of that code (and legwork) could be reused so the mundane parts
>> of writing this will be less hassle. I might have missed that if Tom and
>> you hadn't pointed that out.
>>
>> So, Fair, and thanks.
> 
> No worries. I'm not upset with you, and I see where you're coming
> from.  But I since I'm trying to be helpful I thought I would check
> whether it's working.  Sounds like yes, which is nice.
> 
> It would be spiffy if we could use the same config-file parser from
> outside postgres itself, but it seems hard.  I assume the core lexer
> and parser could be adapted to work from libcommon with some
> non-enormous amount of effort, but check-functions are can and do
> assume that they are running in a backend environment; one would lose
> a lot of sanity-checking if those couldn't be executed, 

I've been considering that. Reusing the parser would ensure no errors
are introduces by having a different implementation, but on the other
hand involving the pg build in installation what's intended as a
lightweight, independent tool would hurt.
Because it's dubious whether this will end up in core, I'd like
"pip install pg_confcheck" to be all that is required.

Also, anything short of building the tool in tree with an unmodified
parser amounts to forking the parser code and maintaining it along with
upstream, per version, etc'. I'm not eager to do that.

> and checking GUCs created by loadable modules seems impossible.  Still, even a
> partial move toward making this code accessible in frontend code would
> be welcome from where I sit.
> 

Dumping the output of the pg_settings view into json has already
provided all the type/enum information needed to type-check values and
flag unrecognized GUCs. I don't really see that as involving a heroic
amount of work, the language seems extremely simple.
There aren't that many types, type-checking them isn't that much work,
and once that's written the same checks should apply to all GUCs
registered with the server, assuming the pg_settings view is
exhaustive in that sense.

Any modules outside the main server can provide their own data
in a similar format, if it comes to that. I doubt whether such
a tool has that much appeal, especially if it isn't bundled with
the server.

So, I'll think about this some more, and write up a description of how I
think it's going to look for some feedback. Then I'll code something.

Regards,
Amir

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: pg_confcheck - syntactic & semantic validation of postgresql configuration files

2015-10-13 Thread Amir Rohan

On 10/14/2015 12:14 AM, Alvaro Herrera wrote:
> Amir Rohan wrote:
> 
>> I've been considering that. Reusing the parser would ensure no errors
>> are introduces by having a different implementation, but on the other
>> hand involving the pg build in installation what's intended as a
>> lightweight, independent tool would hurt.
>> Because it's dubious whether this will end up in core, I'd like
>> "pip install pg_confcheck" to be all that is required.
> 
> Maybe just compile a single file in a separate FRONTEND environment?
> 

You mean refactoring the postgres like rhass means? could you elaborate?

I believe most people get pg as provided by their distro or PaaS,
and not by compiling it. Involving a pg build environment is asking a
whole lot of someone who has pg all installed and who just wants to lint
their conf.

It's also legitimate for someone to be configuring postgres
without having a clue how to compile it.

If this was in core, this wouldn't be an issue because then packages
would also include  the tool. Note I'm not actually pushing for that,
it's that that has implications on what can be assumed as available.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: pg_confcheck - syntactic & semantic validation of postgresql configuration files

2015-10-13 Thread Amir Rohan

On 10/14/2015 01:12 AM, Alvaro Herrera wrote:
> Amir Rohan wrote:
>> On 10/14/2015 12:14 AM, Alvaro Herrera wrote:
>>> Amir Rohan wrote:
>>>
>>>> I've been considering that. Reusing the parser would ensure no errors
>>>> are introduces by having a different implementation, but on the other
>>>> hand involving the pg build in installation what's intended as a
>>>> lightweight, independent tool would hurt.
>>>> Because it's dubious whether this will end up in core, I'd like
>>>> "pip install pg_confcheck" to be all that is required.
>>>
>>> Maybe just compile a single file in a separate FRONTEND environment?
>>
>> You mean refactoring the postgres like rhass means? could you elaborate?
>>
>> I believe most people get pg as provided by their distro or PaaS,
>> and not by compiling it.
> 
> I mean the utility would be built by using a file from the backend
> source, just like pg_xlogdump does.  We have several such cases.
> I don't think this is impossible to do outside core, either.
> 

I've considered "vendoring", but it seems like enough code surgery
be involved to make this very dubious "reuse". The language is simple
enough that writing a parser from scratch isn't a big deal hard, and
there doesn't seem much room for divergent parsing either.
So, the only question is whether reusing the existing parser will
brings along some highly useful functionality beyond an AST and
a battle-tested validator for bools, etc'. I'm not ruling anything
out yet, though.

Amir



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: pg_confcheck - syntactic & semantic validation of postgresql configuration files

2015-10-13 Thread Amir Rohan

On 10/14/2015 01:16 AM, Andres Freund wrote:
> On October 13, 2015 11:14:19 PM GMT+02:00, Alvaro Herrera 
>  wrote:
>> Amir Rohan wrote:
>>
>>> I've been considering that. Reusing the parser would ensure no errors
>>> are introduces by having a different implementation, but on the other
>>> hand involving the pg build in installation what's intended as a
>>> lightweight, independent tool would hurt.
>>> Because it's dubious whether this will end up in core, I'd like
>>> "pip install pg_confcheck" to be all that is required.
>>
>> Maybe just compile a single file in a separate FRONTEND environment?
> 
> Maybe I'm missing something here - but doesn't the posted binary do nearly 
> all of this for you already? There's the option to display the value of a 
> config option, and that checks the validity of the configuration. Might need 
> to add an option to validate hba.conf et al as well and allow to display all 
> values. That should be pretty simple?
> 

Andres, please see upthread for quite a bit on what it doesn't do, and
why having it in the server is both an advantages and a shortcoming.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: pg_confcheck - syntactic & semantic validation of postgresql configuration files

2015-10-14 Thread Amir Rohan

On 10/14/2015 01:30 PM, Andres Freund wrote:
> On 2015-10-14 01:54:46 +0300, Amir Rohan wrote:
>> Andres, please see upthread for quite a bit on what it doesn't do, and
>> why having it in the server is both an advantages and a shortcoming.
> 
> As far as I have skimmed the thread it's only talking about shortcoming
> in case it requires a running server. Which -C doesn't.
> 

That's helpful, no one has suggested -C yet. This was new to me.

Here's what the usage says:

Usage:
  postgres [OPTION]...

Options:
  -C NAMEprint value of run-time parameter, then exit

```

P1. The fact that -C will warn about bad values when you query for an
unrelated name (as it requires you to do) is cunningly and successfully
elided. I expected to have to check every one individually. How about a
no-argument version of "-C"?

P2. Next, that it will print the value of a "run-time" parameter,
without an actual process is another nice surprise, which I wouldn't
have guessed.

The error messages are all one could wish for:

LOG:  invalid value for parameter "wal_level": "archive2"
HINT:  Available values: minimal, archive, hot_standby, logical.
LOG:  parameter "fsync" requires a Boolean value

*and* it prints everything it finds wrong, not stopping at the first
one. very nice.

it does fail the "dependent options" test:
$ postgres -C "archive_mode"
on
$ postgres -C wal_level
minimal

no errors, great, let's try it:
$ pg_ctl restart

FATAL:  WAL archival cannot be enabled when wal_level is "minimal"

> I don't think there's any fundamental difference between some external
> binary parsing & validating the config file and the postgres binary
> doing it. 

-C does a much better job then pg_file_settings for this task, I agree.
Still, it doesn't answer the already mentioned points of:
1) You need a server (binary, not a running server) to check a config
(reasonable), with a version matching what you'll run on (again, reasonable)
2) It doesn't catch more complicated problems like dependent options.
3) It doesn't catch bad ideas, only errors.
4) You can't easily extend the checks performed, without forking
postgres or going through the (lengthy, rigorous) cf process.
5) Because it checks syntax only, you don't get the benefits of having
an official place for the community to easily contribute rules that
warn you against config pitfalls, so that all users benefits.
See my OP for real-world examples and links about this problem.

> There *is* the question of the utility being able to to
> process options from multiple major releases, but I don't think that's a
> particularly worthwhile goal here.
> 

For one thing, I think it's been made clear that either few people know
about -C or few use it as part of their usual workflow. That in itself
is an issue. Any bloggers reading this?

Next, you may not agree semantic checks/advice is useful. I only
agree it's easy to dismiss until it saves your (i.e. a user's) ass
that first time.

I would *highly* recommend the talk mentioned in the OP:

"...Lag - What's Wrong with My Slave?"  (Samantha Billington, 2015)
https://youtu.be/If22EwtCFKc

Not just for the concrete examples (*you* know those by heart), but to
really hear the frustration in a real production user's voice when they
deploy pg in production, hit major operational difficulties, spend lots
of time and effort trying to root-cause and finally realize they simply
needed to "turn on FOO". Most of these problem can be caught very easily
with a conf linter.

Amir

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: pg_confcheck - syntactic & semantic validation of postgresql configuration files

2015-10-14 Thread Amir Rohan

On 10/14/2015 04:24 PM, Andres Freund wrote:
> On 2015-10-14 16:17:55 +0300, Amir Rohan wrote:
>> it does fail the "dependent options" test:
>> $ postgres -C "archive_mode"
>> on
>> $ postgres -C wal_level
>> minimal
> 
> Yea, because that's currently evaluated outside the config
> mechanism. It'd imo would be good to change that independent of this
> thread.
> 

I agree.

>> 5) Because it checks syntax only, you don't get the benefits of having
>> an official place for the community to easily contribute rules that
>> warn you against config pitfalls, so that all users benefits.
>> See my OP for real-world examples and links about this problem.
> 
> I don't think we as a community want to do that without review
> mechanisms in place, and I personally don't think we want to add
> separate processes for this.
> 

That's what "contribute" means in my book. I'm getting mixed signals
about what the "community" wants. I certainly think if adding rules
involves modifying the postgres server code, that is far too high
a bar and no one will.

I'm not sure what you mean by "separate process". My original
pitch was to have this live in core or contrib, and no one wanted
it. If you don't want it in core, but people thinks its a good idea to
have (with review), what would you suggest?

Amir

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: pg_confcheck - syntactic & semantic validation of postgresql configuration files

2015-10-14 Thread Amir Rohan

On 10/14/2015 05:35 PM, Andres Freund wrote:
> On 2015-10-14 16:50:41 +0300, Amir Rohan wrote:
>>> I don't think we as a community want to do that without review
>>> mechanisms in place, and I personally don't think we want to add
>>> separate processes for this.
>>>
>>
>> That's what "contribute" means in my book.
> 
> Then your argument about the CF process doesn't seem to make sense.
> 

Why? I ask again, what do you mean by "separate process"?
either it's in core (and follows its processes) or it isn't. But you
can't say you don't want it in core but that you also don't
want it to follow a "separate process".

I think you're simply saying you're -1 on the whole idea. ok.

Regards,
Amir

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: pg_confcheck - syntactic & semantic validation of postgresql configuration files

2015-10-14 Thread Amir Rohan

On 10/14/2015 05:55 PM, Andres Freund wrote:
> On 2015-10-14 17:46:25 +0300, Amir Rohan wrote:
>> On 10/14/2015 05:35 PM, Andres Freund wrote:
>>> Then your argument about the CF process doesn't seem to make sense.
> 
>> Why? I ask again, what do you mean by "separate process"?
> 
> Not going through the CF and normal release process.
> 
>> either it's in core (and follows its processes) or it isn't. But you
>> can't say you don't want it in core but that you also don't
>> want it to follow a "separate process".
> 
> Oh for crying out loud. You write:
> 

Andres, I'm not here looking for ways to quibble with you.
So, please "assume good faith".

>> 4) You can't easily extend the checks performed, without forking
>> postgres or going through the (lengthy, rigorous) cf process.
> 
> and
> 
>>> I don't think we as a community want to do that without review
>>> mechanisms in place, and I personally don't think we want to add
>>> separate processes for this.
> 
>> That's what "contribute" means in my book.
> 
> I don't see how those two statements don't conflict.
> 

Right.

I was saying that "contribute" always implies review before acceptance,
responding to the first half of your sentence. The second half
assumes it makes sense to discuss "review process" as a separate issue
from inclusion in core. It does not make sense, and I said so.

If you have a bone to pick with my comment about CF review being
lengthy, the points was that an independent tool can move more
quickly to accept submissions because:
1. there's less at stake
2. if it's written in a higher level language, enhancements
are easier.

Those don't hold when adding another check involves changes to the
`postgres` binary.

Fair?

Amir

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: pg_confcheck - syntactic & semantic validation of postgresql configuration files

2015-10-14 Thread Amir Rohan

On 10/14/2015 07:41 PM, David Fetter wrote:
>> On Wed, Oct 14, 2015 at 01:52:21AM +0300, Amir Rohan wrote:

>>
>> I've considered "vendoring", but it seems like enough code surgery
>> be involved to make this very dubious "reuse". The language is simple
>> enough that writing a parser from scratch isn't a big deal hard, and
>> there doesn't seem much room for divergent parsing either.
> 
> Such room as there is seems worth eliminating if possible.  

IMO, not If it means the tool will thus dodge minor potential
for harmless bugs, but becomes far more difficult to install
in the process.

> There's even a formal name for this issue, which attackers can use, although
> the implications as a source of subtle bugs in the absence of an
> attacker seem more pertinent right now.
> 
> https://www.google.com/?q=parse%20tree%20differential%20attack
> 

Interesting, but the security implications of a "parser attack" or
indeed different parsing behavior in some corner cases, are roughly
nil in this case.

>> So, the only question is whether reusing the existing parser will
>> brings along some highly useful functionality beyond an AST and
>> a battle-tested validator for bools, etc'. I'm not ruling anything
>> out yet, though.
> 
> I suspect that having a single source parser, however painful now,
> will pay large dividends later.
> 

I don't think anything is more important for a productivity tool then
being easily installed and easy to use.
It's a fact of life that all software will have bugs. If they matter,
you fix them.

Still, if there are more concrete benefits to adapting the parser, such
as batteries-included features I'm simply unaware of, I'd love to hear
about it.

Thanks,
Amir

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: pg_confcheck - syntactic & semantic validation of postgresql configuration files

2015-10-14 Thread Amir Rohan

 IOn 10/14/2015 08:45 PM, Alvaro Herrera wrote:
> Amir Rohan wrote:
> 
>> it does fail the "dependent options" test:
>> $ postgres -C "archive_mode"
>> on
>> $ postgres -C wal_level
>> minimal
>>
>> no errors, great, let's try it:
>> $ pg_ctl restart
>>
>> FATAL:  WAL archival cannot be enabled when wal_level is "minimal"
> 
> This complaint could be fixed we had a --check-config that runs the
> check hook for every variable, I think.  I think that closes all the
> syntax checks you want; then your tool only needs to worry about
> semantic checks, and can obtain the values by repeated application of -C
>  (or, more conveniently, have a new -C mode that takes no args
> and prints the names and values for all parameters).
> 

That sounds reasonable and convenient. It's important to have one
tool that covers everything, so I'd have to call "-C" and parse
the errors, but that seems possible.

If there was a flag which produced something more like the output of the
pg_settings view in a structured format, as well as the errors, that
would be ideal. And possibly useful for other tool building.
Would such a thing be contrary to the pg spirit?

Amir

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

38 matches

Mail list logo