[GENERAL] Feature request: separate logging

2016-11-18 Thread otheus uibk
A glaring weakness in Postgresql for production systems is that the
administrator has no way of controlling what types of logs go where. There
are at least two types of logs: errors and statement logs. (I could also
add: connection, syntax error, query duration, audit). It has becomes
increasingly important in the Linux world, with its over reliance on
systemd, that we admins be able to distinguish between these log outputs.
Systemd wants to control all the daemons and services run on a Linux host.
To do this effectively, it needs to capture the stdout and stderr or
possibly parse the external logfile of the daemon it spawns. The benefits
of systemd's journald subsystem include being able to automatically
identify daemons that are failing to start properly. (The merits of systemd
are beyond the scope of this discussion; it is now the all-but-ubiquitous
standard of linux distributions, and has become nearly intractable).

The Postgresql startup process could greatly benefit from systemd. As it is
now, PGDG distributes postgresql with init scripts which really do very
little to check if postgresql bombed shortly after startup. I have improved
upon that script to do some smart detection of a failed startup, but the
result is having two different files to monitor. On the one hand, I want to
use postgresql's built-in file or CSV logging. On the other, I need to
check the output of the startup process. These logs have different formats,
especially for the smart DBA who wants to customize the log prefix with a
more usable timestamp and other fields. Logging to external files is nice
because postgresql rotates the logs for us automatically. (External log
rotation is problematic and risky because it requires a configuration
reload which may lead to undesirable side-effects. Right?)

One alternative is to capture everything into the local logging system, or
to send all to stdout/stderr and capture this with journald (or runit's
svlogd or something). But then you have the following problem: if I enable
statement-level logging or set log_min_duration_statement=0, the in-memory
journald will quickly be overrun with statement logs. The journald
subsystem will then become useless to the other daemons and subsystems.

One solution is proposed in these forums: https://www.
postgresql.org/message-id/flat/etPan.53397e77.643c9869.
1a0%40palos#etPan.53397e77.643c9869.1a0@palos
While pgbadger has some nice features, it doesn't really solve the problem
of allowing postgresql to be used with systemd+journald.

What I do today is to configure postgresql to write csvlogs. Stdout/stderr
are captured by journald. A custom perl script with the Text::CSV module
and tail -F semantics continuously processes the csvlog file, ignores
query, dml, and detail log lines,  and sends the rest via syslog() (which
journald then handles).

It's not the right way.

I would like to see postgresql to have the ability to

   1. Write to a csvlog with one set of selectors
   2. Write to stdout/stderr a different set of selectors (no statement, no
autovacuum, etc) using a purely line-oriented output that

   2.1. has the kind of detail contained in the CSV.  Currently, the
log-prefix option does not offer some of the information provided in the
CSV logs. Really, the CSV log should simply be an implementation of the
log-prefix.
   2.2. Collapses multi-lined queries into one line (newlines and tabs
are escaped with backslashes or the x1B character).

Finally, if these changes can be implemented, is it impossible to backport
them to prior versions, say 9.1 and up? If I wrote a patch, under what
conditions would the patch be accepted for inclusion in official releases
of older versions?


-- 
Otheus
otheus.u...@gmail.com
otheus.shell...@uibk.ac.at


Re: [GENERAL] Unexpected trouble from pg_basebackup

2016-10-04 Thread otheus uibk
After a 3 to 4 minute delay, pg_basebackup started doing it's thing and
finished within a few minutes. So now the question is: why the startup
delay?


[GENERAL] Unexpected trouble from pg_basebackup

2016-10-04 Thread otheus uibk
I recently updated my systems from pg 9.1.8 to 9.5.3. A pg_dumpall was used
to migrate the data. Now I'm trying to re-establish replication between
master and slave. I'm getting stuck.

When I run pg_basebackup (via a script which worked flawlessly on 9.1.8,
AND via command line, ala "manual mode"), it creates the PGDATA directory
structure, some files, but soon gets stuck. It writes fewer than 40 MB of a
20GB database. Further, using tcpdump to watch packets on the wire, I
confirm no data is going across.

However, it's clear a connection has been made and the process was started.
On the master, I see the process table:

postgres: wal sender process pgsync A.B.C.D(42821) sending backup

Meanwhile, on the client, I see no output updates. Here's the command:

pg_basebackup -x -P -v -D $PGDATA -w

PGUSER, PGPASSWORD and PGHOST are set accordingly. The User is a user with
the REPLICATION attribute. Just in case, I dropped and re-created the user.

So that's question A.

Question B is related. In attempting to verify that the permissions and HBA
were set correctly, I attempted to do a pg_dump using the same PGUSER and
PGHOST. What I got surprised me:

  $ pg_dump -s onyxeditor
  pg_dump: [archiver (db)] query failed: ERROR:  permission denied for
relation licence
  pg_dump: [archiver (db)] query was: LOCK TABLE public.licence IN ACCESS
SHARE MODE

This is the replication user. Other schemas worked just fine, but not this
one. Is this expected behavior?

 Kind regards,
 Otheus

-- 
Otheus
otheus.u...@gmail.com
otheus.shell...@uibk.ac.at


Re: [GENERAL] How to Qualifying or quantify risk of loss in asynchronous replication

2016-03-19 Thread otheus uibk
On Wednesday, March 16, 2016, Thomas Munro <thomas.mu...@enterprisedb.com>
wrote:
> In asynchronous replication, the primary writes to the WAL and flushes
the disk.  Then, for any standbys that happen to be connected, a WAL sender
 process trundles along behind feeding new WAL doesn the socket as soon as
it can, but it can be running arbitrarily far behind or not running at all
(the network could be down or saturated, the standby could be temporarily down
or up but not reading the stream fast enough, etc etc).

Thanks for your help on finding the code. To be more precise, in the 9.1.8
code, I see this:

 1. [backend] WAL is flushed to disk
 2. [backend] WAL-senders are sent  SIGUSR1 to wake up
 3. [backend] wait for responses from other SyncRep-Receiver, effectively
skipped if none
 [wal-sender] wakes up
 4. [backend] end-of-xact cycle
 [wal-sender] reads WAL (XLogRead) up to MAX_SEND_SIZE (or less) bytes
 5. [backend] ? is there an ACK send to client?
 [wal-sender] sends chunk to WAL-receiver using the
pq_putmessage_noblock call
 6. [wal-sender] repeats reading-sending loop

So if the WAL record is bigger than whatever MAX_SEND_SIZE is (in my
source, I seek 8k * 16 = 128 kB, so 1 Mb (roughly)), the WAL may end up
sleeping (between iterations of 5 and 6).

On Wed, Mar 16, 2016 at 10:21 AM, otheus uibk <otheus.u...@gmail.com> wrote:

> Section 25.2.5. "The standby connects to the primary, which streams WAL
> records to the standby as they're generated, without waiting for the WAL
> file to be filled."

 Section 25.2.6 "If the primary server crashes then some transactions that
> were committed may not have been replicated to the standby server, causing
> data loss. The amount of data loss is proportional to the replication delay
> at the time of failover."


Both these statements, then, from the documentation perspective, are
incorrect, at least to a pedant. For 25.2.5, The primary streams WAL
records to the standby after they've been flushed to disk but without
waiting for the file to be filled. For 25.2.6 it's not clear: some
transactions that were *written* to the local WAL and reported as committed
but not yet *sent* to the standby server is possible.

Somehow, the documentation misleads (me) to believe the async replication
algorithm at least guarantees WAL records are *sent* before responding
"committed" to the client. I now know this is not the case. *grumble*.

How can I help make the documentation clearer on this point?

-- 
Otheus
otheus.u...@gmail.com
otheus.shell...@uibk.ac.at


Re: [GENERAL] How to Qualifying or quantify risk of loss in asynchronous replication

2016-03-16 Thread otheus uibk
Apologies for the double-reply... This is to point out the ambiguity
between the example you gave and stated documentation.

On Wednesday, March 16, 2016, Thomas Munro 
wrote:

>
> Waiting for the transaction to be durably stored (flushed to disk) on
> two servers before COMMIT returns means that you can avoid this
> situation:
>
> 1.  You commit a transaction, and COMMIT returns as soon as the WAL is
> flushed to disk on the primary.
> 2.  You communicate a fact based on that transaction to a third party
> ("Thank you Dr Bowman, you are booked in seat A4, your reservation
> number is JUPITER123").
> 3.  Your primary computer is destroyed by a meteor, and its WAL sender
> hadn't yet got around to sending that transaction to the standby


Section 25.2.5. "The standby connects to the primary, which streams WAL
records to the standby as they're generated, without waiting for the WAL
file to be filled."

This suggests that the record is on the network stack possibly before a
flush to disk.

 Section 25.2.6 "If the primary server crashes then some transactions that
were committed may not have been replicated to the standby server, causing
data loss. The amount of data loss is proportional to the replication delay
at the time of failover."

Whence this replication delay? If the standby server is caught up and
streaming asynchronously, what delays *in receiving* might there be other
than network delays?

Note: I am totally unconcerned with the possibility that both primary and
standby go down at the same time.


-- 
Otheus
otheus.u...@gmail.com
otheus.shell...@uibk.ac.at


Re: [GENERAL] How to Qualifying or quantify risk of loss in asynchronous replication

2016-03-16 Thread otheus uibk
Thomas, thanks for your input... But I'm not quite getting the answer I
need

> But what precisely is the algorithm and timing involved with streaming
> WALs?
> >
> > Is it:
> >   * client issues COMMIT
> >   * master receives commit
> >   * master processes transaction internally
> >   * master creates WAL record
> >   | master appends WAL to local WAL file, flushes disk
> >   | master sends WAL record to all streaming clients
> >   * master sends OK to client
> >   * master applies WAL
> >
> > So is this correct? Is it correct to say: PG async guarantees that the
> WAL
> > is *sent* to the receivers, but not that they are received, before the
> > client receives acknowledgement?
>
> Async replication doesn't guarantee anything at all about receivers,
> or even that there is one right at this moment.  Did you mean to write
> "synchronous" instead of "asynchronous"?


I'm only concerned with async (for this thread).


In asynchronous replication,
> the primary writes to the WAL and flushes the disk.  Then, for any
> standbys that happen to be connected, a WAL sender process trundles
> along behind feeding new WAL doesn the socket as soon as it can, but
> it can be running arbitrarily far behind or not running at all (the
> network could be down or saturated, the standby could be temporarily
> down or up but not reading the stream fast enough, etc etc).



This is the *process* I want more detail about. The question is the same as
above:
> (is it true that) PG async guarantees that the WAL
> is *sent* to the receivers, but not that they are received, before the
> client receives acknowledgement?

But I will refine what I mean by "sent"... does PostgreSQL write the WAL to
the socket and  flush the socket before acknowledging the transaction to
the client? Does it *always* do this? Or does it make a best effort? Or
does the write to the socket and return to client happen asynchronously?

I realize that the data might not be *seen* at the client, i realize
network buffers may take time to reach the network, I realize various
levels of synchronous replication provide higher guarantees.  But For the
purposes of this topic, I'm interest to know what PG actually does. I can't
tell that from the documentation (because it is not clearly stated and
because it is self contradictory).






-- 
Otheus
otheus.u...@gmail.com
otheus.shell...@uibk.ac.at


Re: [GENERAL] Replaying xlogs from beginning

2016-02-25 Thread otheus uibk
> You're assuming that the only significant aspect of initdb's output that
can vary from run to run is the database system ID.

I prefer to call it "optimistic prediction". But yes. :)

> If you're lucky this technique will work, but it's not reliable and not
supported.  You really need to take an initial base backup after running
initdb.

Thanks, but it's too late in retrospect.  Since that is the case, to whom
can I make a feature request that initdb have an implicitly enabled option
to make this base backup?

OK, back to question. Provided the system is the same, the postgresql
executable and libraries are identical, the locales are unchanged, we have:

1.  bitwise variances in the system catalog

-- can be solved with David's suggestion of full dump/restore after the
steps I listed. Confer?

2.  other things you couldn't think of yet.

-- anything come to mind yet?


-- 
Otheus
otheus.u...@gmail.com
otheus.shell...@uibk.ac.at


Re: [GENERAL] Replaying xlogs from beginning

2016-02-17 Thread otheus uibk
I came up with an answer to the _second_ question (how do I do this from a
new instance?).

In the new instance directory:
1. Hack the system ID in the global/pg_control file to that of the original
instance.
   1a. Use pg_controlinfo to get the hex version of the control id:
 $ pg_controldata | perl -l -n -e 'if (/^Database system
identifier:\s+(\d+)$/) { $sysid=sprintf("%x",$1);print join(" ",reverse
$sysid=~m/../g); exit; }'
 fb fb 59 56 b9 31 58 53
   1b. Use a hex editor (vim with xxd / xxd -r will do) to replace the
first bytes with these values.

2. The new control file will be unusable because its CRC won't match. Fix
that with pg_resetlog.
$ pg_resetlog -f

3. Copy the very first pg_xlog file into place. The rest can be copied by
hand or found using whatever you use for the "restore_command"

4. Create recovery.conf file.

Start instance.

Again, this worked for me. What I want to be sure of is: does this really
work? And I still want to know: can I do this without creating a new
instance?


[GENERAL] Replaying xlogs from beginning

2016-02-17 Thread otheus uibk
I'm looking for answers to this question, but so far haven't turned up a
usable answer. Perhaps I'm asking it the wrong way.

I want to replay the xlogs from the beginning of time up until a particular
time. The problem is, the time is before the first base backup. But I have
all the xlogs since the database was initialized.

To test this, I initialized a new DB and put a few xlogs in its pg_xlog
directory, and I create a simple recovery.conf ("restore_command=false").
When I start it up, I get something like "DETAIL: WAL file database system
identifier is 6221786353392811102, pg_control database system identifier is
6252279422905597461."

The most important question is: How do I recover to PIT, starting from
initialization, without a base backup, provided I have all xlog files.
The secondary question is: How can I test this with a new instance?

-- 
Otheus
otheus.u...@gmail.com
otheus.shell...@uibk.ac.at


[GENERAL] pg_start_backup does not actually allow for consistent, file-level backup

2015-06-08 Thread otheus uibk
The manual and in this mailing list, the claim is made that consistent,
file-level backups may be made by bracketing the file-copy operation with
the postgresql pg_start_backup and pg_stop_backup operations.  Many people
including myself have found that in some circumstances, using tar to copy
these files will result in an error if one of the data files changes during
the tar operation. The responses to those queries on this mailing list are
unsatisfactory (everything is fine, trust us).

For example:
http://www.postgresql.org/message-id/flat/aanlktikg70ip91tpuwvuf9gb0xts4fukakm4qxltw...@mail.gmail.com#aanlktikg70ip91tpuwvuf9gb0xts4fukakm4qxltw...@mail.gmail.com

and
http://www.postgresql.org/message-id/flat/201004201350.o3kdokr02...@momjian.us#201004201350.o3kdokr02...@momjian.us
(quoted below)

On Fri, Apr 16, 2010 at 11:55 AM, raghavendra t
 raagavendra(dot)rao(at)gmail(dot)com wrote:
  Hi All,
 
  For some setups reason, i started taking Hot backup. In this course I
 have first issued pg_start_backup('backup') and went to the data directory
 for backing up in OS format using the command tar -cf backup.tar  /data .
 When i issued this command , tar was generating some errors as show below.
 
  bash-3.00# tar -cf 16aprilstandby.tar /db-data/
  tar: Removing leading `/' from member names
  tar: /db-data/base/24643/151667: file changed as we read it
  tar: /db-data/base/24643/151764.2: file changed as we read it
  tar: /db-data/base/24643/151766: file changed as we read it
  tar: /db-data/base/24643/151768: file changed as we read it
  tar: /db-data/base/66412/151969: file changed as we read it
 
  After sometime tar has ended and i also issued pg_stop_backup() and i
 continued the process.
 
  My question here is, is this errors generated by tar are to worrisome or
 whats happening in the backend.
  Is tar file is safewell to use. Could you please tell me.
 Those are not errors, they are warnings. As long as you use
 pg_start_backup() and pg_stop_backup() before and after the tar, they
 are perfectly harmless, and can be ignored.


The above scenario is exactly what I saw, albeit with less frequency and
severity.

I decided to test this claim that these messages are perfectly harmless
and can be ignored:

1. I executed pg_start_backup() on server
2. Ran md5sum recursively through PG's data directories
3. waited a split second
4. Ran md5sum recursively through PG's data directories as in step 2
5. Compared output from #2 and #4

As you can see below, there were non-zero changes made to these files.

 a1a571bfd1e4a98b20245edbdfce6d9a  /var/lib/pgsql/data/base/41514/809275
---
 21de5b864019c96c55e81a38fa1c9ccf  /var/lib/pgsql/data/base/41514/809275
1783c1783
 8eb4a578ecb56667e1698174f89c462c  /var/lib/pgsql/data/base/41514/809280
---
 b4c7b4ef30dda9543181465f53a85d72  /var/lib/pgsql/data/base/41514/809280

Such changes occurred EVEN WHEN TAR DID NOT WARN of changed files. Further,
when step 3 involved an actual backup, involving minutes, not milliseconds,
dozens of differences to files in data/base/... are reported. To be clear,
I excluded from consideration all files in pg_xlog, pg_clog, pg_subtrans,
pg_stat_tmp.

If these files are changing during the pg_start_backup() and
pg_stop_backup, then exactly what is their purpose? Might they be changing
during the tar, as tar thinks? How may an operator be assured the snapshot
is consistent (unless one stops the databases)?  Will the redo logs restore
the files to a consistent state, no matter when these files are changed? I
find it hard to believe that would be the case.

This test was performed using Postgresql 9.1.8. A scan of the CHANGELOG
since then indicates that if this is a bug, it has not been reported as
fixed.



-- 
Otheus
otheus.u...@gmail.com
otheus.shell...@uibk.ac.at


Re: [GENERAL] pg_start_backup does not actually allow for consistent, file-level backup

2015-06-08 Thread otheus uibk
Thank you, all.  The manual for 9.4 is indeed clearer on this point than
the 9.1 version.


Re: [GENERAL] pg_start_backup does not actually allow for consistent, file-level backup

2015-06-08 Thread otheus uibk
On Mon, Jun 8, 2015 at 3:13 PM, otheus uibk otheus.u...@gmail.com wrote:

 Thank you, all.  The manual for 9.4 is indeed clearer on this point than
 the 9.1 version.


Just to nit-pick, I see nowhere in either version of the manual the
indication that it is normal for postgresql to continue to update files in
its data catalog between pg_start_backup and pg_stop_backup. The closest I
see comes in this paragraph:

** Some file system backup tools emit warnings or errors if the files they
are trying to copy change while the copy proceeds. When taking a base
backup of an active database, this situation is normal and not an error.

Does this situation refer to the tools emitting warnings or to the fact
that postgresql is updating the files? It might be the case, for instance,
that timestamps are updated but not the contents of the files (this is what
I had assumed prior to today).

-- 
Otheus
otheus.u...@gmail.com
otheus.shell...@uibk.ac.at