[HACKERS] Proposal: ALTER EXTENSION SET OPTION

2017-11-08 Thread Chris Travers
Hi all;

One of the annoyances we currently deal with regarding analytics extensions
in a PG environment with mixed versions is there is no facility right now
to conditionally support certain modifications to functions that might be
required to make certain features work properly.

The case that comes to mind right now is in marking some functions parallel
safe for PostgreSQL 9.6 and above while taking no action for 9.5.
Currently we have a few options:

1.  Mark on a best effort basis
2,  Drop support for 9.5 and below
3.  Come up with some much more complicated version graph.

It would be very nice to be able to define some options which could be set
for extensions but don't affect their arguments or return types, such as
marking functions parallel-safe and then have scripts which run to define
these functions.

My thinking is one would have a syntax for a few specified options, such as:

ALTER EXTENSION foo SET OPTION PARALLEL SAFETY;
or if the extension supports the reverse:
ALTER EXTENSION foo UNSET OPTION PARALLEL SAFETY;

Over time more options could be added, but would be mapped to a file
convention.  In this case, we could expect:

foo--[version]--set-parallel-safety.sql and
foo--[version]--unset-parallel-safety.sql

-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] proposal: schema variables

2017-11-03 Thread Chris Travers
Some thoughts on this.

On Thu, Nov 2, 2017 at 4:48 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Nico Williams <n...@cryptonector.com> writes:
> > With access controls, GUCs could become schema variables, and settings
> > from postgresql.conf could move into the database itself (which I think
> > would be nice).
>
> People re-propose some variant of that every so often, but it never works,
> because it ignores the fact that some of the GUCs' values are needed
> before you can access system catalogs at all, or in places where relying
> on system catalog access would be a bad idea.
>

I think the basic point one should get here is that no matter the
unification, you still have some things in the db and some things out.

I would rather look at how the GUC could be improved on a functional/use
case level before we look at the question of a technical solution.

 One major use case today would be restricting how high various users can
set something like work_mem or the like.  As it stands, there isn't really
a way to control this with any granularity.  So some of the proposals
regarding granting access to a session variable would be very handy in
granting access to a GUC variable.

>
> Sure, we could have two completely different configuration mechanisms
> so that some of the variables could be "inside the database", but that
> doesn't seem like a net improvement to me.  The point of the Grand Unified
> Configuration mechanism was to be unified, after all.
>

+1

>
> I'm on board with having a totally different mechanism for session
> variables.  The fact that people have been abusing GUC to store
> user-defined variables doesn't make it a good way to do that.
>

What about having a more clunky syntax as:

SET VARIABLE foo='bar';

Perhaps one can have a short form of:

SET VAR foo = 'bar';

vs

SET foo = 'bar'; -- GUC



>
> regards, tom lane
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] Oracle to PostGre

2017-11-01 Thread Chris Travers
As a brief note, this is probably not the best list for this.  You would do
better to ask questions like this on -general where you have more
application developers and so forth.  This is more of an SQL question so
asking people who are hacking the codebase may not be the best way to get
it answered.

Also, it is Postgres or PostgreSQL.  People will assume you are totally new
if you call it Postgre.

On Wed, Nov 1, 2017 at 12:55 PM, Brahmam Eswar <brahmam1...@gmail.com>
wrote:

> Hi,
>
> App is moving to Postgre from Oracel . After migrating the store procedure
> is throwing an error with collection type.
>
> *Oracle :*
>
> create or replace PROCEDURE"PROC1"
>  (
>
>  , REQ_CURR_CODE IN VARCHAR2
>  , IS_VALID OUT VARCHAR2
>  , ERROR_MSG OUT VARCHAR2
>  ) AS
>
>
>TYPE INV_LINES_RT IS RECORD(
>  VENDOR_NUM AP.CREATURE_TXN_LINE_ITEMS.VENDOR_NUM%TYPE,
>  VENDOR_SITE_CODE AP.CREATURE_TXN_LINE_ITEMS.
> VENDOR_SITE_CODE%TYPE,
>  INVOICE_NUM AP.CREATURE_TXN_LINE_ITEMS.INVOICE_NUM%TYPE,
>  TXN_CNT NUMBER
>);
>TYPE INV_LINES_T IS TABLE OF INV_LINES_RT;
>L_INV_LINES INV_LINES_T;
>IS_MULTI_VENDOR FINO_APRVL_BUS_CHANN_DEFAULTS.
> MULTI_VENDOR_REQ_ALLOWED%TYPE;
>BUS_CHANNEL_RECORD FINO_APRVL_BUS_CHANN_DEFAULTS%ROWTYPE;
> CAL_APRVL_AMT_BY_TOTAL_AMT FINO_APRVL_BUS_CHANN_DEFAULTS.
> CAL_APR_AMT_BY_INV%TYPE;
>
>
> Postgre :
>
> create or replace FUNCTION AP.VALIDATE_CRTR_LINE_ITEMS
> (
> REQ_CURR_CODE IN VARCHAR,
> IS_VALID OUT VARCHAR,
> ERROR_MSG OUT VARCHAR
> ) AS $$
>
> DECLARE
>
> INV_LINES_T ap.validate_crtr_line_items$inv_lines_rt ARRAY;
> L_INV_LINES INV_LINES_T%TYPE;
> L_INV_LINES$temporary_record ap.validate_crtr_line_items$inv_lines_rt;
> IS_MULTI_VENDOR AP.FINO_APRVL_BUS_CHANN_DEFAULTS.MULTI_VENDOR_REQ_
> ALLOWED%TYPE;
> BUS_CHANNEL_RECORD ap.fino_aprvl_bus_chann_defaults%ROWTYPE;
>  CAL_APRVL_AMT_BY_TOTAL_AMT AP.FINO_APRVL_BUS_CHANN_
> DEFAULTS.CAL_APR_AMT_BY_INV%TYPE;
>
>
> but it's throwing an error as : 0 SQLState: 42P01 Message: ERROR:
> relation "l_inv_lines" does not exist
> --
> Thanks & Regards,
> Brahmeswara Rao J.
>

When you ask on -general, please include the query which is actually
causing the problem.  My guess is that either you didn't declare the type
properly or there is some other error in your function, but the information
provided is not sufficient to answer it.

Best or luck asking on -general.

-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] WIP: Restricting pg_rewind to data/wal dirs

2017-11-01 Thread Chris Travers
Attached is the patch as submitted for commitfest.

Please note, I am not adverse to adding an additional --Include-path
directive if that avoids backwards-compatibility problems.  However the
patch is complex enough I would really prefer review on the rest of it to
start first.  This doesn't strike me as perfectly trivial and I think it is
easier to review pieces separately.  I have already started on the
--Include-path directive and could probably get it attached to a later
version of the patch very soon.

I would also like to address a couple of important points here:

1.  I think default restrictions plus additional paths is the best, safest
way forward.  Excluding shell-globs doesn't solve the "I need this
particular config file" very well particularly if we want to support this
outside of an internal environment.  Shell globs also tend to be overly
inclusive and so if you exclude based on them, you run into a chance that
your rewind is corrupt for being overly exclusive.

2.  I would propose any need for an additional paths be specified using an
--Include-path directive.  This could be specified multiple times and could
point to a file or directory which would be added to the paths rewound.  I
have a patch for this mostly done but I am concerned that these sorts of
changes result in a combination of changes that are easier to review
separately than together.  So if needed, I can add it or in a separate
patch following.

3.  I think it would be a mistake to tie backup solutions in non-replicated
environments to replication use cases, and vice versa.  Things like
replication slots (used for streaming backups) have different
considerations in different environments.  Rather than build the same
infrastructure first, I think it is better to support different use cases
well and then build common infrastructure to support the different cases.
I am not against building common infrastructure for pg_rewind and
pg_basebackup.  I am very much against having the core guarantees being the
exact same.

Best Wishes,
Chris Travers

On Sat, Oct 28, 2017 at 1:22 PM, Chris Travers <chris.trav...@adjust.com>
wrote:

> Hi;
>
> There are still some cleanup bits needed here but I wanted to get feedback
> on my general direction.
>
> I hope to submit for commit fest soon if the general feedback is good.
> Tests are passing (with adjustments intended for change of behaviour in one
> test script).  I want to note that Crimson Thompson (also with Adjust) has
> provided some valuable discussion on this code.
>
> The Problem:
>
> pg_rewind makes no real guarantees regarding the final state of non-data
> files or directories.  It.checks to see if the timeline has incremented
> (and therefore guarantees that if successful the data directories are on
> the same timeline) but for non-data files, these are clobbered if we rewind
> and left intact if not.  These other files include postgresql.auto.conf,
> replication slots, and can include log files.
>
> Copying logs over to the new slave is something one probably never wants
> to do (same with replication slots), and the behaviours here can mask
> troubleshooting regarding what a particular master failed, cause wal
> segments to build up, automatic settings changes, and other undesirable
> behaviours.  Together these make pg_rewind very difficult to use properly
> and push tasks to replication management tooling that the management tools
> are not in a good position to handle correctly.
>
> Designing the Fix:
>
> Two proposed fixes have been considered and one selected:  Either we
> whitelist directories and only rewind those.  The other was to allow shell
> globs to be provided that could be used to exclude files.  The whitelisting
> solution was chosen for a number of reasons.
>
> When pg_rewind "succeeds" but not quite correctly the result is usually a
> corrupted installation which requires a base backup to replace it anyway.
> In a recovery situation, sometimes pressures occur which render human
> judgment less effective, and therefore glob exclusion strikes me as
> something which would probably do more harm than good, but maybe I don't
> understand the use case (comments as to why some people look at the other
> solution as preferable would be welcome).
>
> In going with the whitelisting solution, we chose to include all
> directories with WAL-related information.This allows more predicable
> interaction with other parts of the replication chain.  Consequently not
> only do we copy pg_wal and pg_xact but also commit timestamps and so forth.
>
> The Solution:
>
> The solution is a whitelist of directories specified which are the only
> ones which are synchronised.  The relevant part of this patch is:
>
> +/* List of directories to synchronize:
>
> + * base data dirs (a

Re: [HACKERS] Patch: restrict pg_rewind to whitelisted directories

2017-11-01 Thread Chris Travers
On Tue, Oct 31, 2017 at 1:38 PM, Robert Haas <robertmh...@gmail.com> wrote:

> On Mon, Oct 30, 2017 at 6:44 PM, Chris Travers <chris.trav...@adjust.com>
> wrote:
> > The attached patch is cleaned up and filed for the commit fest this next
> > month:
>
> It's generally better to post the patch on the same message as the
> discussion thread, or at least link back to the discussion thread from
> the new one.  Otherwise people may have trouble understanding the
> history.
>

Fair point.  Mostly concerned about the WIP marker there. This is my first
time around this.

I will then retire this thread and attach the patch on the other one.

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>



-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


[HACKERS] Anyone have experience benchmarking very high effective_io_concurrency on NVME's?

2017-10-31 Thread Chris Travers
Hi;

After Andres's excellent talk at PGConf we tried benchmarking
effective_io_concurrency on some of our servers and found that those which
have a number of NVME storage volumes could not fill the I/O queue even at
the maximum setting (1000).

Before we start benchmarking patched versions of PostgreSQL to bump this
setting up to ever higher values, I am wondering if anyone has done this
yet and if so if you would be willing to share results.

-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


[HACKERS] Patch: restrict pg_rewind to whitelisted directories

2017-10-30 Thread Chris Travers
The attached patch is cleaned up and filed for the commit fest this next
month:

Here's the full commit message via Mercurial.  I will likely have a new
branch per version the patch since that's the closest thing to a rebase in
this version control system.

changeset:   60492:47f87a2d2fa1

tag: mine/pg_rewind_restrict_dirs

parent:  60446:e638ba9c3c11

user:Chris Travers <chris.trav...@gmail.com>

date:Mon Oct 30 12:25:18 2017 +0100

files:   doc/src/sgml/ref/pg_rewind.sgml src/bin/pg_rewind/copy_fetch.c
src/bin/pg_rewind/fetch.c src/bin/pg_rewind/fetch.h
src/bin/pg_rewind/libpq_fetch.c src/bin/pg_rewind/pg_rewind.c
src/bin/pg_rewind/t/003_extrafiles.pl

description:

Restrict pg_rewind to whitelisted directories.


This is intended to be a minimum working version and in fact builds and
passes tests.

Note that tests for extra files have been changed to reflect new behavior
and additional

debugging informnation added in to output in case of failure.


The patch iterates through a series of set directories to synchronize them
only.  This improves

predictability of the complete state of the system after a rewind.


One important outstanding question here is whether we need to ensure the
possibility of backing

up other files if they exist via an --include-path command line switch
(this would not be a glob).

In the thread discussing this patch, Michael Paquier has expressed concern
about configuration

files created by extensions or other components not being copied.  I could
add such a switch but

the patch is long enough, and it is unclear enough to the extent this is
needed at present, so

I am leaving it at the reviewer's discretion whether I should add this here
or submit a second

patch later to add the ability to add additional paths to the filemap.

Either way, it is worth noting that I expect to have a subsequent patch
either incorporated here or in a further submission that takes this and
adds the ability to include additional directories or files via a command
line flag.  This will *not* be a shell glob but one directory or file per
invocation of the switch (similar to -t in pg_dump).

-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


pg_rewind_restrict_dirs.v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Restricting pg_rewind to data/wal dirs

2017-10-30 Thread Chris Travers
On Mon, Oct 30, 2017 at 11:36 AM, Michael Paquier <michael.paqu...@gmail.com
> wrote:

> On Mon, Oct 30, 2017 at 10:15 AM, Chris Travers
> <chris.trav...@adjust.com> wrote:
> > This also brings up a fairly major concern more generally about control
> by
> > the way.  A lot of cases where pg_rewind is called, the user doesn't
> > necessarily have much control on how it is called.  Moreover in many of
> > these cases, the user is probably not in a position to understand the
> > internals well enough to grasp what to check after.
>
> Likely they are not.
>

So currently I am submitting the current patch to commit fest.  I am open
to adding a new
--Include-path option in this patch, but I am worried about atomicity
concerns, and I am still
not really sure what the impact is for you (I haven't heard whether you
expect this file to be
there before the rewind, i.e. whether it would be on both master and slave,
or just on the slave).

Sp there is the question of under what specific circumstances this would
break for you.

>
> > Right, but there is a use case difference between "I am taking a backup
> of a
> > server" and "I need to get the server into  rejoin the replication as a
> > standby."
>
> The intersection of the first and second categories is not empty. Some
> take backups and use them to deploy standbys.
>

Sure, but it is not complete either.

>
> > A really good example of where this is a big problem is with replication
> > slots.  On a backup I would expect you want replication slots to be
> copied
> > in.
>
> I would actually expect the contrary, and please note that replication
> slots are not taken in a base backup, which is what the documentation
> says as well:
> https://www.postgresql.org/docs/10/static/protocol-replication.html
> "pg_dynshmem, pg_notify, pg_replslot, pg_serial, pg_snapshots,
> pg_stat_tmp, and pg_subtrans are copied as empty directories (even if
> they are symbolic links)."
>

Many of these are emptied on server restart but repslot is not.  What is
the logic
for excluding it from backups other than to avoid problems with replication
provisioning?

I mean if I have a replication slot for taking backups with barman (and I
am not actually doing replication) why would I *not* want that in my base
backup if I might have to restore to a new machine somewhere?

>
> Some code I have with 9.6's pg_bsaebackup removes manually replication
> slots as this logic is new in 10 ;)
>
> >> The pattern that base backups now use is an exclude list. In the
> >> future I would rather see base backups and pg_rewind using the same
> >> infrastructure for both things:
> >> - pg_rewind should use the replication protocol with BASE_BACKUP.
> >> Having it rely on root access now is crazy.
> >> - BASE_BACKUP should have an option where it is possible to exclude
> >> custom paths.
> >> What you are proposing here would make both diverge more, which is in
> >> my opinion not a good approach.
> >
> > How does rep mgr or other programs using pg_rewind know what to exclude?
>
> Good question. Answers could come from folks such as David Steele
> (pgBackRest) or Marco (barman) whom I am attaching in CC.
>

Two points that further occur to me:

Shell globs seem to me to be foot guns in this area, I think special paths
should be one path per invocation of the option not "--exclude=pg_w*" since
this is just asking for problems as time goes on and things get renamed.

It also seems to me that adding specific paths is far safer than removing
specific paths.

> --
> Michael
>



-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] WIP: Restricting pg_rewind to data/wal dirs

2017-10-30 Thread Chris Travers
On Mon, Oct 30, 2017 at 10:57 AM, Michael Paquier <michael.paqu...@gmail.com
> wrote:

> On Mon, Oct 30, 2017 at 9:43 AM, Chris Travers <chris.trav...@adjust.com>
> wrote:
> > Are there any cases right now where you have features added by
> extensions that write to directories which are required for a rewind?
>
> In some of the stuff I maintain, I actually have one case now of a
> configuration file included with include_if_exists by postgresql.conf
> and is expected to be generated by a component that my code doing the
> rewind has no direct access on... I can control how pg_rewind kicks
> in, but I think that you would actually break silently the current
> code, which is scary especially if I don't control the code where
> pg_rewind is called when Postgres 11 gets integrated into the product
> I am thinking about, and even more scary if there is no way to include
> something.
>

Ok, so there is an argument that there needs to be a way to include
additional paths in this patch.  One important question I would have in
these cases is if you expect these to be set only on the master.  If not,
then is this a problem and if not then how do you handle the fail-back
problem etc?

This also brings up a fairly major concern more generally about control by
the way.  A lot of cases where pg_rewind is called, the user doesn't
necessarily have much control on how it is called.  Moreover in many of
these cases, the user is probably not in a position to understand the
internals well enough to grasp what to check after.

>
> > The problem with an exclude list is that we cannot safely exclude
> > configuration files or logs (because these could be named anything),
> right?
>
> You have the exact same problem with base backups now, and any
> configuration files created by extensions are a per-case problem...
>

Right, but there is a use case difference between "I am taking a backup of
a server" and "I need to get the server into  rejoin the replication as a
standby."

A really good example of where this is a big problem is with replication
slots.  On a backup I would expect you want replication slots to be copied
in.  However when setting yourself up as a slave you most certainly do not
want to just copy these over from the master unless you have infinite disk
space.  I would argue that under *no* circumstance should pg_rewind *ever*
copy replication slots.  But pg_base_backup really *should* (and when
provisioning a new slave you should clear these as soon as it is set up).

The pattern that base backups now use is an exclude list. In the
> future I would rather see base backups and pg_rewind using the same
> infrastructure for both things:
> - pg_rewind should use the replication protocol with BASE_BACKUP.
> Having it rely on root access now is crazy.
> - BASE_BACKUP should have an option where it is possible to exclude
> custom paths.
> What you are proposing here would make both diverge more, which is in
> my opinion not a good approach.
>

How does rep mgr or other programs using pg_rewind know what to exclude?

Again I am not convinced setting up a replica and taking a backup for
disaster recovery are the same use case or have the same requirements.

--
> Michael
>



-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] WIP: Restricting pg_rewind to data/wal dirs

2017-10-30 Thread Chris Travers
First, thanks for your thoughts on this, and I am interested in probing
them more.

On Mon, Oct 30, 2017 at 9:04 AM, Michael Paquier <michael.paqu...@gmail.com>
wrote:

> On Sat, Oct 28, 2017 at 4:22 AM, Chris Travers <chris.trav...@adjust.com>
> wrote:
> > The Solution:
> > The solution is a whitelist of directories specified which are the only
> ones
> > which are synchronised.  The relevant part of this patch is:
> >
> > +/* List of directories to synchronize:
> > + * base data dirs (and ablespaces)
> > + * wal/transaction data
> > + * and that is it.
> > + *
> > + * This array is null-terminated to make
> > + * it easy to expand
> > + */
> >
> > +const char *rewind_dirs[] = {
> > +"base",
> > +"global",
> > +"pg_commit_ts",
> > +"pg_logical",
> > +"pg_multixact",
> > +"pg_serial",
> > +"pg_subtrans",
> > +"pg_tblspc",
> > +"pg_twophase",
> > +"pg_wal",
> > +"pg_xact",
> > +NULL
> > +};
>
> The problem with a white list, which is one reason why I do not favour
> it, is in the case where a feature adds in the data folder a new path
> for its data, particularly in the case where this is critical for a
> base backup. If this folder is not added in this list, then a rewind
> may be silently corrupted as well. There are also a set of directories
> in this list that we do not care about, pg_serial being one.
> pg_subtrans is a second, as it gets zeroed on startup.
>

The problem with an exclude list is that we cannot safely exclude
configuration files or logs (because these could be named anything), right?

Are there any cases right now where you have features added by extensions
that write to directories which are required for a rewind?  I am asking
because I would like to see if this is the minimum working change or if
this change is fundamentally broken currently and must be extended to allow
custom paths to be sync'd as well.

>
> And if you think about it, pg_rewind is actually the *same* processing
> as a base backup taken using the replication protocol. So instead of
> this patch I would recommend using excludeFiles and excludeDirContents
> by moving this list to a common header where frontend and backend can
> refer to. Note that basebackup.h includes replnodes.h, so my thoughts
> is that you should split the current header with something like
> basebackup_internal.h which is backend-only, and have pg_rewind fetch
> the list of directories to automatically ignore as those ones. You can
> also simplify a bit the code of pg_rewind a bit so as things like
> postmaster.opts. On top of that, there is visibly no need to tweak the
> SQL query fetching the directory list (which is useful for debugging
> as shaped now actually), but just change process_source_file so as its
> content is not added in the file map.
>

I am not sure it *should* be the same, however.  In a backup we probably
want to backup the postgresql.auto.conf, but on a failover, I don't think
we want to clobber configuration.  We certainly do not want to sometimes
clobber configuration but not other times (which is what happens right now
in some cases).  And under no circumstances do we want to clobber logs on a
failed server with logs on a working server.  That's asking for serious
problems in my view.

If you think about it, there's a huge difference in use case in backing up
a database cluster (Including replication slots, configs in the dir etc)
and re-syncing the data so that replication can resume, and I think there
are some dangers that come up when assuming these should be closely tied
together.

>
> Then there is a second case where you do not want a set of folders to
> be included, but those can be useful by default, an example here being
> pg_wal where one might want to have an empty path to begin with. A
> third case is a set of folders that you do not care about, but depends
> on the deployment, being here "log" for example. For those you could
> just add an --exclude-path option which simply piles up paths into a
> linked list when called multiple times. This could happen with a
> second patch.
>

Agreed.  And one could add an "--include-path" option to allow for unusual
cases where you want extra directories, such as replication slots, or the
like.

I think another patch could also specifically empty and perhaps create a
replication slot allowing for one to bring tings back up via streaming
replication more safely.

>
> > From there we iterate over this array for directory-based approaches in
> > copy_fetch.c and with one query per directory in libpq

Re: [HACKERS] proposal: schema variables

2017-10-29 Thread Chris Travers
On Sat, Oct 28, 2017 at 4:56 PM, Pavel Stehule <pavel.steh...@gmail.com>
wrote:

>
>>
> The creating database objects and necessary infrastructure is the most
> simple task of this project. I'll be more happy if there are zero
> intersection because variables and GUC are designed for different purposes.
> But due SET keyword the intersection there is.
>
> When I thinking about it, I have only one, but important reason, why I
> prefer design new type of database object -the GUC are stack based with
> different default granularity - global, database, user, session, function.
> This can be unwanted behave for variables - it can be source of hard to
> detected bugs. I afraid so this behave can be too messy for usage as
> variables.
>
> @1 I have not clean opinion about it - not sure if rights are good enough
> - probably some user limits can be more practical - but can be hard to
> choose result when some user limits and GUC will be against
>

I was mostly thinking that users can probably set things like work_mem and
possibly this might be a problem.


> @2 With variables typed custom GUC are not necessary
>

I don't know about that.  For example with the geoip2lookup extension it is
nice that you could set the preferred language for translation on a per
user basis or the mmdb path on a per-db basis.


> @3 Why you need it? It is possible with set_config function now.
>

Yeah you could do it safely with set_config and a CTE, but suppose I have:

with a (Id, value) as (values (1::Int, 'foo'), (2, 'bar'), (3, 'baz'))
SELECT set_config('custom_val', value) from a where id = 2;

What is the result out of this?  I would *expect* that this would probably
run set_config 3 times and filter the output.


>
> Regards
>
> Pavel
>
>
>
>
>>
>>
>>> regards
>>>
>>> Pavel
>>>
>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Chris Travers
>> Database Administrator
>>
>> Tel: +49 162 9037 210 <+49%20162%209037210> | Skype: einhverfr |
>> www.adjust.com
>> Saarbrücker Straße 37a, 10405 Berlin
>> <https://maps.google.com/?q=Saarbr%C3%BCcker+Stra%C3%9Fe+37a,+10405+Berlin=gmail=g>
>>
>>
>


-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] proposal: schema variables

2017-10-28 Thread Chris Travers
On Thu, Oct 26, 2017 at 9:21 AM, Pavel Stehule <pavel.steh...@gmail.com>
wrote:

> Hi,
>
> I propose a  new database object - a variable. The variable is persistent
> object, that holds unshared session based not transactional in memory value
> of any type. Like variables in any other languages. The persistence is
> required for possibility to do static checks, but can be limited to session
> - the variables can be temporal.
>
> My proposal is related to session variables from Sybase, MSSQL or MySQL
> (based on prefix usage @ or @@), or package variables from Oracle (access
> is controlled by scope), or schema variables from DB2. Any design is coming
> from different sources, traditions and has some advantages or
> disadvantages. The base of my proposal is usage schema variables as session
> variables for stored procedures. It should to help to people who try to
> port complex projects to PostgreSQL from other databases.
>
> The Sybase  (T-SQL) design is good for interactive work, but it is weak
> for usage in stored procedures - the static check is not possible. Is not
> possible to set some access rights on variables.
>
> The ADA design (used on Oracle) based on scope is great, but our
> environment is not nested. And we should to support other PL than PLpgSQL
> more strongly.
>
> There is not too much other possibilities - the variable that should be
> accessed from different PL, different procedures (in time) should to live
> somewhere over PL, and there is the schema only.
>
> The variable can be created by CREATE statement:
>
> CREATE VARIABLE public.myvar AS integer;
> CREATE VARIABLE myschema.myvar AS mytype;
>
> CREATE [TEMP] VARIABLE [IF NOT EXISTS] name AS type
>   [ DEFAULT expression ] [[NOT] NULL]
>   [ ON TRANSACTION END { RESET | DROP } ]
>   [ { VOLATILE | STABLE } ];
>
> It is dropped by command DROP VARIABLE  [ IF EXISTS] varname.
>
> The access rights is controlled by usual access rights - by commands
> GRANT/REVOKE. The possible rights are: READ, WRITE
>
> The variables can be modified by SQL command SET (this is taken from
> standard, and it natural)
>
> SET varname = expression;
>
> Unfortunately we use the SET command for different purpose. But I am
> thinking so we can solve it with few tricks. The first is moving our GUC to
> pg_catalog schema. We can control the strictness of SET command. In one
> variant, we can detect custom GUC and allow it, in another we can disallow
> a custom GUC and allow only schema variables. A new command LET can be
> alternative.
>
> The variables should be used in queries implicitly (without JOIN)
>
> SELECT varname;
>
> The SEARCH_PATH is used, when varname is located. The variables can be
> used everywhere where query parameters are allowed.
>
> I hope so this proposal is good enough and simple.
>
> Comments, notes?
>


I have a question on this.  Since one can issue set commands on arbitrary
settings (and later ALTER database/role/system on settings you have created
in the current session) I am wondering how much overlap there is between a
sort of extended GUC with custom settings and variables.

Maybe it would be simpler to treat variables and GUC settings to be similar
and see what can be done to extend GUC in this way?

I mean if instead we allowed restricting SET to known settings then we
could have a CREATE SETTING command which would behave like this and then
use SET the same way across both.

In essence I am wondering if this really needs to be as separate from GUC
as you are proposing.

If done this way then:

1.  You could issue grant or revoke on GUC settings, allowing some users
but not others to set things like work_mem for their queries
2.  You could specify allowed types in custom settings.
3.  In a subsequent stage you might be able to SELECT  INTO
setting_name FROM ;  allowing access to setting writes based on queries.



> regards
>
> Pavel
>
>
>


-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


[HACKERS] WIP: Restricting pg_rewind to data/wal dirs

2017-10-28 Thread Chris Travers
Hi;

There are still some cleanup bits needed here but I wanted to get feedback
on my general direction.

I hope to submit for commit fest soon if the general feedback is good.
Tests are passing (with adjustments intended for change of behaviour in one
test script).  I want to note that Crimson Thompson (also with Adjust) has
provided some valuable discussion on this code.

The Problem:

pg_rewind makes no real guarantees regarding the final state of non-data
files or directories.  It.checks to see if the timeline has incremented
(and therefore guarantees that if successful the data directories are on
the same timeline) but for non-data files, these are clobbered if we rewind
and left intact if not.  These other files include postgresql.auto.conf,
replication slots, and can include log files.

Copying logs over to the new slave is something one probably never wants to
do (same with replication slots), and the behaviours here can mask
troubleshooting regarding what a particular master failed, cause wal
segments to build up, automatic settings changes, and other undesirable
behaviours.  Together these make pg_rewind very difficult to use properly
and push tasks to replication management tooling that the management tools
are not in a good position to handle correctly.

Designing the Fix:

Two proposed fixes have been considered and one selected:  Either we
whitelist directories and only rewind those.  The other was to allow shell
globs to be provided that could be used to exclude files.  The whitelisting
solution was chosen for a number of reasons.

When pg_rewind "succeeds" but not quite correctly the result is usually a
corrupted installation which requires a base backup to replace it anyway.
In a recovery situation, sometimes pressures occur which render human
judgment less effective, and therefore glob exclusion strikes me as
something which would probably do more harm than good, but maybe I don't
understand the use case (comments as to why some people look at the other
solution as preferable would be welcome).

In going with the whitelisting solution, we chose to include all
directories with WAL-related information.This allows more predicable
interaction with other parts of the replication chain.  Consequently not
only do we copy pg_wal and pg_xact but also commit timestamps and so forth.

The Solution:

The solution is a whitelist of directories specified which are the only
ones which are synchronised.  The relevant part of this patch is:

+/* List of directories to synchronize:

+ * base data dirs (and ablespaces)

+ * wal/transaction data

+ * and that is it.

+ *

+ * This array is null-terminated to make

+ * it easy to expand

+ */

+

+const char *rewind_dirs[] = {

+"base",

+"global",

+"pg_commit_ts",

+"pg_logical",

+"pg_multixact",

+"pg_serial",

+"pg_subtrans",

+"pg_tblspc",

+"pg_twophase",

+"pg_wal",

+"pg_xact",

+NULL

+};


From there we iterate over this array for directory-based approaches in
copy_fetch.c and with one query per directory in libpq_fetch.c.  This also
means shifting from the basic interface from PQexec to PQexecParams.  It
would be possible to move to binary formats too, but this was not done
currently in order to simplify code review (that could be a separate
independent patch at a later time).

Testing Done:

The extra files tests correctly test this change in behaviour.  The tests
have been modified, and diagnostics in cases of failures expanded, in this
case.  The other tests provide good coverage of whether pg_rewind does what
it is supposed to do.

Cleanup still required:

I accidentally left Carp::Always in the PM in this perl module.  It will be
fixed.

I renamed one of the functions used to have a more descriptive name but
currently did not remove the old function yet.


Feedback is very welcome.  pg_rewind is a very nice piece of software.  I
am hoping that these sorts of changes will help ensure that it is easier to
use and provides more predictable results.
-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


pg_rewind_restrict.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add Roman numeral conversion to to_number

2017-09-17 Thread Chris Travers
On Sun, Sep 17, 2017 at 6:43 PM, David Fetter <da...@fetter.org> wrote:

> On Sat, Sep 16, 2017 at 10:42:49PM +, Douglas Doole wrote:
> > Oliver, I took a look at your tests and they look thorough to me.
> >
> > One recommendation, instead of having 3999 separate selects to test every
> > legal roman numeral, why not just do something like this:
> >
> > do $$
> > declare
> > i int;
> > rn text;
> > rn_val int;
> > begin
> > for i in 1..3999 loop
> > rn := trim(to_char(i, 'rn'));
> > rn_val := to_number(rn, 'rn');
> > if (i <> rn_val) then
> > raise notice 'Mismatch: i=% rn=% rn_val=%', i, rn, rn_val;
> > end if;
> > end loop;
> > raise notice 'Tested roman numerals 1..3999';
> > end;
> > $$;
> >
> > It's a lot easier to maintain than separate selects.
>
> Why not just one SELECT, as in:
>
> SELECT i, to_char(i, 'rn'), to_number(to_char(i, 'rn'), 'rn');
> FROM generate_series(1,3999) i
>

Question:  What is our definition of a legal Roman numeral?

For example sometimes IXX appears in the corpus to refer to 19 even though
our standardised notation would be XIX.

>
> Best,
> David.
> --
> David Fetter <david(at)fetter(dot)org> http://fetter.org/
> Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
> Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com
>
> Remember to vote!
> Consider donating to Postgres: http://www.postgresql.org/about/donate
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] pg_rewind proposed scope and interface changes

2017-09-15 Thread Chris Travers
On Wed, Sep 13, 2017 at 6:28 AM, Michael Paquier <michael.paqu...@gmail.com>
wrote:

> On Tue, Sep 12, 2017 at 11:52 PM, Chris Travers
> <chris.trav...@adjust.com> wrote:
> > Additionally the wal, xact, timestamp and logical directories must be
> > processed in some way.
>
> To what does the term "logical directories" refer to?
>
> >   * if --wal=sync the directories are processed the way they are today
> >   * if --wal=clear then the contents of the directories are cleared and
> > replication is assumed to be used to bring the system up after.  Note
> this
> > will need to come with warning about the need for replication slots.
>
> Hm. I am not sure in what --wal=clear is helpful. Keeping around WAL
> segments from the point of the last checkpoint where WAL forked up to
> the point where WAL has forked is helpful, because you don't need to
> copy again those WAL segments, be they come from an archive or from
> streaming. Copying a set of WAL segments during the rewind of the new
> timeline is helpful as well because you don't need to do the copy
> again. One configuration where this is helpful is that there is
> already an archive local to the target server available with the
> segments of the new timeline available.
>

so maybe clear should just clear diverged wal files.  That makes some
sense.

>
> > Base, global, pg_tablespace
> >
> > With
> > pg_wal, pg_xact, pg_commit_ts, pg_logical added if wal strategy is set to
> > sync.
>
> Skipping some directories in a way similar to what a base backup does
> would be nicer I think. We already have a list of those in
> basebackup.c in the shape of excludeDirContents and excludeFiles. I
> think that it would be a good idea to export those into a header that
> pg_rewind could include, and refer to in order to exclude them when
> fetching a set of files. At the end of the day, a rewind is a kind of
> base backup in itself, and this patch would already serve well a lot
> of people.
>

I think there are several reasons not to just do this based on base
backup.  For base backup there may be some things we restore as such that
we don't want to restore with pg_rewind.  Configuration files, logs, and
replication slots are the ones that immediately come to mind.  Given that
configuration files can have arbitrary names and locations, expecting third
party failover tooling like repmgr or patroni to pick up on such policies
strikes me as asking a bit much.  I think we are better off with a program
that solves one problem well and solves it right and the problem is
rewinding a system so that it can follow a replica that was previously
following it.

>
> Having on top of that a way to exclude a wanted set of files and the
> log directory (for example: should we look at log_directory and
> exclude it from the fetched paths if it is not an absolute path?),
> which is smart enough to take care of not removing paths critical for
> a rewind like anything in base/, then you are good to go with a
> full-blown tool that I think would serve the purposes you are looking
> for.
>

If you are going to go this route, I think you need to look at the config
files themselves, parse them, and possibly look at the command line by
which PostgreSQL is started.Otherwise I don't know how you expect this
to be configured


> --
> Michael
>



-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


[HACKERS] pg_rewind proposed scope and interface changes

2017-09-12 Thread Chris Travers
Hi all;


After working with pg_rewind a bit more it has become clear to me that some
aspects of the ideal scope of the program are a bit unclear.  This is in
part because the majority of cases where you use the utility, a small
amount of data loss is accepted due to the tradeoff of continuity.


Basic Responsibility:


pg_rewind’s core responsibility is to restore data directories so that they
are compatible with following another server.


Expanded Options:

As with pg_basebackup, wal files and related data may need to be specified
separately.


Pre-run checks:


PG_VERSION must be the same on target and source or else we stop.


We then check the timelines and if they have not diverged don’t do anything.


Processed directories:


pg_basebackup must *always* process global, base, and pg_tablespace
directories regardless of any WAL


Additionally the wal, xact, timestamp and logical directories must be
processed in some way.


  * if --wal=sync the directories are processed the way they are today

  * if --wal=clear then the contents of the directories are cleared and
replication is assumed to be used to bring the system up after.  Note this
will need to come with warning about the need for replication slots.


No other files will be clobbered.


This means that the basic processing file list will look like this:


Base, global, pg_tablespace


With

pg_wal, pg_xact, pg_commit_ts, pg_logical added if wal strategy is set to
sync.


Proposed patch roadmap:

   1. Improve WAL handling by allowing unlinking of all regular files in
   those directories during the rewind process.  Make —wal=sync the default
   for now because that is backwards compatible.  Longer-run we may consider
   switching the default.
   2. Restrict main operation to the directories listed above.

Problems that we will not try to solve:

 * Rewinding past table creation orphans table file.  This is a complex
topic on its own and probably needs a separate utility.

Thoughts?

-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] Proposal: pg_rewind to skip config files

2017-09-07 Thread Chris Travers
On Thu, Sep 7, 2017 at 2:47 PM, Alvaro Herrera <alvhe...@alvh.no-ip.org>
wrote:

> After reading this discussion, I agree that pg_rewind needs to become
> smarter in order to be fully useful in production environments; right
> now there are many warts and corner cases that did not seem to have been
> considered during the initial development (which I think is all right,
> taking into account that its mere concept was complicated enough; so we
> need not put any blame on the original developers, rather the contrary).
>

Agreed with this assessment.  And as a solution to the problem of "base
backups take too long to take and transfer" the solution and the corner
cases make a lot of sense.

>
> I think we need to make the program simple to use (i.e. not have the
> user write shell globs for the common log file naming patterns) while
> remaining powerful (i.e. not forcibly copy any files that do not match
> hardcoded globs).


I would add that well-defined tasks are a key aspect of powerful software
in my view and here the well defined task is to restore data states to a
particular usable timeline point taken from another system.  If that is
handled well, that opens up new uses and makes some problems that are
difficult right now much easier to solve.


>   Is the current dry-run mode enough to give the user
> peace of mind regarding what would be done in terms of testing globs
> etc?  If not, maybe the debug mode needs to be improved (for instance,
> have it report the file size for each file that would be copied;
> otherwise you may not notice it's going to copy the 20GB log file until
> it's too late ...)
>

The dry run facility solves one problem in one circumstance, namely a
manually invoked run of the software along with the question of "will this
actually re-wind?"  I suppose software developers might be able to use it
to backup and restore things that are to be clobbered (but is anyone likely
to on the software development side?).  I don't see anything in that corner
that can be improved without over engineering the solution.

There are two reasons I am skeptical that a dry-run mode will ever be
"enough."

The first is that pg_rewind is often integrated into auto-failover/back
tools and the chance of running it in a dry-run mode before it is
automatically triggered is more or less nil.  These are also the cases
where you won't notice it does something bad until much later.

The second is that there are at least some corner cases we may need to
define as outside the responsibility of pg_rewind.  The one that comes to
mind here is if I am rewinding back past the creation of a small table.  I
don't see an easy or safe way to address that from inside pg_rewind without
a lot of complication.  It might be better to have a dedicated tool for
that.


>
> Now, in order for any of this to happen, there will need to be a
> champion to define what the missing pieces are, write all those patches
> and see them through the usual (cumbersome, slow) procedure.  Are you,
> Chris, willing to take the lead on that?
>

 Yeah. I think the first step would be list of the corner cases and a
proposal for how I think it should work.  Then maybe a roadmap of patches,
and then submitting them as they become complete.


> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>



-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] Proposal: pg_rewind to skip config files

2017-09-07 Thread Chris Travers
One more side to this which is relevant to other discussions.

If I am rewinding back to before when a table was created, the current
algorithm as well as any proposed algorithms will delete the reference to
the relfilenode in the catalogs but not the file itself.  I don't see how
an undo subsystem would fix this.

Is this a reason to rethink the idea that maybe a pg_fsck utility might be
useful that could be run immediately after a rewind?

-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] Proposal: pg_rewind to skip config files

2017-09-05 Thread Chris Travers
On Tue, Sep 5, 2017 at 2:40 PM, Vladimir Borodin <r...@simply.name> wrote:

>
> 5 сент. 2017 г., в 14:04, Michael Paquier <michael.paqu...@gmail.com>
> написал(а):
>
> For example, in archive_command we put WALs for archiving from
> pg_xlog/pg_wal into another directory inside PGDATA and than another cron
> task makes real archiving. This directory ideally should be skipped by
> pg_rewind, but it would not be handled by proposed change.
>
>
> I would be curious to follow the reasoning for such a two-phase
> archiving (You basically want to push it in two places, no? But why
> not just use pg_receivexlog then?). This is complicated to handle from
> the point of view of availability and backup reliability + durability.
>
>
> We do compress WALs and send them over network. Doing it via
> archive_command in single thread is sometimes slower than new WALs are
> written under heavy load.
>

How would this work when it comes to rewinding against a file directory?

>
> --
> May the force be with you…
> https://simply.name
>
>


-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] Proposal: pg_rewind to skip config files

2017-09-05 Thread Chris Travers
On Tue, Sep 5, 2017 at 1:04 PM, Michael Paquier <michael.paqu...@gmail.com>
wrote:

> On Tue, Sep 5, 2017 at 7:54 PM, Vladimir Borodin <r...@simply.name> wrote:
> > 5 сент. 2017 г., в 12:31, Chris Travers <chris.trav...@adjust.com>
> > написал(а):
> >
> > I think the simplest solution for now is to skip any files ending in
> .conf,
> > .log, and serverlog.
>
> This is not a portable solution. Users can include configuration files
> with the names they want. So the current patch as proposed is
> definitely not something worth it.
>

Actually that is exactly why I think the long-term solution is to figure
out what we need to copy and not copy anything we don't recognise.

That means the following directories as far as I can see:
 * base
 * global
 * pg_xlog/pg_wal
 * pg_clog/pg_xact
 * pg_commit_ts
 * pg_twophase
 * pg_snapshots?

Are there any other directories I am missing?


At any rate, I think the current state makes it very difficult to test
rewind adequately, and it makes it extremely difficult to use in a
non-trivial environment because you have to handle replication slots,
configuration files, and so forth yourself, and you have to be aware that
these *may* or *may not* be consistently clobbered by a rewind, so you have
to have some way of applying another set of files in following a rewind.

If nothing else we ought to *at least* special case the recovery.conf and
the postgresql.auto.conf, and pg_replslot because these are always located
there and should never be clobbered.


>
> > For example, in archive_command we put WALs for archiving from
> > pg_xlog/pg_wal into another directory inside PGDATA and than another cron
> > task makes real archiving. This directory ideally should be skipped by
> > pg_rewind, but it would not be handled by proposed change.
>
> I would be curious to follow the reasoning for such a two-phase
> archiving (You basically want to push it in two places, no? But why
> not just use pg_receivexlog then?). This is complicated to handle from
> the point of view of availability and backup reliability + durability.
>
> > While it is definitely an awful idea the user can easily put something
> > strange (i.e. logs) to any important directory in PGDATA (i.e. into base
> or
> > pg_wal). Or how for example pg_replslot should be handled (I asked about
> it
> > a couple of years ago [1])? It seems that a glob/regexp for things to
> skip
> > is a more universal solution.
> >
> > [1]
> > https://www.postgresql.org/message-id/flat/8DDCCC9D-450D-
> 4CA2-8CF6-40B382F1F699%40simply.name
>
> Well, keeping the code simple is not always a bad thing. Logs are an
> example that can be easily countered, as well as archives in your
> case.
>




> --
> Michael
>



-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] Proposal: pg_rewind to skip config files

2017-09-05 Thread Chris Travers
On Tue, Sep 5, 2017 at 12:54 PM, Vladimir Borodin <r...@simply.name> wrote:

>
> 5 сент. 2017 г., в 12:31, Chris Travers <chris.trav...@adjust.com>
> написал(а):
>
> I think the simplest solution for now is to skip any files ending in
> .conf, .log, and serverlog.
>
>
> Why don’t you want to solve the problem once? It is a bit harder to get
> consensus on a way how to do it, but it seems that there are no reasons to
> make temporary solution here.
>
> For example, in archive_command we put WALs for archiving from
> pg_xlog/pg_wal into another directory inside PGDATA and than another cron
> task makes real archiving. This directory ideally should be skipped by
> pg_rewind, but it would not be handled by proposed change.
>

Ok let's back up a bit in terms of what I see is the proper long-term fix.
Simple code, by the way, is important, but at least as important are
programs which solve simple, well defined problems.  The current state is:

1.  pg_rewind makes *no guarantee* as to whether differences in logs,
config files, etc. are clobbered.  They may (If a rewind is needed) or not
(If the timelines haven't diverged).  Therefore the behavior of these sorts
of files with the invocation of pg_rewind is not really very well defined.
That's a fairly big issue in an operational environment.

2.  There are files which *may* be copied (I.e. are copied if the timelines
have diverged) which *may* have side effects on replication topology, wal
archiving etc.  Replication slots, etc. are good examples.

The problem I think pg_rewind should solve is "give me a consistent data
environment from the timeline on that server."  I would think that access
to the xlog/clog files would indeed be relevant to that.  If I were
rewriting the application now I would include those.  Just because
something can be handled separately doesn't mean it should be, and I would
refer not to assume that archiving is properly set up and working.

>
>
> Long run, it would be nice to change pg_rewind from an opt-out approach to
> an approach of processing the subdirectories we know are important.
>
>
> While it is definitely an awful idea the user can easily put something
> strange (i.e. logs) to any important directory in PGDATA (i.e. into base or
> pg_wal). Or how for example pg_replslot should be handled (I asked about it
> a couple of years ago [1])? It seems that a glob/regexp for things to skip
> is a more universal solution.
>

I am not convinced it is a universal solution unless you take an arbitrary
number or regexes to check and loop through checking all of them.  Then the
chance of getting something catastrophically wrong in a critical
environment goes way up and you may end up in an inconsistent state at the
end.

Simple code is good.  A program that solves simple problems reliably (and
in simple ways) is better.

The problem I see is that pg_rewind gets incorporated into other tools
which don't really provide the user before or after hooks and therefore it
isn't really fair to say, for example that repmgr has the responsibility to
copy server logs out if present, or to make sure that configuration files
are not in the directory.

The universal solution is to only touch the files that we know are needed
and therefore work simply and reliably in a demanding environment.


>
> [1] https://www.postgresql.org/message-id/flat/8DDCCC9D-
> 450D-4CA2-8CF6-40B382F1F699%40simply.name
>
>
> --
> May the force be with you…
> https://simply.name
>
>


-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] Proposal: pg_rewind to skip config files

2017-09-05 Thread Chris Travers
On Mon, Sep 4, 2017 at 3:38 PM, Alvaro Herrera <alvhe...@alvh.no-ip.org>
wrote:

> Chris Travers wrote:
> > On Mon, Sep 4, 2017 at 12:23 PM, Michael Paquier <
> michael.paqu...@gmail.com>
> > wrote:
> >
> > > On Mon, Sep 4, 2017 at 7:21 PM, Michael Paquier
> > > <michael.paqu...@gmail.com> wrote:
> > > > A simple idea would be to pass as a parameter a regex on which we
> > > > check files to skip when scanning the directory of the target
> remotely
> > > > or locally. This needs to be used with care though, it would be easy
> > > > to corrupt an instance.
> > >
> > > I actually shortcut that with a strategy similar to base backups: logs
> > > are on another partition, log_directory uses an absolute path, and
> > > PGDATA has no reference to the log path.
> >
> > Yeah, it is quite possible to move all these out of the data directory,
> but
> > bad things can happen when you accidentally copy configuration or logs
> over
> > those on the target and expecting that all environments will be properly
> > set up to avoid these problems is not always a sane assumption.
>
> I agree that operationally it's better if these files weren't in PGDATA
> to start with, but from a customer support perspective, things are
> frequently not already setup like that, so failing to support that
> scenario is a loser.
>
> I wonder how portable fnmatch() is in practice (which we don't currently
> use anywhere).  A shell glob seems a more natural interface to me for
> this than a regular expression.
>

I think the simplest solution for now is to skip any files ending in .conf,
.log, and serverlog.

Long run, it would be nice to change pg_rewind from an opt-out approach to
an approach of processing the subdirectories we know are important.

It is worth noting further that if you rewind in the wrong way, in a
cascading replication environment, you can accidentally change your
replication topology if you clobber the recovery.conf from another replica
and there is no way to ensure that this file is not in the data directory
since it MUST be put there.

Best Wishes,
Chris Travers


>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>



-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] Proposal: pg_rewind to skip config files

2017-09-04 Thread Chris Travers
On Mon, Sep 4, 2017 at 12:23 PM, Michael Paquier <michael.paqu...@gmail.com>
wrote:

> On Mon, Sep 4, 2017 at 7:21 PM, Michael Paquier
> <michael.paqu...@gmail.com> wrote:
> > A simple idea would be to pass as a parameter a regex on which we
> > check files to skip when scanning the directory of the target remotely
> > or locally. This needs to be used with care though, it would be easy
> > to corrupt an instance.
>
> I actually shortcut that with a strategy similar to base backups: logs
> are on another partition, log_directory uses an absolute path, and
> PGDATA has no reference to the log path.
>

Yeah, it is quite possible to move all these out of the data directory, but
bad things can happen when you accidentally copy configuration or logs over
those on the target and expecting that all environments will be properly
set up to avoid these problems is not always a sane assumption.

So consequently, I think it would be good to fix in the tool.  The
fundamental question is if there is any reason someone would actually want
to copy config files over.


--
> Michael
>



-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] Proposal: pg_rewind to skip config files

2017-09-04 Thread Chris Travers
Ok so I have a proof of concept patch here.

This is proof of concept only.  It odes not change documentation or the
like.

The purpose of the patch is discussion on the "do we want this" side.

The patch is fairly trivial but I have not added test cases or changed docs
yet.

Intention of the patch:
pg_rewind is an important backbone tool for recovering data directories
following a switchover.  However currently it is over inclusive as to what
it copies.  This patch excludes any file ending in "serverlog", ".conf",
and ".log" because these are never directly related and add a great deal of
complexity to switchovers.

.conf files are excluded for two major reasons.  The first is that often we
may want to put postgresql.conf and other files in the data directory, and
if we change these during switchover this can change, for example, the port
the database is running on or other things that can break production or
testing environments.  This is usually a problem with testing environments,
but it could happen with production environments as well.

A much larger concern with .conf files though is the recovery.conf.  This
file MUST be put in the data directory, and it helps determine the
replication topology regarding cascading replication and the like.  If you
rewind from an upstream replica, you suddenly change the replication
topology and that can have wide-ranging impacts.

I think we are much better off insisting that .conf files should be copied
separately because the scenarios where you want to do so are more limited
and the concern often separate from rewinding the timeline itself.

The second major exclusion added are files ending in "serverlog" and
".log."  I can find no good reason why server logs from the source should
*ever* clobber those on the target.  If you do this, you lose historical
information relating to problems and introduce management issues.


Backwards-incompatibility scenarios:
If somehow one has a workflow that depends on copying .conf files, this
would break that.  I cannot think of any cases where anyone would actually
want to do this but that doesn't mean they aren't out there.  If people
really want to, then they need to copy the configuration files they want
separately.

Next Steps:

If people like this idea I will add test cases and edit documentation as
appropriate.

-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


pg_rewind_log_conf_patch.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Proposal: pg_rewind to skip config files

2017-09-04 Thread Chris Travers
In some experiments with pg_rewind and rep mgr I noticed that local testing
is complicated by the fact that pg_rewind appears to copy configuration
files from the source to target directory.

I would propose to make a modest patch to exclude postgresql.conf,
pg_hba.conf, and pg_ident.conf from the file tree traversal.

Any feedback before I create.a proof of concept?

-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] Proposal: global index

2017-08-25 Thread Chris Travers
On Fri, Aug 25, 2017 at 12:15 PM, Petr Jelinek <petr.jeli...@2ndquadrant.com
> wrote:

> On 25/08/17 10:28, Chris Travers wrote:
> >
> >
> > On Thu, Aug 24, 2017 at 9:44 PM, Andres Freund <and...@anarazel.de
> > <mailto:and...@anarazel.de>> wrote:
> >
> > Hi,
> >
> > On 2017-08-18 12:12:58 +0300, Ildar Musin wrote:
> > > While we've been developing pg_pathman extension one of the most
> frequent
> > > questions we got from our users was about global index support. We
> cannot
> > > provide it within an extension. And I couldn't find any recent
> discussion
> > > about someone implementing it. So I'm thinking about giving it a
> shot and
> > > start working on a patch for postgres.
> >
> > FWIW, I personally think for constraints the better approach is to
> make
> > the constraint checking code cope with having to check multiple
> > indexes. Initially by just checking all indexes, over the longer term
> > perhaps pruning the set of to-be-checked indexes based on the values
> in
> > the partition key if applicable.   The problem with creating huge
> global
> > indexes is that you give away some the major advantages of
> partitioning:
> > - dropping partitions now is slow / leaves a lof of garbage again
> > - there's no way you can do this with individual partitions being
> remote
> >   or such
> > - there's a good chunk of locality loss in global indexes
> >
> > The logic we have for exclusion constraints checking can essentially
> be
> > extended to do uniqueness checking over multiple partitions.
> Depending
> > on the desired deadlock behaviour one might end up doing speculative
> > insertions in addition.  The foreign key constraint checking is
> fairly
> > simple, essentially one "just" need to remove the ONLY from the
> > generated check query.
> >
>
> +1 (or +as much as I am allowed to get away with really ;) )
>
> >
> > To be clear, this would still require a high-level concept of a global
> > index and the only question is whether it gets stored as multiple
> > partitions against partitioned tables vs stored in one giant index,
> right?
> >
> No, just global constraints. For example, if you consider unique index
> to be implementation detail of a unique constraint, there is nothing
> stopping us to use multiple such indexes (one per partition) as
> implementation detail to single global unique constraint. No need for
> global index at all.
>

Ok so in this case a global constraint needs to track partitioned indexes,
right?

How does this differ, in practice from a "global but partitioned index?"
 This seems like splitting hairs but I am trying to see if there is
disagreement here that goes beyond language.

For example, could I have a global, partial unique constraint the way I can
do things with a single table currently (something like create global
unique index foo_id_idxuf on foo(Id) where id > 12345)?  Is this something
the discussion here would foreclose?

It seems to me that you can get both of these (and more) by adding the
concept of a global index which means:
1.  Index is on parent table
2.  Index is inherited to child tables and managed on parent.
3.  Writes to children that hit inherited unique index ALSO must check
(with exclusion constraints etc) ALL other tables in the inheritance tree
of the index.

That would also have a few important side benefits:
1.  Easier management of indexes where all partitions (or other children
since there are other uses for table inheritance than partitioning) must be
indexed
2.  Ability to have partial unique indexes enforced consistently across an
inheritance tree.

An alternative might be to generalise partial unique indexes into partial
unique constraints. (alter table foo add unique (bar) where id > 12345)


>
> --
>   Petr Jelinek  http://www.2ndQuadrant.com/
>   PostgreSQL Development, 24x7 Support, Training & Services
>



-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] Thoughts on unit testing?

2017-08-25 Thread Chris Travers
On Thu, Aug 24, 2017 at 9:07 AM, Torsten Zuehlsdorff <
mailingli...@toco-domains.de> wrote:

>
>
> On 13.08.2017 21:19, Peter Geoghegan wrote:
>
>> On Thu, Aug 10, 2017 at 2:53 PM, Thomas Munro
>> <thomas.mu...@enterprisedb.com> wrote:
>>
>>> The current regression tests, isolation tests and TAP tests are very
>>> good (though I admit my experience with TAP is limited), but IMHO we
>>> are lacking support for C-level unit testing.  Complicated, fiddly
>>> things with many states, interactions, edge cases etc can be hard to
>>> get full test coverage on from the outside.  Consider
>>> src/backend/utils/mmgr/freepage.c as a case in point.
>>>
>>
I don't want to see full test coverage of the code btw.  I think that
undermines the benefit from testing.

I would like to see better test coverage though for reasons below.

>
>> It is my understanding that much of the benefit of unit testing comes
>> from maintainability.
>>
>
> I never had this understanding. I write tests to test expected behavior
> and not the coded one. If possible i separate the persons writing
> unit-tests from the ones writing the function itself. Having someone really
> read the documentation or translate the expectation into a test-case, makes
> sure, the function itself works well.
>

Right.  I would go so far as to say I test to the documentation, not to the
code.  Usually when I test my own code, I write docs, then code, then I
re-read the docs, and write test cases from the docs.

In my not-so humble opinion, the point of tests is to make sure you don't
break other people's stuff, people who are reading the docs and using them
when they write their code.


>
> Also it really safes time in the long run. Subtle changes / bugs can be
> caught which unit-tests. Also a simple bug-report can be translated into a
> unit-test make sure that the bugfix really works and that no regression
> will happen later. I literally saved ones a week of work with a single
> unit-test.
>

Also comparing to the docs means we have a way to reason about where
misbehaviour is occurring that means better maintainability and less code
entropy in the course of fixing bugs.

>
> There are many other advantages, but to earn them the code need to be
> written to be testable. And this is often not the case. Most literature
> advises to Mocking, mixins or other techniques, which most times just
> translate into "this code is not written testable" or "the technique /
> language / concept / etc is not very good in being testable".
>

Agreed here.  Usually you end up with better, more stable, more carefully
designed components as a result.  But as I say, documentation, design, and
testing are actually harder to get right than coding

So with the above being said, the fact is that a lot of advanced stuff can
be done by writing C libraries that get preloaded into PostgreSQL. These C
libraries risk being broken by cases where behaviour does not match
documentation.  So if I want to translate an oid into a db name or vice
versa, I might call the internal functions to do this.  Having a stable C
API would be of great help in ensuring longer-term maintainability of such
libraries.  So I think it would be good to add some unit tests here.

Of course it also means we get to decide what functionality is sufficiently
stable to test and guarantee, and that saves both time in maintenance, and
it improves the safety of moving forward.

But I still think the question of what to test ought to be geared around
"what are we willing to try to guarantee as behaviour for some years, not
just to ourselves but to third parties."

>
> Greetings,
> Torsten
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] Proposal: global index

2017-08-25 Thread Chris Travers
On Thu, Aug 24, 2017 at 9:44 PM, Andres Freund <and...@anarazel.de> wrote:

> Hi,
>
> On 2017-08-18 12:12:58 +0300, Ildar Musin wrote:
> > While we've been developing pg_pathman extension one of the most frequent
> > questions we got from our users was about global index support. We cannot
> > provide it within an extension. And I couldn't find any recent discussion
> > about someone implementing it. So I'm thinking about giving it a shot and
> > start working on a patch for postgres.
>
> FWIW, I personally think for constraints the better approach is to make
> the constraint checking code cope with having to check multiple
> indexes. Initially by just checking all indexes, over the longer term
> perhaps pruning the set of to-be-checked indexes based on the values in
> the partition key if applicable.   The problem with creating huge global
> indexes is that you give away some the major advantages of partitioning:
> - dropping partitions now is slow / leaves a lof of garbage again
> - there's no way you can do this with individual partitions being remote
>   or such
> - there's a good chunk of locality loss in global indexes
>
> The logic we have for exclusion constraints checking can essentially be
> extended to do uniqueness checking over multiple partitions. Depending
> on the desired deadlock behaviour one might end up doing speculative
> insertions in addition.  The foreign key constraint checking is fairly
> simple, essentially one "just" need to remove the ONLY from the
> generated check query.
>


To be clear, this would still require a high-level concept of a global
index and the only question is whether it gets stored as multiple
partitions against partitioned tables vs stored in one giant index, right?

Also for foreign key constraints, does it make sense, for
backwards-compatibility reasons to introduce a new syntax for checking all
child tables?



>
> Greetings,
>
> Andres Freund
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] Proposal: global index

2017-08-23 Thread Chris Travers
On Aug 21, 2017 07:47, "Simon Riggs"  wrote:

On 18 August 2017 at 15:40, Alvaro Herrera  wrote:
> Ildar Musin wrote:
>
>> While we've been developing pg_pathman extension one of the most frequent
>> questions we got from our users was about global index support. We cannot
>> provide it within an extension. And I couldn't find any recent discussion
>> about someone implementing it. So I'm thinking about giving it a shot and
>> start working on a patch for postgres.
>>
>> One possible solution is to create an extended version of item pointer
which
>> would store relation oid along with block number and position:
>
> I've been playing with the index code in order to allow indirect tuples,
> which are stored in a format different from IndexTupleData.
>
> I've been adding an "InMemoryIndexTuple" (maybe there's a better name)
> which internally has pointers to both IndexTupleData and
> IndirectIndexTupleData, which makes it easier to pass around the index
> tuple in either format.

> It's very easy to add an OID to that struct,
> which then allows to include the OID in either an indirect index tuple
> or a regular one.

If there is a unique index then there is no need for that. Additional
data to the index makes it even bigger and even less useful, so we
need to count that as a further disadvantage of global indexes.

I have a very clear statement from a customer recently that "We will
never use global indexes", based upon their absolute uselessness in
Oracle.


It is worth noting that the only use case I can see where global indexes
fill a functionality gap are with unique indexes which allow you to enforce
uniqueness across an inheritance tree where the uniqueness is orthogonal to
any partition key.

I could find large numbers of uses for that.  That could also allow
referential integrity to check against a root table rather than force
partition explosion.Will

Otherwise the following mostly works:

Create table (like foo including all) inherits (foo);

So the gap this addresses is very real even if it is narrow.


> Then, wherever we're using IndexTupleData in the index AM code, we would
> replace it with InMemoryIndexTuple.  This should satisfy both your use
> case and mine.

Global indexes are a subset of indirect indexes use case but luckily
not the only use.

--
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?

2017-08-21 Thread Chris Travers
On Sun, Aug 20, 2017 at 4:10 AM, MauMau <maumau...@gmail.com> wrote:

> From: Chris Travers
> > Why cannot you do all this in a language handler and treat as a user
> defined function?
> > ...
> > If you have a language handler for cypher, why do you need in_region
> or cast_region?  Why not just have a graph_search() function which
> takes in a cypher query and returns a set of records?
>
> The language handler is for *stored* functions.  The user-defined
> function (UDF) doesn't participate in the planning of the outer
> (top-level) query.  And they both assume that they are executed in SQL
> commands.
>

Sure but stored functions can take arguments, such as a query string which
gets handled by the language handler.  There's absolutely no reason you
cannot declare a function in C that takes in a Cypher query and returns a
set of tuples.   And you can do a whole lot with preloaded shared libraries
if you need to.

The planning bit is more difficult, but see below as to where I see major
limits here.

>
> I want the data models to meet these:
>
> 1) The query language can be used as a top-level session language.
> For example, if an app specifies "region=cypher_graph" at database
> connection, it can use the database as a graph database and submit
> Cypher queries without embedding them in SQL.
>

That sounds like a foot gun.  I would probably think of those cases as
being ideal for a custom background worker, similar to Mongress.
Expecting to be able to switch query languages on the fly strikes me as
adding totally needless complexity everywhere to be honest.  Having
different listeners on different ports simplifies this a lot and having,
additionally, query languages for ad-hoc mixing via language handlers might
be able to get most of what you want already.

>
> 2) When a query contains multiple query fragments of different data
> models, all those fragments are parsed and planned before execution.
> The planner comes up with the best plan, crossing the data model
> boundary.  To take the query example in my first mail, which joins a
> relational table and the result of a graph query.  The relational
> planner considers how to scan the table, the graph planner considers
> how to search the graph, and the relational planner considers how to
> join the two fragments.
>

It seems like all you really need is a planner hook for user defined
languages (I.e. "how many rows does this function return with these
parameters" right?).  Right now we allow hints but they are static.  I
wonder how hard this would be using preloaded, shared libraries.


>
> So in_region() and cast_region() are not functions to be executed
> during execution phase, but are syntax constructs that are converted,
> during analysis phase, into calls to another region's parser/analyzer
> and an inter-model cast routine.
>

So basically they work like immutable functions except that you cannot
index the output?

>
> 1. The relational parser finds in_region('cypher_graph', 'graph
> query') and produces a parse node InRegion(region_name, query) in the
> parse tree.
>
> 2. The relational analyzer looks up the system catalog to checks if
> the specified region exists, then calls its parser/analyzer to produce
> the query tree for the graph query fragment.  The relational analyser
> attaches the graph query tree to the InRegion node.
>
> 3. When the relational planner finds the graph query tree, it passes
> the graph query tree to the graph planner to produce the graph
> execution plan.
>
> 4. The relational planner produces a join plan node, based on the
> costs/statistics of the relational table scan and graph query.  The
> graph execution plan is attached to the join plan node.
>
> The parse/query/plan nodes have a label to denote a region, so that
> appropriate region's routines can be called.
>

It would be interesting to see how much of what you want you can get with
what we currently have and what pieces are really missing.

Am I right that if you wrote a function in C to take a Cypher query plan,
and analyse it, and execute it, the only thing really missing would be
feedback to the PostgreSQL planner regarding number of rows expected?

>
> Regards
> MauMau
>
>


-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?

2017-08-19 Thread Chris Travers
On Sat, Aug 19, 2017 at 4:29 PM, MauMau <maumau...@gmail.com> wrote:

> Hello,
>
> Please forgive me for asking such a stupid and rough question.
>
> I'm thinking of making PostgreSQL a multi-model database by supporting
> data models other than the current relational model.  A data model
> consists of a query language (e.g. SQL for relational model, Cypher
> for graph model), a parser and analyzer to transform a query into a
> query tree, a planner to transform the query tree into an execution
> plan, an executor, and a storage engine.
>
> To promote the data model development, I want to make data models
> pluggable.  The rough sketch is:
>
> 1) A data model developer implements the parser, analyzer,
> transformer, planner, executor, and storage engine functions in a
> shared library.
>
> 2) The DBA registers the data model.
>
>   CREATE QUERY LANGUAGE Cypher (
> PARSER = 
>   );
>
>   CREATE DATA MODEL graph (
> QUERY LANGUAGE = Cypher,
> ANALYZER = ,
> TRANSFORMER = ,
> PLANNER = ,
> EXECUTOR = ,
> STORAGE ENGINE = ,
>   );
>
>   CREATE REGION cypher_graph (
> QUERY LANGUAGE = Cypher,
> DATA MODEL = graph
>   );
>

Why cannot you do all this in a language handler and treat as a user
defined function?

>
> The region is just a combination of a query language and a data model,
> much like a locale is a combination of a language and a country.  This
> is because there may be multiple popular query languages for a data
> model.
>
> 3) The application connects to the database, specifying a desired
> region.  The specified region's query language becomes the default
> query language for the session.
>
>
> The application can use the data of multiple data models in one query
> by specifying another region and its query via in_region().  For
> example, the following query combines the relational restaurant table
> and a social graph to get the five chinese restaurants in Tokyo that
> are most popular among friends of John and their friends.
>
>   SELECT r.name, g.num_likers
>   FROM restaurant r,
> cast_region(
>   in_region('cypher_graph',
> 'MATCH (:Person {name:"John"})-[:IS_FRIEND_OF*1..2]-(friend),
>   (friend)-[:LIKES]->(restaurant:Restaurant)
>   RETURN restaurant.name, count(*)'),
> 'relational', 'g', '(name text, num_likers int')
>   WHERE r.name = g.name AND
> r.city = 'Tokyo' AND r.cuisine = 'chinese'
>   ORDER BY g.num_likers DESC LIMIT 5


If you have a language handler for cypher, why do you need in_region or
cast_region?  Why not just have a graph_search() function which takes in a
cypher query and returns a set of records?

> ;
>
>
> What do you think would be difficult to make data models pluggable,
> especially related to plugging the parser, planner, executor, etc?
> One possible concern is that various PostgreSQL components might be
> too dependent on the data model being relational, and it would be
> difficult to separate tight coupling.
>

I guess I am missing why the current language handler structure is not
enough.  Maybe I am missing something?


>
> Regards
> MauMau
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] Proposal: global index

2017-08-18 Thread Chris Travers
I would really like to see global indexes.  It would make things a lot
easier for things like unique constraints across table inheritance trees.

On Fri, Aug 18, 2017 at 11:12 AM, Ildar Musin <i.mu...@postgrespro.ru>
wrote:

> Hi hackers,
>
> While we've been developing pg_pathman extension one of the most frequent
> questions we got from our users was about global index support. We cannot
> provide it within an extension. And I couldn't find any recent discussion
> about someone implementing it. So I'm thinking about giving it a shot and
> start working on a patch for postgres.
>
> One possible solution is to create an extended version of item pointer
> which would store relation oid along with block number and position:
>
> struct ItemPointerExt
> {
> Oid ip_relid;
> BlockIdData ip_blkid;
> OffsetNumber ip_posid;
> };
>
> and use it in global index (regular index will still use old version).
> This will require in-depth refactoring of existing index nodes to make them
> support both versions. Accordingly, we could replace ItemPointer with
> ItemPointerExt in index AM to make unified API to access both regular and
> global indexes. TIDBitmap will require refactoring as well to be able to
> deal with relation oids.
>

So, to be clear on-disk representations would be unchanged for old indexes
(ensuring that pg_upgrade would not be broken), right?

>
> It seems to be quite an invasive patch since it requires changes in
> general index routines, existing index nodes, catalog, vacuum routines and
> syntax. So I'm planning to implement it step by step. As a first prototype
> it could be:
>
> * refactoring of btree index to be able to store both regular and extended
> item pointers;


Do you foresee any performance implementation of handling both?


>

* refactoring of TIDBitmap;
> * refactoring of general index routines (index_insert, index_getnext, etc)
> and indexAM api;
> * catalog (add pg_index.indisglobal attribute and/or a specific relkind as
> discussed in [1] thread);
> * syntax for global index definition. E.g., it could be oracle-like syntax:
>
> CREATE INDEX my_idx ON my_tbl (key) GLOBAL;
>
> If it goes well, then I’ll do the rest of indexes and vacuuming. If you
> have any ideas or concerns I’ll be glad to hear it.
>
> [1] https://www.postgresql.org/message-id/c8fe4f6b-ff46-aae0-89e
> 3-e936a35f0cfd%40postgrespro.ru
>
> Thanks!
>
> --
> Ildar Musin
> i.mu...@postgrespro.ru
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] Orphaned files in base/[oid]

2017-08-16 Thread Chris Travers
On Wed, Aug 16, 2017 at 7:15 PM, Andres Freund <and...@anarazel.de> wrote:

>
>
> I think this entirely is the wrong approach.  We shouldn't add weird
> check commands that require locks on pg_class, we should avoid leaving
> the orphaned files in the first place.  I've upthread outlined
> approached how to do so.
>

And in the mean time I suppose there is no reason that the approach I have
outlined cannot be used by an external program.I think it is sane to
draw the line at having the db responsible for cleaning up after itself
when something goes wrong and leave external programs to do after-the-fact
review and cleanup.

>
> Greetings,
>
> Andres Freund
>



-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] Orphaned files in base/[oid]

2017-08-16 Thread Chris Travers
So having throught about this a bit more, and having had some real-world
experience with the script now, I have an idea that might work and some
questions to make it succeed.

My thinking is to add a new form of vacuum called VACUUM FSCK.

This would:
1. lock pg_class in exclusive mode (or do I need exclusive access?), as
this is needed to solve the race conditions.  As I see, this seems to bring
the database to a screeching halt concurrency-wise (but unlike my script
would allow other databases to be accessed normally).
2. read the files where the name consists of only digits out of the
filesystem and compare with oids from pg_class and relfilenodes
3.  Any file not found in that list would then unlink it, as well as any
files with the patter followed by an underscore or period.

This would mean that the following cases would not be handled:

If you have the first extent gone but later extents are present we check on
the first extant, and so would not see the later ones.  Same goes for
visibility maps and other helper files.

If you add a file in the directory which has a name like 34F3A222BC, that
would never get cleaned up because it contains non-digits.

So this leads to the following questions:

1.  Is locking pg_class enough to avoid race conditions?  Is exclusive mode
sufficient or do I need exclusive access mode?
2.  would it be preferable to move the file to a directory rather than
unlinking it?
3.  Should I perform any sort of check on the tables at the end to make
sure everything is ok?

-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] Orphaned files in base/[oid]

2017-08-15 Thread Chris Travers
On Tue, Aug 15, 2017 at 3:32 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Chris Travers <chris.trav...@adjust.com> writes:
> > I wonder about a different solution.  Would it be possible to special
> case
> > vacuum to check for and remove (or just move to where they can be
> removed)
> > files when vacuuming pg_class?  At the point we are vacuuming pg_class,
> we
> > ought to be able to know that a relfilenode shouldn't be used anymore,
> > right?
>
> I don't think so.  It's not clear to me whether you have in mind "scan
> pg_class, collect relfilenodes from all live tuples, then zap all files
> not in that set" or "when removing a dead tuple, zap the relfilenode
> it mentions".  But neither one works.  The first case has a race condition
> against new pg_class entries.  As for the second, the existence of a dead
> tuple bearing relfilenode N isn't evidence that some other live tuple
> can't have relfilenode N.
>

Ah because if the file never made it on to disk the number could be
re-used.

>
> Another problem for the second solution is that in the case you're worried
> about (ie, PANIC due to out-of-WAL-space during relation's creating
> transaction), there's no very good reason to expect that the relation's
> pg_class tuple ever made it to disk at all.
>
> A traditional low-tech answer to this has been to keep the WAL on a
> separate volume from the main data store, so that it's protected from
> out-of-space conditions in the main store and temp areas.  The space
> needs for WAL proper are generally much more predictable than the main
> store, so it's easier to keep the dedicated space from overflowing.
> (Stalled replication/archiving processes can be hazardous to your
> health in this scenario, though, if they prevent prompt recycling of
> WAL files.)
>

Yeah, most of our dbs here have wal on a separate volume but not this
system.  This system is also unusual in that disk usage varies wildly (and
I am not 100% sure that this is the only case which causes it though I can
reproduce it consistently in the case of the wal writer running out of disk
space with symptoms exactly what I found).

So for now that leaves my fallback approach as a way to fix it when I see
it.

I have written a shell script which does as follows:
1.  starts Postgres in single user mode with a data directory or dies
(won't run if Postgres seems to be already running)
2.  gets the old of the current database
3.  lists all files consisting of only digits in the  base/[dboid] directory
4. asks Postgres (In single user mode again) for all relfilenodes and oids
of tables (In my testing both were required because there were some cases
where relfilenodes were not set in some system
5.  Loops through the file nodes gathered, checks against the relfilenode
entries, and zaps $f, $f_*, and $f.*.  Currently for testing "zaps" has
been to move to a lostnfound folder for inspection following the script.
The logic here is not perfect and is very slightly under inclusive, but
better that than the other way.

Then we can start Postgres again.  I cannot find a better way to avoid race
conditions, I guess. At any rate it sounds like preventing the problem more
generally may be something beyond what I would feel comfortable trying to
do as a patch at my current level of familiarity with he source code.

The full script is included inline below my signature in case it is of
interest to anyone on the list.


> regards, tom lane
>



-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com

Saarbrücker Straße 37a, 10405 Berlin

---

#!/bin/bash

datadir=$1
database=$2

pg_ctl -D $datadir stop

dboidfile="$PWD/cleanupdb.oid"
reloidfile="$PWD/refilenodes.list"

echo "COPY (select oid from pg_database where datname = current_database())
TO '$dboidfile'" | postgres --single -D $datadir $database > /dev/null


if (($?))
then
   echo "FATAL: Could not start Postgres in single user mode"
   exit 1
fi

dboid=`cat $dboidfile`
filenodes=`(cd test/base/$dboid; ls [0-9]*[0-9] | grep -v '\.' | sort -n)`
#echo $filenodes

echo "COPY (select relfilenode from pg_class union select oid as
relfilenode from pg_class) TO '$reloidfile'" | postgres --single -D
$datadir $database > /dev/null
relfilenodes=`cat $reloidfile`
#echo $relfilenodes
if [[ -z relfilenodes ]]
then
   echo "FATAL: did not get any relfilenodes"
   exit 2
fi

mkdir lostnfound;
for f in $filenodes
do
  if [[ -z `echo $relfilenodes | grep -w $f` ]]
  then
  echo moving $f to lostnfound
  mv $datadir/base/$dboid/$f lostnfound
  mv $datadir/base/$dboid/${f}_* lostnfound 2> /dev/null
  mv $datadir/base/$dboid/${f}.* lostnfound 2> /dev/null
  fi
done
rm $dboidfile
rm $reloidfile


Re: [HACKERS] Orphaned files in base/[oid]

2017-08-15 Thread Chris Travers
There's another side to this and that I am not sure it is a backend crash.

Here is what I did to reproduce:

2 virtual disk images:  100mb for main data, 40 MB for WAL.  work_mem set
to 256MB. The idea is to test different out of space conditions.

Create table as ...; drop table; select
pg_size_pretty(pg_current_xlog_location() - '0/0');

I played around with parameters to determine how different kinds of out of
space errors were handled.

1.  running out of temp space was cleaned up without a server restart
needed.
2.  A relation running out of disk space *seemed* to get cleaned up.
3.  Running out of WAL space left *both* temp and non-temp files.

I wonder about a different solution.  Would it be possible to special case
vacuum to check for and remove (or just move to where they can be removed)
files when vacuuming pg_class?  At the point we are vacuuming pg_class, we
ought to be able to know that a relfilenode shouldn't be used anymore,
right?

-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] Orphaned files in base/[oid]

2017-08-14 Thread Chris Travers
On Mon, Aug 14, 2017 at 8:40 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

>
>
> It would be possible to have orphaned non-temp tables if you'd suffered
> a crash during the transaction that created those tables.  Ordinarily
> a newly-created table file wouldn't be that large, but if your workflow
> created tables and shoved boatloads of data into them in the same
> transaction, it's not so hard to see this becoming an issue.
>

I think the working theory is that these were very like a number of very
large (multi-hundred-GB materialised views).

>
> The core problem with zapping non-temp table files is that you can't
> do that unless you're sure you have consistent, up-to-date pg_class
> data that nobody else is busy adding to.  It's hard to see an external
> application being able to do that safely.  You certainly can't do it
> at the point in the postmaster startup cycle where we currently do
> the other things --- for those, we rely only on filesystem naming
> conventions to identify what to zap.


Yeah that occurred to me. At this point I would settle for something I
could run with Postgres in single user mode.  Although that is very far
from ideal.  So what I wonder is if at least a short-term solution might be
a utility that starts Postgres in single user mode and we insist that
PostgreSQL is otherwise not running before the run.

I am certainly not feeling qualified at present for more advanced solutions
but that I might be able to do.

>
>         regards, tom lane
>



-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] Orphaned files in base/[oid]

2017-08-14 Thread Chris Travers
On Mon, Aug 14, 2017 at 6:33 PM, Andres Freund <and...@anarazel.de> wrote:

> Hi,
>
> On 2017-08-14 14:12:22 +0200, Chris Travers wrote:
> > Problem:
> > The system this came up on is PostgreSQL 9.6.3 and has had repeated
> trouble
> > with disk space.  Querying pg_database_size, as well as du on the
> > subdirectory of base/ show total usage to be around 3.8TB.  Summing up
> the
> > size of the relations in pg_class though shows around 2.1TB.
> >
> > Initial troubleshooting found around 150 GB of space in pg_temp which had
> > never been cleared and was at least several days old.  Restarting the
> > server cleared these up.
> >
> > Poking around the base/[oid] directory, I found a large number of files
> > which did not correspond with a pg_class entry.  One of the apparent
> > relations was nearly 1TB in size.
> >
> > What I think happened:
> > I think various pg_temp/* and orphaned relation files (In base/[oid])
> were
> > created when PostgreSQL crashed due to running out of space in various
> > operations including creating materialised views.
> >
> > So my question is if there is a way we can safely clean these up on
> server
> > restart?  If not does it make sense to try to create a utility that can
> > connect to PostgreSQL, seek out valid files, and delete the rest?
>
> I think the fix here is to call RemovePgTempFiles() during
> crash-restarts, instead of just full starts. The previously stated need
> to be able to inspect temp files after a crash can be less impactfully
> fulfilled with restart_after_crash = false.
>
> But that only clears temp files right?

I am less concerned about the temp files because a restart clears them.

The bigger issue I see are with the orphaned base files.  It looks like
files in base/[oid] don't get cleaned up either if I read my output
correctly and it would explain why we saw 1.7TB of discrepancy between
relations and database size.  Safety-wise it seems like the best way out of
that is a dump/restore but doing that with a 2.1TB database is annoying.


> Greetings,
>
> Andres Freund
>



-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] Orphaned files in base/[oid]

2017-08-14 Thread Chris Travers
On Aug 14, 2017 14:12, "Chris Travers" <chris.trav...@adjust.com> wrote:

Hi all;

I am trying to track down a problem we are seeing that looks very similar
to bug #12050, and would certainly consider trying to contribute a fix if
we agree on one.  (I am not sure we can, so absent that, the next question
is whether it makes sense to create a utility to fix the problem when it
comes up so that a dump/restore is not needed).

The system:
PostgreSQL 9.6.3
Gentoo Linux.

Problem:
The system this came up on is PostgreSQL 9.6.3 and has had repeated trouble
with disk space.  Querying pg_database_size, as well as du on the
subdirectory of base/ show total usage to be around 3.8TB.  Summing up the
size of the relations in pg_class though shows around 2.1TB.

Initial troubleshooting found around 150 GB of space in pg_temp which had
never been cleared and was at least several days old.  Restarting the
server cleared these up.

Poking around the base/[oid] directory, I found a large number of files
which did not correspond with a pg_class entry.  One of the apparent
relations was nearly 1TB in size.

What I think happened:
I think various pg_temp/* and orphaned relation files (In base/[oid]) were
created when PostgreSQL crashed due to running out of space in various
operations including creating materialised views.

So my question is if there is a way we can safely clean these up on server
restart?  If not does it make sense to try to create a utility that can
connect to PostgreSQL, seek out valid files, and delete the rest?


Ok I  have identified one case where symptoms I am seeing can be
reproduced.  I am currently working on a Mac so there may be quirks in my
repro.  However

When the WAL writer runs out of disk space no cleanup is done.

So I  will be looking at possible solutions next.


-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 <+49%20162%209037210> | Skype: einhverfr |
www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


[HACKERS] Orphaned files in base/[oid]

2017-08-14 Thread Chris Travers
Hi all;

I am trying to track down a problem we are seeing that looks very similar
to bug #12050, and would certainly consider trying to contribute a fix if
we agree on one.  (I am not sure we can, so absent that, the next question
is whether it makes sense to create a utility to fix the problem when it
comes up so that a dump/restore is not needed).

The system:
PostgreSQL 9.6.3
Gentoo Linux.

Problem:
The system this came up on is PostgreSQL 9.6.3 and has had repeated trouble
with disk space.  Querying pg_database_size, as well as du on the
subdirectory of base/ show total usage to be around 3.8TB.  Summing up the
size of the relations in pg_class though shows around 2.1TB.

Initial troubleshooting found around 150 GB of space in pg_temp which had
never been cleared and was at least several days old.  Restarting the
server cleared these up.

Poking around the base/[oid] directory, I found a large number of files
which did not correspond with a pg_class entry.  One of the apparent
relations was nearly 1TB in size.

What I think happened:
I think various pg_temp/* and orphaned relation files (In base/[oid]) were
created when PostgreSQL crashed due to running out of space in various
operations including creating materialised views.

So my question is if there is a way we can safely clean these up on server
restart?  If not does it make sense to try to create a utility that can
connect to PostgreSQL, seek out valid files, and delete the rest?

-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: [HACKERS] Funny WAL corruption issue

2017-08-11 Thread Chris Travers
On Fri, Aug 11, 2017 at 1:33 PM, Greg Stark <st...@mit.edu> wrote:

> On 10 August 2017 at 15:26, Chris Travers <chris.trav...@gmail.com> wrote:
> >
> >
> > The bitwise comparison is interesting.  Remember the error was:
> >
> > pg_xlogdump: FATAL:  error in WAL record at 1E39C/E1117FB8: unexpected
> > pageaddr 1E375/61118000 in log segment 0001E39C00E1, offset
> > 1146880
> ...
> > Since this didn't throw a checksum error (we have data checksums
> disabled but wal records ISTR have a separate CRC check), would this
> perhaps indicate that the checksum operated over incorrect data?
>
> No checksum error and this "unexpected pageaddr" doesn't necessarily
> mean data corruption. It could mean that when the database stopped logging
> it was reusing a wal file and the old wal stream had a record boundary
> on the same byte position. So the previous record checksum passed and
> the following record checksum passes but the record header is for a
> different wal stream position.
>
> I think you could actually hack xlogdump to ignore this condition and
> keep outputting and you'll see whether the records that follow appear
> to be old wal log data.  I haven't actually tried this though.
>

For better or worse, I get a different error at the same spot if I try this:

Doing so involved disabling the check in the backend wal reader.

pg_xlogdump: FATAL:  error in WAL record at 1E39C/E1117FB8: invalid
contrecord length 4509 at 1E39C/E1117FF8

If I hack it to ignore all errors on that record, no further records come
out though it does run over the same records.

This leads me to conclude there are no further valid records.


>
> --
> greg
>



-- 
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor
lock-in.
http://www.efficito.com/learn_more


Re: [HACKERS] Funny WAL corruption issue

2017-08-11 Thread Chris Travers
On Fri, Aug 11, 2017 at 1:33 PM, Greg Stark <st...@mit.edu> wrote:

> On 10 August 2017 at 15:26, Chris Travers <chris.trav...@gmail.com> wrote:
> >
> >
> > The bitwise comparison is interesting.  Remember the error was:
> >
> > pg_xlogdump: FATAL:  error in WAL record at 1E39C/E1117FB8: unexpected
> > pageaddr 1E375/61118000 in log segment 0001E39C00E1, offset
> > 1146880
> ...
> > Since this didn't throw a checksum error (we have data checksums
> disabled but wal records ISTR have a separate CRC check), would this
> perhaps indicate that the checksum operated over incorrect data?
>
> No checksum error and this "unexpected pageaddr" doesn't necessarily
> mean data corruption. It could mean that when the database stopped logging
> it was reusing a wal file and the old wal stream had a record boundary
> on the same byte position. So the previous record checksum passed and
> the following record checksum passes but the record header is for a
> different wal stream position.
>

I expect to test this theory shortly.

Assuming it is correct, what can we do to prevent restarts of slaves from
running into it?


> I think you could actually hack xlogdump to ignore this condition and
> keep outputting and you'll see whether the records that follow appear
> to be old wal log data.  I haven't actually tried this though.
>
> --
> greg
>



-- 
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor
lock-in.
http://www.efficito.com/learn_more


Re: [HACKERS] Funny WAL corruption issue

2017-08-10 Thread Chris Travers
On Thu, Aug 10, 2017 at 3:17 PM, Vladimir Rusinov <vrusi...@google.com>
wrote:

>
>
> On Thu, Aug 10, 2017 at 1:48 PM, Aleksander Alekseev <
> a.aleks...@postgrespro.ru> wrote:
>
>> I just wanted to point out that a hardware issue or third party software
>> issues (bugs in FS, software RAID, ...) could not be fully excluded from
>> the list of suspects. According to the talk by Christophe Pettus [1]
>> it's not that uncommon as most people think.
>
>
> This still might be the case of hardware corruption, but it does not look
> like one.
>

Yeah, I don't think so either.  The systems were not restarted, only the
service so I don't think this is a lie-on-write case.  We have EEC with
full checks, etc.  It really looks like something I initiated caused it but
not sure what and really not interested in trying to reproduce on a db of
this size.

>
> Likelihood of two different persons seeing similar error message just a
> year apart is low. From our practice hardware corruption usually looks like
> a random single bit flip (most common - bad cpu or memory), bunch of zeroes
> (bad storage), or bunch of complete garbage (usually indicates in-memory
> pointer corruption).
>
> Chris, if you still have original WAL segment from the master and it's
> corrupt copy from standby, can you do bit-by-bit comparison to see how they
> are different? Also, if you can please share some hardware details.
> Specifically, do you use ECC? If so, are there any ECC errors logged? Do
> you use physical disks/ssd or some form of storage virtualization?
>

Straight on bare metal, eec with no errors logged.  SSD for both data and
wal.

The bitwise comparison is interesting.  Remember the error was:

pg_xlogdump: FATAL:  error in WAL record at 1E39C/E1117FB8: unexpected
pageaddr 1E375/61118000 in log segment 0001E39C00E1, offset
1146880


Starting with the good segment:

Good wall segment, I think the record starts at 003b:


0117fb0     003b   

0117fc0 7f28 e111 e39c 0001 0940  cb88 db01

0117fd0 0200  067f  4000  2249 0195

0117fe0 0001  8001  b5c3  05ff 

0117ff0  0003   008c   

0118000 d093 0005 0001  8000 e111 e39c 0001

0118010 0084    7fb8 e111 e39c 0001

0118020 0910  ccac 2eba 2000 0056 067f 

0118030 4000  2249 0195 b5c4  08ff 0001

0118040 0002 0003 0004 0005 0006 0007 0008 0009

0118050 000a 000b 000c 000d 000e 000f 0010 0011

0118060 0012 0013 0014 0015 0016 0017 0018 0019

0118070 001a 001b 001c 001d 001e 001f 0020 0021


0117fb0     003b   

0117fc0 7f28 e111 e39c 0001 0940  cb88 db01

0117fd0 0200  067f  4000  2249 0195

0117fe0 0001  8001  b5c3  05ff 

0117ff0  0003   4079 ce05 1cce ecf9

0118000 d093 0005 0001  8000 6111 e375 0001

0118010 119d    cfd4 00cc ca00 0410

0118020 1800 7c00 5923 544b dc20 914c 7a5c afec

0118030 db45 0060 b700 1910 1800 7c00 791f 2ede

0118040 c573 a110 5a88 e1e6 ab48 0034 9c00 2210

0118050 1800 7c00 4415 400d 2c7e b5e3 7c88 bcef

0118060 4666 00db 9900 0a10 1800 7c00 7d1d b355
0118070 d432 8365 de99 4dba 87c7 00ed 6200 2210

I think the divergence is interesting here.  Up through 0117ff8, they are
identical.  Then the last half if the line differs.
The first half of the next is the same (but up through 011800a this time),
but the last 6 bytes differ (those six hold what appear to be the memory
address causing the problem), and we only have a few bits different in the
rest of the line.

It looks like some data and some flags were overwritten, perhaps while the
process exited.  Very interesting.


> Also, in absolute majority of cases corruption is caught by checksums. I
> am not familiar with WAL protocol - do we have enough checksums when
> writing it out and on the wire? I suspect there are much more things
> PostgreSQL can do to be more resilient, and at least detect corruptions
> earlier.
>

Since this didn't throw a checksum error (we have data checksums disabled
but wal records ISTR have a separate CRC check), would this perhaps
indicate that the checksum operated over incorrect data?

>
> --
> Vladimir Rusinov
> PostgreSQL SRE, Google Ireland
>
> Google Ireland Ltd.,Gordon House, Barrow Street, Dublin 4, Ireland
> Registered in Dublin, Ireland
> Registration Number: 368047
>



-- 
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor
lock-in.
http://www.efficito.com/learn_more


Re: [HACKERS] Funny WAL corruption issue

2017-08-10 Thread Chris Travers
> Yes.  Exactly the same output until a certain point where pg_xlogdump dies
> with an error.  That is the same LSN where PostgreSQL died with an error on
> restart.
>

 One other thing that is possibly relevant after reading through the last
bug report is the error pgxlogdumo gives:

pg_xlogdump: FATAL:  error in WAL record at 1E39C/E1117FB8: unexpected
pageaddr 1E375/61118000 in log segment 0001E39C00E1, offset
1146880



> --
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor
lock-in.
http://www.efficito.com/learn_more


Re: [HACKERS] Funny WAL corruption issue

2017-08-10 Thread Chris Travers
Sorry, meant to reply all.

On Thu, Aug 10, 2017 at 2:19 PM, Vladimir Borodin <r...@simply.name> wrote:

> Hi, Chris.
>
> 10 авг. 2017 г., в 15:09, Chris Travers <chris.trav...@gmail.com>
> написал(а):
>
> Hi;
>
> I ran into a funny situation today regarding PostgreSQL replication and
> wal corruption and wanted to go over what I think happened and what I
> wonder about as a possible solution.
>
> Basic information is custom-build PostgreSQL 9.6.3 on Gentoo, on a ~5TB
> database with variable load.  Master database has two slaves and generates
> 10-20MB of WAL traffic a second.  The data_checksum option is off.
>
>
> The problem occurred when I attempted to restart the service on the slave
> using pg_ctl (I believe the service had been started with sys V init
> scripts).  On trying to restart, it gave me a nice "Invalid memory
> allocation request" error and promptly stopped.
>
> The main logs showed a lot of messages like before the restart:
> 2017-08-02 11:47:33 UTC LOG:  PID 19033 in cancel request did not match
> any process
> 2017-08-02 11:47:33 UTC LOG:  PID 19032 in cancel request did not match
> any process
> 2017-08-02 11:47:33 UTC LOG:  PID 19024 in cancel request did not match
> any process
> 2017-08-02 11:47:33 UTC LOG:  PID 19034 in cancel request did not match
> any process
>
> On restart, the following was logged to stderr:
> LOG:  entering standby mode
> LOG:  redo starts at 1E39C/8B77B458
> LOG:  consistent recovery state reached at 1E39C/E1117FF8
> FATAL:  invalid memory alloc request size 3456458752
> LOG:  startup process (PID 18167) exited with exit code 1
> LOG:  terminating any other active server processes
>
> LOG:  database system is shut down
>
> After some troubleshooting I found that the wal segment had become
> corrupt, I copied the correct one from the master and everything came up to
> present.
>
> So It seems like somewhere something crashed big time on the back-end and
> when we tried to restart, the wal ended in an invalid way.
>
>
> We have reported the same thing [1] nearly a year ago. Could you please
> check with pg_xlogdump that both WALs (normal from master and corrupted)
> are exactly the same until some certain LSN?
>
> [1] https://www.postgresql.org/message-id/20160614103415.
> 5796.6885%40wrigleys.postgresql.org
>

Yes.  Exactly the same output until a certain point where pg_xlogdump dies
with an error.  That is the same LSN where PostgreSQL died with an error on
restart.

>
>
> I am wondering what can be done to prevent these sorts of things from
> happening in the future if, for example, a replica dies in the middle of a
> wal fsync.
> --
> Best Wishes,
> Chris Travers
>
> Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor
> lock-in.
> http://www.efficito.com/learn_more
>
>
>
> --
> May the force be with you…
> https://simply.name
>
>


-- 
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor
lock-in.
http://www.efficito.com/learn_more


[HACKERS] Funny WAL corruption issue

2017-08-10 Thread Chris Travers
Hi;

I ran into a funny situation today regarding PostgreSQL replication and wal
corruption and wanted to go over what I think happened and what I wonder
about as a possible solution.

Basic information is custom-build PostgreSQL 9.6.3 on Gentoo, on a ~5TB
database with variable load.  Master database has two slaves and generates
10-20MB of WAL traffic a second.  The data_checksum option is off.


The problem occurred when I attempted to restart the service on the slave
using pg_ctl (I believe the service had been started with sys V init
scripts).  On trying to restart, it gave me a nice "Invalid memory
allocation request" error and promptly stopped.

The main logs showed a lot of messages like before the restart:

2017-08-02 11:47:33 UTC LOG:  PID 19033 in cancel request did not match any
process

2017-08-02 11:47:33 UTC LOG:  PID 19032 in cancel request did not match any
process

2017-08-02 11:47:33 UTC LOG:  PID 19024 in cancel request did not match any
process

2017-08-02 11:47:33 UTC LOG:  PID 19034 in cancel request did not match any
process


On restart, the following was logged to stderr:

LOG:  entering standby mode

LOG:  redo starts at 1E39C/8B77B458

LOG:  consistent recovery state reached at 1E39C/E1117FF8

FATAL:  invalid memory alloc request size 3456458752

LOG:  startup process (PID 18167) exited with exit code 1

LOG:  terminating any other active server processes

LOG:  database system is shut down

After some troubleshooting I found that the wal segment had become corrupt,
I copied the correct one from the master and everything came up to present.

So It seems like somewhere something crashed big time on the back-end and
when we tried to restart, the wal ended in an invalid way.

I am wondering what can be done to prevent these sorts of things from
happening in the future if, for example, a replica dies in the middle of a
wal fsync.
-- 
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor
lock-in.
http://www.efficito.com/learn_more


Re: [HACKERS] 9.3 Json Array's

2013-09-24 Thread Chris Travers


 On 24 September 2013 at 13:46 Andrew Dunstan and...@dunslane.net wrote:
 
 
 Feel free to ask questions.
 
 The heart of the API is the event handlers defined in this stuct in
 include/utils/jsonapi.h:
 
 typedef struct JsonSemAction
 {
  void   *semstate;
  json_struct_action object_start;
  json_struct_action object_end;
  json_struct_action array_start;
  json_struct_action array_end;
  json_ofield_action object_field_start;
  json_ofield_action object_field_end;
  json_aelem_action array_element_start;
  json_aelem_action array_element_end;
  json_scalar_action scalar;
 } JsonSemAction;
 
 
 Basically there is a handler for the start and end of each non-scalar
 structural element in JSON, plus a handler for scalars.
 
 There are several problems that will be posed by processing nested
 arrays and objects, including:
 
   * in effect you would need to construct a stack of state that could be
 pushed and popped

True.

   * JSON arrays aren't a very good match for SQL arrays - they are
 unidimensional and heterogenous.

This is true, but I think one would have to start with an assumption that the
data is valid for an SQL type and then check again once one gets it done.
   JSON is a pretty flexible format which makes it a poor match in many cases
for SQL types generally.  But I think the example so far (with
json_populate_recordset) is a good one, namely a best effort conversion.

 
 
 I'm not saying this can't be done - it will just take a bit of effort.

Yeah, looking through the code, I think it will be more work than I originally
thought but that just means it will take longer.
 
 cheers
 
 andrew
 
 
 
 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers
Best Wishes,
Chris Travers
http://www.2ndquadrant.com
PostgreSQL Services, Training, and Support

Re: [HACKERS] 9.3 Json Array's

2013-09-23 Thread Chris Travers



 On 23 September 2013 at 23:37 Adam Jelinek ajeli...@gmail.com wrote:
 
  I am sure I am doing something wrong here, or this is an unsupported feature,
 but I wanted to double check. I was hoping that if I did a json_agg(x) and
 then copied that output of that and passed it into a json_populate_recordset
 that I would get the record back.  I know I can make things work using a CTE
 and other functions like json_each, but I was hoping for a simple one liner.
 



Yeah, I had the same experience.  It is not supported.  I am looking at trying
to add support for nested objects and better support for arrays.  Interested in
collaborating?

 
 
  CREATE SCHEMA varrm;
 
  CREATE SEQUENCE varrm.item_id_seq;
 
  CREATE TABLE varrm.item
(item_idbigint DEFAULT nextval('varrm.item_id_seq')
,title  text
,short_desc text
,long_desc  text
,tags   text[]
 

^^^ That is what it chokes on.



,external_api_key   text
,trans_timestamptimestamp without time zone DEFAULT now()
,upsert_timestamp   timestamp without time zone DEFAULT clock_timestamp()
,end_timestamp  timestamp without time zone DEFAULT '-12-31
 23:59:59.99'::timestamp without time zone
,CONSTRAINT item_primary_keyPRIMARY KEY (item_id)
);
 
  INSERT INTO varrm.item (title, short_desc, long_desc, tags, external_api_key)
 values ('My Title', 'My Short Desc', 'My Very Long Desc', '{GAME, WII, PS4,
 ACTION, FIRST PERSON SHOOTER}', '1235467');
 
  SELECT json_agg(t1) FROM (SELECT title, short_desc, long_desc, tags,
 external_api_key FROM varrm.item) AS t1
  --output is
  --[{title:My Title,short_desc:My Short Desc,long_desc:My Very Long
 Desc,tags:[GAME,WII,PS4,ACTION,FIRST PERSON
 SHOOTER],external_api_key:null}]
 
  SELECT '[{title:My Title,short_desc:My Short Desc,long_desc:My
 Very Long Desc,tags:[GAME,WII,PS4,ACTION,FIRST PERSON
 SHOOTER],external_api_key:null}]'::JSON
 
  SELECT * FROM json_populate_recordset(null::varrm.item, '[{title:My
 Title,short_desc:My Short Desc,long_desc:My Very Long
 Desc,tags:[GAME,WII,PS4,ACTION,FIRST PERSON
 SHOOTER],external_api_key:null}]'::JSON)
  /**
  ERROR:  cannot call json_populate_recordset on a nested object
 


I am still in the process of wrapping my head around the current JSON logic.  I
hope to produce a proof of concept that can later be turned into a patch.  See
my previous post on this topic.  Again collaboration is welcome.


Best Wishes,
Chris Travers
http://www.2ndquadrant.com
PostgreSQL Services, Training, and Support

Re: [HACKERS] Minmax indexes

2013-09-16 Thread Chris Travers


 On 16 September 2013 at 11:03 Heikki Linnakangas hlinnakan...@vmware.com
 wrote:


 Something like this seems completely sensible to me:

 create index i_accounts on accounts using minmax (ts) where valid = true;

 The situation where that would be useful is if 'valid' accounts are
 fairly well clustered, but invalid ones are scattered all over the
 table. The minimum and maximum stoed in the index would only concern
 valid accounts.

Here's one that occurs to me:

CREATE INDEX i_billing_id_mm ON billing(id) WHERE paid_in_full IS NOT TRUE;

Note that this would be a frequently moving target and over years of billing,
the subset would be quite small compared to the full system (imagine, say, 50k
rows out of 20M).

Best Wises,
Chris Travers

 - Heikki


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers
Best Wishes,
Chris Travers
http://www.2ndquadrant.com
PostgreSQL Services, Training, and Support

Re: [HACKERS] Proposal: json_populate_record and nested json objects

2013-09-16 Thread Chris Travers


 On 16 September 2013 at 14:43 Merlin Moncure mmonc...@gmail.com wrote:


 Huge +1 on on this. Couple random thoughts:

 *) Hard to see how you would structure this as an extension as you're
 adjusting the behaviors of existing functions, unless you wanted to
 introduce new function names for testing purposes?

Yeah, and reading the source, it looks like some parts of the JSON parsing code
will have to be rewritten because the nested object errors are thrown quite
deeply in the parsing stage.  It looks to me as if this will require some
significant copying as a POC into a new file with different publicly exposed
function names.

 *) Would like to at least consider being able to use casting syntax as
 a replacement for populate_record and (the missing) populate_array
 for most usages.

Yes.  I am trying to figure out how best to do this at present.  Initially I
think I would be happy to settle for casts wrapping functions which themselves
just wrap the call to populate_record.

What I will probably do for my POC is expose the following methods:

1.  json_populate_type()
2.  json_populate_array()

Then we can talk about whether to merge the changes into the core,

This will be an interesting way to get into PostgreSQL hacking.

Best Wishes,
Chris Travers


 merlin


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers
Best Wishes,
Chris Travers
http://www.2ndquadrant.com
PostgreSQL Services, Training, and Support

Re: [HACKERS] Proposal: json_populate_record and nested json objects

2013-09-15 Thread Chris Travers


 On 15 September 2013 at 18:42 Andrew Dunstan and...@dunslane.net wrote:



 On 09/14/2013 10:27 PM, chris travers wrote:

 Well, you could fairly easily build it as an extension as a POC. The
 main point of the API this is built on was to allow for extensions.

 The logic changes might be a bit tricky. I'll be interested to see what
 you come up with.

 If we're going to do this we should make these handle arrays as well as
 objects.

Yes, arrays are necessary for my use case.  I probably should have been explicit
that I needed that too.

Best Wishes,
Chris Travers

 cheers

 andrew


Best Wishes,
Chris Travers
http://www.2ndquadrant.com
PostgreSQL Services, Training, and Support

[HACKERS] Proposal: json_populate_record and nested json objects

2013-09-14 Thread chris travers
Hi all;

Currently json_populate_record and json_populate_recordset cannot work with
nested json objects.  This creates two fundamental problems when trying to use
JSON as an interface format.

The first problem is you can't easily embed a json data type in an json object
and have it populate a record.  This means that storing extended attributes in
the database is somewhat problematic if you accept the whole row in as a json
object.

The second problem is that nested data structures and json don't go together
well.  You can't have  a composite type which has as an attribute an array of
another composite type and populate this from a json object.  This makes json
largely an alternative to hstore for interfaces in its current shape.

I would propose handling the json_populate_record and friends as such:

1. Don't throw errors initially as a pre-check if the json object is nested.
2. If one comes to a nested fragment, check the attribute type it is going into
first.
2.1 If it is a json type, put the nested fragment there.
2.2 If it is a composite type (i.e. anything in pg_class), push it through
another json_populate_record run
2.3 If it is neither, then see if a json::[type] cast exists, if so call it.
2.4 Otherwise raise an exception

I have a few questions before I go on to look at creating a patch.

1.  Are there any problems anyone spots with this approach?

2.  Is anyone working on something like this?

3.  Would it be preferable to build something like this first as an extension
(perhaps with different function names) or first as a patch?

Best Wishes,
Chris Travers
http://www.2ndquadrant.com
PostgreSQL Services, Training, and Support

Re: [HACKERS] Proposal: PL/PgSQL strict_mode

2013-09-13 Thread chris travers
A few thoughts about this.

 On 14 September 2013 at 05:28 Marko Tiikkaja ma...@joh.to wrote:


 Hi,

 After my previous suggestion for adding a STRICT keyword got shot
 down[1], I've been thinking about an idea Andrew Gierth tossed out:
 adding a new strict mode into PL/PgSQL. In this mode, any query which
 processes more than one row will raise an exception. This is a bit
 similar to specifying INTO STRICT .. for every statement, except
 processing no rows does not trigger the exception. The need for this
 mode comes from a few observations I make almost every day:
 1) The majority of statements only deal with exactly 0 or 1 rows.
 2) Checking row_count for a statement is ugly and cumbersome, so
 often it just isn't checked. I often use RETURNING TRUE INTO STRICT _OK
 for DML, but that a) requires an extra variable, and b) isn't possible
 if 0 rows affected is not an error in the application logic.
 3) SELECT .. INTO only fetches one row and ignores the rest. Even
 row_count is always set to 0 or 1, so there's no way to fetch a value
 *and* to check that there would not have been more rows. This creates
 bugs which make your queries return wrong results and which could go
 undetected for a long time.

I am going to suggest that STRICT is semantically pretty far from what is meant
here in common speech.  I think STRICT here would be confusing.  This would be
really pretty severe for people coming from Perl or MySQL backgrounds.

May I suggest SINGLE as a key word instead?  It might be worth having attached
to a INSERT, UPDATE, and DELETE statements.

I am thinking something like:

DELETE SINGLE FROM foo WHERE f1  1000;

would be more clearer.  Similarly one could have:

INSERT SINGLE INTO foo SELECT * from foo2;

and

UPDATE SINGLE foo

You could even use SELECT SINGLE but not sure where the use case is there where
unique indexes are not sufficient.



 Attached is a proof-of-concept patch (no docs, probably some actual code
 problems too) to implement this as a compile option:

 =# create or replace function footest() returns void as $$
 $# #strict_mode strict
 $# begin
 $# -- not allowed to delete more than one row
 $# delete from foo where f1  100;
 $# end$$ language plpgsql;
 CREATE FUNCTION
 =# select footest();
 ERROR: query processed more than one row
 CONTEXT: PL/pgSQL function footest() line 5 at SQL statement

 Now while I think this is a step into the right direction, I do have a
 couple of problems with this patch:
 1) I'm not sure what would be the correct behaviour with EXECUTE.
 I'm tempted to just leave EXECUTE alone, as it has slightly different
 rules anyway.
 2) If you're running in strict mode and you want to
 insert/update/delete more than one row, things get a bit uglier; a wCTE
 would work for some cases. If strict mode doesn't affect EXECUTE (see
 point 1 above), that could work too. Or maybe there could be a new
 command which runs a query, discards the results and ignores the number
 of rows processed.

Yeah, I am worried about this one.  I am concerned that if you can't disable on
a statement by statement basis, then you have a problem where you end up
removing the mode from the function and then it becomes a lot harder for
everyone maintaining the function to have a clear picture of what is going on.
 I am further worried that hacked ugly code ways around it will introduce plenty
of other maintenance pain points that will be worse than what you are solving.

 I'll be adding this to the open commitfest in hopes of getting some
 feedback on this idea (I'm prepared to hear a lot of you're crazy!),
 but feel free to comment away any time you please.

Well, I don't know if my feedback above is helpful, but there it is ;-)


 Regards,
 Marko Tiikkaja

 [1]: http://www.postgresql.org/message-id/510bf731.5020...@gmx.net

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers
Best Wishes,
Chris Travers
http://www.2ndquadrant.com
PostgreSQL Services, Training, and Support

Re: [HACKERS] [GENERAL] Creating temp tables inside read only transactions

2011-07-11 Thread Chris Travers
On Mon, Jul 11, 2011 at 12:49 PM, David Johnston pol...@yahoo.com wrote:

 I do not see how recursive queries (really iteration of records) even enters
 the picture...

I agree, FWIW.  If the feature was that desirable, we could look at
questions of implementation to make recursion either unnecessary or at
least well managed.

 Right now I can emulate a hierarchical schema structure via a naming scheme
 - for example  schemabase_sub1_sub2_etc.  I am simply looking for a formal
 way to do the above AND also tell the system that I want all schemas under
 schemabase to be in the search path.  Heck, I guess just allowing for
 simply pattern matching in search_path would be useful in this case
 regardless of the presence of an actual schema hierarchy.  Using LIKE
 syntax say: SET search_path TO schemabase_sub1_% or something similar.
 The only missing ability becomes a way for graphical tools to represent the
 schema hierarchy using a tree-structure with multiple depths.

Right.  Semantically myapp_schemaname_subschemaname is no less
hierarchical than myapp.schemaname.subschemaname.  The larger issue is
that of potential ambiguity wrt cross-database references (I don't
have a lot of experience reading the SQL standards, but seeing how
different db's implement cross-db references suggests that the
standards contemplate semantic meaning to depth of the namespace).


 I can see how adding . and .. and relative paths would confuse the issue
 those are not necessary features of a multi-level schema depth.

 The above, combined with a different separator for intra-level
 namespace/schema delineation, would allow for an unambiguous way to define
 and use a hierarchical schema with seemingly minimal invasion into the
 current way of doing things. You could almost implement it just by requiring
 a specific character to act as the separator and then construct the actual
 schema using single-level literals and supporting functions that can convert
 them into an hierarchy.  In other words, the schema table would still only
 contain one field with the full parent!child as opposed to (schema,
 parent) with (VALUES('parent',null),('child','parent')).

 In other words, if we use ! as the separator, any schema named
 parent!child  could be stored and referenced as such but then if you run a
 getChildren(parent) function it would return child along with any other
 schemas of the form parent!%.  In this case the % sign could maybe only
 match everything except ! and the * symbol could be used to match ! as
 well.

Agreed that this would be helpful.  I would personally have a lot of
use for this sort of feature, particularly with managing large numbers
of stored procedures.  Right now I am using a double underscore which
is error-prone.

Best Wishes,
Chris Travers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Question about the TODO, numerics, and division

2007-03-20 Thread Chris Travers

Hi all;

I have been looking at the TODO and have found something that I find 
sort of odd and we should probably reconsider:


One of the items under data types is:

   * Add NUMERIC division operator that doesn't round?

 Currently NUMERIC _rounds_ the result to the specified precision.
 This means division can return a result that multiplied by the
 divisor is greater than the dividend, e.g. this returns a value  10:

SELECT (10::numeric(2,0) / 6::numeric(2,0))::numeric(2,0) * 6;

This does not seem to me to be an division op issue but rather a simple 
casting mistake.  Note that the result of 10/6 is cast as numeric(2,0) 
and then multiplied by 6.


The following example shows that the problem is with the query and 
casting, not with the division op:

SELECT ((10::numeric(2,0) / 6::numeric(2,0)) * 6)::numeric(2,0);
numeric
-
 10
(1 row)


Am I missing something?

Best Wishes,
Chris Travers


begin:vcard
fn:Chris Travers
n:Travers;Chris
email;internet:[EMAIL PROTECTED]
tel;work:509-888-0220
tel;cell:509-630-7794
x-mozilla-html:FALSE
version:2.1
end:vcard


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [GENERAL] [HACKERS] 'a' == 'a '

2005-10-20 Thread Chris Travers

Dann Corbit wrote:


Let me make something clear:
When we are talking about padding here it is only in the context of a
comparison operator and NOT having anything to do with storage.
 

IIrc, varchar and bpchar are stored in a similar way, but are presented 
differently when retrieved.  I.e. storage is separate from presentation 
in this case.  I.e. the padding in bpchar occurs when it is presented 
and stripped when it is stored.


Again, I am happy solving this simply by documenting it since any 
questions of interpretation and implimentation of the standard would be 
answered.  So far what I (and I am sure others) have not heard is a 
strong case for changing the behavior, given that it is in line with a 
reasonable interpretation of the standards.



Given two strings of different in a comparison, most database systems
(by default) will blank pad the shorter string so that they are the same
length before performing the comparison.
 

Understood, but what gain do you have in a case like this that might 
justify the effort that would go into making it, say, an initdb option?  
How often does this behavior cause problems?


Best Wishes,
Chris Travers
Metatron Technology Consulting

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] [HACKERS] 'a' == 'a '

2005-10-19 Thread Chris Travers

Josh Berkus wrote:


Dann,

 


I think that whatever is done ought to be whatever the standard says.
If I misinterpret the standard and PostgreSQL is doing it right, then
that is fine.  It is just that PostgreSQL is very counter-intuitive
compared to other database systems that I have used in this one
particular area.  When I read the standard, it looked to me like
PostgreSQL was not performing correctly.  It is not unlikely that I read
it wrong.
   



AFAIT, the standard says implementation-specific.   So we're standard.

The main cost for comparing trimmed values is performance; factoring an 
rtrim into every comparison will add significant overhead to the already 
CPU-locked process of, for example, creating indexes.  We're looking for 
ways to make the comparison operators lighter-weight, not heavier.
 

If I understand the spec correctly, it seems to indicate that this is 
specific to the locale/character set.  Assuming that the standard 
doesn't have anything to do with any character sets, it should be 
possible to make this available for those who want it as an initdb 
option.  Whether or not this is important enough to offer or not is 
another matter.


Personally my questions are:

1)  How many people have been bitten by this badly?
2)  How many people have been bitten by joins that depend on padding?

Personally, unlike case folding, this seems to be an area where a bit of 
documentation (i.e. all collation sets have are assumed to have the NO 
PAD option in the SQL standard) would be sufficient to answer to 
questions of standards-compliance.


My general perspective on this is that if trailing blanks are a significant 
hazard for your application, then trim them on data input.  That requires 
a *lot* less peformance overhead than doing it every time you compare 
something.  
 

In general I agree.  But I am not willing to jump to the conclusion that 
it will never be warranted to add this as an initdb option.  I am more 
interested in what cases people see where this would be required.  But I 
agree that the bar is much higher than it is in many other cases.


Best Wishes,
Chris Travers
Metatron Technology Consulting

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [pgsql-advocacy] Spikewatch testing

2005-08-26 Thread Chris Travers

Josh Berkus wrote:




Is it worth trying to promote this as a way to promote PostgreSQL? Also,
is it worth trying to improve our test coverage?
   



Actually, they'll be running a contest (with prizes up to $2500) for 
improved test coverage for OSS applications.   I've been trying to get 
someone to commit to helping me on the contest, so that PostgreSQL can 
participate.


 

What is involved in this?  Maybe if you give specifics one of us can 
commit to helping :-)


Best Wishes,
Chris Travers
Metatron Technology Consulting

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [pgsql-advocacy] [HACKERS] Increased company involvement

2005-05-02 Thread Chris Travers
Andrew Dunstan wrote:
I've deliberately let the dust settle slightly on this.
One thing that might help is a more open sponsorship clearing house. 
Example (not meant as a bid, but just to illustrate): the JDBC driver 
needs a scanner overhaul - it breaks on dollar quoting and a bunch of 
other stuff. I could do that work (as could others, of course) but I 
don't have time, unless someone buys some of my professional time. 
Someone might want to do just that, but how would they find me?

Regarding the secret code stuff - I predict that it will quickly bite 
whoever does it, unless they are extremely lucky.

I like this idea.
There is another issue too.  In general, there is a feeling like one 
needs to produce something that works and not wait for the slower 
movement of the community's approval.  I don't think several open source 
forks for the project necessarily produce problems if most of these are 
used as experimental projects.  The example which comes to mind is 
Samba-TNG.   So some of this concern may be overblown.

This is also the way things work with the SQL Standard:  The various 
vendors (PostgreSQL included) go out and start with the base, extend 
that feature set, and eventually come back together to try to build the 
next standard based on everyone's experience.  This embrace and extend 
process is indeed critical for the further development of the standard.

At the same time, I agree with Bruce's main point-- that the lack of 
communication is a threat to this progress.  So at least some note of  
Best practices regarding these extensions or contributions would be a 
help.  Adding a clearing house to this would also add a critical tool 
and would also have the side effect of increasing the pace of 
development.  Maybe have it divided into two sections:  Bids and Bounties.

Best Wishes,
Chris Travers
Metatron Technology Consulting
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] [GENERAL] CREATE DATABASE on the heap with PostgreSQL?

2004-06-07 Thread Chris Travers
Albretch wrote:
Gaetano Mendola [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED]...
 

If you access a table more frequently then other and you have enough
RAM your OS will mantain that table on RAM, don't you think ?
BTW if you trust on your UPS I'm sure you are able to create a RAM
disk and place that table in RAM.
Regards
Gaetano Mendola
   

RAMdisks still need a hard disk drive to operate. I am talking here
about entirely diskless configurations.
 

I asked this question not long after 7.4 debuted.  In general the basic 
answer I got was:

1)  Especially with 7.5 and the ARC, small tables which can be stored 
entirely in RAM and are frequently used will end up being fully cached 
there anyway.  Presumably, complex updates would still cause I/O 
bottlenecks, but read performance should not be any different than for a 
RAM-based table.

2)  Given the upcoming release of ARC, there is no real reason to 
consider having a table reside only in memory (doing so may impact the 
performance of other tables in the database as well).

3)  HEAP tables are not planned.  PostgreSQL is focused on data 
integrity and reliability, and this is a can of worms regarding these 
topics which is best left untouched.

Best Wishes,
Chris Travers
Metatron Technology Consulting
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] [pgsql-advocacy] What can we learn from MySQL?

2004-04-27 Thread Chris Travers
Jim C. Nasby wrote:
Maybe also a more generic section about how PGSQL is different from
other databases. Maybe I'm just dense, but it took me a long time to
figure out the whole lack of stored procedures thing (yes, PGSQL
obviously has the functionality, but many experienced DBAs won't
associate functions with stored procs). Pointing out the documentation
on MVCC and how it changes how you want to use the database would be
good, as would links to documentation on what postgresql.conf settings
you want to change out of the box.
 

I think this is a good idea.  And you seem to be suggesting that it 
includes information on differences in nomenclature as well.

On the other topics...
I think the biggest service PGSQL could provide to the open source
community is a resource that teaches people with no database experience
the fundamentals of databases. If people had an understanding of what a
RDBMS should be capable of and how it should be used, they wouldn't pick
MySQL.
 

I think that this is incredibly important.  Many many developers choose 
MySQL because MySQL really does make the effort in this regard.  This 
strategy has helped both MySQL and Red Hat become the commercial 
successes they are today.

Having a windows port is critical for 'student mindshare'. If PGSQL can't
play on windows, professors can't use it. Likewise, installation on OS X
should be made as easy as possible.
 

PostgreSQL *can* play on Windows (via Cygwin) and I am not sure that 
this is so important to student mindshare.  Howener, it is important for 
another reason: a windows port (even one labled for development use 
only) would go a LONG way towards recruiting new faces into our 
community, as it would lower the barrier to entry for using the database 
(yes, the Cygwin installer because of the ipc stuff is a reasonable 
barrier to entry).

Best Wishes,
Chris Travers
Metatron Technology Consulting
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [pgsql-advocacy] [HACKERS] What can we learn from MySQL?

2004-04-27 Thread Chris Travers
Alexey Borzov wrote:
Hi!
Tim Conrad wrote:
My favourite part of it is:

MySQL uses traditional row-level locking. PostgreSQL uses something 
called Multi Version Concurrency Control (MVCC) by default. MVCC is 
a little different from row-level locking in that transactions on 
the database are performed on a snapshot of the data and then 
serialized. New versions of PostgreSQL support standard row-level 
locking as an option, but MVCC is the preferred method.


Nice that you point out that incorrectly stated something. Even
nicer that you don't tell me what the correct answer would be.
Unfortunanatly, that's the best I could come up with with doing
research with the documentation I could find on the subject. MVCC
does a  lot more than can be easily contained in a sentance. 

The problem is that in MySQL
1) MyISAM does table-level locking;
2) BDB does row-level locking;
3) InnoDB does MVCC (mostly) like PostgreSQL.
PostgreSQL does support row-level locking (SELECT ... FOR UPDATE), 
table-level locking (LOCK TABLE ...), though this does not *replace* 
MVCC, as one may understand from the quotation.

MySQL's roadmap is complete bullshit. Subselects were first promised 
in 4.0, which was not that far away [1] back in 1998! Well, they 
are in 4.1, which is still alpha in 2004.

I realize this.  I also realize that having a nicely defined roadmap 
would
give Postgres a hands up in this category. 

A hands up in *what* category? In bragging?
Should PostgreSQL developers write something along the lines of 
PostgreSQL 9i (available Really Soon Now) will also be able to make 
coffee?

Well, as you know about coffee now, why don't you add make coffee to 
your comparison table, with empty space in MySQL's and commercial 
DBMSs' columns and in 9i in PostgreSQL's one?

Maybe.  Just for jest-- If you read the Linux Coffee how-to, write a C 
module, get the right hardware, etc. Yes, PostgreSQL can make coffee!  
Of course, this would occur outside any sort of transactional control...

Seriously, though...  I think that it would be helpful to have a list of 
features which are under active development  (not just the ToDo list 
which are features which we want to develop).  We could also have 
contact info for leads (or maybe a contact via a web form, etc.) as well 
as status for that feature.  As the lead in a project whose roadmap has 
changed many times due to paid contracts, I don't really see the value 
of published roadmaps in general.

Best Wishes,
Chris Travers
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [pgsql-advocacy] [HACKERS] What can we learn from MySQL?

2004-04-26 Thread Chris Travers
Bruno Wolff III wrote:
On Fri, Apr 23, 2004 at 16:36:57 -0400,
 [EMAIL PROTECTED] wrote:
 

Ease of use is VERY important, but few suggestions that address this are
ever really accepted. Yes, focusing on the functionality is the primary
concern, but how you set it up and deploy it is VERY important. You guys
need to remember, people are coming from a world where MySQL, Oracle, and
MSSQL all have nice setup programs.
   

nice must be in the eye of the beholder. I have used Oracle's installer
to install a client and was not amused by it need hundreds of megabtyes
to do a client install.
 

I second that.  I have not found *anybody* who has used Oracle's 
installer to install the actual database server on Linux or Solaris who 
has described their installation proceedure as either nice or easy.  
In fact even reading the installation isntructions is enough to give you 
second thoughts  MS SQL does have a nice installer, however, as do 
most binary open source products for Windows.  I am completely confident 
that PostgreSQL for Windows, when it arrives, will have a nice GUI-based 
installer.

Best Wishes,
Chris Travers
Metatron Technology Consulting
begin:vcard
fn:Chris Travers
n:Travers;Chris
email;internet:[EMAIL PROTECTED]
x-mozilla-html:FALSE
version:2.1
end:vcard


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] [GENERAL] Should we consider empty fields as NULL values when

2003-12-11 Thread Chris Travers

On Thu, 2003-12-11 at 22:36, Nagib Abi Fadel wrote:
 HI,
 let's say we have the following table :
  
 # CREATE TABLE tempo (col1 varchar(3) not null);
 CREATE TABLE
 
 # insert INTO tempo VALUES ('');
 INSERT 11420541 1
  
 the insert command works. 
  
 The issue is that since the column col1 is defined as character with
 not null attribute,
 shouldn't we deny such inserts (i mean inserting empty fields)???
 (PS: i am using postresql 7.3.2)

NULL has a special meaning and CAST(NULL, string) != ''

Remember that NULL is a special value and does not equate to any other
value.  For example you may KNOW that the value of a string is '', but
if you don't know what the value is, then NULL is the value which
represents that unknown.

For this reason, 
NULL || 'Mystring' IS NULL
'' || 'MyString' = 'MyString'

In the first case, we are appending 'Mystring' to an unknown string
(hence the result is unknown), and in the second, we append 'MyString'
to an empty string.  Hence the value is the same.

  
 When using script languages (like PHP) if by mistake the variable is
 not defined such insert is possible (let's say we have a variable
 $col1_value and after a long day of work we make a mistake and write
 it $col_value).

The only way to handle this is to write your own routines to check the
values and substitute as appropriate.  That is not the answer you were
looking for, but...

For example (PHP):

function db_quote($db_var){
if ($db_var === NULL){
return 'NULL';
} else {
return '$db_var';
}
}

This will enclose your variable with single-quotes unless it is not set
in which case it will return a string, NULL which can be used in your
database queries.

Best Wishes,
Chris Travers

  
 This problem is solved by adding the constraint:
  ALTER TABLE tempo add constraint col1_check check(col1!='');
  


---(end of broadcast)---
TIP 8: explain analyze is your friend