Re: [HACKERS] Hard to Use WAS: Hard limit on WAL space

2013-06-14 Thread Brendan Jurd
On 15 June 2013 16:18, Craig Ringer  wrote:
> On 06/15/2013 02:08 PM, Brendan Jurd wrote:
>> On 15 June 2013 14:43, Craig Ringer  wrote:
>>> The #1 question I see on Stack Overflow has to be confusion about
>>> pg_hba.conf, mostly from people who have no idea it exists, don't understand
>>> how to configure it, etc.
>> The totally non-obvious name of the file probably has something to do
>> with that.  It should be called 'auth.conf'.
> Not convinced; since it only controls one facet of auth - it doesn't
> define users, passwords, grants, etc ...

When somebody is setting up postgres for the first time, and they list
the contents of the config directory, you want them to have some idea
what each of the files is for.  If they see something called
'auth.conf', they'll get the right general idea.  An understanding of
the nuances (like that it doesn't control user accounts) will come
once they open up the file -- which they may well do, because it is
called 'auth.conf', and 'auth' is a thing you want to configure.

If they see something called 'pg_hba.conf', they may very reasonably
assume that it is some internal/advanced stuff that they don't need to
worry about just yet, because what the heck is a 'pg_hba'?  The 'pg'
is unnecessary and the 'hba' is an internal jargon term that we've
ill-advisedly allowed to leak out into the filename.

If you really feel that 'auth.conf' is too imprecise, maybe something
like 'conn-auth.conf' would be more your style.

Cheers,
BJ


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hard to Use WAS: Hard limit on WAL space

2013-06-14 Thread Craig Ringer
On 06/15/2013 02:08 PM, Brendan Jurd wrote:
> On 15 June 2013 14:43, Craig Ringer  wrote:
>> The #1 question I see on Stack Overflow has to be confusion about
>> pg_hba.conf, mostly from people who have no idea it exists, don't understand
>> how to configure it, etc.
> The totally non-obvious name of the file probably has something to do
> with that.  It should be called 'auth.conf'.
Not convinced; since it only controls one facet of auth - it doesn't
define users, passwords, grants, etc - that'd probably be just as confusing.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hard to Use WAS: Hard limit on WAL space

2013-06-14 Thread Brendan Jurd
On 15 June 2013 14:43, Craig Ringer  wrote:
> The #1 question I see on Stack Overflow has to be confusion about
> pg_hba.conf, mostly from people who have no idea it exists, don't understand
> how to configure it, etc.

The totally non-obvious name of the file probably has something to do
with that.  It should be called 'auth.conf'.

Cheers,
BJ


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: Should array_length() Return NULL)

2013-06-14 Thread Brendan Jurd
On 14 June 2013 03:53, David E. Wheeler  wrote:
> Similar things should have dissimilar names. I propose:
>
> 
>
>  Old  |New
> --+--
>  array_dims   | array_desc

array_bounds?

>  array_ndims  | array_depth
>  array_length | array_size
>  array_lower  | array_start
>  array_upper  | array_finish
>
> The last two are meh, but it’s a place to start…

I think that even with the most dissimilar names we can come up with,
this is going to confuse people.  But it is still better than doing
nothing.

I wonder whether, if we go in this direction, we could still use some
of the work I did on deprecating zero-D arrays.  Let's say the old
functions keep doing what they do now, and we teach them to treat all
empty arrays the same way they currently treat zero-D arrays (return
NULL).  The new functions treat zero-D arrays as though they were 1-D
empty with default bounds, and we add CARDINALITY per ArrayGetNItems.

This way, applications would not be broken by upgrading, and we'd be
giving people a way to opt-in to a better API.

Cheers,
BJ


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hard to Use WAS: Hard limit on WAL space

2013-06-14 Thread Craig Ringer
On 06/15/2013 02:16 AM, Josh Berkus wrote:
> On 06/12/2013 02:03 PM, Joshua D. Drake wrote:
>> What concerns me is we seem to be trying to make this "easy". It isn't
>> supposed to be easy. This is hard stuff. Smart people built it and it
>> takes a smart person to run it. When did it become a bad thing to be
>> something that smart people need to run?
> 1997, last I checked.
>
> Our unofficial motto: "PostgreSQL: making very hard things possible, and
> simple things hard."
>
> It *is* hard.  But that's because we've *made* it hard to understand and
> manage, not because the problem is inherently hard.

I have to agree with all this... Pg has some of the best docs around, a
really nice SQL level interface, and some truly shocking usability
outside that nice zone.

Once a user steps into the "admin zone" they're confronted with a lot of
settings they'll really struggle to understand and manage.

I don't want this to be used as an argument not to commit early stages
of work, though. I think iterative development with exposure to
real-world testing and experience is necessary when you're getting to
the complexity of things that are now going in to Pg. It's more that
"commited" != "done"; right now, once its usable at that power-user
stage further management and improvement gets farmed out to external
tools and the usability of the core feature stays rather ... rough.

Some examples:

fsync=off


We have a giant foot-cannon in the config files, "fsync" with the "off"
option neatly documented alongside all the others. No note saying
"setting fsync=off is equivalent to setting yes_you_can_eat_my_data=on".
No WARNING in the logs, not that a user who'd set that without
understanding it would look at the logs. The fsync section of
http://www.postgresql.org/docs/current/static/runtime-config-wal.html
 is
ok, though it could do with a more prominent warning... but the user
needs to know where to look. I've repeatedly asked to change this -
usually after yet another user comes onto -general with data loss due to
not understanding fsync=off - and haven't been able to get agreement on
even a config file comment.

Proposed fix 9.3, config file comment saying "Warning, fsync=off may
cause data loss, see the user manual."

Proposed fix 9.4+: Remove fsync=off from docs. Leave the GUC enum there
but have the postmaster FATAL when it sees it with a message saying
"fsync=off has been replaced with unsafe_writes=on, please change your
postgresql.conf". Add the corresponding new GUC.

max_connections


max_connections is another one. I see systems with max_connections=3000
in the wild... performing terribly, as you'd expect. Yet there's no
indication (even in the docs) that this is often a terrible idea, and
that you should really look into a connection pooler if you're going
above a few hundred (hardware/workload dependent).
http://www.postgresql.org/docs/current/static/runtime-config-connection.html

doesn't mention it, there's no config file comment, etc.

Proposed fix: Comment in the config file saying something like "See the
documentation before raising this above a few hundred". In the docs, a
note about the perf impact of high max_connections with a brief mention
of external connection pooling and links to pgbouncer/pgpool-II, mention
that many app frameworks have built-in connection pools. Brief comment
about there being an optimum workload-and-hardware dependent level of
concurrency above which performance degrades. I'll be happy to write a
draft patch for this if there's agreement on the idea.

vacuum/autovacuum
---

autovaccum tuning. We've just had this conversation and there seems to
be agreement that it needs some love, but unlike the above two there's
no easy fix and it's an ongoing process. I don't have any right to
complain about it unless I do more to help fix it.

Bloat
--

Table bloat. Table bloat has been a major issue with PostgreSQL
users/admins for years. Anyone care to explain to me in a simple
paragraph how to find out if you have table or index bloat issues in
your database and what to do about it? (Maybe we need
"pg_catalog.pg_index_bloat" and "pg_catalog.pg_table_bloat" views
including FILLFACTOR correction?)

I think I'll draft up a patch to add exactly that.

Dump/restore and globals
--

Dump and restore. The standard advice I give is to do a "pg_dumpall
--globals-only" followed by a "pg_dump -Fc" of each database, since we
don't have "pg_dumpfall -Fc". Users often seem to do single-DB dumps
then find themselves having trouble restoring them due to missing user
accounts, etc. Or they do a pg_dumpall then want to restore just one
DB/table.

There's also a lot of confusion around restoring dumps due to the
different formats. This has improved now that pg_restore tells

Re: [HACKERS] MD5 aggregate

2013-06-14 Thread Craig Ringer
On 06/13/2013 05:35 PM, Dean Rasheed wrote:
> Hi,
>
> Attached is a patch implementing a new aggregate function md5_agg() to
> compute the aggregate MD5 sum across a number of rows. This is
> something I've wished for a number of times. I think the primary use
> case is to do a quick check that 2 tables, possibly on different
> servers, contain the same data, using a query like
>
>   SELECT md5_agg(foo.*::text) FROM (SELECT * FROM foo ORDER BY id) foo;
>
> or
>
>   SELECT md5_agg(foo.*::text ORDER BY id) FROM foo;

That's a very useful thing to be able to do, but I'm hesitant to make
the fact that it uses md5 too prominent in the name if it doesn't
produce a result that an external user could reasonably expect from
md5'ing the same data.

I imagine having an md5_agg(text) and md5(bytea) that was the more
efficient, streaming equivalent of:

md5(string_agg(the_col,''))

would be rather handy.

It'd be less useful for other types (floats, integers, etc) unless we
had a way to get the binary representations of those in a well defined
form, like int8le(1) . Casting to 'text' would be sufficient for most of
the purposes I can imagine, though, and for those that it wouldn't
things would quickly get so complicated that you'd want to be using a
PL/something function anyway.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MD5 aggregate

2013-06-14 Thread Craig Ringer

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 06/14/2013 09:40 PM, Stephen Frost wrote:
> Where I'd take this is actually in a completely different direction..
> I'd like the aggregate to be able to match the results of running the
> 'md5sum' unix utility on a file that's been COPY'd out.
Until I started looking at the follow-up discussion I didn't realise it
wasn't supposed to.

If it isn't the md5sum of the ordered rows, it shouldn't be called
'md5'. It might still be useful, but please don't call it md5.

The proposals to make it produce the same result with different row
orderings sound useful in the context of SQL; if it produced a different
result with different orderings I'd want a way to force an explicit
ORDER BY clause on the aggregate and error out if one wasn't present.
Making it ignore order means that it's no longer md5, though.

row_sums(...) ?


- -- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.13 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJRu+XMAAoJELBXNkqjr+S2mkkH/j8gi8d07dI6+G742f0U+v0J
u8DGhtDQuuWHalqlaUDOssmi4fRDg99OzLlR+Mid0yGL/UfFMoL47H+kNRoMkuzV
stUz3vf5rp8TbqEnikT3EwEKIuzaWrae0Fn3TKIYXVSRVvWjGzRSZsvJZsdfcS7T
7lZ9sf6QGekT9bAi6BIFsG7Z1bFLb6Q6AeTsX04++dLBCrjm96CSyisBswY5J2qg
zD0WrK6IOsSn9ljlIZRGSTtP+tdM5mOi/DHdeEd+glGx5YKQ9t9yq++oayoqb9mp
hPtsBo6UwcMaylPA2vQnhVi0q2bl9FMa+QGpaWBe6YfXLPF4PWhET2OixkRat1w=
=ncT/
-END PGP SIGNATURE-



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add transforms feature

2013-06-14 Thread Craig Ringer
On 06/14/2013 11:11 AM, Peter Eisentraut wrote:
> A transform is an SQL object that supplies to functions for converting
> between data types and procedural languages.  For example, a transform
> could arrange that hstore is converted to an appropriate hash or
> dictionary object in PL/Perl or PL/Python.
>
> Externally visible changes:
>
> - new SQL commands CREATE TRANSFORM and DROP TRANSFORM
>
> - system catalog pg_transform
>
> - transform support in PL/Perl and PL/Python
>
> - PL/Perl and PL/Python install their header files for use by external
>   types
I wonder if that should be extended to install headers for hstore,
ltree, and while we're at it, intarray as well?

You handle getting access to the headers by using include path flags:

PG_CPPFLAGS = -I$(top_srcdir)/src/pl/plpython $(python_includespec)
-I$(top_srcdir)/contrib/hstore

which is fine for in-tree builds but AFAIK means that pgxs cannot build
these extensions. If the extension modules installed their headers that
problem would go away; you'd just

#include "contrib/hstore/hstore.h"

Many modules already have useful non-static functions that work with raw
types and aren't callable via the fmgr, as well as the macros and
typdefs that make working with them easier. intarray in particular would
be extremely useful to use from other extensions, but in the past I've
landed up copying it and then adding the extra functions I needed
because I couldn't access its header in a PGXS build.

Peter's already answered my main worry on this, which is whether any
platform would prevent us from doing the required runtime symbol lookup
from shared libs loaded at the same level. I knew RTLD_GLOBAL allowed it
for dlopen(...), but wasn't sure about other platforms.

Thoughts? Should contrib modules be able to install headers and use each
others' symbols? I've seen interest in this in the wild from people who
want to use hstore in their own extensions and I've had uses for it
myself with intarray.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Remove useless USE_PGXS support in contrib

2013-06-14 Thread Craig Ringer
On 06/13/2013 11:16 AM, Peter Eisentraut wrote:
> This has served no purpose except to
>
> 1. take up space
> 2. confuse users
> 3. produce broken external extension modules that take contrib as an example
> 4. break builds of PostgreSQL when users try to fix 3. by exporting USE_PGXS

Weak -1 from me; I find being able to build contrib modules using PGXS
from the source tree extremely useful in development. It allows me to
easily install an extension from a newer Pg version into an older
server, test against older servers, etc.

What about assuming USE_PGXS=1 if it's unset, and have recursive make
set USE_PGXS=0 explicitly when invoking contrib makefiles?

There'd be a behaviour change, in that you'd have to write USE_PGXS=0
when you wanted the module to build against the tree it was currently in
instead of with pgxs. Maybe that's just as confusing.

If everyone hates that idea I can just write a 'pgxs' command that
copies the Makefile, hacks it to use PGXS, and runs it, so it's not like
I lose anything critical.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pluggable compression support

2013-06-14 Thread Joshua D. Drake


On 06/14/2013 06:56 PM, Robert Haas wrote:


On Fri, Jun 14, 2013 at 8:45 PM, Andres Freund  wrote:

On 2013-06-14 17:35:02 -0700, Josh Berkus wrote:



No. I think as long as we only have pglz and one new algorithm (even if
that is lz4 instead of the current snappy) we should just always use the
new algorithm. Unless I missed it nobody seemed to have voiced a
contrary position?
For testing/evaluation the guc seems to be sufficient.


Then it's not "pluggable", is it?  It's "upgradable compression
support", if anything.  Which is fine, but let's not confuse people.


The point is that it's pluggable on the storage level in the sense of
that several different algorithms can coexist and new ones can
relatively easily added.
That part is what seems to have blocked progress for quite a while
now. So fixing that seems to be the interesting thing.

I am happy enough to do the work of making it configurable if we want it
to be... But I have zap interest of doing it and throw it away in the
end because we decide we don't need it.


I don't think we need it.  I think what we need is to decide is which
algorithm is legally OK to use.  And then put it in.

In the past, we've had a great deal of speculation about that legal
question from people who are not lawyers.  Maybe it would be valuable
to get some opinions from people who ARE lawyers.  Tom and Heikki both
work for real big companies which, I'm guessing, have substantial
legal departments; perhaps they could pursue getting the algorithms of
possible interest vetted.  Or, I could try to find out whether it's
possible do something similar through EnterpriseDB.


We have IP legal representation through Software in the Public interest 
who pretty much specializes in this type of thing.


Should I follow up? If so, I need a summary of the exact question 
including licenses etc.


JD








--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
For my dreams of your image that blossoms
   a rose in the deeps of my heart. - W.B. Yeats


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pluggable compression support

2013-06-14 Thread Robert Haas
On Fri, Jun 14, 2013 at 8:45 PM, Andres Freund  wrote:
> On 2013-06-14 17:35:02 -0700, Josh Berkus wrote:
>>
>> > No. I think as long as we only have pglz and one new algorithm (even if
>> > that is lz4 instead of the current snappy) we should just always use the
>> > new algorithm. Unless I missed it nobody seemed to have voiced a
>> > contrary position?
>> > For testing/evaluation the guc seems to be sufficient.
>>
>> Then it's not "pluggable", is it?  It's "upgradable compression
>> support", if anything.  Which is fine, but let's not confuse people.
>
> The point is that it's pluggable on the storage level in the sense of
> that several different algorithms can coexist and new ones can
> relatively easily added.
> That part is what seems to have blocked progress for quite a while
> now. So fixing that seems to be the interesting thing.
>
> I am happy enough to do the work of making it configurable if we want it
> to be... But I have zap interest of doing it and throw it away in the
> end because we decide we don't need it.

I don't think we need it.  I think what we need is to decide is which
algorithm is legally OK to use.  And then put it in.

In the past, we've had a great deal of speculation about that legal
question from people who are not lawyers.  Maybe it would be valuable
to get some opinions from people who ARE lawyers.  Tom and Heikki both
work for real big companies which, I'm guessing, have substantial
legal departments; perhaps they could pursue getting the algorithms of
possible interest vetted.  Or, I could try to find out whether it's
possible do something similar through EnterpriseDB.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamic background workers

2013-06-14 Thread Michael Paquier
On Sat, Jun 15, 2013 at 6:00 AM, Robert Haas  wrote:

> The second patch, dynamic-bgworkers-v1.patch, revises the background
> worker API to allow background workers to be started dynamically.
> This requires some communication channel from ordinary workers to the
> postmaster, because it is the postmaster that must ultimately start
> the newly-registered workers.  However, that communication channel has
> to be designed pretty carefully, lest a shared memory corruption take
> out the postmaster and lead to inadvertent failure to restart after a
> crash.  Here's how I implemented that: there's an array in shared
> memory of a size equal to max_worker_processes.  This array is
> separate from the backend-private list of workers maintained by the
> postmaster, but the two are kept in sync.  When a new background
> worker registration is added to the shared data structure, the backend
> adding it uses the existing pmsignal mechanism to kick the postmaster,
> which then scans the array for new registrations.  I have attempted to
> make the code that transfers the shared_memory state into the
> postmaster's private state as paranoid as humanly possible.  The
> precautions taken are documented in the comments.  Conversely, when a
> background worker flagged as BGW_NEVER_RESTART is considered for
> restart (and we decide against it), the corresponding slot in the
> shared memory array is marked as no longer in use, allowing it to be
> reused for a new registration.
>
> Since the postmaster cannot take locks, synchronization between the
> postmaster and other backends using the shared memory segment has to
> be lockless.  This mechanism is also documented in the comments.  An
> lwlock is used to prevent two backends that are both registering a new
> worker at about the same time from stomping on each other, but the
> postmaster need not care about that lwlock.
>
> This patch also extends worker_spi as a demonstration of the new
> interface.  With this patch, you can CREATE EXTENSION worker_spi and
> then call worker_spi_launch(int4) to launch a new background worker,
> or combine it with generate_series() to launch a bunch at once.  Then
> you can kill them off with pg_terminate_backend() and start some new
> ones.  That, in my humble opinion, is pretty cool.
>
This looks really interesting, +1. I'll test the patch if possible next
week.
-- 
Michael


Re: [HACKERS] pluggable compression support

2013-06-14 Thread Andres Freund
On 2013-06-14 17:35:02 -0700, Josh Berkus wrote:
> 
> > No. I think as long as we only have pglz and one new algorithm (even if
> > that is lz4 instead of the current snappy) we should just always use the
> > new algorithm. Unless I missed it nobody seemed to have voiced a
> > contrary position?
> > For testing/evaluation the guc seems to be sufficient.
> 
> Then it's not "pluggable", is it?  It's "upgradable compression
> support", if anything.  Which is fine, but let's not confuse people.

The point is that it's pluggable on the storage level in the sense of
that several different algorithms can coexist and new ones can
relatively easily added.
That part is what seems to have blocked progress for quite a while
now. So fixing that seems to be the interesting thing.

I am happy enough to do the work of making it configurable if we want it
to be... But I have zap interest of doing it and throw it away in the
end because we decide we don't need it.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pluggable compression support

2013-06-14 Thread Josh Berkus

> No. I think as long as we only have pglz and one new algorithm (even if
> that is lz4 instead of the current snappy) we should just always use the
> new algorithm. Unless I missed it nobody seemed to have voiced a
> contrary position?
> For testing/evaluation the guc seems to be sufficient.

Then it's not "pluggable", is it?  It's "upgradable compression
support", if anything.  Which is fine, but let's not confuse people.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Minmax indexes

2013-06-14 Thread Greg Stark
On Fri, Jun 14, 2013 at 11:28 PM, Alvaro Herrera
 wrote:
> Re-summarization is relatively expensive, because the complete page range has
> to be scanned.

That doesn't sound too bad to me. It just means there's a downside to
having larger page ranges. I would expect the page ranges to be
something in the ballpark of 32 pages --  scanning 32 pages to
resummarize doesn't sound that painful but sounds like it's large
enough that the resulting index would be a reasonable size.

But I don't understand why an insert would invalid a tuple. An insert
can just update the min and max incrementally. It's a delete that
invalidates the range but as you note it doesn't really invalidate it,
just mark it as needing a refresh -- and even then only if the value
being deleted is equal to either the min or max.

> Same-size page ranges?
> Current related literature seems to consider that each "index entry" in a
> minmax index must cover the same number of pages.  There doesn't seem to be a

I assume the reason for this in the literature is the need to quickly
find the summary for a given page when you're handling an insert or
delete. If you have some kind of meta data structure that lets you
find it (which I gather is what the validity map is?) then you
wouldn't need it. But that seems like a difficulty cost to justify
compared to just having a 1:1 mapping from block to bitmap tuple.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] Revive line type

2013-06-14 Thread Peter Eisentraut
Complete the implementations of line_in, line_out, line_recv,
line_send.  Remove comments and error messages about the line type not
being implemented.  Add regression tests for existing line operators
and functions.
---
This just revives existing functionality, doesn't add anything new.
One thing that the original code did not settle was how to convert a
line in form Ax+By+C=0 to the two-points output form.  Obviously, you
can just pick to random points on the line, but I wonder whether there
is a more standard solution.

 doc/src/sgml/datatype.sgml |   34 +++-
 doc/src/sgml/func.sgml |6 +
 src/backend/utils/adt/geo_ops.c|  108 +
 src/include/catalog/pg_type.h  |3 +-
 src/include/utils/geo_decls.h  |7 -
 src/test/regress/expected/geometry.out |3 -
 src/test/regress/expected/line.out |  243 
 src/test/regress/expected/sanity_check.out |3 +-
 src/test/regress/output/misc.source|3 +-
 src/test/regress/parallel_schedule |2 +-
 src/test/regress/serial_schedule   |1 +
 src/test/regress/sql/geometry.sql  |4 -
 src/test/regress/sql/line.sql  |   77 +
 13 files changed, 408 insertions(+), 86 deletions(-)
 create mode 100644 src/test/regress/expected/line.out
 create mode 100644 src/test/regress/sql/line.sql

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index f73e6b2..ecbbdd8 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -3066,7 +3066,7 @@ Geometric Types

 line
 32 bytes
-Infinite line (not fully implemented)
+Infinite line
 ((x1,y1),(x2,y2))


@@ -3142,6 +3142,38 @@ Points

 

+Lines
+
+
+ line
+
+
+
+ Lines (line) are specified by pairs of points.
+ Values of type line are specified using any of the following
+ syntaxes:
+
+
+[ ( x1 , y1 ) , ( 
x2 , y2 ) ]
+( ( x1 , y1 ) , ( 
x2 , y2 ) )
+  ( x1 , y1 ) , ( 
x2 , y2 )
+x1 , y1   ,   
x2 , y2
+
+
+ where
+ 
(x1,y1)
+ and
+ 
(x2,y2)
+ are two (different) points on the line.
+
+
+
+ Lines are output using the first syntax.  The points used in the output
+ are not necessarily the points used on input.
+
+   
+
+   
 Line Segments
 
 
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 4c5af4b..835a189 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -8070,6 +8070,12 @@ Geometric Type Conversion Functions
 circle(polygon '((0,0),(1,1),(2,0))')


+line(point, 
point)
+line
+points to line
+lseg(point '(-1,0)', point '(1,0)')
+   
+   
 
  
   lseg
diff --git a/src/backend/utils/adt/geo_ops.c b/src/backend/utils/adt/geo_ops.c
index ad18cf0..61a1900 100644
--- a/src/backend/utils/adt/geo_ops.c
+++ b/src/backend/utils/adt/geo_ops.c
@@ -933,13 +933,8 @@
 Datum
 line_in(PG_FUNCTION_ARGS)
 {
-#ifdef ENABLE_LINE_TYPE
char   *str = PG_GETARG_CSTRING(0);
-#endif
LINE   *line;
-
-#ifdef ENABLE_LINE_TYPE
-   /* when fixed, modify "not implemented", catalog/pg_type.h and SGML */
LSEGlseg;
int isopen;
char   *s;
@@ -950,15 +945,13 @@
(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
 errmsg("invalid input syntax for type line: 
\"%s\"", str)));
 
+   if (FPeq(lseg.p[0].x, lseg.p[1].x) && FPeq(lseg.p[0].y, lseg.p[1].y))
+   ereport(ERROR,
+   (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+errmsg("invalid line specification: must be 
two distinct points")));
+
line = (LINE *) palloc(sizeof(LINE));
line_construct_pts(line, &lseg.p[0], &lseg.p[1]);
-#else
-   ereport(ERROR,
-   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-errmsg("type \"line\" not yet implemented")));
-
-   line = NULL;
-#endif
 
PG_RETURN_LINE_P(line);
 }
@@ -967,66 +960,31 @@
 Datum
 line_out(PG_FUNCTION_ARGS)
 {
-#ifdef ENABLE_LINE_TYPE
LINE   *line = PG_GETARG_LINE_P(0);
-#endif
-   char   *result;
-
-#ifdef ENABLE_LINE_TYPE
-   /* when fixed, modify "not implemented", catalog/pg_type.h and SGML */
LSEGlseg;
 
if (FPzero(line->B))
{   /* vertical */
/* use "x = C" */
-   result->A = -1;
-   result->B = 0;
-   result->C = pt1->x;
-#ifdef GEODEBUG
-   printf("line_out- line is vertical\n");
-#endif
-#ifdef NOT_USED
-   result->m = DBL_MAX;
-#endif
-
+   lseg.p[0].x = lseg.p[1].x = line->C/line-

Re: [HACKERS] pluggable compression support

2013-06-14 Thread Andres Freund
On 2013-06-14 17:12:01 -0700, Josh Berkus wrote:
> On 06/14/2013 04:01 PM, Andres Freund wrote:
> > It still contains a guc as described in the above message to control the
> > algorithm used for compressing new tuples but I think we should remove
> > that guc after testing.
> 
> Did you add the storage attribute?

No. I think as long as we only have pglz and one new algorithm (even if
that is lz4 instead of the current snappy) we should just always use the
new algorithm. Unless I missed it nobody seemed to have voiced a
contrary position?
For testing/evaluation the guc seems to be sufficient.

If we want to make it configurable on a per column basis I think the way
to go is to add a new column to pg_attribute and split compression
related things out of attstorage into attcompression.
That's a fair amount of work and it includes a minor compatibility break
in the catalog format, so I'd prefer not to do it until there's a good
reason to do so.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pluggable compression support

2013-06-14 Thread Josh Berkus
On 06/14/2013 04:01 PM, Andres Freund wrote:
> It still contains a guc as described in the above message to control the
> algorithm used for compressing new tuples but I think we should remove
> that guc after testing.

Did you add the storage attribute?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible external toast tuple support

2013-06-14 Thread Andres Freund
On 2013-06-14 19:14:15 -0400, Alvaro Herrera wrote:
> Andres Freund escribió:
> 
> > Here's the updated version. It shouldn't contain any obvious WIP pieces
> > anymore, although I think it needs some more documentation. I am just
> > not sure where to add it yet, postgres.h seems like a bad place :/
> 
> How about a new file, say src/include/access/toast.h?

Well, the question is if that buys us all that much, we need the varlena
definitions to be available pretty much everywhere. Except of section 3
- which we reduced to be pretty darn small these days - of postgres.h
pretty much all of it is concerned with Datums, a good of them being
varlenas.
We could move section 1) into its own file and unconditionally include
it...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible external toast tuple support

2013-06-14 Thread Alvaro Herrera
Andres Freund escribió:

> Here's the updated version. It shouldn't contain any obvious WIP pieces
> anymore, although I think it needs some more documentation. I am just
> not sure where to add it yet, postgres.h seems like a bad place :/

How about a new file, say src/include/access/toast.h?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Minmax indexes

2013-06-14 Thread Tom Lane
Josh Berkus  writes:
>> To avoid this, a table having a minmax index would be
>> configured so that inserts only go to the page(s) at the end of the table; 
>> this
>> avoids frequent invalidation of ranges in the middle of the table.  We 
>> provide
>> a table reloption that tweaks the FSM behavior, so that summarized pages are
>> not candidates for insertion.

> We haven't had an index type which modifies table insertion behavior
> before, and I'm not keen to start now; imagine having two indexes on the
> same table each with their own, conflicting, requirements.

I agree; such a restriction is a nonstarter for a secondary index.  I
don't believe that hacking the FSM would be sufficient to guarantee the
required behavior, either.

We've talked a lot about index-organized tables in the past.  How much
of the use case for this would be subsumed by a feature like that?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extensible external toast tuple support

2013-06-14 Thread Andres Freund
On 2013-05-31 23:42:51 -0400, Robert Haas wrote:
> On Thu, May 30, 2013 at 7:42 AM, Andres Freund  wrote:
> > In
> > http://archives.postgresql.org/message-id/20130216164231.GA15069%40awork2.anarazel.de
> > I presented the need for 'indirect' toast tuples which point into memory
> > instead of a toast table. In the comments to that proposal, off-list and
> > in-person talks the wish to make that a more general concept has
> > been voiced.
> >
> > The previous patch used varattrib_1b_e.va_len_1be to discern between
> > different types of external tuples. That obviously only works if the
> > data sizes of all possibly stored datum types are distinct which isn't
> > nice. So what the newer patch now does is to rename that field into
> > 'va_tag' and decide based on that what kind of Datum we have. To get the
> > actual length of that datum there now is a VARTAG_SIZE() macro which
> > maps the tags back to size.
> > To keep on-disk compatibility the size of an external toast tuple
> > containing a varatt_external is used as its tag value.
> >
> > This should allow for fairly easy development of a new compression
> > scheme for out-of-line toast tuples. It will *not* work for compressed
> > inline tuples (i.e. VARATT_4B_C). I am not convinced that that is a
> > problem or that if it is, that it cannot be solved separately.
> >
> > FWIW, in some quick microbenchmarks I couldn't find any performance
> > difference due to the slightly more complex size computation which I do
> > *not* find surprising.
> >
> > Opinions?
> 
> Seems pretty sensible to me.  The patch is obviously WIP but the
> direction seems fine to me.

Here's the updated version. It shouldn't contain any obvious WIP pieces
anymore, although I think it needs some more documentation. I am just
not sure where to add it yet, postgres.h seems like a bad place :/

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
>From 654e24e9a615dcacea4d9714cf8cdbf6953983d5 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Tue, 11 Jun 2013 23:25:26 +0200
Subject: [PATCH] Add support for multiple kinds of external toast datums

There are several usecases where our current representation of external toast
datums is limiting:
* adding new compression schemes
* avoidance of repeated detoasting
* externally decoded toast tuples

For that support 'tags' on external (varattrib_1b_e) varlenas which recoin the
current va_len_1be field to store the tag (or type) of a varlena. To determine
the actual length a macro VARTAG_SIZE(tag) is added which can be used to map
from a tag to the actual length.

This patch adds support for 'indirect' tuples which point to some externally
allocated memory containing a toast tuple. It also implements the stub for a
different compression algorithm.
---
 src/backend/access/heap/tuptoaster.c | 100 +++
 src/include/c.h  |   2 +
 src/include/postgres.h   |  83 +
 3 files changed, 153 insertions(+), 32 deletions(-)

diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index fc37ceb..99044d0 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -128,7 +128,7 @@ heap_tuple_fetch_attr(struct varlena * attr)
 struct varlena *
 heap_tuple_untoast_attr(struct varlena * attr)
 {
-	if (VARATT_IS_EXTERNAL(attr))
+	if (VARATT_IS_EXTERNAL_ONDISK(attr))
 	{
 		/*
 		 * This is an externally stored datum --- fetch it back from there
@@ -145,6 +145,15 @@ heap_tuple_untoast_attr(struct varlena * attr)
 			pfree(tmp);
 		}
 	}
+	else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+	{
+		struct varatt_indirect redirect;
+		VARATT_EXTERNAL_GET_POINTER(redirect, attr);
+		attr = (struct varlena *)redirect.pointer;
+		Assert(!VARATT_IS_EXTERNAL_INDIRECT(attr));
+
+		attr = heap_tuple_untoast_attr(attr);
+	}
 	else if (VARATT_IS_COMPRESSED(attr))
 	{
 		/*
@@ -191,7 +200,7 @@ heap_tuple_untoast_attr_slice(struct varlena * attr,
 	char	   *attrdata;
 	int32		attrsize;
 
-	if (VARATT_IS_EXTERNAL(attr))
+	if (VARATT_IS_EXTERNAL_ONDISK(attr))
 	{
 		struct varatt_external toast_pointer;
 
@@ -204,6 +213,13 @@ heap_tuple_untoast_attr_slice(struct varlena * attr,
 		/* fetch it back (compressed marker will get set automatically) */
 		preslice = toast_fetch_datum(attr);
 	}
+	else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+	{
+		struct varatt_indirect redirect;
+		VARATT_EXTERNAL_GET_POINTER(redirect, attr);
+		return heap_tuple_untoast_attr_slice(redirect.pointer,
+			 sliceoffset, slicelength);
+	}
 	else
 		preslice = attr;
 
@@ -267,7 +283,7 @@ toast_raw_datum_size(Datum value)
 	struct varlena *attr = (struct varlena *) DatumGetPointer(value);
 	Size		result;
 
-	if (VARATT_IS_EXTERNAL(attr))
+	if (VARATT_IS_EXTERNAL_ONDISK(attr))
 	{
 		/* va_rawsize is the size of the original datum -- including header *

[HACKERS] GIN improvements part 3: ordering in index

2013-06-14 Thread Alexander Korotkov
Hackers,

attached patch implementing ordering inside GIN index. This is third patch
of GIN improvements, see previous two:
http://www.postgresql.org/message-id/capphfduxv-il7aedwpw0w5fxrwgakfxijwm63_hzujacrxn...@mail.gmail.com
http://www.postgresql.org/message-id/CAPpHfdvftaJq7www381naLw1=4u0h+qpxgwvnhceb9hmvyw...@mail.gmail.com

This patch introduces new interface method of GIN which takes same
arguments as consistent but returns float8.
float8 gin_ordering(bool check[], StrategyNumber n, Datum query, int32
nkeys, Pointer extra_data[], bool *recheck, Datum queryKeys[], bool
nullFlags[], Datum addInfo[], bool addInfoIsNull[])
This patch implements gingettuple method which can return ordering data
using KNN infrastructure. Also it introduces >< operator for fts which
support ordering in GIN index. Some example:

postgres=# explain analyze select * from dblp_titles2 where tsvector @@
to_tsquery('english', 'statistics') order by tsvector ><
to_tsquery('english', 'statistics') limit 10;
   QUERY
PLAN
-
 Limit  (cost=12.00..48.22 rows=10 width=136) (actual time=6.999..7.120
rows=10 loops=1)
   ->  Index Scan using dblp_titles2_idx on dblp_titles2
 (cost=12.00..43003.03 rows=11868 width=136) (actual time=6.996..7.115
rows=10 loops=1)
 Index Cond: (tsvector @@ '''statist'''::tsquery)
 Order By: (tsvector >< '''statist'''::tsquery)
 Total runtime: 7.556 ms
(5 rows)

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] [RFC] Minmax indexes

2013-06-14 Thread Josh Berkus
Alvaro,

This sounds really interesting, and I can see the possibilities.
However ...

> Value changes in columns that are part of a minmax index, and tuple insertion
> in summarized pages, would invalidate the stored min/max values.  To support
> this, each minmax index has a validity map; a range can only be considered in 
> a
> scan if it hasn't been invalidated by such changes (A range "not considered" 
> in
> the scan needs to be returned in whole regardless of the stored min/max 
> values,
> that is, it cannot be pruned per query quals).  The validity map is very
> similar to the visibility map in terms of performance characteristics: quick
> enough that it's not contentious, allowing updates and insertions to proceed
> even when data values violate the minmax index conditions.  An invalidated
> range can be made valid by re-summarization (see below).

This begins to sound like these indexes are only useful on append-only
tables.  Not that there aren't plenty of those, but ...

> Re-summarization is relatively expensive, because the complete page range has
> to be scanned.

Why?  Why can't we just update the affected pages in the index?

>  To avoid this, a table having a minmax index would be
> configured so that inserts only go to the page(s) at the end of the table; 
> this
> avoids frequent invalidation of ranges in the middle of the table.  We provide
> a table reloption that tweaks the FSM behavior, so that summarized pages are
> not candidates for insertion.

We haven't had an index type which modifies table insertion behavior
before, and I'm not keen to start now; imagine having two indexes on the
same table each with their own, conflicting, requirements.  This is
sounding a lot more like a candidate for our prospective pluggable
storage manager.  Also, the above doesn't help us at all with UPDATEs.

If we're going to start adding reloptions for specific table behavior,
I'd rather think of all of the optimizations we might have for a
prospective "append-only table" and bundle those, rather than tying it
to whether a certain index exists or not.

Also, I hate the name ... if this feature goes ahead, I'm going to be
lobbying to change it.  But that's pretty minor compared to the update
issues.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] stray SIGALRM

2013-06-14 Thread Tom Lane
Richard Poole  writes:
> In 9.3beta1, a backend will receive a SIGALRM after authentication_timeout
> seconds, even if authentication has been successful. Most of the time
> this doesn't hurt anyone, but there are cases, such as when the backend
> is doing the open() of a backend copy, when it breaks things and results
> in an error getting reported to the client. In particular, if you're doing
> a copy from a FIFO, it is normal for open() to block until the process at
> the other end has data ready, so you're very likely to have it interrupted
> by the SIGALRM and fail.

> To see the SIGALRM just run psql then determine your backend's pid,
> attach an strace to it, and wait 60 seconds, or whatever you've got
> authentication_timeout set to.

> This behaviour appears in 6ac7facdd3990baf47efc124e9d7229422a06452 as a
> side-effect of speeding things up by getting rid of setitimer() calls;
> it's not obvious what's a good way to fix it without losing the benefits
> of that commit.

Ugh.  It doesn't sound very practical to try to guarantee that every
single kernel call in the backend is set up to recover from EINTR,
even though ideally they should all be able to cope.  Maybe we have to
revert those signal-handling changes.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] GIN improvements part2: fast scan

2013-06-14 Thread Alexander Korotkov
Hackes,

attached patch implementing "fast scan" technique for GIN. This is second
patch of GIN improvements, see the 1st one here:
http://www.postgresql.org/message-id/capphfduxv-il7aedwpw0w5fxrwgakfxijwm63_hzujacrxn...@mail.gmail.com
This patch allow to skip parts of posting trees when their scan is not
necessary. In particular, it solves "frequent_term & rare_term" problem of
FTS.
It introduces new interface method pre_consistent which behaves like
consistent, but:
1) allows false positives on input (check[])
2) allowed to return false positives

Some example: "frequent_term & rare_term" becomes pretty fast.

create table test as (select to_tsvector('english', 'bbb') as v from
generate_series(1,100));
insert into test (select to_tsvector('english', 'ddd') from
generate_series(1,10));
create index test_idx on test using gin (v);

postgres=# explain analyze select * from test where v @@
to_tsquery('english', 'bbb & ddd');
  QUERY PLAN
---
 Bitmap Heap Scan on test  (cost=942.75..7280.63 rows=5000 width=17)
(actual time=0.458..0.461 rows=10 loops=1)
   Recheck Cond: (v @@ '''bbb'' & ''ddd'''::tsquery)
   ->  Bitmap Index Scan on test_idx  (cost=0.00..941.50 rows=5000 width=0)
(actual time=0.449..0.449 rows=10 loops=1)
 Index Cond: (v @@ '''bbb'' & ''ddd'''::tsquery)
 Total runtime: 0.516 ms
(5 rows)


--
With best regards,
Alexander Korotkov.


gin_fast_scan.1.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] logical changeset generation v5

2013-06-14 Thread Andres Freund
Hi!

I am rather pleased to announce the next version of the changeset
extraction patchset. Thanks to help from a large number of people I
think we are slowly getting to the point where it is getting
committable.

Since the last submitted version
(20121115002746.ga7...@awork2.anarazel.de) a large number of fixes and
the result of good amount of review has been added to the tree. All
bugs known to me have been fixed.

Fixes include:
* synchronous replication support
* don't peg the xmin for user tables, do it only for catalog ones.
* arbitrarily large transaction support by spilling large transactions
  to disk
* spill snapshots to disk, so we can restart without waiting for a new
  snapshot to be built
* Don't read all WAL from the establishment of a logical slot
* tests via SQL interface to changeset extraction

The todo list includes:
* morph the "logical slot" interface into being "replication slots" that
  can also be used by streaming replication
* move some more code from snapbuild.c to decode.c to remove a largely
  duplicated switch
* do some more header/comment cleanup & clarification
* move pg_receivellog into its own directory in src/bin or contrib/.
* user/developer level documentation

The patch series currently has two interfaces to logical decoding. One -
which is primarily useful for pg_regress style tests and playing around
- is SQL based, the other one uses a walsender replication connection.

A quick demonstration of the SQL interface (server needs to be started
with wal_level = logical and max_logical_slots > 0):
=# CREATE EXTENSION test_logical_decoding;
=# SELECT * FROM init_logical_replication('regression_slot', 'test_decoding');
slotname | xlog_position 
-+---
 regression_slot | 0/17D5908
(1 row)

=# CREATE TABLE foo(id serial primary key, data text);

=# INSERT INTO foo(data) VALUES(1);

=# UPDATE foo SET id = -id, data = ':'||data;

=# DELETE FROM foo;

=# DROP TABLE foo;

=# SELECT * FROM start_logical_replication('regression_slot', 'now', 
'hide-xids', '0');
 location  | xid |  data
---+-+
 0/17D59B8 | 695 | BEGIN
 0/17D59B8 | 695 | COMMIT
 0/17E8B58 | 696 | BEGIN
 0/17E8B58 | 696 | table "foo": INSERT: id[int4]:1 data[text]:1
 0/17E8B58 | 696 | COMMIT
 0/17E8CA8 | 697 | BEGIN
 0/17E8CA8 | 697 | table "foo": UPDATE: old-pkey: id[int4]:1 new-tuple: 
id[int4]:-1 data[text]::1
 0/17E8CA8 | 697 | COMMIT
 0/17E8E50 | 698 | BEGIN
 0/17E8E50 | 698 | table "foo": DELETE: id[int4]:-1
 0/17E8E50 | 698 | COMMIT
 0/17E9058 | 699 | BEGIN
 0/17E9058 | 699 | COMMIT
(13 rows)

=# SELECT * FROM pg_stat_logical_decoding ;
slot_name|plugin | database | active | xmin | 
restart_decoding_lsn 
-+---+--++--+--
 regression_slot | test_decoding |12042 | f  |  695 | 0/17D58D0
(1 row)

=# SELECT * FROM stop_logical_replication('regression_slot');
 stop_logical_replication
--
0

The walsender interface has the same calls
INIT_LOGICAL_REPLICATION 'slot' 'plugin';
START_LOGICAL_REPLICATION 'slot' restart_lsn [(option value)*];
STOP_LOGICAL_REPLICATION 'slot';

The only difference is that START_LOGICAL_REPLICATION can stream changes
and it can support synchronous replication.

The output seen in the 'data' column is produced by a so called 'output
plugin' which users of the facility can write to suit their needs. They
can be written by implementing 5 functions in the shared object that's
passed to init_logical_replication() above:
* pg_decode_init (optional)
* pg_decode_begin_txn
* pg_decode_change
* pg_decode_commit_txn
* pg_decode_cleanup (optional)

The most interesting function pg_decode_change get's passed a structure
containing old/new versions of the row, the 'struct Relation' belonging
to it and metainformation about the transaction.

The output plugin can rely on syscache lookups et al. to decode the
changed tuple in whatever fashion it wants.

I'd like to invite reviewers to first look at:
* the output plugin interface
* the walsender/SRF interface
* patch 12 which contains most of the code

When reading the code, the information flow during decoding might be
interesting:
---
  +---+
  | XLogReader|
  +---+
  |
XLOG Records
  |
  v
  +---+
  | decode.c  |
  +---+
 |   |
 |   |
 v   |
+---+|
| snapbuild.c   |  HeapTupleData
+---+|
 |   |
  catalog snapshots  |
 |   |
 v   v
  +---+
  |reorderbuffer.c|
  +---+
 |
HeapTuple & Metadata

[HACKERS] stray SIGALRM

2013-06-14 Thread Richard Poole
In 9.3beta1, a backend will receive a SIGALRM after authentication_timeout
seconds, even if authentication has been successful. Most of the time
this doesn't hurt anyone, but there are cases, such as when the backend
is doing the open() of a backend copy, when it breaks things and results
in an error getting reported to the client. In particular, if you're doing
a copy from a FIFO, it is normal for open() to block until the process at
the other end has data ready, so you're very likely to have it interrupted
by the SIGALRM and fail.

To see the SIGALRM just run psql then determine your backend's pid,
attach an strace to it, and wait 60 seconds, or whatever you've got
authentication_timeout set to.

This behaviour appears in 6ac7facdd3990baf47efc124e9d7229422a06452 as a
side-effect of speeding things up by getting rid of setitimer() calls;
it's not obvious what's a good way to fix it without losing the benefits
of that commit.

Thanks Alvaro and Andres for helping me get from "why is my copy getting
these signals" to understanding what's actually going on.

Richard

-- 
Richard Poole http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] single-user vs standalone in docs and messages

2013-06-14 Thread Robert Haas
On Thu, Jun 13, 2013 at 6:10 PM, Jeff Janes  wrote:
> Some places in the docs and elog hints refer to "standalone" backends, while
> the official name as used in app-postgres.html is "single-user mode", and in
> fact "standalone" does not appear on that page.
>
> This tries to standardize the other locations to use "single-user".  I think
> I did the right thing with the message translation files, but I can't figure
> out how to test that.
>
> I made no attempt to change code-comments, just the user-facing parts.

I think you could tell people to use "single-user mode" instead of "a
standalone backend", but telling them to use "a single-user backend"
just seems weird.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [RFC] Minmax indexes

2013-06-14 Thread Alvaro Herrera
Hi,

This is a preliminary proposal for Minmax indexes.  I'm experimenting
with the code, but it's too crude to post yet, so here's a document
explaining what they are and how they work, so that reviewers can poke
holes to have the design improved.  My intention is to have a patch to
show for CF2, so please do have a look at this and comment.

This is part of the AXLE project http://www.axleproject.eu and the
intention is to support tables of very large size.  In a quick
experiment, I have a table of ~12 GB and its corresponding index is 65
kB in size, making the time to do the equivalent of a seqscan a small
fraction of that taken by a real seqscan.  This technique sits between a
bitmap scan of a normal btree, and a seqscan: the minmax index tells the
bitmap heap scan what pages to seqscan, allowing it to skip a large
fraction of pages that are known not to contain tuples matching the
query quals.  This is a huge win for large data warehouses.  

Without further ado, here's what I propose.


Minmax Range Indexes


Minmax indexes are a new access method intended to enable very fast scanning of
extremely large tables.

The essential idea of a minmax index is to keep track of the min() and max()
values in consecutive groups of heap pages (page ranges).  These values can be
used by constraint exclusion to avoid scanning such pages, depending on query
quals.

The main drawback of this is having to update the stored min/max values of each
page range as tuples are inserted into them.

Other database systems already have this feature. Some examples:

* Oracle Exadata calls this "storage indexes"
  http://richardfoote.wordpress.com/category/storage-indexes/

* Netezza has "zone maps"
  http://nztips.com/2010/11/netezza-integer-join-keys/

* Infobright has this automatically within their "data packs"
  
http://www.infobright.org/Blog/Entry/organizing_data_and_more_about_rough_data_contest/

* MonetDB seems to have it
  http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.108.2662
  "Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS"

Grammar
---

To create a minmax index, we use

  CREATE INDEX foo_minmax_idx ON foo USING MINMAX (a, b, e);

Partial indexes are not supported; since an index is concerned with minimum and
maximum values of the involved columns across all the pages in the table, it
doesn't make sense to exclude values.  Another way to see "partial" indexes
here would be those that only considered some pages in the table instead of all
of them; but this would be difficult to implement and manage and, most likely,
pointless.

Expressional indexes can probably be supported in the future, but we disallow
them initially for conceptual simplicity.

Having multiple minmax indexes in the same table is acceptable, though most of
the time it would make more sense to have a single index covering all the
interesting columns.  Multiple indexes might be useful for columns added later.

Access Method Design


Since item pointers are not stored inside indexes of this type, it is not
possible to support the amgettuple interface.  Instead, we only provide
amgetbitmap support; scanning a relation using this index requires a recheck
node on top.  The amgetbitmap routine would return a TIDBitmap comprising all
the pages in those page groups that comply with the query quals; the recheck 
node
prunes tuples that are not visible per snapshot and those that are not visible
per query quals.

For each supported datatype, we need an opclass with the following catalog
entries:

- support functions (pg_amproc)
  * pg_proc entry for min()
  * pg_proc entry for max()
- support operators (pg_amop): same as btree (<, <=, =, >=, >)

The min() and max() support functions are used during index construction.
The support operators are used in the optimizer, so that the index is chosen
when queries on the indexed table are planned.  (Also, we use them in the
amgetbitmap routine, to compare ScanKeys and decide whether to emit a certain
block or not).

In each index tuple (corresponding to one page range), we store:
- first block this tuple applies to
- last block this tuple applies to
- for each indexed column:
  * min() value across all tuples in the range
  * max() value across all tuples in the range
  * nulls present in any tuple?

With the default INDEX_MAX_KEYS of 32, and considering columns of 8-byte length
types (timestamptz, bigint), each tuple would be 524 bytes in length, which
seems reasonable.  Of course, larger columns are possible, such as varchar, but
creating minmax indexes on such columns seems of little practical usefulness.

This maximum index tuple size is calculated as:
BlockNumber (4 bytes) * 2 + data value (8 bytes) * 32 * 2 + null bitmap (4 
bytes)


Block ranges mentioned in index entries shouldn't overlap. However, there can
be gaps where some pages have no covering index entry. (In particular, the last
few pages of the table would commonly not be summarized.)

In 

Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)

2013-06-14 Thread Fabien COELHO



I think that the weirdness really comes from the way transactions times
are measured, their interactions with throttling, and latent bugs in the
code.


measurement times, no; interactions with throttling, no.  If it was either of 
those I'd have finished this off days ago.  Latent bugs, possibly.  We may 
discover there's nothing wrong with your code at the end here,


To summarize my point: I think my v10 code does not take into account all 
of the strangeness in doCustom, and I'm pretty sure that there is no point
in including thottle sleeps into latency measures, which was more or less 
the case. So it is somehow a "bug" which only shows up if you look at the 
latency measures, but the tps are fine.


that it just makes hitting this bug more likely. Unfortunately today is 
the day *some* bug is popping up, and I want to get it squashed before 
I'll be happy.


The lag is actually happening during a kernel call that isn't working as 
expected.  I'm not sure whether this bug was there all along if \sleep was 
used, or if it's specific to the throttle sleep.


The throttle sleep is inserted out of the state machine. That is why in 
the "test" patch I added a goto to ensure that it is always taken at the 
right time, that is when state==0 and before txn_begin is set, and not 
possibly between other states when doCustom happens to be recalled after a 
return.


I added a bunch more logging as pgbench steps through its work to track down 
where it's stuck at.  Until the end all transactions look like this:


1371238832.084783 client 10 throttle lag 2 us
1371238832.084783 client 10 executing \setrandom aid 1 :naccounts
1371238832.084803 client 10 sending SELECT abalance FROM pgbench_accounts 
WHERE aid = 753099;

1371238832.084840 calling select
1371238832.086539 client 10 receiving
1371238832.086539 client 10 finished

All clients who hit lag spikes at the end are going through this sequence

> instead:


1371238832.085912 client 13 throttle lag 790 us
1371238832.085912 client 13 executing \setrandom aid 1 :naccounts
1371238832.085931 client 13 sending SELECT abalance FROM pgbench_accounts 
WHERE aid = 564894;

1371238832.086592 client 13 receiving
1371238832.086662 calling select
1371238832.235543 client 13 receiving
1371238832.235543 client 13 finished


Note the "calling select" here that wasn't in the normal length transaction 
before it.  The client is receiving something here, but rather than it 
finishing the transaction it falls through and ends up at the select() system 
call outside of doCustom.  All of the clients that are sleeping when the 
system slips into one of these long select() calls are getting stuck behind 
it.  I'm not 100% sure, but I think this only happens when all remaining 
clients are sleeping.


Note: in both the slow cases there is a "receiving" between "sending" and 
"select". This suggests that the "goto top" at the very end of doCustom is 
followed in one case but not the other.


ISTM that there is a timeout passed to select which is computed based on 
the current sleeping time of each client. I'm pretty sure that not a well 
tested path...


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)

2013-06-14 Thread Greg Smith

On 6/14/13 3:50 PM, Fabien COELHO wrote:

I think that the weirdness really comes from the way transactions times
are measured, their interactions with throttling, and latent bugs in the
code.


measurement times, no; interactions with throttling, no.  If it was 
either of those I'd have finished this off days ago.  Latent bugs, 
possibly.  We may discover there's nothing wrong with your code at the 
end here, that it just makes hitting this bug more likely. 
Unfortunately today is the day *some* bug is popping up, and I want to 
get it squashed before I'll be happy.


The lag is actually happening during a kernel call that isn't working as 
expected.  I'm not sure whether this bug was there all along if \sleep 
was used, or if it's specific to the throttle sleep.


> Also, flag st->listen is set to 1 but *never* set back to 0...

I noticed that st-listen was weird too, and that's on my short list of 
suspicious things I haven't figured out yet.


I added a bunch more logging as pgbench steps through its work to track 
down where it's stuck at.  Until the end all transactions look like this:


1371238832.084783 client 10 throttle lag 2 us
1371238832.084783 client 10 executing \setrandom aid 1 :naccounts
1371238832.084803 client 10 sending SELECT abalance FROM 
pgbench_accounts WHERE aid = 753099;

1371238832.084840 calling select
1371238832.086539 client 10 receiving
1371238832.086539 client 10 finished

All clients who hit lag spikes at the end are going through this 
sequence instead:


1371238832.085912 client 13 throttle lag 790 us
1371238832.085912 client 13 executing \setrandom aid 1 :naccounts
1371238832.085931 client 13 sending SELECT abalance FROM 
pgbench_accounts WHERE aid = 564894;

1371238832.086592 client 13 receiving
1371238832.086662 calling select
1371238832.235543 client 13 receiving
1371238832.235543 client 13 finished

Note the "calling select" here that wasn't in the normal length 
transaction before it.  The client is receiving something here, but 
rather than it finishing the transaction it falls through and ends up at 
the select() system call outside of doCustom.  All of the clients that 
are sleeping when the system slips into one of these long select() calls 
are getting stuck behind it.  I'm not 100% sure, but I think this only 
happens when all remaining clients are sleeping.


Here's another one, it hits the receive that doesn't finish the 
transaction earlier (1371238832.086587) but then falls into the same 
select() call at 1371238832.086662:


1371238832.085884 client 12 throttle lag 799 us
1371238832.085884 client 12 executing \setrandom aid 1 :naccounts
1371238832.085903 client 12 sending SELECT abalance FROM 
pgbench_accounts WHERE aid = 299080;

1371238832.086587 client 12 receiving
1371238832.086662 calling select
1371238832.231032 client 12 receiving
1371238832.231032 client 12 finished

Investigation is still going here...

--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)

2013-06-14 Thread Fabien COELHO


Hello Greg,

I think that the weirdness really comes from the way transactions times 
are measured, their interactions with throttling, and latent bugs in the 
code.


One issue is that the throttling time was included in the measure, but not 
the first time because "txn_begin" is not set at the beginning of 
doCustom.


Also, flag st->listen is set to 1 but *never* set back to 0...
 sh> grep listen pgbench.c
int listen;
if (st->listen)
st->listen = 1;
st->listen = 1;
st->listen = 1;
st->listen = 1;
st->listen = 1;
st->listen = 1;

ISTM that I can fix the "weirdness" by inserting an ugly "goto top;", but 
I would feel better about it by removing all gotos and reordering some 
actions in doCustom in a more logical way. However that would be a bigger 
patch.


Please find attached 2 patches:

 - the first is the full throttle patch which ensures that the
   txn_begin is taken at a consistent point, after throttling,
   which requires resetting "listen". There is an ugly goto.
   I've also put times in a consistent format in the log,
   "789.012345" instead of "789 12345".

 - the second patch just shows the diff between v10 and the first one.

--
Fabien.diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 8c202bf..dc4f819 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -137,6 +137,12 @@ int			unlogged_tables = 0;
 double		sample_rate = 0.0;
 
 /*
+ * When threads are throttled to a given rate limit, this is the target delay
+ * to reach that rate in usec.  0 is the default and means no throttling.
+ */
+int64		throttle_delay = 0;
+
+/*
  * tablespace selection
  */
 char	   *tablespace = NULL;
@@ -205,6 +211,7 @@ typedef struct
 	int			nvariables;
 	instr_time	txn_begin;		/* used for measuring transaction latencies */
 	instr_time	stmt_begin;		/* used for measuring statement latencies */
+	bool		throttled;  /* whether current transaction was throttled */
 	int			use_file;		/* index in sql_files for this client */
 	bool		prepared[MAX_FILES];
 } CState;
@@ -222,6 +229,10 @@ typedef struct
 	instr_time *exec_elapsed;	/* time spent executing cmds (per Command) */
 	int		   *exec_count;		/* number of cmd executions (per Command) */
 	unsigned short random_state[3];		/* separate randomness for each thread */
+int64   throttle_trigger;  /* previous/next throttling (us) */
+	int64   throttle_lag;  /* total transaction lag behind throttling */
+	int64   throttle_lag_max;  /* max transaction lag */
+
 } TState;
 
 #define INVALID_THREAD		((pthread_t) 0)
@@ -230,6 +241,8 @@ typedef struct
 {
 	instr_time	conn_time;
 	int			xacts;
+	int64   throttle_lag;
+	int64   throttle_lag_max;
 } TResult;
 
 /*
@@ -355,6 +368,8 @@ usage(void)
 		   "  -n   do not run VACUUM before tests\n"
 		   "  -N   do not update tables \"pgbench_tellers\" and \"pgbench_branches\"\n"
 		   "  -r   report average latency per command\n"
+		   "  -R SPEC, --rate SPEC\n"
+		   "   target rate in transactions per second\n"
 		   "  -s NUM   report this scale factor in output\n"
 		   "  -S   perform SELECT-only transactions\n"
 	 "  -t NUM   number of transactions each client runs (default: 10)\n"
@@ -898,17 +913,56 @@ doCustom(TState *thread, CState *st, instr_time *conn_time, FILE *logfile, AggVa
 {
 	PGresult   *res;
 	Command   **commands;
+	booldo_throttle = false;
 
 top:
 	commands = sql_files[st->use_file];
 
+	/* handle throttling once per transaction by inserting a sleep.
+	 * this is simpler than doing it at the end.
+	 */
+	if (throttle_delay && ! st->throttled)
+	{
+		/* compute delay to approximate a Poisson distribution
+		 * 100 => 13.8 .. 0 multiplier
+		 *  10 => 11.5 .. 0
+		 *   1 =>  9.2 .. 0
+		 *1000 =>  6.9 .. 0
+		 * if transactions are too slow or a given wait shorter than
+		 * a transaction, the next transaction will start right away.
+		 */
+		int64 wait = (int64)
+			throttle_delay * -log(getrand(thread, 1, 1000)/1000.0);
+
+		thread->throttle_trigger += wait;
+
+		st->until = thread->throttle_trigger;
+		st->sleeping = 1;
+		st->throttled = true;
+		if (debug)
+			fprintf(stderr, "client %d throttling "INT64_FORMAT" us\n",
+	st->id, wait);
+	}
+
 	if (st->sleeping)
 	{			/* are we sleeping? */
 		instr_time	now;
+		int64 now_us;
 
 		INSTR_TIME_SET_CURRENT(now);
-		if (st->until <= INSTR_TIME_GET_MICROSEC(now))
+		now_us = INSTR_TIME_GET_MICROSEC(now);
+		if (st->until <= now_us)
+		{
 			st->sleeping = 0;	/* Done sleeping, go ahead with next command */
+			if (throttle_delay && st->state==0)
+			{
+/* measure lag of throttled transaction */
+int64 lag = now_us - st->until;
+thread->throttle_lag += lag;
+if (lag > thread->throttle_lag_max)
+	thr

Re: [HACKERS] SPGist "triple parity" concept doesn't work

2013-06-14 Thread Tom Lane
Teodor Sigaev  writes:
>> Anyway I now think that we might be better off with the other idea of
>> abandoning an insertion and retrying if we get a lock conflict.

> done, look at the patch.

Looks good, committed with some cosmetic adjustments.

> We definetly need new idea of locking protocol and I'll return to this
> problem at autumn (sorry, I havn't time in summer to do this
> research).

OK.  I think the performance of this way will be okay, actually, in most
cases anyhow.  It'll do till we have a better idea.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hard to Use WAS: Hard limit on WAL space

2013-06-14 Thread Joshua D. Drake


On 06/14/2013 11:16 AM, Josh Berkus wrote:


On 06/12/2013 02:03 PM, Joshua D. Drake wrote:

What concerns me is we seem to be trying to make this "easy". It isn't
supposed to be easy. This is hard stuff. Smart people built it and it
takes a smart person to run it. When did it become a bad thing to be
something that smart people need to run?


1997, last I checked.

Our unofficial motto: "PostgreSQL: making very hard things possible, and
simple things hard."

It *is* hard.  But that's because we've *made* it hard to understand and
manage, not because the problem is inherently hard.  For example: can
you explain to me in 10 words or less how to monitor to see if archiving
is falling behind?  I'll bet you can't, and that's because we've
provided no reliable way to do so.


Hey, I never said we shouldn't have a complete feature set. I agree with 
you. IMO it should not have even been committed without the ability to 
actually know what is going on and we have had it since (in theory) 8.1?


My primary concern is: Don't make it stupid.

I liked Claudio's comment, "More than easy, it should be obvious.".

It should be obvious from a review of the documentation how to manage 
this stuff. It isn't, and worse even if we wrote the documentation it 
still isn't because the feature is not complete.


With great power comes great responsibility :P

JD


--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
For my dreams of your image that blossoms
   a rose in the deeps of my heart. - W.B. Yeats


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hard to Use WAS: Hard limit on WAL space

2013-06-14 Thread Josh Berkus
On 06/12/2013 02:03 PM, Joshua D. Drake wrote:
> What concerns me is we seem to be trying to make this "easy". It isn't
> supposed to be easy. This is hard stuff. Smart people built it and it
> takes a smart person to run it. When did it become a bad thing to be
> something that smart people need to run?

1997, last I checked.

Our unofficial motto: "PostgreSQL: making very hard things possible, and
simple things hard."

It *is* hard.  But that's because we've *made* it hard to understand and
manage, not because the problem is inherently hard.  For example: can
you explain to me in 10 words or less how to monitor to see if archiving
is falling behind?  I'll bet you can't, and that's because we've
provided no reliable way to do so.

It's normal when you're developing features for the ability to utilize
them to go from hacker --> high-end user --> regular user.  We suck at
moving to that last stage, partly because whenever someone on this list
introduces the idea of making a feature not just great but easy to use,
people actually object to the idea that anything should be easy to use.
  It's like we're afraid of being polluted by the unwashed DevOps masses.

In the meantime, Mongo kicks our butts a new user adoption.  Why?  Their
features suck, but the features they do have are easy to use.  You'd
think we would have learned something from MySQL.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)

2013-06-14 Thread Fabien COELHO


pgbench already has a "\sleep" command, and the way that delay is 
handled happens inside threadRun() instead.  The pausing of the rate 
limit throttle needs to operate in the same place.


It does operate at the same place. The throttling is performed by 
inserting a "sleep" first thing when starting a new transaction. So if 
their is a weirdness, it should show as well without throttling but with a 
fixed \sleep instead?


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] another error perhaps to be enhanced

2013-06-14 Thread Joshua D. Drake


On 06/14/2013 11:01 AM, Peter Geoghegan wrote:


On Fri, Jun 14, 2013 at 10:54 AM, Joshua D. Drake  
wrote:

Now, with the error previously shown, which one_idx needs to be reindexed?


Well, you didn't show an actual error message.



ERROR:  index "foo_idx"

Is not an error message? Granted I didn't show the whole error message 
but my point is, it should ALWAYS be fully qualified.




But if you "\set
VERBOSITY verbose" within psql while connected to a 9.3 server, you'll
get fully qualified details of the constraint blamed for the error, if
any. Example:

postgres=# insert into a(a, b) values (3, 'test');
ERROR:  23505: duplicate key value violates unique constraint "a_pkey"
DETAIL:  Key (a)=(3) already exists.
SCHEMA NAME:  public
TABLE NAME:  a
CONSTRAINT NAME:  a_pkey
LOCATION:  _bt_check_unique, nbtinsert.c:398




I was looking in the logs.


JD


--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
For my dreams of your image that blossoms
   a rose in the deeps of my heart. - W.B. Yeats


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)

2013-06-14 Thread Jeff Davis
On Fri, 2013-06-14 at 13:21 -0400, Greg Smith wrote:
> I'm planning to duplicate Jon's test program on a few machines here, and 
> then see if that turns into a useful latency improvement for clients. 
> I'm trying to get this pgbench rate limit stuff working first though, 
> because one of the tests I had in mind for WAL creation overhead would 
> benefit from it.

Great, I'll wait on those results.

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] another error perhaps to be enhanced

2013-06-14 Thread Peter Geoghegan
On Fri, Jun 14, 2013 at 10:54 AM, Joshua D. Drake  
wrote:
> Now, with the error previously shown, which one_idx needs to be reindexed?

Well, you didn't show an actual error message. But if you "\set
VERBOSITY verbose" within psql while connected to a 9.3 server, you'll
get fully qualified details of the constraint blamed for the error, if
any. Example:

postgres=# insert into a(a, b) values (3, 'test');
ERROR:  23505: duplicate key value violates unique constraint "a_pkey"
DETAIL:  Key (a)=(3) already exists.
SCHEMA NAME:  public
TABLE NAME:  a
CONSTRAINT NAME:  a_pkey
LOCATION:  _bt_check_unique, nbtinsert.c:398


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] another error perhaps to be enhanced

2013-06-14 Thread Joshua D. Drake


On 06/14/2013 10:47 AM, Peter Geoghegan wrote:


I think you'll need to better describe what you mean here.



postgres=# create schema foo;
CREATE SCHEMA
postgres=# create schema bar;
CREATE SCHEMA
postgres=# create table foo.foo(id serial);
NOTICE:  CREATE TABLE will create implicit sequence "foo_id_seq" for 
serial column "foo.id"

CREATE TABLE
postgres=# create table bar.bar(id serial);
NOTICE:  CREATE TABLE will create implicit sequence "bar_id_seq" for 
serial column "bar.id"

CREATE TABLE
postgres=# create index one_idx on foo.foo(id);
CREATE INDEX
postgres=# create index one_idx on bar.bar(id);
CREATE INDEX
postgres=#


Now, with the error previously shown, which one_idx needs to be reindexed?

JD

--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
For my dreams of your image that blossoms
   a rose in the deeps of my heart. - W.B. Yeats


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] another error perhaps to be enhanced

2013-06-14 Thread Peter Geoghegan
I think you'll need to better describe what you mean here.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] another error perhaps to be enhanced

2013-06-14 Thread Michael Glaesemann


On Jun 14, 2013, at 13:38, "Joshua D. Drake"  wrote:

> 
> ERROR:  index "foo_idx"
> 
> We should probably add the schema.

I've noticed similar issues with functions. I'd like to see those 
schema-qualified as well.


> 
> JD
> -- 
> Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
> PostgreSQL Support, Training, Professional Services and Development
> High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
> For my dreams of your image that blossoms
>   a rose in the deeps of my heart. - W.B. Yeats
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)

2013-06-14 Thread Greg Smith
I don't have this resolved yet, but I think I've identified the cause. 
Updating here mainly so Fabien doesn't duplicate my work trying to track 
this down.  I'm going to keep banging at this until it's resolved now 
that I got this far.


Here's a slow transaction:

1371226017.568515 client 1 executing \set naccounts 10 * :scale
1371226017.568537 client 1 throttling 6191 us
1371226017.747858 client 1 executing \setrandom aid 1 :naccounts
1371226017.747872 client 1 sending SELECT abalance FROM pgbench_accounts 
WHERE aid = 268721;

1371226017.789816 client 1 receiving

That confirms it is getting stuck at the "throttling" step.  Looks like 
the code pauses there because it's trying to overload the "sleeping" 
state that was already in pgbench, but handle it in a special way inside 
of doCustom(), and that doesn't always work.


The problem is that pgbench doesn't always stay inside doCustom when a 
client sleeps.  It exits there to poll for incoming messages from the 
other clients, via select() on a shared socket.  It's not safe to assume 
doCustom will be running regularly; that's only true if clients keep 
returning messages.


So as long as other clients keep banging on the shared socket, doCustom 
is called regularly, and everything works as expected.  But at the end 
of the test run that happens less often, and that's when the problem 
shows up.


pgbench already has a "\sleep" command, and the way that delay is 
handled happens inside threadRun() instead.  The pausing of the rate 
limit throttle needs to operate in the same place.  I have to redo a few 
things to confirm this actually fixes the issue, as well as look at 
Fabien's later updates to this since I wandered off debugging.  I'm sure 
it's in the area of code I'm poking at now though.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add visibility map information to pg_freespace.

2013-06-14 Thread Peter Geoghegan
On Fri, Jun 14, 2013 at 7:23 AM, Andres Freund  wrote:
> 3). All the others seem to inflict unneccesary pain for not all that
> much gain.

+1. You might want to add a "historical note" about the name to the
pg_freespace documentation, though.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] another error perhaps to be enhanced

2013-06-14 Thread Joshua D. Drake


ERROR:  index "foo_idx"

We should probably add the schema.

JD
--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
For my dreams of your image that blossoms
   a rose in the deeps of my heart. - W.B. Yeats


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)

2013-06-14 Thread Jeff Davis
On Tue, 2013-06-11 at 12:58 -0400, Stephen Frost wrote:
> My main question is really- would this be useful for extending
> *relations*?  Apologies if it's already been discussed; I do plan to go
> back and read the threads about this more fully, but I wanted to voice
> my support for using posix_fallocate, when available, in general.

+1, though separate from this patch.

Andres also pointed out that we can try to track a point in the file
that is below any place where a zero page might still exist. That will
allow us to call zero pages invalid unless they are related to a recent
extension, which is a weakness in the current checksums code.

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)

2013-06-14 Thread Greg Smith

On 6/14/13 1:06 PM, Jeff Davis wrote:

Why have a GUC here at all? Perhaps this was already discussed, and I
missed it? Is it just for testing purposes, or did you intend for it to
be in the final version?


You have guessed correctly!  I suggested it stay in there only to make 
review benchmarking easier.



I started looking at this patch and it looks like we are getting a
consensus that it's the right approach. Microbenchmarks appear to show a
benefit, and (thanks to Noah's comment) it seems like the change is
safe. Are there any remaining questions or objections?


I'm planning to duplicate Jon's test program on a few machines here, and 
then see if that turns into a useful latency improvement for clients. 
I'm trying to get this pgbench rate limit stuff working first though, 
because one of the tests I had in mind for WAL creation overhead would 
benefit from it.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] request a new feature in fuzzystrmatch

2013-06-14 Thread David Fetter
On Fri, Jun 14, 2013 at 10:14:14AM -0700, Joshua D. Drake wrote:
> On 06/14/2013 10:11 AM, David Fetter wrote:
> 
> >>>ok, thanks, I will wait.
> >>Hi Joe,
> >>
> >>Do you have some time in the weekend to help me submit the patch?
> >>Thanks,
> >>
> >>Liming
> >
> >Liming,
> >
> >Is your git skill good enough to create a patch vs. PostgreSQL's git
> >master?  If so, send that and once it's hit the mailing list, record
> >same on commitfest.postgresql.org in the current open commitfest.  If
> >not, let us know where in that process you got stuck.
> >
> 
> This sounds like a wiki page FAQ in the making.

With utmost respect, this sounds like several pages should be
consolidated into one and clarified.  Yet another page will just make
matters more confusing.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] request a new feature in fuzzystrmatch

2013-06-14 Thread Joshua D. Drake


On 06/14/2013 10:11 AM, David Fetter wrote:


ok, thanks, I will wait.

Hi Joe,

Do you have some time in the weekend to help me submit the patch?
Thanks,

Liming


Liming,

Is your git skill good enough to create a patch vs. PostgreSQL's git
master?  If so, send that and once it's hit the mailing list, record
same on commitfest.postgresql.org in the current open commitfest.  If
not, let us know where in that process you got stuck.

Cheers,
David.



This sounds like a wiki page FAQ in the making.

JD

--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
For my dreams of your image that blossoms
   a rose in the deeps of my heart. - W.B. Yeats


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] request a new feature in fuzzystrmatch

2013-06-14 Thread David Fetter
On Fri, Jun 14, 2013 at 10:08:24AM -0700, Liming Hu wrote:
> On Tue, Jun 11, 2013 at 3:19 PM, Liming Hu  wrote:
> > On Tue, Jun 11, 2013 at 2:56 PM, Joe Conway  wrote:
> >> -BEGIN PGP SIGNED MESSAGE-
> >> Hash: SHA1
> >>
> >> On 06/11/2013 02:23 PM, Liming Hu wrote:
> >>> On Tue, Jun 11, 2013 at 1:57 PM, Alvaro Herrera
> >>>  wrote:
>  Liming Hu escribió:
> 
> > I have implemented the code according to Joe's suggestion, and
> > put the code at:
> > https://github.com/liminghu/fuzzystrmatch/tree/fuzzystrmatchv1.1
> 
> 
> >
> >> Please submit a proper patch so it can be seen on our mailing list
>  archives.
> 
> >>> Hi Alvaro,
> >>>
> >>> I am kind of new to the Postgresql hacker community, Can you
> >>> please help me on submit the patch?
> >>
> >> Hi Liming,
> >>
> >> I might be able to help, but it will be at least a couple of days
> >> before I have the time to look at this,
> >>
> >> Joe
> >>
> >>
> > ok, thanks, I will wait.
> Hi Joe,
> 
> Do you have some time in the weekend to help me submit the patch?
> Thanks,
> 
> Liming

Liming,

Is your git skill good enough to create a patch vs. PostgreSQL's git
master?  If so, send that and once it's hit the mailing list, record
same on commitfest.postgresql.org in the current open commitfest.  If
not, let us know where in that process you got stuck.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] request a new feature in fuzzystrmatch

2013-06-14 Thread Liming Hu
On Tue, Jun 11, 2013 at 3:19 PM, Liming Hu  wrote:
> On Tue, Jun 11, 2013 at 2:56 PM, Joe Conway  wrote:
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA1
>>
>> On 06/11/2013 02:23 PM, Liming Hu wrote:
>>> On Tue, Jun 11, 2013 at 1:57 PM, Alvaro Herrera
>>>  wrote:
 Liming Hu escribió:

> I have implemented the code according to Joe's suggestion, and
> put the code at:
> https://github.com/liminghu/fuzzystrmatch/tree/fuzzystrmatchv1.1


>
>> Please submit a proper patch so it can be seen on our mailing list
 archives.

>>> Hi Alvaro,
>>>
>>> I am kind of new to the Postgresql hacker community, Can you
>>> please help me on submit the patch?
>>
>> Hi Liming,
>>
>> I might be able to help, but it will be at least a couple of days
>> before I have the time to look at this,
>>
>> Joe
>>
>>
> ok, thanks, I will wait.
Hi Joe,

Do you have some time in the weekend to help me submit the patch?
Thanks,

Liming
>
> Liming
>> - --
>> Joe Conway
>> credativ LLC: http://www.credativ.us
>> Linux, PostgreSQL, and general Open Source
>> Training, Service, Consulting, & 24x7 Support
>> -BEGIN PGP SIGNATURE-
>> Version: GnuPG v1.4.12 (GNU/Linux)
>> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>>
>> iQIcBAEBAgAGBQJRt50oAAoJEDfy90M199hliukP/A8IZf7L0DcYWG/jZUttTg0M
>> yWpCz1KHZ7fAGqeJ5ddzuSx5CMjsmt2zYQ+0EfLX1ftoSxIHaFpSC49GgJHUdAoq
>> TSOOC1rfkfmNJG98WfUPH14Flq4eEl9reUZDXXi3jqol+npJdAQaEt9EM5y0qkcB
>> pDCy9iMaYdjYNV6RXFOBNI+7Up43oULbMLhWwwFnGX9AgCLk8SGRZHnfT1zOaXYE
>> bW/Dl9TIu058ENZqFFLIfjxqngE/Y61SOaLRAxEkbO7HAFkuwgQwnIayrasR29F+
>> GX4+HRcsd0jrcF858Dm4E+YAffI2quOR2HgZTGQj6jTNtRpb16+EI+X1vrYI3AH8
>> 5tf3exq9FDL/02zlCpKm8+uXksaLzffQXlbQPw8uDlpk+ThD0uo2990/TC6QYgXO
>> o2vCu+nkWdTc2AUk4NkoVFwXaaFLZ+M8U+swRHNAShqH7VQVx8rRwubsRJ1msT3i
>> nC8BFyMsBzOmsKNwO0IM6ZcsJXaIpmMCUshF8cGDfgpsWUQ/wzovzopb+PSiEKQB
>> X45hWYtiK3tIvo0f9gvEWzRJ4+O8tmmIZzzWz127yBdugV0xjsEHnDihpjkeMPJx
>> WHs5ViN62u62r34UCtX1oiClCC+FYR0f//alh48VlXWzP7NkFt4dcKOD6ZCzSOeQ
>> udKM0QF2TNbUebj9QIpq
>> =W6Oy
>> -END PGP SIGNATURE-
>
>
>
> --
> Liming Hu
> cell: (435)-512-4190
> Seattle Washington



--
Liming Hu
cell: (435)-512-4190
Seattle Washington


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)

2013-06-14 Thread Jeff Davis
On Sat, 2013-05-25 at 13:55 -0500, Jon Nelson wrote:
> Ack.  I've revised the patch to always have the GUC (for now), default
> to false, and if configure can't find posix_fallocate (or the user
> disables it by way of pg_config_manual.h) then it remains a GUC that
> simply can't be changed.

Why have a GUC here at all? Perhaps this was already discussed, and I
missed it? Is it just for testing purposes, or did you intend for it to
be in the final version?

If it's supported, it seems like we always want it. I doubt there are
cases where it hurts performance; but if there are, it's pretty hard for
a DBA to know what those cases are, anyway.

Also:

* The other code assumes that no errno means ENOSPC. We should be
consistent about that assumption, and do the same thing in the fallocate
case.

* You check for the presence of posix_fallocate at configure time, but
don't #ifdef the call site. It looks like you removed this from the v2
patch, was there a reason for that? Won't that cause build errors for
platforms without it?

I started looking at this patch and it looks like we are getting a
consensus that it's the right approach. Microbenchmarks appear to show a
benefit, and (thanks to Noah's comment) it seems like the change is
safe. Are there any remaining questions or objections?

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_filedump 9.3: checksums (and a few other fixes)

2013-06-14 Thread Tom Lane
Andres Freund  writes:
> On 2013-06-14 11:59:04 -0400, Tom Lane wrote:
>> Ah, you are right, I forgot the #ifndef CHECKSUM_IMPL_H dance.  Will fix
>> in a bit.

> That won't help against errors if it's included in two different
> files/translation units though.

Good point, but there's not any real reason to do that --- only
checksum.h should ever be #include'd in more than one file.  Any program
using this stuff is expected to #include checksum_impl.h in exactly one
place.  So maybe it's fine as-is.

> E.g. in rmgrlist.h we have the following comment:
> /* there is deliberately not an #ifndef RMGRLIST_H here */
> and I think the reasoning behind that comment applies here as well.

Well, that's a different case: there, and also in kwlist.h, there's an
idea that it could actually be useful to #include the file more than
once, redefining the PG_RMGR() macro each time.  There's no such use
case that I can see for checksum_impl.h.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MD5 aggregate

2013-06-14 Thread Dean Rasheed
On 14 June 2013 16:09, Hannu Krosing  wrote:
> What skytools/pgq/londiste uses for comparing tables on master
> and slave is query like this
>
> select sum(hashtext(t.*::text)) from  t;
>
> This is non-modulo sum and does not use md5 but relies on
> whatever the hashtext() du jour is :)
>
> So it is not comparable to anything external (like the md5sum
> compatible idea above) but is usually good enough for fast
> checks of compatible tables.
>
> As tables are unordered by definition anyway, this should be
> good enough for most SQL.
>
> The speed comes from both fast(er) hashtext() function and
> avoiding the sort.
>

That sounds like a pretty good approach. We could do that if we had a
version of md5() that returned numeric. My impression is that numeric
computations are pretty fast compared to the sorting overhead.

On the other hand, if there is a usable index, select md5_agg(..) from
(sub-query) will do and index scan rather than a sort, making it much
faster than using an ORDER BY in the aggregate.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_restore -l with a directory archive

2013-06-14 Thread Fujii Masao
Hi,

When I ran pg_restore -l with the directory arhicve input, I found that
its format is wrongly reported as UNKNOWN.

$ pg_dump -F d -f hoge
$ pg_restore -l hoge
;
; Archive created at Sat Jun 15 01:38:14 2013
; dbname: postgres
; TOC Entries: 9
; Compression: -1
; Dump Version: 1.12-0
; Format: UNKNOWN
; Integer: 4 bytes
; Offset: 8 bytes
; Dumped from database version: 9.3beta1
; Dumped by pg_dump version: 9.3beta1
;
;
; Selected TOC Entries:


In this case, the format should be reported as DIRECTORY.
The attached patch fixes this problem.

Regards,

-- 
Fujii Masao


pg_restore_tocsummary_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-14 Thread Andres Freund
On 2013-06-14 09:21:12 -0700, Jeff Davis wrote:
> On Fri, 2013-06-14 at 16:10 +0200, Andres Freund wrote:
> > Jeff Davis has a patch pending
> > (1365493015.7580.3240.camel@sussancws0025) that passes the buffer_std
> > flag down to MarkBufferDirtyHint() for exactly that reason. I thought we
> > were on track committing that, but rereading the thread it doesn't look
> > that way.
> > 
> > Jeff, care to update that patch?
> 
> Rebased and attached. Changed so all callers use buffer_std=true except
> those in freespace.c and fsmpage.c.
> 
> Simon, did you (or anyone else) have an objection to this patch? If not,
> I'll go ahead and commit it tomorrow morning.

I'd like to see a comment around the memcpys in XLogSaveBufferForHint()
that mentions that they are safe in a non std buffer due to
XLogCheckBuffer setting an appropriate hole/offset. Or make an explicit
change of the copy algorithm there.

Btw, if you touch that code, I'd vote for renaming XLOG_HINT to XLOG_FPI
or something like that. I find the former name confusing...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-14 Thread Jeff Davis
On Fri, 2013-06-14 at 16:10 +0200, Andres Freund wrote:
> Jeff Davis has a patch pending
> (1365493015.7580.3240.camel@sussancws0025) that passes the buffer_std
> flag down to MarkBufferDirtyHint() for exactly that reason. I thought we
> were on track committing that, but rereading the thread it doesn't look
> that way.
> 
> Jeff, care to update that patch?

Rebased and attached. Changed so all callers use buffer_std=true except
those in freespace.c and fsmpage.c.

Simon, did you (or anyone else) have an objection to this patch? If not,
I'll go ahead and commit it tomorrow morning.

Regards,
Jeff Davis

*** a/src/backend/access/hash/hash.c
--- b/src/backend/access/hash/hash.c
***
*** 287,293  hashgettuple(PG_FUNCTION_ARGS)
  			/*
  			 * Since this can be redone later if needed, mark as a hint.
  			 */
! 			MarkBufferDirtyHint(buf);
  		}
  
  		/*
--- 287,293 
  			/*
  			 * Since this can be redone later if needed, mark as a hint.
  			 */
! 			MarkBufferDirtyHint(buf, true);
  		}
  
  		/*
*** a/src/backend/access/heap/pruneheap.c
--- b/src/backend/access/heap/pruneheap.c
***
*** 262,268  heap_page_prune(Relation relation, Buffer buffer, TransactionId OldestXmin,
  		{
  			((PageHeader) page)->pd_prune_xid = prstate.new_prune_xid;
  			PageClearFull(page);
! 			MarkBufferDirtyHint(buffer);
  		}
  	}
  
--- 262,268 
  		{
  			((PageHeader) page)->pd_prune_xid = prstate.new_prune_xid;
  			PageClearFull(page);
! 			MarkBufferDirtyHint(buffer, true);
  		}
  	}
  
*** a/src/backend/access/nbtree/nbtinsert.c
--- b/src/backend/access/nbtree/nbtinsert.c
***
*** 413,421  _bt_check_unique(Relation rel, IndexTuple itup, Relation heapRel,
  	 * crucial. Be sure to mark the proper buffer dirty.
  	 */
  	if (nbuf != InvalidBuffer)
! 		MarkBufferDirtyHint(nbuf);
  	else
! 		MarkBufferDirtyHint(buf);
  }
  			}
  		}
--- 413,421 
  	 * crucial. Be sure to mark the proper buffer dirty.
  	 */
  	if (nbuf != InvalidBuffer)
! 		MarkBufferDirtyHint(nbuf, true);
  	else
! 		MarkBufferDirtyHint(buf, true);
  }
  			}
  		}
*** a/src/backend/access/nbtree/nbtree.c
--- b/src/backend/access/nbtree/nbtree.c
***
*** 1052,1058  restart:
  opaque->btpo_cycleid == vstate->cycleid)
  			{
  opaque->btpo_cycleid = 0;
! MarkBufferDirtyHint(buf);
  			}
  		}
  
--- 1052,1058 
  opaque->btpo_cycleid == vstate->cycleid)
  			{
  opaque->btpo_cycleid = 0;
! MarkBufferDirtyHint(buf, true);
  			}
  		}
  
*** a/src/backend/access/nbtree/nbtutils.c
--- b/src/backend/access/nbtree/nbtutils.c
***
*** 1789,1795  _bt_killitems(IndexScanDesc scan, bool haveLock)
  	if (killedsomething)
  	{
  		opaque->btpo_flags |= BTP_HAS_GARBAGE;
! 		MarkBufferDirtyHint(so->currPos.buf);
  	}
  
  	if (!haveLock)
--- 1789,1795 
  	if (killedsomething)
  	{
  		opaque->btpo_flags |= BTP_HAS_GARBAGE;
! 		MarkBufferDirtyHint(so->currPos.buf, true);
  	}
  
  	if (!haveLock)
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 7681,7692  XLogRestorePoint(const char *rpName)
   * records. In that case, multiple copies of the same block would be recorded
   * in separate WAL records by different backends, though that is still OK from
   * a correctness perspective.
-  *
-  * Note that this only works for buffers that fit the standard page model,
-  * i.e. those for which buffer_std == true
   */
  XLogRecPtr
! XLogSaveBufferForHint(Buffer buffer)
  {
  	XLogRecPtr	recptr = InvalidXLogRecPtr;
  	XLogRecPtr	lsn;
--- 7681,7689 
   * records. In that case, multiple copies of the same block would be recorded
   * in separate WAL records by different backends, though that is still OK from
   * a correctness perspective.
   */
  XLogRecPtr
! XLogSaveBufferForHint(Buffer buffer, bool buffer_std)
  {
  	XLogRecPtr	recptr = InvalidXLogRecPtr;
  	XLogRecPtr	lsn;
***
*** 7708,7714  XLogSaveBufferForHint(Buffer buffer)
  	 * and reset rdata for any actual WAL record insert.
  	 */
  	rdata[0].buffer = buffer;
! 	rdata[0].buffer_std = true;
  
  	/*
  	 * Check buffer while not holding an exclusive lock.
--- 7705,7711 
  	 * and reset rdata for any actual WAL record insert.
  	 */
  	rdata[0].buffer = buffer;
! 	rdata[0].buffer_std = buffer_std;
  
  	/*
  	 * Check buffer while not holding an exclusive lock.
*** a/src/backend/commands/sequence.c
--- b/src/backend/commands/sequence.c
***
*** 1118,1124  read_seq_tuple(SeqTable elm, Relation rel, Buffer *buf, HeapTuple seqtuple)
  		HeapTupleHeaderSetXmax(seqtuple->t_data, InvalidTransactionId);
  		seqtuple->t_data->t_infomask &= ~HEAP_XMAX_COMMITTED;
  		seqtuple->t_data->t_infomask |= HEAP_XMAX_INVALID;
! 		MarkBufferDirtyHint(*buf);
  	}
  
  	seq = (Form_pg_sequence) GETSTRUCT(seqtuple);
--- 1118,1124 
  		HeapTupleHeaderSetXmax(seqtuple-

Re: [HACKERS] pg_filedump 9.3: checksums (and a few other fixes)

2013-06-14 Thread Andres Freund
On 2013-06-14 11:59:04 -0400, Tom Lane wrote:
> Jeff Davis  writes:
> > I have a question about the commit though: shouldn't both functions be
> > static if they are in a .h file? Otherwise, it could lead to naming
> > conflicts. I suppose it's wrong to include the implementation file
> > twice, but it still might be confusing if someone tries. Two ideas that
> > come to mind are:
> >   * make both static and then have a trivial wrapper in checksum.c
> >   * export one or both functions, but use #ifndef CHECKSUM_IMPL_H to
> > prevent redefinition
> 
> Ah, you are right, I forgot the #ifndef CHECKSUM_IMPL_H dance.  Will fix
> in a bit.

That won't help against errors if it's included in two different
files/translation units though. I don't really see a valid case where it
could be validly be included multiple times in one TU?
If anything we should #error in that case, but I am not sure it's worth
bothering.
E.g. in rmgrlist.h we have the following comment:
/* there is deliberately not an #ifndef RMGRLIST_H here */
and I think the reasoning behind that comment applies here as well.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] refresh materialized view concurrently

2013-06-14 Thread Kevin Grittner
Attached is a patch for REFRESH MATERIALIZED VIEW CONCURRENTLY for
9.4 CF1.  The goal of this patch is to allow a refresh without
interfering with concurrent reads, using transactional semantics.

It is my hope to get this committed during this CF to allow me to
focus on incremental maintenance for the rest of the release cycle.

I didn't need to touch very much outside of matview-specific files
for this.  My biggest concern is that I needed two small functions
which did *exactly* what some static functions in ri_triggers.c
were doing and couldn't see where the best place to share them from
was.  For the moment I just duplicated them, but my hope would be
that they could be put in a suitable location and called from both
places, rather than duplicating the 30-some lines of code.  The
function signatures are:

void quoteOneName(char *buffer, const char *name)
void quoteRelationName(char *buffer, Relation rel)

Comments in the patch describe the technique used for the
transactional refresh, but I'm not sure how easy it is to
understand the technique from the comments.  Here is a
demonstration of the basic technique, using a table to mock the
materialized view so it can be run directly.

---

--
-- Setup
--
drop table if exists n, nt, nd cascade;
drop table if exists nm;

create table n (id int not null primary key, val text);
insert into n values
  (1, 'one'), (2, 'two'), (3, 'three'), (4, 'four'), (5, 'five'),
  (6, null), (7, null), (8, null), (9, null);
-- We use a table to mock this materialized view definition:
--   create materialized view nm as select * from n;
create table nm as select * from n;
insert into n values (10, 'ten'), (11, null);
update n set val = 'zwei' where id = 2;
update n set val = null where id = 3;
update n set id = 44, val = 'forty-four' where id = 4;
update n set val = 'seven' where id = 7;
delete from n where id = 5;
delete from n where id = 8;

vacuum analyze;

--
-- Sample of internal processing for REFRESH MV CONCURRENTLY.
--
begin;
create temp table nt as select * from n;
analyze nt;
create temp table nd as
  SELECT x.ctid as tid, y
    FROM nm x
    FULL JOIN n y ON (y.id OPERATOR(pg_catalog.=) x.id)
    WHERE (y.*) IS DISTINCT FROM (x.*)
    ORDER BY tid;
analyze nd;

delete from nm where ctid in
  (select tid from nd
    where tid is not null and y is not distinct from null);
update nm x set id = (d.y).id, val = (d.y).val from nd d
  where d.tid is not null and x.ctid = d.tid;
insert into nm select (y).* from nd where tid is null;
commit;

--
-- Check that results match.
--
select * from n order by id;
select * from nm order by id;

---

I also tried a million-row materialized view with the patch to see
what the performace was like on a large table with just a few
changes.  I was surprised that a small change-set like this was
actually faster than replacing the heap, at least on my machine.
Obviously, when a larger number of rows are affected the
transactional CONCURRENTLY option will be slower, and this is not
intended in any way as a performace-enhancing feature, that was
just a happy surprise in testing.

---

-- drop from previous test
drop table if exists testv cascade;

-- create and populate permanent table
create table testv (id int primary key, val text);
insert into testv
  select n, cash_words((floor(random() * 1) / 100)::text::money)
  from (select generate_series(1, 200, 2)) s(n);
update testv
  set val = NULL
  where id = 547345;

create materialized view matv as select * from testv;
create unique index matv_id on matv (id);
vacuum analyze matv;

delete from testv where id = 16405;
insert into testv
  values (393466, cash_words((floor(random() * 1) / 100)::text::money));
update testv
  set val = cash_words((floor(random() * 1) / 100)::text::money)
  where id = 1947141;

refresh materialized view concurrently matv;

---

People may be surprised to see this using SPI even more than
ri_triggers.c does.  I think this is the safest and most
maintainable approach, although I welcome alternative suggestions.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company*** a/doc/src/sgml/mvcc.sgml
--- b/doc/src/sgml/mvcc.sgml
***
*** 928,935  ERROR:  could not serialize access due to read/write dependencies among transact
  
  
  
!  This lock mode is not automatically acquired on tables by any
!  PostgreSQL command.
  
 

--- 928,934 
  
  
  
!  Acquired by REFRESH MATERIALIZED VIEW CONCURRENTLY.
  
 

*** a/doc/src/sgml/ref/refresh_materialized_view.sgml
--- b/doc/src/sgml/ref/refresh_materialized_view.sgml
***
*** 21,

Re: [HACKERS] pg_filedump 9.3: checksums (and a few other fixes)

2013-06-14 Thread Tom Lane
Jeff Davis  writes:
> I have a question about the commit though: shouldn't both functions be
> static if they are in a .h file? Otherwise, it could lead to naming
> conflicts. I suppose it's wrong to include the implementation file
> twice, but it still might be confusing if someone tries. Two ideas that
> come to mind are:
>   * make both static and then have a trivial wrapper in checksum.c
>   * export one or both functions, but use #ifndef CHECKSUM_IMPL_H to
> prevent redefinition

Ah, you are right, I forgot the #ifndef CHECKSUM_IMPL_H dance.  Will fix
in a bit.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_filedump 9.3: checksums (and a few other fixes)

2013-06-14 Thread Jeff Davis
On Thu, 2013-06-13 at 20:09 -0400, Tom Lane wrote:
> What I propose we do about this is reduce backend/storage/page/checksum.c
> to something like
> 
> #include "postgres.h"
> #include "storage/checksum.h"
> #include "storage/checksum_impl.h"
> 
> moving all the code currently in the file into a new .h file.  Then,
> any external programs such as pg_filedump can use the checksum code
> by including checksum_impl.h.  This is essentially the same thing we
> did with the CRC support functionality some time ago.

Thank you for taking care of that. After seeing that it needed to be in
a header file, I was going to try doing it all as macros.

I have a question about the commit though: shouldn't both functions be
static if they are in a .h file? Otherwise, it could lead to naming
conflicts. I suppose it's wrong to include the implementation file
twice, but it still might be confusing if someone tries. Two ideas that
come to mind are:
  * make both static and then have a trivial wrapper in checksum.c
  * export one or both functions, but use #ifndef CHECKSUM_IMPL_H to
prevent redefinition

> Also, we have the cut-point between checksum.c and bufpage.c at the
> wrong place.  IMO we should move PageCalcChecksum16 in toto into
> checksum.c (or really now into checksum_impl.h), because that and not
> just checksum_block() is the functionality that is wanted.

Agreed.

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MD5 aggregate

2013-06-14 Thread Hannu Krosing
On 06/14/2013 04:47 PM, Tom Lane wrote:
> Dean Rasheed  writes:
>> On 14 June 2013 14:14, Tom Lane  wrote:
>>> Personally I'd be a bit inclined to xor the per-row md5's rather than
>>> sum them, but that's a small matter.
>> But this would be a much riskier thing to do with a single column,
>> because if you updated multiple rows in the same way (e.g., UPDATE t
>> SET x='foo' WHERE x='bar') then xor'ing the md5's would cancel out if
>> there were an even number of matches.
> I was implicitly thinking that the sum would be a modulo sum so that the
> final result is still the size of an md5 signature.  If that's true,
> then leaking bits via carry out is just as bad as xor's deficiencies.
> Now, you could certainly make it a non-modulo sum and not lose any
> information to carries, if you're willing to do the arithmetic in
> NUMERIC and have a variable-width result.  Sounds a bit slow though.
What skytools/pgq/londiste uses for comparing tables on master
and slave is query like this

select sum(hashtext(t.*::text)) from  t;

This is non-modulo sum and does not use md5 but relies on
whatever the hashtext() du jour is :)

So it is not comparable to anything external (like the md5sum
compatible idea above) but is usually good enough for fast
checks of compatible tables.

As tables are unordered by definition anyway, this should be
good enough for most SQL.

The speed comes from both fast(er) hashtext() function and
avoiding the sort.

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MD5 aggregate

2013-06-14 Thread Andres Freund
On 2013-06-14 15:49:31 +0100, Dean Rasheed wrote:
> On 14 June 2013 15:19, Stephen Frost  wrote:
> > * Andrew Dunstan (and...@dunslane.net) wrote:
> >> I'd rather go the other way, processing the records without having
> >> to process them otherwise at all. Turning things into text must slow
> >> things down, surely.
> >
> > That's certainly an interesting idea also..
> >
> 
> md5_agg(record) ?
> 
> Yes, I like it.

It's more complex than just memcmp()ing HeapTupleData though. At least
if the Datum contains varlena columns there's so many different
representations (short, long, compressed, external, external compressed)
of the same data that a md5 without normalizing that wouldn't be very
interesting.
So you would at least need a normalizing version of
toast_flatten_tuple() that also deals with short/long varlenas. But even
after that, you would still need to deal with Datums that can have
different representation (like short numerics, old style hstore, ...).

It might be more realistic to use the binary output functions, but I am
not sure whether all of those are sufficiently reproduceable.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MD5 aggregate

2013-06-14 Thread Dean Rasheed
On 14 June 2013 15:19, Stephen Frost  wrote:
> * Andrew Dunstan (and...@dunslane.net) wrote:
>> I'd rather go the other way, processing the records without having
>> to process them otherwise at all. Turning things into text must slow
>> things down, surely.
>
> That's certainly an interesting idea also..
>

md5_agg(record) ?

Yes, I like it.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MD5 aggregate

2013-06-14 Thread Tom Lane
Dean Rasheed  writes:
> On 14 June 2013 14:14, Tom Lane  wrote:
>> Personally I'd be a bit inclined to xor the per-row md5's rather than
>> sum them, but that's a small matter.

> But this would be a much riskier thing to do with a single column,
> because if you updated multiple rows in the same way (e.g., UPDATE t
> SET x='foo' WHERE x='bar') then xor'ing the md5's would cancel out if
> there were an even number of matches.

I was implicitly thinking that the sum would be a modulo sum so that the
final result is still the size of an md5 signature.  If that's true,
then leaking bits via carry out is just as bad as xor's deficiencies.
Now, you could certainly make it a non-modulo sum and not lose any
information to carries, if you're willing to do the arithmetic in
NUMERIC and have a variable-width result.  Sounds a bit slow though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-14 Thread Tom Lane
Greg Stark  writes:
> On Fri, Jun 14, 2013 at 2:21 PM, Tom Lane  wrote:
>> So it's not that we actually need to log the individual hint bit
>> changes, it's that we need to WAL-log a full page image on the first
>> update after a checkpoint, so as to recover from torn-page cases.
>> Which one are we doing?

> Wal logging a full page image after a checkpoint wouldn't actually be
> enough since subsequent hint bits will dirty the page and not wal log
> anything creating a new torn page risk. FPI are only useful if all the
> subsequent updates are wal logged.

No, there's no new torn page risk, because any crash recovery would
replay starting from the checkpoint.  You might lose the
subsequently-set hint bits, but that's okay.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-14 Thread Greg Stark
On Fri, Jun 14, 2013 at 2:21 PM, Tom Lane  wrote:
> So it's not that we actually need to log the individual hint bit
> changes, it's that we need to WAL-log a full page image on the first
> update after a checkpoint, so as to recover from torn-page cases.
> Which one are we doing?

Wal logging a full page image after a checkpoint wouldn't actually be
enough since subsequent hint bits will dirty the page and not wal log
anything creating a new torn page risk. FPI are only useful if all the
subsequent updates are wal logged.




-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add visibility map information to pg_freespace.

2013-06-14 Thread Andres Freund
On 2013-06-14 10:22:19 -0400, Alvaro Herrera wrote:
> Kyotaro HORIGUCHI wrote:
> > Helle,
> > 
> > I've added visibility map information to pg_freespace for my
> > utility.
> 
> This makes sense to me.

+1

> I only lament the fact that this makes the
> module a misnomer.  Do we want to 1) rename the module (how
> inconvenient), 2) create a separate module for this (surely not
> warranted), or 3) accept it and move on?

3). All the others seem to inflict unneccesary pain for not all that
much gain.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Buildfarm client 4.11 released

2013-06-14 Thread Andrew Dunstan


Version 4.11 of the PostgreSQL Buildfarm client has been released. It 
can be downloaded from 



Changes since 4.10:

 * Turn down module cleanup verbosity
 * Add check for rogue postmasters.
 * Add pseudo-branch targets HEAD_PLUS_LATEST and HEAD_PLUS_LATEST2.
 * Use Digest::SHA instead of Digest::SHA1.
 * Make directory handling more robust in git code.
 * Move web transaction into a module procedure.
 * Switch to using the porcelain format of git status.
 * Provide parameter for core file patterns.
 * Use a command file for gdb instead of the -ex option


The web transaction and Digest::SHA changes have allowed the removal of 
a couple of long-standing uglinesses on the system. In almost all cases, 
the config parameter "aux_path" and the separate run_web_transaction.pl 
script are now redundant (the exception is older Msys systems).


Enjoy

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add visibility map information to pg_freespace.

2013-06-14 Thread Alvaro Herrera
Kyotaro HORIGUCHI wrote:
> Helle,
> 
> I've added visibility map information to pg_freespace for my
> utility.

This makes sense to me.  I only lament the fact that this makes the
module a misnomer.  Do we want to 1) rename the module (how
inconvenient), 2) create a separate module for this (surely not
warranted), or 3) accept it and move on?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MD5 aggregate

2013-06-14 Thread Dean Rasheed
On 14 June 2013 14:14, Tom Lane  wrote:
> Marko Kreen  writes:
>> On Thu, Jun 13, 2013 at 12:35 PM, Dean Rasheed  
>> wrote:
>>> Attached is a patch implementing a new aggregate function md5_agg() to
>>> compute the aggregate MD5 sum across a number of rows.
>
>> It's more efficient to calculate per-row md5, and then sum() them.
>> This avoids the need for ORDER BY.
>
> Good point.  The aggregate md5 function also fails to distinguish the
> case where we have 'xyzzy' followed by 'xyz' in two adjacent rows
> from the case where they contain 'xyz' followed by 'zyxyz'.
>

Well, if you aggregated foo.*::text as in my original example, then
the textual representation of the row would protect you from that. But
yes, if you were just doing it with a single text column that might be
a risk.


> Now, as against that, you lose any sensitivity to the ordering of the
> values.
>
> Personally I'd be a bit inclined to xor the per-row md5's rather than
> sum them, but that's a small matter.
>

But this would be a much riskier thing to do with a single column,
because if you updated multiple rows in the same way (e.g., UPDATE t
SET x='foo' WHERE x='bar') then xor'ing the md5's would cancel out if
there were an even number of matches.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MD5 aggregate

2013-06-14 Thread Stephen Frost
* Andrew Dunstan (and...@dunslane.net) wrote:
> I'd rather go the other way, processing the records without having
> to process them otherwise at all. Turning things into text must slow
> things down, surely.

That's certainly an interesting idea also..

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)

2013-06-14 Thread Greg Smith

On 6/12/13 3:19 AM, Fabien COELHO wrote:

If you are still worried: if you run the very same command without
throttling and measure the same latency, does the same thing happens at
the end? My guess is that it should be "yes". If it is no, I'll try out
pgbench-tools.


It looks like it happens rarely for one client without the rate limit, 
but that increases to every time for multiple client with limiting in 
place.  pgbench-tools just graphs the output from the latency log. 
Here's a setup that runs the test I'm doing:


$ createdb pgbench
$ pgbench -i -s 10 pgbench
$ pgbench -S -c 25 -T 30 -l pgbench && tail -n 40 pgbench_log*

Sometimes there's no slow entries. but I've seen this once so far:

0 21822 1801 0 1371217462 945264
1 21483 1796 0 1371217462 945300
8 20891 1931 0 1371217462 945335
14 20520 2084 0 1371217462 945374
15 20517 1991 0 1371217462 945410
16 20393 1928 0 1371217462 945444
17 20183 2000 0 1371217462 945479
18 20277 2209 0 1371217462 945514
23 20316 2114 0 1371217462 945549
22 20267 250128 0 1371217463 193656

The third column is the latency for that transaction.  Notice how it's a 
steady ~2000 us except for the very last transaction, which takes 
250,128 us.  That's the weird thing; these short SELECT statements 
should never take that long.  It suggests there's something weird 
happening with how the client exits, probably that its latency number is 
being computed after more work than it should.


Here's what a rate limited run looks like for me.  Note that I'm still 
using the version I re-submitted since that's where I ran into this 
issue, I haven't merged your changes to split the rate among each client 
here--which means this is 400 TPS per client == 1 TPS total:


$ pgbench -S -c 25 -T 30 -R 400 -l pgbench && tail -n 40 pgbench_log

7 12049 2070 0 1371217859 195994
22 12064 2228 0 1371217859 196115
18 11957 1570 0 1371217859 196243
23 12130 989 0 1371217859 196374
8 11922 1598 0 1371217859 196646
11 12229 4833 0 1371217859 196702
21 11981 1943 0 1371217859 196754
20 11930 1026 0 1371217859 196799
14 11990 13119 0 1371217859 208014
^^^ fast section
vvv delayed section
1 11982 91926 0 1371217859 287862
2 12033 116601 0 1371217859 308644
6 12195 115957 0 1371217859 308735
17 12130 114375 0 1371217859 308776
0 12026 115507 0 1371217859 308822
3 11948 118228 0 1371217859 308859
4 12061 113484 0 1371217859 308897
5 12110 113586 0 1371217859 308933
9 12032 117744 0 1371217859 308969
10 12045 114626 0 1371217859 308989
12 11953 113372 0 1371217859 309030
13 11883 114405 0 1371217859 309066
15 12018 116069 0 1371217859 309101
16 11890 115727 0 1371217859 309137
19 12140 114006 0 1371217859 309177
24 11884 115782 0 1371217859 309212

There's almost 90,000 usec of latency showing up between epoch time 
1371217859.208014 and 1371217859.287862 here.  What's weird about it is 
that the longer the test runs, the larger the gap is.  If collecting the 
latency data itself caused the problem, that would make sense, so maybe 
this is related to flushing that out to disk.


If you want to look just at the latency numbers without the other 
columns in the way you can use:


cat pgbench_log.* | awk {'print $3'}

That is how I was evaluating the smoothness of the rate limit, by 
graphing those latency values.  pgbench-tools takes those and a derived 
TPS/s number and plots them, which made it easier for me to spot this 
weirdness.  But I've already moved onto analyzing the raw latency data 
instead, I can see the issue without the graph once I've duplicated the 
conditions.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-14 Thread Andres Freund
On 2013-06-14 16:58:38 +0300, Heikki Linnakangas wrote:
> On 14.06.2013 16:15, Andres Freund wrote:
> >On 2013-06-14 09:08:15 -0400, Tom Lane wrote:
> >>I just had my nose in the part of the checksum patch that tediously
> >>copies entire pages out of shared buffers to avoid possible instability
> >>of the hint bits while we checksum and write the page.
> >
> >I am really rather uncomfortable with that piece of code, and I hacked
> >it up after Jeff Janes had reported a bug there (The one aborting WAL
> >replay to early...). So I am very happy that you are looking at it.
> 
> Hmm. In XLogSaveBufferForHint():
> 
> > * Note that this only works for buffers that fit the standard page model,
> > * i.e. those for which buffer_std == true
> 
> The free-space-map uses non-standard pages, and MarkBufferDirtyHint(). Isn't
> that completely broken for the FSM? If I'm reading it correctly, what will
> happen is that replay will completely zero out all FSM pages that have been
> touched. All the FSM data is between pd_lower and pd_upper, which on
> standard pages is the "hole".

Jeff Davis has a patch pending
(1365493015.7580.3240.camel@sussancws0025) that passes the buffer_std
flag down to MarkBufferDirtyHint() for exactly that reason. I thought we
were on track committing that, but rereading the thread it doesn't look
that way.

Jeff, care to update that patch?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-14 Thread Heikki Linnakangas

On 14.06.2013 17:01, Andres Freund wrote:

At this point we have a page that has been modified without an FPI. But
it's not marked dirty, so it won't be written out without further
cause. Which might be fine since there's no cause to write out the page
and there probably won't be anyone doing that without logging an FPI
independently.
Can anybody see a scenario where this is actually dangerous?


The code also relies on that being safe during recovery:


 * If we're in recovery we cannot dirty a page because 
of a hint.
 * We can set the hint, just not dirty the page as a 
result so the
 * hint is lost when we evict the page or shutdown.
 *
 * See src/backend/storage/page/README for longer 
discussion.
 */
if (RecoveryInProgress())
return;


I can't immediately see a problem with that.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-14 Thread Andres Freund
On 2013-06-14 09:21:52 -0400, Tom Lane wrote:
> Heikki Linnakangas  writes:
> > On 14.06.2013 16:08, Tom Lane wrote:
> >> Refresh my memory as to why we need to WAL-log hints for checksumming?
> 
> > Torn pages:
> 
> So it's not that we actually need to log the individual hint bit
> changes, it's that we need to WAL-log a full page image on the first
> update after a checkpoint, so as to recover from torn-page cases.
> Which one are we doing?


>From quickly looking at the code again I think the MarkBufferDirtyHint()
code makes at least one assumption that isn't correct in the fact of
checksums.

It tests for the need to dirty the page with:
if ((bufHdr->flags & (BM_DIRTY | BM_JUST_DIRTIED)) !=
(BM_DIRTY | BM_JUST_DIRTIED))

*before* taking a lock. A comment explains why that is safe:

 * Since we make this test unlocked, there's a chance we
 * might fail to notice that the flags have just been cleared, and 
failed
 * to reset them, due to memory-ordering issues.

That's fine for the classical usecase without checksums but what about
the following scenario:

1) page is dirtied, FPI is logged
2) SetHintBits gets called on the same page, holding only a share lock
3) checkpointer/bgwriter/... writes out the the page, clearing the dirty
   flag
4) checkpoint finishes, updates redo ptr
5) SetHintBits actually modifies the hint bits
6) SetHintBits calls MarkBufferDirtyHint which doesn't notice that the
   page isn't dirty anymore and thus doesn't check whether something
   needs to get logged.

At this point we have a page that has been modified without an FPI. But
it's not marked dirty, so it won't be written out without further
cause. Which might be fine since there's no cause to write out the page
and there probably won't be anyone doing that without logging an FPI
independently.
Can anybody see a scenario where this is actually dangerous?

Since

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MD5 aggregate

2013-06-14 Thread Andrew Dunstan


On 06/14/2013 09:40 AM, Stephen Frost wrote:

* Tom Lane (t...@sss.pgh.pa.us) wrote:

Marko Kreen  writes:

On Thu, Jun 13, 2013 at 12:35 PM, Dean Rasheed  wrote:

Attached is a patch implementing a new aggregate function md5_agg() to
compute the aggregate MD5 sum across a number of rows.

It's more efficient to calculate per-row md5, and then sum() them.
This avoids the need for ORDER BY.

Good point.  The aggregate md5 function also fails to distinguish the
case where we have 'xyzzy' followed by 'xyz' in two adjacent rows
from the case where they contain 'xyz' followed by 'zyxyz'.

Now, as against that, you lose any sensitivity to the ordering of the
values.

Personally I'd be a bit inclined to xor the per-row md5's rather than
sum them, but that's a small matter.

Where I'd take this is actually in a completely different direction..
I'd like the aggregate to be able to match the results of running the
'md5sum' unix utility on a file that's been COPY'd out.  Yes, that means
we'd need a way to get back "what would this row look like if it was
sent through COPY with these parameters", but I've long wanted that
also.

No, no clue about how to put all that together.  Yes, having this would
be better than nothing, so I'm still for adding this even if we can't
make it match COPY output. :)






I'd rather go the other way, processing the records without having to 
process them otherwise at all. Turning things into text must slow things 
down, surely.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-14 Thread Heikki Linnakangas

On 14.06.2013 16:15, Andres Freund wrote:

On 2013-06-14 09:08:15 -0400, Tom Lane wrote:

I just had my nose in the part of the checksum patch that tediously
copies entire pages out of shared buffers to avoid possible instability
of the hint bits while we checksum and write the page.


I am really rather uncomfortable with that piece of code, and I hacked
it up after Jeff Janes had reported a bug there (The one aborting WAL
replay to early...). So I am very happy that you are looking at it.


Hmm. In XLogSaveBufferForHint():


 * Note that this only works for buffers that fit the standard page model,
 * i.e. those for which buffer_std == true


The free-space-map uses non-standard pages, and MarkBufferDirtyHint(). 
Isn't that completely broken for the FSM? If I'm reading it correctly, 
what will happen is that replay will completely zero out all FSM pages 
that have been touched. All the FSM data is between pd_lower and 
pd_upper, which on standard pages is the "hole".


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MD5 aggregate

2013-06-14 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Marko Kreen  writes:
> > On Thu, Jun 13, 2013 at 12:35 PM, Dean Rasheed  
> > wrote:
> >> Attached is a patch implementing a new aggregate function md5_agg() to
> >> compute the aggregate MD5 sum across a number of rows.
> 
> > It's more efficient to calculate per-row md5, and then sum() them.
> > This avoids the need for ORDER BY.
> 
> Good point.  The aggregate md5 function also fails to distinguish the
> case where we have 'xyzzy' followed by 'xyz' in two adjacent rows
> from the case where they contain 'xyz' followed by 'zyxyz'.
> 
> Now, as against that, you lose any sensitivity to the ordering of the
> values.
> 
> Personally I'd be a bit inclined to xor the per-row md5's rather than
> sum them, but that's a small matter.

Where I'd take this is actually in a completely different direction..
I'd like the aggregate to be able to match the results of running the
'md5sum' unix utility on a file that's been COPY'd out.  Yes, that means
we'd need a way to get back "what would this row look like if it was
sent through COPY with these parameters", but I've long wanted that
also.

No, no clue about how to put all that together.  Yes, having this would
be better than nothing, so I'm still for adding this even if we can't
make it match COPY output. :)

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-14 Thread Andres Freund
On 2013-06-14 09:21:52 -0400, Tom Lane wrote:
> Heikki Linnakangas  writes:
> > On 14.06.2013 16:08, Tom Lane wrote:
> >> Refresh my memory as to why we need to WAL-log hints for checksumming?
> 
> > Torn pages:
> 
> So it's not that we actually need to log the individual hint bit
> changes, it's that we need to WAL-log a full page image on the first
> update after a checkpoint, so as to recover from torn-page cases.
> Which one are we doing?

MarkBufferDirtyHint() loggs an FPI (just not via a BKP block) via
XLogSaveBufferForHint() iff XLogCheckBuffer() says we need to by
comparing GetRedoRecPtr() with the page's lsn.
Otherwise we don't do anything besides marking the buffer dirty.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-14 Thread Heikki Linnakangas

On 14.06.2013 16:21, Tom Lane wrote:

Heikki Linnakangas  writes:

On 14.06.2013 16:08, Tom Lane wrote:

Refresh my memory as to why we need to WAL-log hints for checksumming?



Torn pages:


So it's not that we actually need to log the individual hint bit
changes, it's that we need to WAL-log a full page image on the first
update after a checkpoint, so as to recover from torn-page cases.
Which one are we doing?


Correct. We're doing the latter, see XLogSaveBufferForHint().

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Remove useless USE_PGXS support in contrib

2013-06-14 Thread Andrew Dunstan


On 06/14/2013 08:35 AM, Peter Eisentraut wrote:

On 6/13/13 9:20 PM, amul sul wrote:

Agree, only if we consider these contrib module is always gonna deployed with 
the postgresql.
But, what if user going to install such module elsewhere i.e. not from contrib 
directory of pg source.

Why would anyone do that?




Maybe they wouldn't.

I do think we need to make sure that we have at least buildfarm coverage 
of pgxs module building and testing. I have some coverage of a few 
extensions I have written, which exercise that, so maybe that will 
suffice. If not, maybe we need to have one module that only builds via 
pgxs and is build after an install (i.e. not via the standard contrib 
build).


I don't really like the directory layout we use for these modules 
anyway, so I'm not sure they constitute best practice for extension 
builders. Lately I have been using an extension skeleton that looks 
something like this:


License
Readme.md
META.json (for pgxn)
extension.control
Makefile
doc/extension.md (soft linked to ../Readme.md)
src/extension.c
sql/extension.sql
test/sql/extension.sql
test/expected/extension.out


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-14 Thread Andres Freund
On 2013-06-14 09:08:15 -0400, Tom Lane wrote:
> Heikki Linnakangas  writes:
> > Well, time will tell I guess. The biggest overhead with the checksums is 
> > exactly the WAL-logging of hint bits.
> 
> Refresh my memory as to why we need to WAL-log hints for checksumming?
> I just had my nose in the part of the checksum patch that tediously
> copies entire pages out of shared buffers to avoid possible instability
> of the hint bits while we checksum and write the page.

I am really rather uncomfortable with that piece of code, and I hacked
it up after Jeff Janes had reported a bug there (The one aborting WAL
replay to early...). So I am very happy that you are looking at it.

Jeff Davis and I were talking about whether the usage of
PGXAC->delayChkpt makes the whole thing sufficiently safe at pgcon - we
couldn't find any real danger but...

> Given that we're
> paying that cost, I don't see why we'd need to do any extra WAL-logging
> (above and beyond the log-when-freeze cost that we have to pay already).
> But I've not absorbed any caffeine yet today, so maybe I'm just missing
> it.

The usual torn page spiel I think. If we crash while only one half of
the page made it to disk we would get spurious checksum failures from
there on.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MD5 aggregate

2013-06-14 Thread Benedikt Grundmann
On Fri, Jun 14, 2013 at 2:14 PM, Tom Lane  wrote:

> Marko Kreen  writes:
> > On Thu, Jun 13, 2013 at 12:35 PM, Dean Rasheed 
> wrote:
> >> Attached is a patch implementing a new aggregate function md5_agg() to
> >> compute the aggregate MD5 sum across a number of rows.
>
> > It's more efficient to calculate per-row md5, and then sum() them.
> > This avoids the need for ORDER BY.
>
> Good point.  The aggregate md5 function also fails to distinguish the
> case where we have 'xyzzy' followed by 'xyz' in two adjacent rows
> from the case where they contain 'xyz' followed by 'zyxyz'.
>
> Now, as against that, you lose any sensitivity to the ordering of the
> values.
>
> Personally I'd be a bit inclined to xor the per-row md5's rather than
> sum them, but that's a small matter.
>
> regards, tom lane
>
>
xor works but only if each row is different (e.g. at the very least all
columns together make a unique key).




>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-14 Thread Tom Lane
Heikki Linnakangas  writes:
> On 14.06.2013 16:08, Tom Lane wrote:
>> Refresh my memory as to why we need to WAL-log hints for checksumming?

> Torn pages:

So it's not that we actually need to log the individual hint bit
changes, it's that we need to WAL-log a full page image on the first
update after a checkpoint, so as to recover from torn-page cases.
Which one are we doing?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-14 Thread Heikki Linnakangas

On 14.06.2013 16:08, Tom Lane wrote:

Heikki Linnakangas  writes:

Well, time will tell I guess. The biggest overhead with the checksums is
exactly the WAL-logging of hint bits.


Refresh my memory as to why we need to WAL-log hints for checksumming?


Torn pages:

1. Backend sets a hint bit, dirtying the buffer.
2. Checksum is calculated, and buffer is written out to disk.
3. 

If the page is torn, the checksum won't match. Without checksums, a torn 
page is not a problem with hint bits, as a single bit can't be torn and 
the page is otherwise intact. But with checksums, it causes a checksum 
failure.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-14 Thread Amit Kapila
On Friday, June 14, 2013 2:42 PM Samrat Revagade wrote:
> Hello,

> We have already started a discussion on pgsql-hackers for the problem of
taking fresh backup during the failback operation here is the link for that:
 
>
http://www.postgresql.org/message-id/CAF8Q-Gxg3PQTf71NVECe-6OzRaew5pWhk7yQtb
jgwrfu513...@mail.gmail.com
 
> Let me again summarize the problem we are trying to address.
 
> When the master fails, last few WAL files may not reach the standby. But
the master may have gone ahead and made changes to its local file system
after > flushing WAL to the local storage.  So master contains some file
system level changes that standby does not have.  At this point, the data
directory of > master is ahead of standby's data directory.
> Subsequently, the standby will be promoted as new master.  Later when the
old master wants to be a standby of the new master, it can't just join the 
> setup since there is inconsistency in between these two servers. We need
to take the fresh backup from the new master.  This can happen in both the 
> synchronous as well as asynchronous replication.
 
> Fresh backup is also needed in case of clean switch-over because in the
current HEAD, the master does not wait for the standby to receive all the
WAL 
> up to the shutdown checkpoint record before shutting down the connection.
Fujii Masao has already submitted a patch to handle clean switch-over case, 
> but the problem is still remaining for failback case.
 
> The process of taking fresh backup is very time consuming when databases
are of very big sizes, say several TB's, and when the servers are connected 
> over a relatively slower link.  This would break the service level
agreement of disaster recovery system.  So there is need to improve the
process of 
> disaster recovery in PostgreSQL.  One way to achieve this is to maintain
consistency between master and standby which helps to avoid need of fresh 
> backup.
 
> So our proposal on this problem is that we must ensure that master should
not make any file system level changes without confirming that the 
> corresponding WAL record is replicated to the standby.
 
  How will you take care of extra WAL on old master during recovery. If it
plays the WAL which has not reached new-master, it can be a problem.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MD5 aggregate

2013-06-14 Thread Tom Lane
Marko Kreen  writes:
> On Thu, Jun 13, 2013 at 12:35 PM, Dean Rasheed  
> wrote:
>> Attached is a patch implementing a new aggregate function md5_agg() to
>> compute the aggregate MD5 sum across a number of rows.

> It's more efficient to calculate per-row md5, and then sum() them.
> This avoids the need for ORDER BY.

Good point.  The aggregate md5 function also fails to distinguish the
case where we have 'xyzzy' followed by 'xyz' in two adjacent rows
from the case where they contain 'xyz' followed by 'zyxyz'.

Now, as against that, you lose any sensitivity to the ordering of the
values.

Personally I'd be a bit inclined to xor the per-row md5's rather than
sum them, but that's a small matter.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-14 Thread Tom Lane
Heikki Linnakangas  writes:
> Well, time will tell I guess. The biggest overhead with the checksums is 
> exactly the WAL-logging of hint bits.

Refresh my memory as to why we need to WAL-log hints for checksumming?
I just had my nose in the part of the checksum patch that tediously
copies entire pages out of shared buffers to avoid possible instability
of the hint bits while we checksum and write the page.  Given that we're
paying that cost, I don't see why we'd need to do any extra WAL-logging
(above and beyond the log-when-freeze cost that we have to pay already).
But I've not absorbed any caffeine yet today, so maybe I'm just missing
it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Remove useless USE_PGXS support in contrib

2013-06-14 Thread Amit Langote
On Fri, Jun 14, 2013 at 9:35 PM, Peter Eisentraut  wrote:
> On 6/13/13 9:20 PM, amul sul wrote:
>> Agree, only if we consider these contrib module is always gonna deployed 
>> with the postgresql.
>> But, what if user going to install such module elsewhere i.e. not from 
>> contrib directory of pg source.
>
> Why would anyone do that?

Is he probably saying "install such module *from* elsewhere"? Like
directly from the source directory of a module using something like
following:

cd /path/to/module-source
make USE_PGXS=1 PG_CONFIG=/path/to/pg_config
make USE_PGXS=1 PG_CONFIG=/path/to/pg_config install

When user does not work with pg source directly and does not have
postgresql-contrib installed?
Am I missing something here?

--
Amit Langote


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Remove useless USE_PGXS support in contrib

2013-06-14 Thread Peter Eisentraut
On 6/13/13 9:20 PM, amul sul wrote:
> Agree, only if we consider these contrib module is always gonna deployed with 
> the postgresql.
> But, what if user going to install such module elsewhere i.e. not from 
> contrib directory of pg source.

Why would anyone do that?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add transforms feature

2013-06-14 Thread Peter Eisentraut
On 6/14/13 3:46 AM, Cédric Villemain wrote:
> You kept PGXS style makefile...

I know, but that's a separate issue that hasn't been decided yet.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SPGist "triple parity" concept doesn't work

2013-06-14 Thread Teodor Sigaev



Anyway I now think that we might be better off with the other idea of
abandoning an insertion and retrying if we get a lock conflict.


done, look at the patch.

I was faced with the fact that my mail is considered spam by postgresql.org, so 
I repeat some hthoughts from previous mail:


I considered the idea to forbid placement of child on the same page as parent, 
but this implementation a) could significantly increase size of index, b) 
doesn't solve Greg's point.


We definetly need new idea of locking protocol and I'll return to this problem 
at autumn (sorry, I havn't time in summer to do this research).


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


spgist_deadlock-1.patch.gz
Description: Unix tar archive

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Issue with PGC_BACKEND parameters

2013-06-14 Thread Amit Kapila
I had observed one problem with PGC_BACKEND parameters while testing patch
for ALTER SYSTEM command.

Problem statement: If I change PGC_BACKEND parameters directly in
postgresql.conf and then do pg_reload_conf() and reconnect, it will 
   still show the old value.
Detailed steps
1. Start server with default settings
2. Connect Client
3. show log_connections; -- it will show as off, this is correct.
4. Change log_connections in postgresql.conf to on
5. issue command select pg_reload_conf() in client (which is started in
step-2)
6. Connect a new client
7. show log_connections; -- it will show as off, this is "in-correct".

The problem is in step-7, it should show as on.

This problem occur only in Windows. 

The reason for this problem is that in WINDOWS, when a new session is
started it will load the changed parameters in new backend by
global/config_exec_params file. The flow is in
SubPostmasterMain()->read_nondefault_variables()->set_config_option().

In below code in function set_config_option(), it will not allow to change
PGC_BACKEND variable and even in comments it has mentioned that only
postmaster will be allowed to change and the same will propagate to
subsequently started backends, but this is not TRUE for Windows.

switch (record->context) 
{
..
..
case PGC_BACKEND: 
   if (context == PGC_SIGHUP) 
   { 
   /* 
* If a PGC_BACKEND parameter is changed in
the config file, 
* we want to accept the new value in the
postmaster (whence 
* it will propagate to subsequently-started
backends), but 
* ignore it in existing backends.
This is a tad klugy, but 
* necessary because we don't re-read the
config file during 
* backend start. 
*/ 
   if (IsUnderPostmaster) 
   return -1; 
   }

}


I think to fix the issue we need to pass the information whether PGC_BACKEND
parameter is allowed to change in set_config_option() function.
One way is to pass a new parameter.

Kindly let me know your suggestions. 

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MD5 aggregate

2013-06-14 Thread Marko Kreen
On Thu, Jun 13, 2013 at 12:35 PM, Dean Rasheed  wrote:
> Attached is a patch implementing a new aggregate function md5_agg() to
> compute the aggregate MD5 sum across a number of rows. This is
> something I've wished for a number of times. I think the primary use
> case is to do a quick check that 2 tables, possibly on different
> servers, contain the same data, using a query like
>
>   SELECT md5_agg(foo.*::text) FROM (SELECT * FROM foo ORDER BY id) foo;
>
> or
>
>   SELECT md5_agg(foo.*::text ORDER BY id) FROM foo;
>
> these would be equivalent to
>
>   SELECT md5(string_agg(foo.*::text, '' ORDER BY id)) FROM foo;
>
> but without the excessive memory consumption for the intermediate
> concatenated string, and the resulting 1GB table size limit.

It's more efficient to calculate per-row md5, and then sum() them.
This avoids the need for ORDER BY.

-- 
marko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-14 Thread Greg Stark
On Fri, Jun 14, 2013 at 12:20 PM, Heikki Linnakangas
 wrote:
> For pg_rewind, we wouldn't actually need a full-page image for hint bit
> updates, just a small record saying "hey, I touched this page". And you'd
> only need to write that the first time a page is touched after a checkpoint.

I would expect that to be about the same cost though. The latency for
the fsync on the wal record before being able to flush the buffer is
the biggest cost.


> The proposed patch is clearly not 9.3 material either. If anything, there's
> a much better change that we could still sneak in a GUC to allow hint bits
> to be WAL-logged without checksums in 9.3. All the code is there, it'd just
> be a new guc to control it separetely from checksums.

On the other hand if you're going to wal log the hint bits why not
enable checksums?

Do we allow turning off checksums after a database is initdb'd? IIRC
we can't turn it on later but I don't see why we couldn't turn them
off.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-14 Thread Heikki Linnakangas

On 14.06.2013 14:06, Pavan Deolasee wrote:

On Fri, Jun 14, 2013 at 4:12 PM, Heikki Linnakangas
wrote:



  Robert Haas pointed out in that thread that it has a problem with hint
bits that are not WAL-logged,


I liked that tool a lot until Robert pointed out the above problem. I
thought this is a show stopper because I can't really see any way to
circumvent it unless we enable checksums or explicitly WAL log hint bits.


but it will still work if you also enable the new checksums feature, which
forces hint bit updates to be WAL-logged.


Are we expecting a lot of people to run their clusters with checksums on ?
Sorry, I haven't followed the checksum discussions and don't know how much
overhead it causes. But if the general expectation is that checksums will
be turned on most often, I agree pg_rewind is probably good enough.


Well, time will tell I guess. The biggest overhead with the checksums is 
exactly the WAL-logging of hint bits.



Perhaps we could add a GUC to enable hint bits to be WAL-logged,
regardless of checksums, to make pg_rewind work.


Wouldn't that be too costly ? I mean, in the worst case every hint bit on a
page may get updated separately. If each such update is WAL logged, we are
looking for a lot more unnecessary WAL traffic.


Yep, same as with checksums. I was not very enthusiastic about the 
checksums patch because of that, but a lot of people are willing to pay 
that price. Maybe we can figure out a way to reduce that cost in 9.4. 
It'd benefit the checksums greatly.


For pg_rewind, we wouldn't actually need a full-page image for hint bit 
updates, just a small record saying "hey, I touched this page". And 
you'd only need to write that the first time a page is touched after a 
checkpoint.



I think that's a more flexible approach to solve this problem. It doesn't
require an online feedback loop from the standby to master, for starters.


I agree. That's a big advantage of pg_rewind. Unfortunately, it can't work
with 9.3 and below because of the hint bits issue, otherwise it would have
been even more cool.


The proposed patch is clearly not 9.3 material either. If anything, 
there's a much better change that we could still sneak in a GUC to allow 
hint bits to be WAL-logged without checksums in 9.3. All the code is 
there, it'd just be a new guc to control it separetely from checksums.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-14 Thread Pavan Deolasee
On Fri, Jun 14, 2013 at 2:51 PM, Benedikt Grundmann <
bgrundm...@janestreet.com> wrote:

>
> A alternative proposal (which will probably just reveal my lack of
> understanding about what is or isn't possible with WAL).  Provide a way to
> restart the master so that it rolls back the WAL changes that the slave
> hasn't seen.
>

WAL records in PostgreSQL can only be used for physical redo. They can not
be used for undo. So what you're suggesting is not possible though I am
sure a few other databases do that.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-14 Thread Pavan Deolasee
On Fri, Jun 14, 2013 at 4:12 PM, Heikki Linnakangas  wrote:

>  Robert Haas pointed out in that thread that it has a problem with hint
> bits that are not WAL-logged,
>

I liked that tool a lot until Robert pointed out the above problem. I
thought this is a show stopper because I can't really see any way to
circumvent it unless we enable checksums or explicitly WAL log hint bits.


> but it will still work if you also enable the new checksums feature, which
> forces hint bit updates to be WAL-logged.
>

Are we expecting a lot of people to run their clusters with checksums on ?
Sorry, I haven't followed the checksum discussions and don't know how much
overhead it causes. But if the general expectation is that checksums will
be turned on most often, I agree pg_rewind is probably good enough.


> Perhaps we could add a GUC to enable hint bits to be WAL-logged,
> regardless of checksums, to make pg_rewind work.
>
>
Wouldn't that be too costly ? I mean, in the worst case every hint bit on a
page may get updated separately. If each such update is WAL logged, we are
looking for a lot more unnecessary WAL traffic.


> I think that's a more flexible approach to solve this problem. It doesn't
> require an online feedback loop from the standby to master, for starters.
>
>
I agree. That's a big advantage of pg_rewind. Unfortunately, it can't work
with 9.3 and below because of the hint bits issue, otherwise it would have
been even more cool.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


  1   2   >