date:20170814

Re: [HACKERS] POC: Sharing record typmods between backends

2017-08-14 Thread Thomas Munro

On Mon, Aug 14, 2017 at 12:32 PM, Andres Freund  wrote:
> Review for 0001:
>
> I think you made a few long lines even longer, like:
>
> @@ -1106,11 +1106,11 @@ pltcl_trigger_handler(PG_FUNCTION_ARGS, 
> pltcl_call_state *call_state,
> Tcl_ListObjAppendElement(NULL, tcl_trigtup, Tcl_NewObj());
> for (i = 0; i < tupdesc->natts; i++)
> {
> -   if (tupdesc->attrs[i]->attisdropped)
> +   if (TupleDescAttr(tupdesc, i)->attisdropped)
> Tcl_ListObjAppendElement(NULL, tcl_trigtup, 
> Tcl_NewObj());
> else
> Tcl_ListObjAppendElement(NULL, tcl_trigtup,
> - 
>Tcl_NewStringObj(utf_e2u(NameStr(tupdesc->attrs[i]->attname)), -1));
> + 
>Tcl_NewStringObj(utf_e2u(NameStr(TupleDescAttr(tupdesc, i)->attname)), 
> -1));
>
>
> as it's not particularly pretty to access tupdesc->attrs[i] repeatedly,
> it'd be good if you instead had a local variable for the individual
> attribute.

Done.

> Similar:
> if 
> (OidIsValid(get_base_element_type(TupleDescAttr(tupdesc, i)->atttypid)))
> sv = plperl_ref_from_pg_array(attr, 
> TupleDescAttr(tupdesc, i)->atttypid);
> else if ((funcid = 
> get_transform_fromsql(TupleDescAttr(tupdesc, i)->atttypid, 
> current_call_data->prodesc->lang_oid, current_call_data->prodesc->trftypes)))
> sv = (SV *) 
> DatumGetPointer(OidFunctionCall1(funcid, attr));

Done.

> @@ -150,7 +148,7 @@ ValuesNext(ValuesScanState *node)
>  */
> values[resind] = 
> MakeExpandedObjectReadOnly(values[resind],
>   
>   isnull[resind],
> - 
>   att[resind]->attlen);
> + 
>   TupleDescAttr(slot->tts_tupleDescriptor, 
> resind)->attlen);
>
> @@ -158,9 +158,9 @@ convert_tuples_by_position(TupleDesc indesc,
>  * must agree.
>  */
> if (attrMap[i] == 0 &&
> -   indesc->attrs[i]->attisdropped &&
> -   indesc->attrs[i]->attlen == 
> outdesc->attrs[i]->attlen &&
> -   indesc->attrs[i]->attalign == 
> outdesc->attrs[i]->attalign)
> +   TupleDescAttr(indesc, i)->attisdropped &&
> +   TupleDescAttr(indesc, i)->attlen == 
> TupleDescAttr(outdesc, i)->attlen &&
> +   TupleDescAttr(indesc, i)->attalign == 
> TupleDescAttr(outdesc, i)->attalign)
> continue;

Done.

> I think you get the drift, there's more/

Done in some more places too.

> Review for 0002:
>
> @@ -71,17 +71,17 @@ typedef struct tupleConstr
>  typedef struct tupleDesc
>  {
> int natts;  /* number of 
> attributes in the tuple */
> -   Form_pg_attribute *attrs;
> -   /* attrs[N] is a pointer to the description of Attribute Number N+1 */
> TupleConstr *constr;/* constraints, or NULL if none */
> Oid tdtypeid;   /* composite type ID 
> for tuple type */
> int32   tdtypmod;   /* typmod for tuple type */
> booltdhasoid;   /* tuple has oid attribute in 
> its header */
> int tdrefcount; /* reference count, 
> or -1 if not counting */
> +   /* attrs[N] is the description of Attribute Number N+1 */
> +   FormData_pg_attribute attrs[FLEXIBLE_ARRAY_MEMBER];
>  } *TupleDesc;
>
> sorry if I'm beating on my hobby horse, but if we're re-ordering anyway,
> can you move TupleConstr to the second-to-last? That a) seems more
> consistent but b) (hobby horse, sorry) avoids unnecessary alignment
> padding.

Done.

> @@ -734,13 +708,13 @@ BuildDescForRelation(List *schema)
> /* Override TupleDescInitEntry's settings as requested */
> TupleDescInitEntryCollation(desc, attnum, attcollation);
> if (entry->storage)
> -   desc->attrs[attnum - 1]->attstorage = entry->storage;
> +   desc->attrs[attnum - 1].attstorage = entry->storage;
>
> /* Fill in additional stuff not handled by TupleDescInitEntry 
> */
> -   desc->attrs[attnum - 1]->attnotnull =

Re: [HACKERS] INSERT .. ON CONFLICT DO SELECT [FOR ..]

2017-08-14 Thread Peter Geoghegan

On Mon, Aug 14, 2017 at 6:23 PM, Marko Tiikkaja  wrote:
> Attached is a patch for $SUBJECT.  It might still be a bit rough around the
> edges and probably light on docs and testing, but I thought I'd post it
> anyway so I won't forget.

Is it possible for ON CONFLICT DO SELECT to raise a cardinality
violation error? Why or why not?



-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Explicit relation name in VACUUM VERBOSE log

2017-08-14 Thread Michael Paquier

On Tue, Aug 15, 2017 at 12:45 PM, Alvaro Herrera
 wrote:
> I'm thinking that this data is useful to analyze as a stream of related
> events, rather than as individual data points.  Grepping logs in order to
> extract the numbers is lame and slow.  If you additionally have to mix
> data that comes from vacuumdb with the autovacuum data from the server
> log, that's much worse.  Maybe you remember this thread?
> https://www.postgresql.org/message-id/flat/509300F7.5000803%402ndQuadrant.com
> I'm thinking it would be damn convenient to have both user-invoked
> VACUUM and autovacuum emit some sort of event data which is saved
> somewhere for later analysis.

Ah... A tracker for the history activity. With these kind of things
there need to be a careful design to control data retention as events
keep piling up. Yeah that would be useful to save CPU grepping for
dedicated logs. And we could just have an API to a history table to
which is sent a timestamp, an event name and an object like a JSON
blob or a text array. The stats collector could do the cleanup of the
oldest records by maintaining a counter to know the number of records
it should keep around, defined by a GUC, or a duration to allow
retention of history for this period, say the last X days of events.

For vacuum logs, it would be possible to get things done with saving
this information into a private memory area, and then offer a hook to
let callers do what they want with this data... Not extensible and
ugly, but that would do the work. Still I don't like that.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Explicit relation name in VACUUM VERBOSE log

2017-08-14 Thread Alvaro Herrera

Michael Paquier wrote:
> On Tue, Aug 15, 2017 at 11:14 AM, Alvaro Herrera
>  wrote:
> >> In vacuum_rel()@vacuum.c, there are a couple of logs that could be
> >> improved as well with the schema name.
> >
> > I agree that there's a lot of room for improvement there.  If I'm
> > allowed some scope creep, I'd say that gathering detailed vacuum info
> > from both autovacuum and user-invoked vacuum in a central place would be
> > very welcome.
> 
> Hm. I am not sure what you have in mind here.

I'm thinking that this data is useful to analyze as a stream of related
events, rather than as individual data points.  Grepping logs in order to
extract the numbers is lame and slow.  If you additionally have to mix
data that comes from vacuumdb with the autovacuum data from the server
log, that's much worse.  Maybe you remember this thread?
https://www.postgresql.org/message-id/flat/509300F7.5000803%402ndQuadrant.com
I'm thinking it would be damn convenient to have both user-invoked
VACUUM and autovacuum emit some sort of event data which is saved
somewhere for later analysis.

I *did* mention this was scope creep ;-)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] reload-through-the-top-parent switch the partition table

2017-08-14 Thread Robert Haas

On Mon, Aug 14, 2017 at 12:40 AM, Rushabh Lathia
 wrote:
> On Fri, Aug 11, 2017 at 10:50 PM, Robert Haas  wrote:
>> On Fri, Aug 11, 2017 at 5:36 AM, Rushabh Lathia
>>  wrote:
>> > Please find attach patch with the changes.
>>
>> I found the way that you had the logic structured in flagInhTables()
>> to be quite hard to follow, so I rewrote it in the attached version.
>> This version also has a bunch of minor cosmetic changes.  Barring
>> objections, I'll commit it once the tree opens for v11 development.
>
> Thanks Robert.
>
> Patch changes looks good to me.

Thanks.  Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Orphaned files in base/[oid]

2017-08-14 Thread Michael Paquier

On Tue, Aug 15, 2017 at 3:56 AM, Andres Freund  wrote:
> I think there are some possibilities to close the gap here. We could
> e.g. have .delete_on_crash marker files that get installed
> when creating a new persistent relfilenode. If we set up things so they
> get deleted post commit, but inside the critical section, we could rely
> on them being present in case of crash, but consistently removed during
> WAL replay. At the end of recovery, iterate over the whole datadir and
> nuke all relations with marker files present.

I agree that an approach like that has value for the problem defined
in this thread.

> I first thought that'd cost an additional fsync per relation
> created. But I think we actually can delay that to a pre-commit phase,
> if we have XLOG_SMGR_CREATE create those markers via a flag, and fsync
> them just before checkpoint (via the usual delayed fsync mechanism).
> That'd still require an XLogFlush(), but that seems hard to avoid unless
> we just don't create relations on FS level until buffers are
> evicted and/or BufferSync().

Yeah, that should work as well.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-14 Thread Noah Misch

On Fri, Aug 11, 2017 at 11:59:14AM -0400, Tom Lane wrote:
> "Adam, Etienne (Nokia-TECH/Issy Les Moulineaux)"  
> writes:
> > ERROR:  XX000: unrecognized node type: 90
> > LOCATION:  ExecReScan, execAmi.c:284
> 
> (gdb) p (NodeTag) 90
> $1 = T_GatherMergeState
> 
> So, apparently somebody wrote ExecReScanGatherMerge, but never bothered
> to plug it into ExecReScan.  From which we may draw depressing conclusions
> about how much it's been tested.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Robert,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: proposal - psql: possibility to specify sort for describe commands, when size is printed

2017-08-14 Thread Peter Eisentraut

On 3/11/17 07:06, Pavel Stehule wrote:
> I am sending a updated version with separated sort direction in special
> variable

This patch also needs a rebase.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SQL/JSON in PostgreSQL

2017-08-14 Thread Peter Eisentraut

On 3/15/17 11:56, David Steele wrote:
>> This patch has been moved to CF 2017-07.
> 
> I did not manage to move this patch when I said had.  It is now moved.

Unsurprisingly, this patch needs a major rebase.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] What users can do with custom ICU collations in Postgres 10

2017-08-14 Thread Craig Ringer

On 15 August 2017 at 10:16, Michael Paquier 
wrote:

> On Tue, Aug 15, 2017 at 11:10 AM, Craig Ringer 
> wrote:
> > Ooh, this finally gives us a path toward case-insensitive default
> database
> > collation via CLDR caseLevel.
> >
> > http://userguide.icu-project.org/collation
> > http://www.unicode.org/reports/tr35/tr35-collation.html#Algorithm_Case
> >
> > That *definitely* should be documented and exposed by initdb.
>
> The addition of an interface to initdb smells like an item for v11~,
> but the documentation for 10 could be improved in this sense?
>

Yeah, not suggesting changing it for Pg10, way too late.

It's also not enough for case-insensitive DB by its self, since we still do
binary compares for equality. There'd need to be deeper surgery to make it
work. So I'm getting prematurely excited here.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Explicit relation name in VACUUM VERBOSE log

2017-08-14 Thread Michael Paquier

On Tue, Aug 15, 2017 at 11:14 AM, Alvaro Herrera
 wrote:
>> In vacuum_rel()@vacuum.c, there are a couple of logs that could be
>> improved as well with the schema name.
>
> I agree that there's a lot of room for improvement there.  If I'm
> allowed some scope creep, I'd say that gathering detailed vacuum info
> from both autovacuum and user-invoked vacuum in a central place would be
> very welcome.

Hm. I am not sure what you have in mind here. The existing logs are
spread all other the place, and it is useful to VERBOSE information
when things are being run, bot just once a run has been done (if what
you mean here is to store vacuum verbose-related stats for each
relation in memory and then spawn it all at once, or we could log this
info and also store them).
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] What users can do with custom ICU collations in Postgres 10

2017-08-14 Thread Michael Paquier

On Tue, Aug 15, 2017 at 11:10 AM, Craig Ringer  wrote:
> Ooh, this finally gives us a path toward case-insensitive default database
> collation via CLDR caseLevel.
>
> http://userguide.icu-project.org/collation
> http://www.unicode.org/reports/tr35/tr35-collation.html#Algorithm_Case
>
> That *definitely* should be documented and exposed by initdb.

The addition of an interface to initdb smells like an item for v11~,
but the documentation for 10 could be improved in this sense?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Explicit relation name in VACUUM VERBOSE log

2017-08-14 Thread Alvaro Herrera

Michael Paquier wrote:
> On Tue, Aug 15, 2017 at 10:27 AM, Masahiko Sawada  
> wrote:
> > Currently vacuum verbose outputs vacuum logs as follows. The first log
> > message INFO: vacuuming "public.hoge" writes the relation name with
> > schema name but subsequent vacuum logs output only relation name
> > without schema name. I've encountered a situation where there are some
> > same name tables in different schemas and the concurrent vacuum logs
> > made me hard to distinguish tables. Is there any reasons why we don't
> > write an explicit name in vacuum verbose logs? If not, can we add
> > schema names to be more clearly?
> 
> That's definitely a good idea. lazy_vacuum_rel() uses in one place
> dbname.schname.relname for autovacuum. This is an inconsistent bit,
> but that's not really worth changing and there is always
> log_line_prefix = '%d'. 

Worth keeping in mind that INFO messages do not normally go to the
server log, but rather only to the client.  If it were a problem at the
server side, you could also suggest adding %p to the log line prefix to
disambiguate.  Maybe the scenario where this is a real problem is
vacuumdb -j ...

> In vacuum_rel()@vacuum.c, there are a couple of logs that could be
> improved as well with the schema name.

I agree that there's a lot of room for improvement there.  If I'm
allowed some scope creep, I'd say that gathering detailed vacuum info
from both autovacuum and user-invoked vacuum in a central place would be
very welcome.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] What users can do with custom ICU collations in Postgres 10

2017-08-14 Thread Craig Ringer

On 10 August 2017 at 06:49, Peter Geoghegan  wrote:

> There are actually very many customizations to collations that are
> possible beyond what the "stock" ICU collations provide (whatever
> "stock" means). Some of these are really cool, and I can imagine use
> cases where they are very compelling that have nothing to do with
> internationalization (such customizations are how we should eventually
> implement case-insensitive collations, once the infrastructure for
> doing that without breaking hashing is in place).
>
> I'd like to give a demo on what is already possible, but not currently
> documented. I didn't see anyone else comment on this, including Peter
> E (maybe I missed that?). We should improve the documentation in this
> area, to get this into the hands of users.
>
> Say we're unhappy that numbers come first, which we see here:
>
>
Ooh, this finally gives us a path toward case-insensitive default database
collation via CLDR caseLevel.

http://userguide.icu-project.org/collation

http://www.unicode.org/reports/tr35/tr35-collation.html#Algorithm_Case

That *definitely* should be documented and exposed by initdb.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] make async slave to wait for lsn to be replayed

2017-08-14 Thread Craig Ringer

On 22 March 2017 at 01:17, Robert Haas  wrote:

> On Sun, Mar 12, 2017 at 10:20 PM, Thomas Munro
>  wrote:
> > Maybe someone can think of a clever way for an extension to insert a
> > wait for a user-supplied LSN *before* acquiring a snapshot so it can
> > work for the higher levels, or maybe the hooks should go into core
> > PostgreSQL so that the extension can exist as an external project not
> > requiring a patched PostgreSQL installation, or maybe this should be
> > done with new core syntax that extends transaction commands.  Do other
> > people have views on this?
>
> IMHO, trying to do this using a function-based interface is a really
> bad idea for exactly the reasons you mention.  I don't see why we'd
> resist the idea of core syntax here; transactions are a core part of
> PostgreSQL.
>
> There is, of course, the question of whether making LSNs such a
> user-visible thing is a good idea in the first place, but that's a
> separate question from issue of what syntax for such a thing is best.


(I know this is old, but):

That ship sailed a long time ago unfortunately, they're all over
pg_stat_replication and pg_replication_slots and so on. They're already
routinely used for monitoring replication lag in bytes, waiting for a peer
to catch up, etc.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2017-08-14 Thread Craig Ringer

On 15 August 2017 at 09:11, Moon Insung 
wrote:

> Dear Craig Ringer
>
>
>
> Frist, thank you for implementing the necessary function.
>
>
>
> but, i have some question.
>
>
>
> question 1) vacuum freeze hint bits
>
> If run a vacuum freeze, bits in the infomask will be 0x0300.
>
> in this case, if output the value of informsk in the run to you modified,
>
> HEAP_XMIN_COMMITTED(0x0100), HEAP_XMIN_INVALID(0x0200),
> HEAP_XMIN_FROZEN(0x0300)
>
> all outputs to hint bits.
>
>
>
> is it normal to output values?
>
>
>
> if look at htup_details.h code,
>
>
>
> #define HeapTupleHeaderXminInvalid(tup) \
>
> ( \
>
>   ((tup)->t_infomask & (HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID))
> == \
>
> HEAP_XMIN_INVALID \
>
> )
>
> #define HeapTupleHeaderSetXminCommitted(tup) \
>
> ( \
>
>   AssertMacro(!HeapTupleHeaderXminInvalid(tup)), \
>
>   ((tup)->t_infomask |= HEAP_XMIN_COMMITTED) \
>
> )
>
>
>
> HEAP_XMIN_INVALID and HEAP_XMIN_COMMITTED can not be write simultaneously.
>

The bits are set, those macros just test to exclude the special meaning of
both bits being set at once to mean "frozen".

I was reluctant to filter out  HEAP_XMIN_COMMITTED and HEAP_XMIN_INVALID
when we detect that it's frozen, because that could well be misleading when
debugging.

If you think that is useful, then I suggest you add an option so that when
it's outputting the interpreted mask not the raw mask, it suppresses output
of HEAP_XMIN_COMMITTED and HEAP_XMIN_INVALID if HEAP_XMIN_FROZEN.

question 2) xmax lock hint bits
>
> similar to the vacuum freezeze question..
>
> Assume that the infomask has a bit of 0x0050
>
>
>
> In this case, if run on the code that you modified,
>
> HEAP_XMAX_KEYSHR_LOCK(0x0010), HEAP_XMAX_EXCL_LOCK(0x0040),
> HEAP_XMAX_IS_LOCKED_ONLY
>
> three hint bits are the output.
>
>
>
> if look at htup_details.h code,
>
>
>
> #define HEAP_XMAX_IS_SHR_LOCKED(infomask) \
>
>   (((infomask) & HEAP_LOCK_MASK) == HEAP_XMAX_SHR_LOCK)
>
> #define HEAP_XMAX_IS_EXCL_LOCKED(infomask) \
>
>   (((infomask) & HEAP_LOCK_MASK) == HEAP_XMAX_EXCL_LOCK)
>
> #define HEAP_XMAX_IS_KEYSHR_LOCKED(infomask) \
>
>   (((infomask) & HEAP_LOCK_MASK) == HEAP_XMAX_KEYSHR_LOCK)
>
>
>
> It is divided into to hint bits.
>
> so I think this part needs to fix.
>

It's the same issue as above, with the same answer IMO.

If we're showing the raw mask bits we should show the raw mask bits only.

But if we're showing combined bits and masks too, I guess we should filter
out the raw bits when matched by some mask.

I'm not entirely convinced by that, since I think hiding information could
create more confusion than it fixes. I welcome others' views here.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Explicit relation name in VACUUM VERBOSE log

2017-08-14 Thread Michael Paquier

On Tue, Aug 15, 2017 at 10:27 AM, Masahiko Sawada  wrote:
> Currently vacuum verbose outputs vacuum logs as follows. The first log
> message INFO: vacuuming "public.hoge" writes the relation name with
> schema name but subsequent vacuum logs output only relation name
> without schema name. I've encountered a situation where there are some
> same name tables in different schemas and the concurrent vacuum logs
> made me hard to distinguish tables. Is there any reasons why we don't
> write an explicit name in vacuum verbose logs? If not, can we add
> schema names to be more clearly?

That's definitely a good idea. lazy_vacuum_rel() uses in one place
dbname.schname.relname for autovacuum. This is an inconsistent bit,
but that's not really worth changing and there is always
log_line_prefix = '%d'. In vacuum_rel()@vacuum.c, there are a couple
of logs that could be improved as well with the schema name.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] measuring the impact of increasing WAL segment size

2017-08-14 Thread Andres Freund

Hi,

Thanks for running this!

On 2017-08-15 03:27:00 +0200, Tomas Vondra wrote:
> Granted - this chart does not show latency, so it's not a complete
> picture.

That'd be quite useful to see here, too.


> Also, if you care about raw OLTP performance you're probably already running
> on flash, where this does not seem to be an issue. It's also not an issue if
> you have RAID controller with write cache, which can absorb those writes.
> And of course, those machines have reasonable dirty_background_bytes values
> (like 64MB or less).

The problem is that dirty_background_bytes = 64MB is *not* actually a
generally reasonable config, because it makes temp table, disk sort, etc
operations flush way too aggressively.


> b) The "flushing enabled" case seems to be much more sensitive to WAL
> segment size increases. It seems the throughput drops a bit (by 10-20%), for
> some segment sizes, and then recovers. The behavior seems to be smooth (not
> just a sudden drop for one segment size) but the value varies depending on
> the scale, test type (tpc-b /simple-update).

That's interesting.  I presume you've not tested with separate data /
xlog disks?


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Explicit relation name in VACUUM VERBOSE log

2017-08-14 Thread Masahiko Sawada

Hi all,

Currently vacuum verbose outputs vacuum logs as follows. The first log
message INFO: vacuuming "public.hoge" writes the relation name with
schema name but subsequent vacuum logs output only relation name
without schema name. I've encountered a situation where there are some
same name tables in different schemas and the concurrent vacuum logs
made me hard to distinguish tables. Is there any reasons why we don't
write an explicit name in vacuum verbose logs? If not, can we add
schema names to be more clearly?

=# vacuum verbose hoge;
INFO:  vacuuming "public.hoge"
INFO:  index "hoge_idx" now contains 10 row versions in 276 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  "hoge": found 0 removable, 108 nonremovable row versions in 1
out of 443 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 559
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
VACUUM

Attached patch and will add to commit fest.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


vacuum_more_explicit_relname.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] INSERT .. ON CONFLICT DO SELECT [FOR ..]

2017-08-14 Thread Marko Tiikkaja

Hi,

Attached is a patch for $SUBJECT.  It might still be a bit rough around the
edges and probably light on docs and testing, but I thought I'd post it
anyway so I won't forget.


.m


insert_conflict_select_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2017-08-14 Thread Moon Insung

Dear Craig Ringer

 

Frist, thank you for implementing the necessary function.

 

but, i have some question.

 

question 1) vacuum freeze hint bits

If run a vacuum freeze, bits in the infomask will be 0x0300.

in this case, if output the value of informsk in the run to you modified,

HEAP_XMIN_COMMITTED(0x0100), HEAP_XMIN_INVALID(0x0200), HEAP_XMIN_FROZEN(0x0300)

all outputs to hint bits.

 

is it normal to output values?

 

if look at htup_details.h code,

 

#define HeapTupleHeaderXminInvalid(tup) \

( \

  ((tup)->t_infomask & (HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID)) == \

HEAP_XMIN_INVALID \

)

#define HeapTupleHeaderSetXminCommitted(tup) \

( \

  AssertMacro(!HeapTupleHeaderXminInvalid(tup)), \

  ((tup)->t_infomask |= HEAP_XMIN_COMMITTED) \

)

 

HEAP_XMIN_INVALID and HEAP_XMIN_COMMITTED can not be write simultaneously.

 

So I think the value of 0x0300 is to HEAP_XMIN_COMMITTED, HEAP_XMIN_FROZEN

Only output needs to be values.

 

 

question 2) xmax lock hint bits

similar to the vacuum freezeze question..

Assume that the infomask has a bit of 0x0050

 

In this case, if run on the code that you modified,

HEAP_XMAX_KEYSHR_LOCK(0x0010), HEAP_XMAX_EXCL_LOCK(0x0040), 
HEAP_XMAX_IS_LOCKED_ONLY

three hint bits are the output.

 

if look at htup_details.h code,

 

#define HEAP_XMAX_IS_SHR_LOCKED(infomask) \

  (((infomask) & HEAP_LOCK_MASK) == HEAP_XMAX_SHR_LOCK)

#define HEAP_XMAX_IS_EXCL_LOCKED(infomask) \

  (((infomask) & HEAP_LOCK_MASK) == HEAP_XMAX_EXCL_LOCK)

#define HEAP_XMAX_IS_KEYSHR_LOCKED(infomask) \

  (((infomask) & HEAP_LOCK_MASK) == HEAP_XMAX_KEYSHR_LOCK)

 

It is divided into to hint bits.

so I think this part needs to fix.

 

If my opinion may be wrong. So plz check one more time.

 

Regards.

Moon

 

From: pgsql-hackers-ow...@postgresql.org 
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Craig Ringer
Sent: Thursday, July 20, 2017 8:53 PM
To: Ashutosh Sharma
Cc: PostgreSQL Hackers; Julien Rouhaud; Pavan Deolasee; Álvaro Herrera; Peter 
Eisentraut; Masahiko Sawada; abhijit Menon-Sen; Peter Geoghegan
Subject: Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

 

 

 

On 20 Jul. 2017 19:09, "Ashutosh Sharma"  > wrote:

I had a quick look into this patch and also tested it and following
are my observations.

 

Thanks very much.

 

I'll expand the tests to cover various normal and nonsensical masks and 
combinations and fix the identified issues.

 

This was a quick morning's work in amongst other things so not surprised I 
missed a few details. The check is appreciated.

Re: [HACKERS] Fix a typo in sequence.c

2017-08-14 Thread Masahiko Sawada

On Tue, Aug 15, 2017 at 2:53 AM, Peter Eisentraut
 wrote:
> On 8/14/17 08:32, Masahiko Sawada wrote:
>> While reading source code, I found a typo in sequence.c file. Attached
>> patch for fixing it.
>>
>> s/localy/locally/g
>
> Fixed.

Thank you!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] shared memory based stat collector (was: Sharing record typmods between backends)

2017-08-14 Thread Alvaro Herrera

Robert Haas wrote:
> On Mon, Aug 14, 2017 at 1:12 PM, Alvaro Herrera
>  wrote:
> > Yeah, the problem that lwlocks aren't released is because the launcher
> > is not in a transaction at that point, so AbortCurrentTransaction()
> > doesn't release locks like it normally would.  The simplest fix (in the
> > attached 0001 patch) is to add a LWLockReleaseAll() call to the jmp
> > block, though I wonder if there should be some other cleanup functions
> > called from there, or whether perhaps it'd be a better strategy to have
> > the launcher run in a transaction at all times.
> 
> Well, if you're going to do aborts outside of a transaction, just
> adding an LWLockReleaseAll() isn't really sufficient.  You need to
> look at something like CheckpointerMain() and figure out which of
> those push-ups are needed here as well.  Probably at least
> ConditionVariableCancelSleep(), pgstat_report_wait_end(),
> AbortBufferIO(), and UnlockBuffers() -- quite possibly some of those
> other AtEOXact calls as well.

Agreed.  I think a saner answer is to create a single function that does
it all, and use it in all places that need it, rather than keep adding
more copies of the same thing.  Attached revised version of patch does
things that way; I think it's good cleanup.  (I had to make the
ResourceOwnerRelease call be conditional on there being a resowner;
seems okay to me.)

I put the new cleanup routine in xact.c, which is not exactly the
perfect place (had to add access/xact.h to a couple of files), but the
fact that the new routine is a cut-down version of another routine in
xact.c makes it clear to me that that's where it belongs.  I first put
it in bootstrap, alongside AuxiliaryProcessMain, but after a while that
seemed wrong.

> > The other problem is that there's no attempt to handle a failed DSA
> > creation/attachment.  The second patch just adds a PG_TRY block that
> > sets a flag not to try the DSA calls again if the first one fails.  It
> > throws a single ERROR line, then autovacuum continues without
> > workitem support.
> 
> Yeah, and the other question -- which Thomas asked before you
> originally committed originally, and which I just now asked again is
> "Why in the world are you using DSA for this at all?".  There are
> serious problems with that which both he and I have pointed out, and
> you haven't explained why it's a good idea at any point, AFAICT.

The main reason is that I envision that the workitem stuff will be used
for other things in the future than just brin summarization, and it
seemed a lame idea to just use a fixed-size memory area in the standard
autovacuum shared memory area.  I think unbounded growth is of course
going to be bad.  The current coding doesn't allow for any growth beyond
the initial fixed size, but it's easier to extend the system from the
current point rather than wholly changing shared memory usage pattern
while at it.

I thought I *had* responded to Thomas in that thread, BTW.

> Among those problems:
> 
> 1. It doesn't work if dynamic_shared_memory_type=none.  That's OK for
> an optional subsystem, but autovacuum is not very optional.

Autovacuum as a whole continues to work if there's no dynamic shared
memory; it's just the workitems stuff that stops working if there's no
DSA.  (After fixing the bug that makes it crash in the case of
dynamic_shared_memory_type=none, of course).

> 2. It allows unbounded bloat if there's no limit on the number of work
> items and is pointless is there is since you could then just use the
> main shared memory segment.

Yeah, there are probably better strategies than just growing the memory
area every time another entry is needed.  Work-items as a whole need a
lot more development from the current point.

> I really think you should respond to those concerns, not just push a
> minimal fix.

We'll just continue to develop things from the current point.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 7a6c80ffb7c610640d9cede8535ceb79a55ddb8f Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 14 Aug 2017 13:54:57 -0300
Subject: [PATCH v2 1/2] Release lwlocks in autovacuum launcher error handling
 path

For regular processes, lwlock release is handling via
AbortCurrentTransaction(), which autovacuum already does.  However,
autovacuum launcher sometimes obtains lock outside of any transaction,
in which case AbortCurrentTransaction is a no-op.  Continuing after
error recovery would block if we tried to obtain an lwlock that we
failed to release.

Reported-by: Robert Haas
Discussion: 
https://postgr.es/m/ca+tgmobqvbz4k_+rsmim9herkpy3vs5xnbkl95gsenwijzp...@mail.gmail.com
---
 src/backend/access/transam/xact.c | 35 +++
 src/backend/bootstrap/bootstrap.c |  4 +---
 src/backend/postmaster/autovacuum.c   |  7 ++-
 src/backend/postmaster/bgwriter.c | 21

Re: [HACKERS] postgres_fdw: evaluate placeholdervars on remote server

2017-08-14 Thread Peter Eisentraut

On 4/3/17 22:00, Etsuro Fujita wrote:
> On 2017/04/04 3:21, Andres Freund wrote:
>> On 2017-02-28 21:45:22 +0900, Etsuro Fujita wrote:
>>> Here is a patch for $subject.
>>
>> This is a nontrivial patch, submitted just before the start of the last
>> CF for postgres 10.  Therefore I think we should move this to the next
>> CF.
> 
> Honestly, I'm not satisfied with this patch and I think it would need 
> more work.  Moved to the next CF.

This patch needs to be rebased for the upcoming commit fest.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Statement-level rollback

2017-08-14 Thread Peter Eisentraut

On 2/28/17 02:39, Tsunakawa, Takayuki wrote:
> The code for stored functions is not written yet, but I'd like your feedback 
> for the specification and design based on the current patch.  I'll add this 
> patch to CommitFest 2017-3.

This patch needs to be rebased for the upcoming commit fest.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] make async slave to wait for lsn to be replayed

2017-08-14 Thread Peter Eisentraut

On 3/9/17 07:49, Ivan Kartyshov wrote:
> Here I attached rebased patch waitlsn_10dev_v3 (core feature)
> I will leave the choice of implementation (core/contrib) to the 
> discretion of the community.

This patch is registered in the upcoming commit fest, but it needs to be
rebased.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Avoid GatherMerge crash when there are no workers.

2017-08-14 Thread Andres Freund

Hi,

On 2017-04-03 12:56:36 +0530, Rushabh Lathia wrote:
> On my local environment I was getting coverage for the heap_compare_slots
> with
> existing test. But it seems like due to low number of record fetch only
> leader get
> evolved to the fetching tuples in the shared report.
> 
> I modified the current test coverage for the Gather Merge to add more rows
> to be
> fetch by the GatherMerge node to make sure that it do coverage for
> heap_compare_slots. Also added test for the zero worker.
> 
> PFA patch as well as LCOV report for the nodeGatherMerge.c.

Pushed, thanks!

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_wal_write statistics view

2017-08-14 Thread Peter Eisentraut

On 3/29/17 22:10, Haribabu Kommi wrote:
> Updated patch to use shared counter instead of adding pg_stat_ calls to send
> the statistics from each background process/worker.

Your patch needs to be rebased and the OID assignments updated.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Protect syscache from bloating with negative cache entries

2017-08-14 Thread Peter Eisentraut

On 1/24/17 02:58, Kyotaro HORIGUCHI wrote:
>> BTW, if you set a slightly larger
>> context size on the patch you might be able to avoid rebases; right
>> now the patch doesn't include enough context to uniquely identify the
>> chunks against cacheinfo[].
> git format-patch -U5 fuses all hunks on cacheinfo[] together. I'm
> not sure that such a hunk can avoid rebases. Is this what you
> suggested? -U4 added an identifiable forward context line for
> some elements so the attached patch is made with four context
> lines.

This patch needs another rebase for the upcoming commit fest.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2017-08-14 Thread Peter Eisentraut

On 4/4/17 01:06, Haribabu Kommi wrote:
> Both pg_dump and pg_upgrade tests are passed. Updated patch attached
> I will add this patch to the next commitfest.

This patch needs to be rebased for the upcoming commit fest.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench more operators & functions

2017-08-14 Thread Peter Eisentraut

On 5/24/17 03:14, Fabien COELHO wrote:
> I've improved it in attached v11:
>   - add a link to the CASE full documentation
>   - add an example expression with CASE ...

This patch needs (at least) a rebase for the upcoming commit fest.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: new set of psql patches for loading (saving) data from (to) text, binary files

2017-08-14 Thread Peter Eisentraut

On 3/18/17 12:51, Pavel Stehule wrote:
> I rewrote these patches - it allows binary export/import from psql and
> the code is very simple. The size of the patch is bigger due including
> 4KB binary file (in hex format 8KB).

This patch needs (at least) a rebase for the upcoming commit fest.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add Roman numeral conversion to to_number

2017-08-14 Thread Merlin Moncure

On Mon, Aug 14, 2017 at 2:48 PM, Peter Eisentraut
 wrote:
> On 8/3/17 13:45, Robert Haas wrote:
>> On Thu, Aug 3, 2017 at 9:25 AM, Oliver Ford  wrote:
>>> Adds to the to_number() function the ability to convert Roman numerals
>>> to a number. This feature is on the formatting.c TODO list. It is not
>>> currently implemented in either Oracle, MSSQL or MySQL so gives
>>> PostgreSQL an edge :-)
>> I kind of put my head in my hands when I saw this.  I'm not really
>> sure it's worth complicating the code for something that has so little
>> practical utility, but maybe other people will feel differently.
>
> I can't get excited about it.  to_number() and such usually mirror the
> Oracle implementation, so having something that is explicitly not in
> Oracle goes a bit against its mission.
>
> One of the more interesting features of to_number/to_char is that it has
> a bunch of facilities for formatting decimal points, leading/trailing
> zeros, filling in spaces and signs, and so on.  None of that applies
> naturally to Roman numerals, so there isn't a strong case for including
> that into these functions, when a separate function or module could do.

Well, doesn't that also apply to scientific notation ()?

'RN' is documented as an accepted formatting string, and nowhere does
it mention that it only works for input.  So we ought to allow for it
to be fixed or at least document that it does not work.  It's nothing
but a curio obviously, but it's kind of cool IMO.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-08-14 Thread Tom Lane

Alexander Korotkov  writes:
> On Mon, Aug 14, 2017 at 2:09 PM, Mark Rofail  wrote:
>> I think we should cast the operands in the RI queries fired as follows
>> 1. we get the array type from the right operand
>> 2. compare the two array type and see which type is more "general" (as to
>> which should be cast to which, int2 should be cast to int4, since casting
>> int4 to int2 could lead to data loss). This can be done by seeing which Oid
>> is larger numerically since, coincidentally, they are declared in this way
>> in pg_type.h.

> I'm not sure numerical comparison of Oids is a good idea.

I absolutely, positively guarantee that a patch written that way will be
rejected.

> Should we instead use logic similar to select_common_type() and underlying
> functions?

Right.  What we typically do in cases like this is check to see if there
is an implicit coercion available in one direction but not the other.
I don't know if you can use select_common_type() directly, but it would
be worth looking at.

Also, given that the context here is RI constraints, what you're really
worried about is whether the referenced column's uniqueness constraint
is associated with compatible operators, so looking into its operator
class for relevant operators might be the right way to think about it.
I wrote something just very recently that touches on that ... ah,
here it is:
https://www.postgresql.org/message-id/13220.1502376...@sss.pgh.pa.us

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/perl extension fails on Windows

2017-08-14 Thread Tom Lane

I wrote:
> Sandeep Thakkar  writes:
>> We built the sources with this patch and were able to create the plperl
>> extension on Windows 32bit and 64bit.

> Excellent, thanks for testing.  I'll finish up the configure-script part
> and push this shortly.

So the early returns from the buildfarm are that this broke baiji,
although a couple of other Windows critters seem to be OK with it.

This presumably means that baiji's version of perl was built with
_USE_32BIT_TIME_T, but $Config{ccflags} isn't admitting to that.
I wonder what Perl version that is exactly, and what it reports for
$Config{ccflags}, and whether there is some other place that we
ought to be looking for the info.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-08-14 Thread Alexander Korotkov

On Mon, Aug 14, 2017 at 2:09 PM, Mark Rofail  wrote:

> On Tue, Aug 8, 2017 at 3:24 PM, Alexander Korotkov 
> wrote:
>
>> On Tue, Aug 8, 2017 at 4:12 PM, Mark Rofail 
>> wrote:
>>
>>> On Tue, Aug 8, 2017 at 2:25 PM, Alexander Korotkov >> > wrote:
>>>
>> GROUP BY would also use default btree/hash opclass for element type.  It
 doesn't differ from DISTINCT from that point.

>>> Then there's no going around this limitation,
>>>
>> That seems like this.
>>
>
> Since for now, the limitation
>
>> ✗ presupposes that count(distinct y) has exactly the same notion of
>> equality that the PK unique index has. In reality, count(distinct) will
>> fall back to the default btree opclass for the array element type.
>
> is unavoidable.
>
> I started to look at the next one on the list.
>
>> ✗ coercion is unsopported. i.e. a numeric can't refrence int8
>
>
> The limitation in short.
>
> #= CREATE TABLE PKTABLEFORARRAY ( ptest1 int4 PRIMARY KEY, ptest2 text );
> #= CREATE TABLE FKTABLEFORARRAY ( ftest1 int2[], FOREIGN KEY (EACH ELEMENT
> OF ftest1) REFERENCES PKTABLEFORARRAY, ftest2 int );
>
> should be accepted but this produces the following error
> operator does not exist: integer[] @> smallint
>
> The algorithm I propose:
> I don't think it's easy to modify the @>> operator as we discussed here.
> 
>
> I think we should cast the operands in the RI queries fired as follows
> 1. we get the array type from the right operand
> 2. compare the two array type and see which type is more "general" (as to
> which should be cast to which, int2 should be cast to int4, since casting
> int4 to int2 could lead to data loss). This can be done by seeing which Oid
> is larger numerically since, coincidentally, they are declared in this way
> in pg_type.h.
>

I'm not sure numerical comparison of Oids is a good idea.  AFAIK, any
regularity of Oids assignment is coincidence...  Also, consider
user-defined data types: their oids depend on order of their creation.
Should we instead use logic similar to select_common_type() and underlying
functions?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Foreign tables privileges not shown in information_schema.table_privileges

2017-08-14 Thread Tom Lane

Peter Eisentraut  writes:
> - Materialized views not included.  I think that is an intentional
> omission.  It's valid to reconsider, but it would be to be a separate
> discussion.

Yes.  The problem is that matviews are not in the SQL standard, so
what are you going to show in tables.table_type?  Do they even belong
there, rather than under "views"?

Our approach to date has been that objects that are outside the scope of
what can be shown standards-compliantly should just be omitted from the
information_schema views.  Thus for example exclusion constraints are
omitted.  They're certainly a type of constraint, but we can't wedge them
into the information_schema view of things without having not-per-spec
output of some sort.  I think the same policy must apply to matviews.

It's not entirely clear to me that it was a good idea for 262e821d
to expose partitioned tables in information_schema.  By doing that,
you're essentially arguing that there is no reason for an application
to know the difference between a plain table and a partitioned one.
Maybe that's true, but it's not incontrovertible.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add Roman numeral conversion to to_number

2017-08-14 Thread Peter Eisentraut

On 8/3/17 13:45, Robert Haas wrote:
> On Thu, Aug 3, 2017 at 9:25 AM, Oliver Ford  wrote:
>> Adds to the to_number() function the ability to convert Roman numerals
>> to a number. This feature is on the formatting.c TODO list. It is not
>> currently implemented in either Oracle, MSSQL or MySQL so gives
>> PostgreSQL an edge :-)
> I kind of put my head in my hands when I saw this.  I'm not really
> sure it's worth complicating the code for something that has so little
> practical utility, but maybe other people will feel differently.

I can't get excited about it.  to_number() and such usually mirror the
Oracle implementation, so having something that is explicitly not in
Oracle goes a bit against its mission.

One of the more interesting features of to_number/to_char is that it has
a bunch of facilities for formatting decimal points, leading/trailing
zeros, filling in spaces and signs, and so on.  None of that applies
naturally to Roman numerals, so there isn't a strong case for including
that into these functions, when a separate function or module could do.

There are probably a bunch of Perl or Python modules that can be
employed for this.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Foreign tables privileges not shown in information_schema.table_privileges

2017-08-14 Thread Peter Eisentraut

On 8/11/17 04:52, Ashutosh Bapat wrote:
>  On Thu, Aug 10, 2017 at 6:30 PM, Nicolas Thauvin
>  wrote:
>> Hello,
>>
>> The information_schema.table_privileges view filters on regular tables
>> and views. Foreign tables are not shown in this view but they are in
>> other views of the information_schema like tables or column_privileges.
>>
>> Is it intentional? A patch is attached if not.
> 
> The line was first added by 596652d6 and updated by 262e821d to
> include partitioned tables. Looks like we have forgot to add tables
> added in between i.e. foreign tables and materialized views.
> column_privileges doesn't have materialized views. Attached patch adds
> materialized views to column_privileges view along with your changes.

I see several neighboring issues here:

- Foreign tables privileges not shown in
information_schema.table_privileges -- That is an omission that should
be fixed.

- information_schema.tables shows table_type 'FOREIGN TABLE', but it
should be 'FOREIGN' per SQL standard.

- Materialized views not included.  I think that is an intentional
omission.  It's valid to reconsider, but it would be to be a separate
discussion.

I think I would fix #1 and #2 with back patches but no catversion change.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Orphaned files in base/[oid]

2017-08-14 Thread Chris Travers

On Mon, Aug 14, 2017 at 8:40 PM, Tom Lane  wrote:

>
>
> It would be possible to have orphaned non-temp tables if you'd suffered
> a crash during the transaction that created those tables.  Ordinarily
> a newly-created table file wouldn't be that large, but if your workflow
> created tables and shoved boatloads of data into them in the same
> transaction, it's not so hard to see this becoming an issue.
>

I think the working theory is that these were very like a number of very
large (multi-hundred-GB materialised views).

>
> The core problem with zapping non-temp table files is that you can't
> do that unless you're sure you have consistent, up-to-date pg_class
> data that nobody else is busy adding to.  It's hard to see an external
> application being able to do that safely.  You certainly can't do it
> at the point in the postmaster startup cycle where we currently do
> the other things --- for those, we rely only on filesystem naming
> conventions to identify what to zap.

Yeah that occurred to me. At this point I would settle for something I
could run with Postgres in single user mode.  Although that is very far
from ideal.  So what I wonder is if at least a short-term solution might be
a utility that starts Postgres in single user mode and we insist that
PostgreSQL is otherwise not running before the run.

I am certainly not feeling qualified at present for more advanced solutions
but that I might be able to do.

>
> regards, tom lane
>

-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin

Re: [HACKERS] shared memory based stat collector (was: Sharing record typmods between backends)

2017-08-14 Thread Robert Haas

On Mon, Aug 14, 2017 at 1:12 PM, Alvaro Herrera
 wrote:
> Yeah, the problem that lwlocks aren't released is because the launcher
> is not in a transaction at that point, so AbortCurrentTransaction()
> doesn't release locks like it normally would.  The simplest fix (in the
> attached 0001 patch) is to add a LWLockReleaseAll() call to the jmp
> block, though I wonder if there should be some other cleanup functions
> called from there, or whether perhaps it'd be a better strategy to have
> the launcher run in a transaction at all times.

Well, if you're going to do aborts outside of a transaction, just
adding an LWLockReleaseAll() isn't really sufficient.  You need to
look at something like CheckpointerMain() and figure out which of
those push-ups are needed here as well.  Probably at least
ConditionVariableCancelSleep(), pgstat_report_wait_end(),
AbortBufferIO(), and UnlockBuffers() -- quite possibly some of those
other AtEOXact calls as well.

> The other problem is that there's no attempt to handle a failed DSA
> creation/attachment.  The second patch just adds a PG_TRY block that
> sets a flag not to try the DSA calls again if the first one fails.  It
> throws a single ERROR line, then autovacuum continues without workitem
> support.

Yeah, and the other question -- which Thomas asked before you
originally committed originally, and which I just now asked again is
"Why in the world are you using DSA for this at all?".  There are
serious problems with that which both he and I have pointed out, and
you haven't explained why it's a good idea at any point, AFAICT.
Among those problems:

1. It doesn't work if dynamic_shared_memory_type=none.  That's OK for
an optional subsystem, but autovacuum is not very optional.

and

2. It allows unbounded bloat if there's no limit on the number of work
items and is pointless is there is since you could then just use the
main shared memory segment.

I really think you should respond to those concerns, not just push a
minimal fix.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Orphaned files in base/[oid]

2017-08-14 Thread Andres Freund

On 2017-08-14 14:40:46 -0400, Tom Lane wrote:
> The core problem with zapping non-temp table files is that you can't
> do that unless you're sure you have consistent, up-to-date pg_class
> data that nobody else is busy adding to.  It's hard to see an external
> application being able to do that safely.  You certainly can't do it
> at the point in the postmaster startup cycle where we currently do
> the other things --- for those, we rely only on filesystem naming
> conventions to identify what to zap.

I think there are some possibilities to close the gap here. We could
e.g. have .delete_on_crash marker files that get installed
when creating a new persistent relfilenode. If we set up things so they
get deleted post commit, but inside the critical section, we could rely
on them being present in case of crash, but consistently removed during
WAL replay. At the end of recovery, iterate over the whole datadir and
nuke all relations with marker files present.

I first thought that'd cost an additional fsync per relation
created. But I think we actually can delay that to a pre-commit phase,
if we have XLOG_SMGR_CREATE create those markers via a flag, and fsync
them just before checkpoint (via the usual delayed fsync mechanism).
That'd still require an XLogFlush(), but that seems hard to avoid unless
we just don't create relations on FS level until buffers are
evicted and/or BufferSync().

Alternatively we could do something without marker files, with some
added complexity: Keep track of all "uncommitted new files" in memory,
and log them every checkpoint. Commit/abort records clear elements of
that list. Since we always start replay at the beginning of a
checkpoint, we'd always reach a moment with such an up2date list of
pending-action files before reaching end-of-recovery. At end-of-recovery
we can delete all unconfirmed files.  To avoid out-of-memory due to too
many tracked relations, we'd possibly still have to have marker files...

Regards,

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Orphaned files in base/[oid]

2017-08-14 Thread Tom Lane

Chris Travers  writes:
> On Mon, Aug 14, 2017 at 6:33 PM, Andres Freund  wrote:
>> I think the fix here is to call RemovePgTempFiles() during
>> crash-restarts, instead of just full starts. The previously stated need
>> to be able to inspect temp files after a crash can be less impactfully
>> fulfilled with restart_after_crash = false.

> But that only clears temp files right?
> I am less concerned about the temp files because a restart clears them.

It will clear temp files, and also temp tables.

> The bigger issue I see are with the orphaned base files.

It would be possible to have orphaned non-temp tables if you'd suffered
a crash during the transaction that created those tables.  Ordinarily
a newly-created table file wouldn't be that large, but if your workflow
created tables and shoved boatloads of data into them in the same
transaction, it's not so hard to see this becoming an issue.

The core problem with zapping non-temp table files is that you can't
do that unless you're sure you have consistent, up-to-date pg_class
data that nobody else is busy adding to.  It's hard to see an external
application being able to do that safely.  You certainly can't do it
at the point in the postmaster startup cycle where we currently do
the other things --- for those, we rely only on filesystem naming
conventions to identify what to zap.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Orphaned files in base/[oid]

2017-08-14 Thread Chris Travers

On Mon, Aug 14, 2017 at 6:33 PM, Andres Freund  wrote:

> Hi,
>
> On 2017-08-14 14:12:22 +0200, Chris Travers wrote:
> > Problem:
> > The system this came up on is PostgreSQL 9.6.3 and has had repeated
> trouble
> > with disk space.  Querying pg_database_size, as well as du on the
> > subdirectory of base/ show total usage to be around 3.8TB.  Summing up
> the
> > size of the relations in pg_class though shows around 2.1TB.
> >
> > Initial troubleshooting found around 150 GB of space in pg_temp which had
> > never been cleared and was at least several days old.  Restarting the
> > server cleared these up.
> >
> > Poking around the base/[oid] directory, I found a large number of files
> > which did not correspond with a pg_class entry.  One of the apparent
> > relations was nearly 1TB in size.
> >
> > What I think happened:
> > I think various pg_temp/* and orphaned relation files (In base/[oid])
> were
> > created when PostgreSQL crashed due to running out of space in various
> > operations including creating materialised views.
> >
> > So my question is if there is a way we can safely clean these up on
> server
> > restart?  If not does it make sense to try to create a utility that can
> > connect to PostgreSQL, seek out valid files, and delete the rest?
>
> I think the fix here is to call RemovePgTempFiles() during
> crash-restarts, instead of just full starts. The previously stated need
> to be able to inspect temp files after a crash can be less impactfully
> fulfilled with restart_after_crash = false.
>
> But that only clears temp files right?

I am less concerned about the temp files because a restart clears them.

The bigger issue I see are with the orphaned base files.  It looks like
files in base/[oid] don't get cleaned up either if I read my output
correctly and it would explain why we saw 1.7TB of discrepancy between
relations and database size.  Safety-wise it seems like the best way out of
that is a dump/restore but doing that with a 2.1TB database is annoying.


> Greetings,
>
> Andres Freund
>



-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin

Re: [HACKERS] shared memory based stat collector (was: Sharing record typmods between backends)

2017-08-14 Thread David Fetter

On Sun, Aug 13, 2017 at 05:56:56PM -0700, Andres Freund wrote:
> Hi,
> 
> Since we're getting a bit into the weeds of a different topic, and since
> I think it's an interesting feature, I'm detaching this into a separate
> thread.
> 
> On 2017-08-12 23:37:27 -0400, Tom Lane wrote:
> > >> On 2017-08-12 22:52:57 -0400, Robert Haas wrote:
> > >>> I think it'd be pretty interesting to look at replacing parts of the
> > >>> stats collector machinery with something DHT-based.
> > > On Sat, Aug 12, 2017 at 11:30 PM, Andres Freund  
> > > wrote:
> > >> That seems to involve a lot more than this though, given that currently
> > >> the stats collector data doesn't entirely have to be in memory. I've
> > >> seen sites with a lot of databases with quite some per-database stats
> > >> data. Don't think we can just require that to be in memory :(
> >
> > Robert Haas  writes:
> > > Hmm.  I'm not sure it wouldn't end up being *less* memory.  Don't we
> > > end up caching 1 copy of it per backend, at least for the database to
> > > which that backend is connected?  Accessing a shared copy would avoid
> > > that sort of thing.
> >
> > Yeah ... the collector itself has got all that in memory anyway.
> > We do need to think about synchronization issues if we make that
> > memory globally available, but I find it hard to see how that would
> > lead to more memory consumption overall than what happens now.
> 
> You both are obviously right.  Another point of potential concern could
> be that we'd pretyt much fully rely on dsm/dht's being available, for
> the server to function correctly. Are we ok with that? Right now
> postgres still works perfectly well, leaving parallelism aside, with
> dynamic_shared_memory_type = none.
> 
> 
> What are your thoughts about how to actually implement this? It seems
> we'd have to do something like:
> 
> 1) Keep the current per-backend & per-transaction state in each
>backend. That allows both to throw away the information and avoids
>increasing contention quite noticeably.
> 
> 2) Some plain shared memory with metadata.  A set of shared hashtables
>for per database, per relation contents.
> 
> 3) Individual database/relation entries are either individual atomics
>(we don't rely on consistency anyway), or seqcount (like
>st_changecount) based.
> 
> 4) Instead of sending stats at transaction end, copy them into a
>"pending" entry.  Nontransactional contents can be moved to
>the pending entry more frequently.
> 
> 5) Occasionally, try to flush the pending array into the global hash.
>The lookup in the table would be protected by something
>LWLockConditionalAcquire() based, to avoid blocking - don't want to
>introduce chokepoints due to commonly used tables and such.  Updating
>the actual stats can happen without the partition locks being held.
> 
> I think there's two other relevant points here:
> 
> a) It'd be quite useful to avoid needing a whole cluster's stats in
>memory. Even if $subject would save memory, I'm hesitant committing
>to something requiring all stats to be in memory forever. As a first
>step it seems reasonable to e.g. not require state for all databases
>to be in memory. The first time per-database stats are required, it
>could be "paged in". Could even be more aggressive and do that on a
>per-table level and just have smaller placeholder entries for
>non-accessed tables, but that seems more work.
> 
>On the other hand, autoavcuum is likely going to make that approach
>useless anyway, given it's probably going to access otherwise unneded
>stats regularly.
> 
> b) I think our tendency to dump all stats whenever we crash isn't really
>tenable, given how autovacuum etc are tied to them. We should think
>about ways to avoid that if we're going to do a major rewrite of the
>stats stuff, which this certainly sounds like.
> 
> 
> If there weren't HS to worry about, these two points kinda sound like
> the data should be persisted into an actual table, rather than some
> weird other storage format. But HS seems to make that untenable.
> 
> Greetings,
> 
> Andres Freund

As I recall, David Gould (hi!) had run across a case where there were
thousand of tables and the stats file became a pretty serious
bottleneck.  There might even be a design or even code to address it.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Don't force-assign transaction id when exporting a snapshot.

2017-08-14 Thread Andres Freund

On 2017-08-14 13:55:29 -0400, Peter Eisentraut wrote:
> On 8/12/17 07:32, Petr Jelinek wrote:
> > This commit has side effect that it makes it possible to export
> > snapshots on the standbys. This makes it possible to do pg_dump -j on
> > standby with consistent snapshot. Here is one line patch (+ doc update)
> > which allows doing that when pg_dumping from PG10+.
> > 
> > I also wonder if it should be mentioned in release notes. If the
> > attached patch would make it into PG10 it would be no brainer to mention
> > it as feature under pg_dump section, but exporting snapshots alone I am
> > not sure about.
> 
> Any other opinions on this patch?

I'd be oh so very slightly inclined to push this, including the modified
pg_dump check. We're going to be on the hook for supporting snapshots on
standbys anyway, unless we put in additional unnecessary error check.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Don't force-assign transaction id when exporting a snapshot.

2017-08-14 Thread Peter Eisentraut

On 8/12/17 07:32, Petr Jelinek wrote:
> This commit has side effect that it makes it possible to export
> snapshots on the standbys. This makes it possible to do pg_dump -j on
> standby with consistent snapshot. Here is one line patch (+ doc update)
> which allows doing that when pg_dumping from PG10+.
> 
> I also wonder if it should be mentioned in release notes. If the
> attached patch would make it into PG10 it would be no brainer to mention
> it as feature under pg_dump section, but exporting snapshots alone I am
> not sure about.

Any other opinions on this patch?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix a typo in sequence.c

2017-08-14 Thread Peter Eisentraut

On 8/14/17 08:32, Masahiko Sawada wrote:
> While reading source code, I found a typo in sequence.c file. Attached
> patch for fixing it.
> 
> s/localy/locally/g

Fixed.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

2017-08-14 Thread Fabien COELHO



Hello,


I think we can use it like --custom-initialize="create_table, vacuum"
which is similar to what we specify a connection option to psql for
example.


Even if it is allowed, do not advertise it. Or use space as a separator 
and not comma. ISTM that with psql connections space is the higher level 
separator, not an optional thing, and comma is used for lower level 
splitting: "host=foo port=5432,5433 ..."



But it will be unnecessary if we have the one letter version.


Sure.


I'm also wondering whether using a list is a good option, because it implies
a large parse function, list management and so on. With the one letter
version, it could be just a string to be scanned char by char for
operations.


I basically agree with the one letter version. But I'm concerned that
it'll confuse users if we have more initialization steps for the
pgbench initialization. If we add more various initialization steps it
makes pgbench command hard to read and the users might have to
remember these options.


I think that if we get to the point where so many initialization steps 
that people get confused, then adding long names will not be an issue:-)


In the mean time it only needs 5 values.


Maybe there could be short-hands for usual setups, eg "default" for "tdpv"
or maybe "ct,ld,pk,va", "full" for "tdpfv" or maybe "ct,ld,pk,fk,va"...


If --custom-initialize option requires for i option to be set,
"pgbench -i" means the initialization with full steps and "pgbench -i
--custom-initialize=..." means the initialization with custom
operation steps.


Sure. It does not preclude the default to have a name.


Remove the "no-primary-keys" from the long option array as it has
disappeared. You might consider make "custom-initialize" take the 'I' short
option.

Ranting unrelated to this patch: the automatic aid type switching based on
scale is a bad idea (tm), because when trying to benchmark it means that
changing the scale also changes the schema, and you really do not need that.
ISTM that it should always use INT8.


Hmm, I think it's a valid point. Should we allow users to specify like
the above thing in the custom initialization feature as well?


I would be in favor of having an option to do a tpc-b conforming schema 
which would include that, but which would also change the default balance 
type which is not large enough per spec. Maybe it could be "T".


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] shared memory based stat collector (was: Sharing record typmods between backends)

2017-08-14 Thread Tom Lane

Robert Haas  writes:
> I think it would be a bad idea to remove both
> dynamic_shared_memory_type=none and dynamic_shared_memory_type=mmap.
> If you do that, then somebody who doesn't have root and whose system
> configuration is messed up can't start PostgreSQL at all.

I'd be happy to leave the mmap option in place if it were getting
tested regularly.  As things stand, I'm doubtful that someone who
was stuck in this hypothetical bad place could rely on it to work
at all.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [GSOC][weekly report 9] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-08-14 Thread Mengxing Liu

In the last week, I tried these two ideas.

> -Original Messages-
> From: "Alvaro Herrera" 
> Sent Time: 2017-08-08 01:51:52 (Tuesday)
> * I wonder if a completely different approach to the finished xact
>   maintenance problem would be helpful.  For instance, in
>   ReleasePredicateLocks we call ClearOldPredicateLocks()
>   inconditionally, and there we grab the SerializableFinishedListLock
>   lock inconditionally for the whole duration of that routine, but
>   perhaps it would be better to skip the whole ClearOld stuff if the
>   SerializableFinishedListLock is not available?  Find out some way to
>   ensure that the cleanup is always called later on, but maybe skipping
>   it once in a while improves overall performance.
> 

I implemented the idea by this way: using LWLockConditionalAcquire instead of 
LWLockAcquire.
If the lock is held by others, just return. It's OK because the routine is used 
to clear old predicate locks 
but it doesn't matter who does the job. 

But we both ignored one thing: this idea doesn't speedup the releasing 
operation. It just avoids some processes
waiting for the lock. If the function ClearOldPredicateLocks is the bottleneck, 
skipping doesn't help anymore.
I attached the result of evaluation. This idea ( conditional lock) has the 
worst performance.

>
> * the whole predicate.c stuff is written using SHM_QUEUE.  I suppose
>   SHM_QUEUE works just fine, but predicate.c was being written at about
>   the same time (or a bit earlier) than the newer ilist.c interface was
>   being created, which I think had more optimization work thrown in.
>   Maybe it would be good for predicate.c to ditch use of SHM_QUEUE and
>   use ilist.c interfaces instead?  We could even think about being less
>   strict about holding exclusive lock on SerializableFinished for the
>   duration of ClearOldPredicateLocks, i.e. use only a share lock and
>   only exchange for exclusive if a list modification is needed.
> 

I used the double linked list in ilist.c but it didn't improve the performance.
I did a micro benchmark to compare the performance of SHM_QUEUE & ilist.c 
and didn't find any difference. So the result is reasonable.

But there is a voice to get rid of SHM_QUEUE because it does not make sense
to have two same implementations. What's your opinion?
 
> -- 
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sincerely


Mengxing Liu





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/perl extension fails on Windows

2017-08-14 Thread Tom Lane

Christoph Berg  writes:
> HEAD as of 5a5c2feca still has the same problem on kfreebsd. Is there
> anything I could dump so we understand the problem better?

Yeah, I did not expect that 5a5c2feca would change anything on
non-Windows.

What we need to do is verify that PL/Perl's idea of
sizeof(PerlInterpreter) is different from Perl's own idea, and then
find out why --- ie, just which fields have different size/alignment
in the two compiles.

You mentioned upthread that configure shows this:

> checking for CFLAGS recommended by Perl... -D_REENTRANT -D_GNU_SOURCE
> -DDEBIAN -fwrapv -fno-strict-aliasing -pipe -I/usr/local/include
> -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64
> checking for CFLAGS to compile embedded Perl... -DDEBIAN

If the source of the problem is the same mechanism as it was for the
other platforms, then presumably the issue is that we need one or more
of
-D_REENTRANT
-D_GNU_SOURCE
-D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64
to be defined while building PL/Perl.  Now, it couldn't be -D_GNU_SOURCE
that's at issue, because we turn that on in src/template/linux:

# Force _GNU_SOURCE on; plperl is broken with Perl 5.8.0 otherwise
CPPFLAGS="$CPPFLAGS -D_GNU_SOURCE"

(That ancient comment is pretty interesting in this connection, isn't it.)

And I'd have thought that _LARGEFILE_SOURCE=1 and _FILE_OFFSET_BITS=64
were the default behavior on any modern platform anyway, but maybe
kfreebsd is weird about that.  Anyway, you could try sticking combinations
of these symbols into perl_embed_ccflags in src/Makefile.global and
rebuilding PL/Perl to see if the problem goes away; if that works it would
give us a leg up on where the problem is.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] shared memory based stat collector (was: Sharing record typmods between backends)

2017-08-14 Thread Andres Freund

On 2017-08-14 13:06:48 -0400, Robert Haas wrote:
> On Mon, Aug 14, 2017 at 1:02 PM, Andres Freund  wrote:
> > I previously thought that an option to occasionally WAL log the stats
> > file would be useful (e.g. just before a checkpoint). That'd make them
> > persistent, and available on the standby. But that'd still require
> > somehow dealing with stats being produced on the standby - I presume
> > we'd need multiple stats files and provide functions for merging them.
> 
> Well, if you think the stats files might be too big to fit in memory,
> then you're probably imagining tens of gigabytes of data, and you're
> not going to want to write a WAL record that size, either.

Well, that's why I'm thinking of it having to be an option.  You
presumably don't want it on relatively idle servers, but if you have big
databases where unnecessary vacuums are bad...

I think again, this all'd be better if we'd figure out a way to make
this use a proper table... We kind of have figured out how to store data
in a queryable manner, with buffering etc. Kind of.  Wonder if there's
some chance of a hack where we have a shared table with a key like
(dboid, reloid, inreplication). If we were to create those (once for
replication, once for not) at table creation, we could
heap_inplace_update them even on a standby...

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] shared memory based stat collector (was: Sharing record typmods between backends)

2017-08-14 Thread Alvaro Herrera

Robert Haas wrote:

> Actually, now that you mention it, I think it *is* broken already, or
> more to the point, if you configure that value, the autovacuum
> launcher hangs up, because of the AutovacuumWorkItem stuff that Alvaro
> added.  When I just tested it, the AV launcher somehow ended up
> waiting for AutovacuumLock and I had to SIGQUIT the server to shut it
> down.  That's actually not really entirely the fault of
> dynamic_shared_memory_type = none, though, because the code in
> autovacuum.c does this:
> 
> AutoVacuumDSA = dsa_create(AutovacuumLock->tranche);
> /* make sure it doesn't go away even if we do */
> dsa_pin(AutoVacuumDSA);
> dsa_pin_mapping(AutoVacuumDSA);
> 
> Now, that's actually really broken because if dsa_create() throws an
> error of any kind, you're going to have already assigned the value to
> AutoVacuumDSA, but you will not have pinned the DSA or the DSA
> mapping.  There's evidently some additional bug here because I'd sorta
> expect this code to just go into an infinite loop in this case,
> failing over and over trying to reattach the segment, but evidently
> something even worse happening - perhaps the ERROR isn't releasing
> AutovacuumLock.

Yeah, the problem that lwlocks aren't released is because the launcher
is not in a transaction at that point, so AbortCurrentTransaction()
doesn't release locks like it normally would.  The simplest fix (in the
attached 0001 patch) is to add a LWLockReleaseAll() call to the jmp
block, though I wonder if there should be some other cleanup functions
called from there, or whether perhaps it'd be a better strategy to have
the launcher run in a transaction at all times.

The other problem is that there's no attempt to handle a failed DSA
creation/attachment.  The second patch just adds a PG_TRY block that
sets a flag not to try the DSA calls again if the first one fails.  It
throws a single ERROR line, then autovacuum continues without workitem
support.

I intend to give these patches further thought before pushing anything,
will update this thread no later than tomorrow 19:00 UTC-0300.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From d761ef2c2fef336fae908d802237715ab38cdf2c Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 14 Aug 2017 13:54:57 -0300
Subject: [PATCH 1/2] Release lwlocks in autovacuum launcher error handling
 path

For regular processes, lwlock release is handling via
AbortCurrentTransaction(), which autovacuum already does.  However,
autovacuum launcher sometimes obtains lock outside of any transaction,
in which case AbortCurrentTransaction is a no-op.  Continuing after
error recovery would block if we tried to obtain an lwlock that we
failed to release.

Reported-by: Robert Haas
Discussion: 
https://postgr.es/m/ca+tgmobqvbz4k_+rsmim9herkpy3vs5xnbkl95gsenwijzp...@mail.gmail.com
---
 src/backend/postmaster/autovacuum.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/backend/postmaster/autovacuum.c 
b/src/backend/postmaster/autovacuum.c
index 00b1e823af..ed1a288475 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -525,6 +525,12 @@ AutoVacLauncherMain(int argc, char *argv[])
AbortCurrentTransaction();
 
/*
+* Release lwlocks.  Normally done as part of 
AbortCurrentTransaction,
+* but launcher is not in a transaction at all times.
+*/
+   LWLockReleaseAll();
+
+   /*
 * Now return to normal top-level context and clear 
ErrorContext for
 * next time.
 */
-- 
2.11.0

>From a56869c1da23f56d747d2d05695879d6659cb8ba Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 14 Aug 2017 13:57:50 -0300
Subject: [PATCH 2/2] Don't repeatedly try to attach to shared memory

If the first attempt fails, give up on it.

Reported-by: Robert Haas
Discussion: 
https://postgr.es/m/ca+tgmobqvbz4k_+rsmim9herkpy3vs5xnbkl95gsenwijzp...@mail.gmail.com
---
 src/backend/postmaster/autovacuum.c | 45 -
 1 file changed, 30 insertions(+), 15 deletions(-)

diff --git a/src/backend/postmaster/autovacuum.c 
b/src/backend/postmaster/autovacuum.c
index ed1a288475..ac47b1ce1b 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -255,6 +255,7 @@ typedef enum
  * av_startingWorker pointer to WorkerInfo currently being started (cleared by
  * the worker itself as soon as it's up 
and running)
  * av_dsa_handle   handle for allocatable shared memory
+ * av_dsa_failed   already tried to allocate shared memory and failed
  *
  * This struct is protected by AutovacuumLock, except for av_signal and parts
  * of the worker list (see

Re: [HACKERS] shared memory based stat collector (was: Sharing record typmods between backends)

2017-08-14 Thread Robert Haas

On Mon, Aug 14, 2017 at 1:02 PM, Andres Freund  wrote:
> I previously thought that an option to occasionally WAL log the stats
> file would be useful (e.g. just before a checkpoint). That'd make them
> persistent, and available on the standby. But that'd still require
> somehow dealing with stats being produced on the standby - I presume
> we'd need multiple stats files and provide functions for merging them.

Well, if you think the stats files might be too big to fit in memory,
then you're probably imagining tens of gigabytes of data, and you're
not going to want to write a WAL record that size, either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] shared memory based stat collector (was: Sharing record typmods between backends)

2017-08-14 Thread Andres Freund

On 2017-08-14 18:57:39 +0200, Magnus Hagander wrote:
> On Mon, Aug 14, 2017 at 5:46 PM, Robert Haas  wrote:
> 
> > On Sun, Aug 13, 2017 at 9:59 PM, Tom Lane  wrote:
> > >> b) I think our tendency to dump all stats whenever we crash isn't really
> > >>tenable, given how autovacuum etc are tied to them.
> > >
> > > Eh, maybe.  I don't think crashes are really so common on production
> > > systems.  As developers we have an inflated view of their frequency ;-)
> >
> 
> From a stats perspective I think the crash problem is the small problem.
> The big problem is we nuke them on replication promotion and we nuke them
> on PITR. If we solve those two, I'm pretty sure we would also solve the
> on-crash more or less in the same thing.

I don't think they're that rare, but the others are problems
too. Unfortunately I think changing that is a bit more complicated,
given that some stats are actually updated on standbys.

I previously thought that an option to occasionally WAL log the stats
file would be useful (e.g. just before a checkpoint). That'd make them
persistent, and available on the standby. But that'd still require
somehow dealing with stats being produced on the standby - I presume
we'd need multiple stats files and provide functions for merging them.

It'd be good if we somehow could figure out a way to WAL log the stats
data in a way that's consistent, i.e. doesn't contain data about dumped
relations. But I don't quite see how...


> > Without taking a position on the point under debate, AFAIK it wouldn't
> > be technically complex either under our current architecture or the
> > proposed new one to dump out a new permanent stats file every 10
> > minutes or so.  So if there is an issue here I think it might not be
> > very hard to fix, whatever else we do.
> >
> 
> I started working on that patch at some point, I think I have a rotting
> branch somewhere. That part was indeed fairly easy.
> 
> I got stalled when I feature-crept myself by realizing I wanted it
> snapshotted to WAL so it could fix the PITR and replication issues. And
> then properly bogged down when I realized that on the standby I'd want
> *both* the stats from the standby (while it's running) and the stats from
> the master (after switchover).

Hah, indeed.



Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] shared memory based stat collector (was: Sharing record typmods between backends)

2017-08-14 Thread Robert Haas

On Mon, Aug 14, 2017 at 12:36 PM, Andres Freund  wrote:
>> If so, why isn't choose_dsm_implementation() trying it; and if not,
>> why are we carrying it?
>
> I think the idea was that there might be platforms that require it, but
> ...

Right.  So, for example, POSIX shared memory will fail on Linux is
/dev/shm is inaccessible, and I've seen such systems in the wild.
System V shared memory will fail if the kernel limits are too small.
On *BSD, which lacks POSIX shared memory altogether, whether you can
start PostgreSQL with the default configuration settings is depends
entirely on how the System V shared memory limits are configured.

I think it would be a bad idea to remove both
dynamic_shared_memory_type=none and dynamic_shared_memory_type=mmap.
If you do that, then somebody who doesn't have root and whose system
configuration is messed up can't start PostgreSQL at all.  While I
understand that catering to rarely-used options has some cost, I don't
think those costs are exorbitant.  And, I think history shows that
minimizing dependencies on operating-system settings is a win.  Commit
b0fc0df9364d2d2d17c0162cf3b8b59f6cb09f67 may not be my
most-appreciated commit ever, but it undoubtedly had the highest
appreciation-to-effort ratio.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] shared memory based stat collector (was: Sharing record typmods between backends)

2017-08-14 Thread Magnus Hagander

On Mon, Aug 14, 2017 at 5:46 PM, Robert Haas  wrote:

> On Sun, Aug 13, 2017 at 9:59 PM, Tom Lane  wrote:
> >> b) I think our tendency to dump all stats whenever we crash isn't really
> >>tenable, given how autovacuum etc are tied to them.
> >
> > Eh, maybe.  I don't think crashes are really so common on production
> > systems.  As developers we have an inflated view of their frequency ;-)
>

>From a stats perspective I think the crash problem is the small problem.
The big problem is we nuke them on replication promotion and we nuke them
on PITR. If we solve those two, I'm pretty sure we would also solve the
on-crash more or less in the same thing.

> Without taking a position on the point under debate, AFAIK it wouldn't
> be technically complex either under our current architecture or the
> proposed new one to dump out a new permanent stats file every 10
> minutes or so.  So if there is an issue here I think it might not be
> very hard to fix, whatever else we do.
>

I started working on that patch at some point, I think I have a rotting
branch somewhere. That part was indeed fairly easy.

I got stalled when I feature-crept myself by realizing I wanted it
snapshotted to WAL so it could fix the PITR and replication issues. And
then properly bogged down when I realized that on the standby I'd want
*both* the stats from the standby (while it's running) and the stats from
the master (after switchover).

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: [HACKERS] pl/perl extension fails on Windows

2017-08-14 Thread Christoph Berg

Re: Sandeep Thakkar 2017-08-08 

> Hi
> 
> An update from beta3 build: We are no longer seeing this issue (handshake
> failure) on Windows 64bit, but on Windows 32bit it still persists.

HEAD as of 5a5c2feca still has the same problem on kfreebsd. Is there
anything I could dump so we understand the problem better?

Christoph


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] What users can do with custom ICU collations in Postgres 10

2017-08-14 Thread Peter Geoghegan

On Mon, Aug 14, 2017 at 9:15 AM, Peter Eisentraut
 wrote:
> I'm having trouble finding some concrete documentation for this.  The TR
> 35 link you showed documents the key words and values, BCP 47 documents
> the syntax, but nothing puts it all together in a form consumable by
> users.  The ICU documentation still mainly focuses on the "old"
> @keyword=value syntax.  I guess we'll have to write our own for now.

There is an unusual style to the standards that apply here. It's
incredibly detailed, and the options are very powerful, but it's in an
unfamiliar language. ICU just considers itself a consumer of the CLDR
locale stuff, which is a broad standard.

We don't have to write comprehensive documentation of these
kn/kb/ka/kh options that I pointed out exist. I think it would be nice
to cover a few interesting cases, and link to the BCP 47 Unicode
extension (TR 35) stuff.

Here is a list of scripts, that are all reorderable with this TR 35
stuff (varies somewhat based on CLDR/ICU version):

http://unicode.org/iso15924/iso15924-codes.html

Here is a CLDR specific XML specification of the variant keywords (can
be mapped to specific ICU version easily):

http://www.unicode.org/repos/cldr/tags/release-31/common/bcp47/collation.xml

> Given that we cannot reasonably preload all these new variants that you
> demonstrated, I think it would make sense to drop all the keyword
> variants from the preloaded set.

Cool. While I am of course in favor of this, I actually understand
very well why you had initdb add them. I think that removing them
creates a discoverability problem that cannot easily be fixed through
documentation. ISTM that we ought to also add an SQL-callable function
that lists the most common keyword variants. Some of those are
specific to one or two locales, such as traditional Spanish, or the
alternative sort orders for Han characters.

What do you think of that idea?

I guess an alternative idea is to just link to that XML document
(collation.xml), which exactly specifies the variants. Users can get
the "co" variants there. Should be for the most part obvious which one
is interesting to which locale, since there is not that many "co"
variants to choose from, and users will probably know what to look for
if they look at all.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] shared memory based stat collector (was: Sharing record typmods between backends)

2017-08-14 Thread Andres Freund

On 2017-08-14 12:28:39 -0400, Tom Lane wrote:
> Robert Haas  writes:
> > On Mon, Aug 14, 2017 at 12:16 PM, Tom Lane  wrote:
> >> Just FYI, the only values being reported by buildfarm animals are
> >> "posix", "sysv", and "windows".  So while mmap may be a thing,
> >> it's an untested thing.
> 
> > I'm pretty sure I dev-tested it before committing anything, but,
> > certainly, having ongoing BF coverage woudn't be a bad thing.
> 
> Looking closer, the reason those are the only reported values is
> that those are the only possible results from initdb's
> choose_dsm_implementation().  So the real question here is whether
> "mmap" should be considered to dominate "sysv" if it's available.

No mmap isn't a good option - it's file backed mmap, rather than
anonymous mmap. To my knowledge there's no good portable way to use
anonymous mmap to share memory across processes unless established
before a fork().


> If so, why isn't choose_dsm_implementation() trying it; and if not,
> why are we carrying it?

I think the idea was that there might be platforms that require it, but
...


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] FYI: branch for v11 devel is planned for next week

2017-08-14 Thread Tom Lane

I wrote:
> The release team discussed this a couple weeks ago, but I don't
> think anybody mentioned it on -hackers: v10 seems to be in good
> enough shape that it's okay to make the REL_10_STABLE branch soon,
> and open HEAD for v11 development.

> Last year we branched on Aug 15, and we should be able to keep
> to the same schedule.  Barring objections or bad bugs showing up
> soon, I'll make the branch early next week.

If no objections, I'll make the v10 branch later today, say around
2100 UTC.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Orphaned files in base/[oid]

2017-08-14 Thread Andres Freund

Hi,

On 2017-08-14 14:12:22 +0200, Chris Travers wrote:
> Problem:
> The system this came up on is PostgreSQL 9.6.3 and has had repeated trouble
> with disk space.  Querying pg_database_size, as well as du on the
> subdirectory of base/ show total usage to be around 3.8TB.  Summing up the
> size of the relations in pg_class though shows around 2.1TB.
> 
> Initial troubleshooting found around 150 GB of space in pg_temp which had
> never been cleared and was at least several days old.  Restarting the
> server cleared these up.
> 
> Poking around the base/[oid] directory, I found a large number of files
> which did not correspond with a pg_class entry.  One of the apparent
> relations was nearly 1TB in size.
> 
> What I think happened:
> I think various pg_temp/* and orphaned relation files (In base/[oid]) were
> created when PostgreSQL crashed due to running out of space in various
> operations including creating materialised views.
> 
> So my question is if there is a way we can safely clean these up on server
> restart?  If not does it make sense to try to create a utility that can
> connect to PostgreSQL, seek out valid files, and delete the rest?

I think the fix here is to call RemovePgTempFiles() during
crash-restarts, instead of just full starts. The previously stated need
to be able to inspect temp files after a crash can be less impactfully
fulfilled with restart_after_crash = false.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Secondary index access optimizations

2017-08-14 Thread Konstantin Knizhnik




On 14.08.2017 12:37, Konstantin Knizhnik wrote:

Hi hackers,

I am trying to compare different ways of optimizing work with huge 
append-only tables in PostgreSQL where primary key is something like 
timestamp and queries are usually accessing most recent data using 
some secondary keys. Size of secondary index is one of the most 
critical factors limiting  insert/search performance. As far as data 
is inserted in timestamp ascending order, access to primary key is 
well localized and accessed tables are present in memory. But if we 
create secondary key for the whole table, then access to it will 
require random reads from the disk and significantly decrease 
performance.


There are two well known solutions of the problem:
1. Table partitioning
2. Partial indexes

This approaches I want to compare. First of all I want to check if 
optimizer is able to generate efficient query execution plan covering 
different time intervals.

Unfortunately in both cases generated plan is not optimal.

1. Table partitioning:

create table base (k integer primary key, v integer);
create table part1 (check (k between 1 and 1)) inherits (base);
create table part2 (check (k between 10001 and 2)) inherits (base);
create index pi1 on part1(v);
create index pi2 on part2(v);
insert int part1 values (generate series(1,1), random());
insert into part2 values (generate_series(10001,2), random());
explain select * from base where k between 1 and 2 and v = 100;
  QUERY PLAN
---
 Append  (cost=0.00..15.65 rows=3 width=8)
   ->  Seq Scan on base  (cost=0.00..0.00 rows=1 width=8)
 Filter: ((k >= 1) AND (k <= 2) AND (v = 100))
   ->  Index Scan using pi1 on part1  (cost=0.29..8.31 rows=1 width=8)
 Index Cond: (v = 100)
 Filter: ((k >= 1) AND (k <= 2))
   ->  Index Scan using pi2 on part2  (cost=0.29..7.34 rows=1 width=8)
 Index Cond: (v = 100)
 Filter: ((k >= 1) AND (k <= 2))

Questions:
- Is there some way to avoid sequential scan of parent table? Yes, it 
is empty and so sequential scan will not take much time, but ... it 
still requires some additional actions and so increasing query 
execution time.
- Why index scan of partition indexes includes filter condition if it 
is guaranteed by check constraint that all records of this partition 
match search predicate?



2. Partial indexes:

create table t (k integer primary key, v integer);
insert into t values (generate_series(1,2),random());
create index i1 on t(v) where k between 1 and 1;
create index i2 on t(v) where k between 10001 and 2;
postgres=# explain select * from t where k between 1 and 1 and v = 
100;

 QUERY PLAN

 Index Scan using i1 on t  (cost=0.29..7.28 rows=1 width=8)
   Index Cond: (v = 100)
(2 rows)


Here we get perfect plan. Let's try to extend search interval:


postgres=# explain select * from t where k between 1 and 2 and v = 
100;

QUERY PLAN
--
 Index Scan using t_pkey on t  (cost=0.29..760.43 rows=1 width=8)
   Index Cond: ((k >= 1) AND (k <= 2))
   Filter: (v = 100)
(3 rows)

Unfortunately in this case Postgres is not able to apply partial indexes.
And this is what I expected to get:

postgres=# explain select * from t where k between 1 and 1 and v = 
100 union all select * from t where k between 10001 and 2 and v = 
100;

  QUERY PLAN
--
 Append  (cost=0.29..14.58 rows=2 width=8)
   ->  Index Scan using i1 on t  (cost=0.29..7.28 rows=1 width=8)
 Index Cond: (v = 100)
   ->  Index Scan using i2 on t t_1  (cost=0.29..7.28 rows=1 width=8)
 Index Cond: (v = 100)


I wonder if there are some principle problems in supporting this two 
things in optimizer:
1. Remove search condition for primary key if it is fully satisfied by 
derived table check constraint.
2. Append index scans of several partial indexes if specified interval 
is covered by their conditions.


I wonder if someone is familiar with this part of optimizer ad can 
easily fix it.
Otherwise I am going to spend some time on solving this problems (if 
community think that such optimizations will be useful).




Replying to myself: the following small patch removes redundant checks 
from index scans for derived tables:



diff --git a/src/backend/optimizer/util/plancat.c 
b/src/backend/optimizer/util/plancat.c

index 939045d..1f7c9cf 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1441,6 +1441,20 @@ relation_excluded_by_constraints(PlannerInfo *root,
if (predicate_refuted_by(safe_constraints, 
rel->baserestrictinfo, false))

return true;

+   /*
+

Re: [HACKERS] Orphaned files in base/[oid]

2017-08-14 Thread Chris Travers

On Aug 14, 2017 14:12, "Chris Travers"  wrote:

Hi all;

I am trying to track down a problem we are seeing that looks very similar
to bug #12050, and would certainly consider trying to contribute a fix if
we agree on one.  (I am not sure we can, so absent that, the next question
is whether it makes sense to create a utility to fix the problem when it
comes up so that a dump/restore is not needed).

The system:
PostgreSQL 9.6.3
Gentoo Linux.

Problem:
The system this came up on is PostgreSQL 9.6.3 and has had repeated trouble
with disk space.  Querying pg_database_size, as well as du on the
subdirectory of base/ show total usage to be around 3.8TB.  Summing up the
size of the relations in pg_class though shows around 2.1TB.

Initial troubleshooting found around 150 GB of space in pg_temp which had
never been cleared and was at least several days old.  Restarting the
server cleared these up.

Poking around the base/[oid] directory, I found a large number of files
which did not correspond with a pg_class entry.  One of the apparent
relations was nearly 1TB in size.

What I think happened:
I think various pg_temp/* and orphaned relation files (In base/[oid]) were
created when PostgreSQL crashed due to running out of space in various
operations including creating materialised views.

So my question is if there is a way we can safely clean these up on server
restart?  If not does it make sense to try to create a utility that can
connect to PostgreSQL, seek out valid files, and delete the rest?


Ok I  have identified one case where symptoms I am seeing can be
reproduced.  I am currently working on a Mac so there may be quirks in my
repro.  However

When the WAL writer runs out of disk space no cleanup is done.

So I  will be looking at possible solutions next.


-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 <+49%20162%209037210> | Skype: einhverfr |
www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin

Re: [HACKERS] shared memory based stat collector (was: Sharing record typmods between backends)

2017-08-14 Thread Tom Lane

Robert Haas  writes:
> On Mon, Aug 14, 2017 at 12:16 PM, Tom Lane  wrote:
>> Just FYI, the only values being reported by buildfarm animals are
>> "posix", "sysv", and "windows".  So while mmap may be a thing,
>> it's an untested thing.

> I'm pretty sure I dev-tested it before committing anything, but,
> certainly, having ongoing BF coverage woudn't be a bad thing.

Looking closer, the reason those are the only reported values is
that those are the only possible results from initdb's
choose_dsm_implementation().  So the real question here is whether
"mmap" should be considered to dominate "sysv" if it's available.
If so, why isn't choose_dsm_implementation() trying it; and if not,
why are we carrying it?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] shared memory based stat collector (was: Sharing record typmods between backends)

2017-08-14 Thread Andres Freund

On 2017-08-14 12:21:30 -0400, Robert Haas wrote:
> >> ... If somebody has a system where no other form of shared
> >> memory, works dynamic_shared_memory_type = mmap is still a thing, so
> >> the use case for "none" seems very thin indeed.  I'd vote for just
> >> ripping it out in v11.
> >
> > Just FYI, the only values being reported by buildfarm animals are
> > "posix", "sysv", and "windows".  So while mmap may be a thing,
> > it's an untested thing.
> 
> I'm pretty sure I dev-tested it before committing anything, but,
> certainly, having ongoing BF coverage woudn't be a bad thing.

Is there any platforms that require it?  I thought posix, sysv and
windows are sufficient?  If we're going to start relying more on dsms
we probably don't want to use mmap anyway...

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)

2017-08-14 Thread Peter Eisentraut

On 8/13/17 15:39, Noah Misch wrote:
> This PostgreSQL 10 open item is past due for your status update.  Kindly send
> a status update within 24 hours, and include a date for your subsequent status
> update.  Refer to the policy on open item ownership:
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

I think there are up to three separate issues in play:

- what to do about some preloaded collations disappearing between versions

- whether to preload keyword variants

- whether to canonicalize some things during CREATE COLLATION

I responded to all these subplots now, but the discussion is ongoing.  I
will set the next check-in to Thursday.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] shared memory based stat collector (was: Sharing record typmods between backends)

2017-08-14 Thread Robert Haas

On Mon, Aug 14, 2017 at 12:16 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Sun, Aug 13, 2017 at 9:59 PM, Tom Lane  wrote:
>>> In principle we could keep the existing mechanism as a fallback.
>>> Whether that's worth the trouble is debatable.  The current code
>>> in initdb believes that every platform has some type of DSM support
>>> (see choose_dsm_implementation).  Nobody's complained about that,
>>> and it certainly works on every buildfarm animal.  So for all we know,
>>> dynamic_shared_memory_type = none is broken already.
>
>> Actually, now that you mention it, I think it *is* broken already, or
>> more to the point, if you configure that value, the autovacuum
>> launcher hangs up, because of the AutovacuumWorkItem stuff that Alvaro
>> added.  When I just tested it, the AV launcher somehow ended up
>> waiting for AutovacuumLock and I had to SIGQUIT the server to shut it
>> down.
>
> Hmm, shouldn't that be an open item for v10?

I already added it.

>> ... If somebody has a system where no other form of shared
>> memory, works dynamic_shared_memory_type = mmap is still a thing, so
>> the use case for "none" seems very thin indeed.  I'd vote for just
>> ripping it out in v11.
>
> Just FYI, the only values being reported by buildfarm animals are
> "posix", "sysv", and "windows".  So while mmap may be a thing,
> it's an untested thing.

I'm pretty sure I dev-tested it before committing anything, but,
certainly, having ongoing BF coverage woudn't be a bad thing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding support for Default partition in partitioning

2017-08-14 Thread Robert Haas

On Mon, Aug 14, 2017 at 7:51 AM, Jeevan Ladhe
 wrote:
> I think even with this change there will be one level of indentation
> needed for throwing the error, as the error is to be thrown only if
> there exists a default partition.

That's true, but we don't need two levels.

>> > 0007:
>> > This patch introduces code to check if the scanning of default partition
>> > child
>> > can be skipped if it's constraints are proven.
>>
>> If I understand correctly, this is actually a completely separate
>> feature not intrinsically related to default partitioning.
>
> I don't see this as a new feature, since scanning the default partition
> will be introduced by this series of patches only, and rather than a
> feature this can be classified as a completeness of default skip
> validation logic. Your thoughts?

Currently, when a partitioned table is attached, we check whether all
the scans can be checked but not whether scans on some partitions can
be attached.  So there are two separate things:

1. When we introduce default partitioning, we need scan the default
partition either when (a) any partition is attached or (b) any
partition is created.

2. In any situation where scans are needed (scanning the partition
when it's attached, scanning the default partition when some other
partition is attached, scanning the default when a new partition is
created), we can run predicate_implied_by for each partition to see
whether the scan of that partition can be skipped.

Those two changes are independent. We could do (1) without doing (2)
or (2) without doing (1) or we could do both.  So they are separate
features.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] shared memory based stat collector (was: Sharing record typmods between backends)

2017-08-14 Thread Tom Lane

Robert Haas  writes:
> On Sun, Aug 13, 2017 at 9:59 PM, Tom Lane  wrote:
>> In principle we could keep the existing mechanism as a fallback.
>> Whether that's worth the trouble is debatable.  The current code
>> in initdb believes that every platform has some type of DSM support
>> (see choose_dsm_implementation).  Nobody's complained about that,
>> and it certainly works on every buildfarm animal.  So for all we know,
>> dynamic_shared_memory_type = none is broken already.

> Actually, now that you mention it, I think it *is* broken already, or
> more to the point, if you configure that value, the autovacuum
> launcher hangs up, because of the AutovacuumWorkItem stuff that Alvaro
> added.  When I just tested it, the AV launcher somehow ended up
> waiting for AutovacuumLock and I had to SIGQUIT the server to shut it
> down.

Hmm, shouldn't that be an open item for v10?

> ... If somebody has a system where no other form of shared
> memory, works dynamic_shared_memory_type = mmap is still a thing, so
> the use case for "none" seems very thin indeed.  I'd vote for just
> ripping it out in v11.

Just FYI, the only values being reported by buildfarm animals are
"posix", "sysv", and "windows".  So while mmap may be a thing,
it's an untested thing.

 dsm_type  | critters_reporting 
---+
 posix | 69
 sysv  | 13
 windows\r | 10
(3 rows)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] What users can do with custom ICU collations in Postgres 10

2017-08-14 Thread Peter Eisentraut

On 8/9/17 18:49, Peter Geoghegan wrote:
> There are actually very many customizations to collations that are
> possible beyond what the "stock" ICU collations provide (whatever
> "stock" means).

This is very nice indeed, and I was not aware that this was possible
with what we already have in place.

I'm having trouble finding some concrete documentation for this.  The TR
35 link you showed documents the key words and values, BCP 47 documents
the syntax, but nothing puts it all together in a form consumable by
users.  The ICU documentation still mainly focuses on the "old"
@keyword=value syntax.  I guess we'll have to write our own for now.

Given that we cannot reasonably preload all these new variants that you
demonstrated, I think it would make sense to drop all the keyword
variants from the preloaded set.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] shared memory based stat collector (was: Sharing record typmods between backends)

2017-08-14 Thread Alvaro Herrera

Robert Haas wrote:

> Actually, now that you mention it, I think it *is* broken already, or
> more to the point, if you configure that value, the autovacuum
> launcher hangs up, because of the AutovacuumWorkItem stuff that Alvaro
> added.  When I just tested it, the AV launcher somehow ended up
> waiting for AutovacuumLock and I had to SIGQUIT the server to shut it
> down.  That's actually not really entirely the fault of
> dynamic_shared_memory_type = none, though, because the code in
> autovacuum.c does this:
> 
> AutoVacuumDSA = dsa_create(AutovacuumLock->tranche);
> /* make sure it doesn't go away even if we do */
> dsa_pin(AutoVacuumDSA);
> dsa_pin_mapping(AutoVacuumDSA);
> 
> Now, that's actually really broken because if dsa_create() throws an
> error of any kind, you're going to have already assigned the value to
> AutoVacuumDSA, but you will not have pinned the DSA or the DSA
> mapping.  There's evidently some additional bug here because I'd sorta
> expect this code to just go into an infinite loop in this case,
> failing over and over trying to reattach the segment, but evidently
> something even worse happening - perhaps the ERROR isn't releasing
> AutovacuumLock.

Looking into this now.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] shared memory based stat collector (was: Sharing record typmods between backends)

2017-08-14 Thread Andres Freund

Hi,

On 2017-08-14 11:46:10 -0400, Robert Haas wrote:
> On Sun, Aug 13, 2017 at 9:59 PM, Tom Lane  wrote:
> > Andres Freund  writes:
> >> You both are obviously right.  Another point of potential concern could
> >> be that we'd pretyt much fully rely on dsm/dht's being available, for
> >> the server to function correctly. Are we ok with that? Right now
> >> postgres still works perfectly well, leaving parallelism aside, with
> >> dynamic_shared_memory_type = none.
> >
> > In principle we could keep the existing mechanism as a fallback.
> > Whether that's worth the trouble is debatable.  The current code
> > in initdb believes that every platform has some type of DSM support
> > (see choose_dsm_implementation).  Nobody's complained about that,
> > and it certainly works on every buildfarm animal.  So for all we know,
> > dynamic_shared_memory_type = none is broken already.
> 
> Actually, now that you mention it, I think it *is* broken already, or
> more to the point, if you configure that value, the autovacuum
> launcher hangs up, because of the AutovacuumWorkItem stuff that Alvaro
> added.  When I just tested it, the AV launcher somehow ended up
> waiting for AutovacuumLock and I had to SIGQUIT the server to shut it
> down.  That's actually not really entirely the fault of
> dynamic_shared_memory_type = none, though, because the code in
> autovacuum.c does this:
> 
> AutoVacuumDSA = dsa_create(AutovacuumLock->tranche);
> /* make sure it doesn't go away even if we do */
> dsa_pin(AutoVacuumDSA);
> dsa_pin_mapping(AutoVacuumDSA);
> 
> Now, that's actually really broken because if dsa_create() throws an
> error of any kind, you're going to have already assigned the value to
> AutoVacuumDSA, but you will not have pinned the DSA or the DSA
> mapping.  There's evidently some additional bug here because I'd sorta
> expect this code to just go into an infinite loop in this case,
> failing over and over trying to reattach the segment, but evidently
> something even worse happening - perhaps the ERROR isn't releasing
> AutovacuumLock.  Anyway, this code has multiple error handling defects
> and IMHO it's pretty questionable whether DSA should have been used
> here at all.  Allowing the number of autovacuum work items to grow
> without bound would not be a good design - and if we've got a bound
> somewhere, why not just put this in the main shared memory segment?

CCing Alvaro.  This seems like it should be an open item.


> Leaving all that aside, when the DSM facility was introduced in 9.4, I
> was afraid that it would break things for people and that they'd
> demand a way to turn it off, and I included "none" as an option for
> that purpose.  Well, it did break a few things initially but those
> seem to have mostly been fixed during the 9.4 cycle itself.  I'm not
> aware of any subsequent problem reports caused by having DSM enabled
> (pointers welcome!) and given that we now have parallel query relying
> on DSM, people are much less likely to want to just give up on having
> DSM available.  If somebody has a system where no other form of shared
> memory, works dynamic_shared_memory_type = mmap is still a thing, so
> the use case for "none" seems very thin indeed.  I'd vote for just
> ripping it out in v11.

It's possibly still useful for debugging - we'll continue not to be able
to rely on allocations to succeed...


> >> a) It'd be quite useful to avoid needing a whole cluster's stats in
> >>memory. Even if $subject would save memory, I'm hesitant committing
> >>to something requiring all stats to be in memory forever. As a first
> >>step it seems reasonable to e.g. not require state for all databases
> >>to be in memory. The first time per-database stats are required, it
> >>could be "paged in". Could even be more aggressive and do that on a
> >>per-table level and just have smaller placeholder entries for
> >>non-accessed tables, but that seems more work.
> >
> > Huh?  As long as we don't lock the shared memory into RAM, which on most
> > platforms we couldn't do without being root anyway, ISTM the kernel should
> > do just fine at paging out little-used areas of the stats.  Let's not
> > reinvent that wheel until there's demonstrable proof of need.
> 
> I think some systems aren't able to page out data stored in shared
> memory segments, and it would probably be pretty awful for performance
> if they did (there's a reason large sorts need to switch to using temp
> files ... and the same rationale seems to apply here).

Right.


> That having been said, I don't see a need to change everything at
> once.  If we moved the stats collector data from frequently-rewritten
> files to shared memory, we'd already be saving a lot of I/O and
> possibly memory utilization.

Right #7


> >> b) I think our tendency to dump all stats whenever we crash isn't really
> >>tenable, given how

Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)

2017-08-14 Thread Robert Haas

On Sun, Aug 13, 2017 at 11:05 AM, Sokolov Yura
 wrote:
> BTW, ad-hoc hash tables already exist: at least recourse owner uses one.

Yeah.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql 10: hash indexes testing

2017-08-14 Thread AP

On Fri, Aug 11, 2017 at 07:33:51AM +0530, Amit Kapila wrote:
> On Thu, Aug 10, 2017 at 4:11 PM, AP  wrote:
> > mdstash=# select * from pgstathashindex('link_datum_id_idx');
> >  version | bucket_pages | overflow_pages | bitmap_pages | unused_pages | 
> > live_items | dead_items |   free_percent
> > -+--++--+--+++--
> >4 | 12391325 |5148912 |  158 |   191643 | 
> > 4560007478 |  0 | 1894.29056075982
> > (1 row)
> 
> The free_percent calculation seems to be wrong.  Can you please once
> try after recent commit 0b7ba3d6474b8f58e74dba548886df3250805cdf?  I
> feel this should be fixed by that commit.

Sorry I couldn't get to help you debugging this myself. Work got annoying. :/

That said, I think that this is the first time that I've seen the value be
under 100:

mdstash=# select * from pgstathashindex('link_datum_id_idx');
 version | bucket_pages | overflow_pages | bitmap_pages | unused_pages | 
live_items | dead_items |   free_percent
-+--++--+--+++--
   4 | 22957200 |9272698 |  283 |  2208624 | 
8448300552 |  0 | 39.8146658879555
(1 row)

Time: 2882974.635 ms (48:02.975)

The index is still functioning, too, with far more data than I've ever had
in the table in the past and well beyond the point where it would previously
die. Happy days. :)

AP


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] shared memory based stat collector (was: Sharing record typmods between backends)

2017-08-14 Thread Robert Haas

On Sun, Aug 13, 2017 at 9:59 PM, Tom Lane  wrote:
> Andres Freund  writes:
>> You both are obviously right.  Another point of potential concern could
>> be that we'd pretyt much fully rely on dsm/dht's being available, for
>> the server to function correctly. Are we ok with that? Right now
>> postgres still works perfectly well, leaving parallelism aside, with
>> dynamic_shared_memory_type = none.
>
> In principle we could keep the existing mechanism as a fallback.
> Whether that's worth the trouble is debatable.  The current code
> in initdb believes that every platform has some type of DSM support
> (see choose_dsm_implementation).  Nobody's complained about that,
> and it certainly works on every buildfarm animal.  So for all we know,
> dynamic_shared_memory_type = none is broken already.

Actually, now that you mention it, I think it *is* broken already, or
more to the point, if you configure that value, the autovacuum
launcher hangs up, because of the AutovacuumWorkItem stuff that Alvaro
added.  When I just tested it, the AV launcher somehow ended up
waiting for AutovacuumLock and I had to SIGQUIT the server to shut it
down.  That's actually not really entirely the fault of
dynamic_shared_memory_type = none, though, because the code in
autovacuum.c does this:

AutoVacuumDSA = dsa_create(AutovacuumLock->tranche);
/* make sure it doesn't go away even if we do */
dsa_pin(AutoVacuumDSA);
dsa_pin_mapping(AutoVacuumDSA);

Now, that's actually really broken because if dsa_create() throws an
error of any kind, you're going to have already assigned the value to
AutoVacuumDSA, but you will not have pinned the DSA or the DSA
mapping.  There's evidently some additional bug here because I'd sorta
expect this code to just go into an infinite loop in this case,
failing over and over trying to reattach the segment, but evidently
something even worse happening - perhaps the ERROR isn't releasing
AutovacuumLock.  Anyway, this code has multiple error handling defects
and IMHO it's pretty questionable whether DSA should have been used
here at all.  Allowing the number of autovacuum work items to grow
without bound would not be a good design - and if we've got a bound
somewhere, why not just put this in the main shared memory segment?

Leaving all that aside, when the DSM facility was introduced in 9.4, I
was afraid that it would break things for people and that they'd
demand a way to turn it off, and I included "none" as an option for
that purpose.  Well, it did break a few things initially but those
seem to have mostly been fixed during the 9.4 cycle itself.  I'm not
aware of any subsequent problem reports caused by having DSM enabled
(pointers welcome!) and given that we now have parallel query relying
on DSM, people are much less likely to want to just give up on having
DSM available.  If somebody has a system where no other form of shared
memory, works dynamic_shared_memory_type = mmap is still a thing, so
the use case for "none" seems very thin indeed.  I'd vote for just
ripping it out in v11.

>> a) It'd be quite useful to avoid needing a whole cluster's stats in
>>memory. Even if $subject would save memory, I'm hesitant committing
>>to something requiring all stats to be in memory forever. As a first
>>step it seems reasonable to e.g. not require state for all databases
>>to be in memory. The first time per-database stats are required, it
>>could be "paged in". Could even be more aggressive and do that on a
>>per-table level and just have smaller placeholder entries for
>>non-accessed tables, but that seems more work.
>
> Huh?  As long as we don't lock the shared memory into RAM, which on most
> platforms we couldn't do without being root anyway, ISTM the kernel should
> do just fine at paging out little-used areas of the stats.  Let's not
> reinvent that wheel until there's demonstrable proof of need.

I think some systems aren't able to page out data stored in shared
memory segments, and it would probably be pretty awful for performance
if they did (there's a reason large sorts need to switch to using temp
files ... and the same rationale seems to apply here).  That having
been said, I don't see a need to change everything at once.  If we
moved the stats collector data from frequently-rewritten files to
shared memory, we'd already be saving a lot of I/O and possibly memory
utilization.  If somebody then wanted to refine that further by
spilling rarely used data out to disk and reloading it on demand, that
could be done as a follow-on patch.  But I think that would only be
needed for people with *really* large numbers of tables, and it would
only help them if most of those tables were barely ever touched.

>> b) I think our tendency to dump all stats whenever we crash isn't really
>>tenable, given how autovacuum etc are tied to them.
>
> Eh, maybe.  I don't think crashes are

Re: [HACKERS] ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)

2017-08-14 Thread Peter Eisentraut

On 8/7/17 21:00, Peter Geoghegan wrote:
> Actually, it's *impossible* for ICU to fail to accept any string as a
> valid locale within CREATE COLLATION, because CollationCreate() simply
> doesn't sanitize ICU names. It doesn't do something like call
> get_icu_language_tag(), unlike initdb (within
> pg_import_system_collations()).
> 
> If I add such a test to CollationCreate(), it does a reasonable job of
> sanitizing, while preserving the spirit of the BCP 47 language tag
> format by not assuming that the user didn't specify a brand new locale
> that it hasn't heard of.

I'm not sure what you are proposing here.  Convert the input to CREATE
COLLATION to a BCP 47 language tag?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standby server

2017-08-14 Thread Stephen Frost

Robert, Michael,

* Michael Paquier (michael.paqu...@gmail.com) wrote:
> On Sun, Aug 6, 2017 at 3:22 AM, Robert Haas  wrote:
> > All of (1)-(3) are legitimate user choices, although not everyone will
> > make them.  (4) is unfortunately the procedure recommended by our
> > documentation, which is where the problem comes in.  I think it's
> > pretty lame that the documentation recommends ignoring the return
> > value of pg_stop_backup(); ISTM that it would be very wise to
> > recommend cross-checking the return value against the WAL files you're
> > keeping for the backup.  Even if we thought the waiting logic was
> > working perfectly in every case, having a cross-check on something as
> > important as backup integrity seems like an awfully good idea.
> 
> I would got a little bit more far and say that this is mandatory as
> the minimum recovery point that needs to be reached is the LSN
> returned by pg_stop_backup(). For backups taken from primaries, this
> is a larger deal because XLOG_BACKUP_END may not be present if the
> last WAL segment switched is not in the archive. For backups taken
> from standbys, the thing is more tricky as the control file should be
> backed up last. I would think that the best thing we could do is to
> extend pg_stop_backup a bit more so as it returns the control file to
> write in the backup using a bytea to hold the data for example.

Indeed, getting this all correct isn't trivial and it's really
unfortunate that our documentation in this area is really lacking.
Further, having things not actually work as the documentation claims
isn't exactly helping.

> > I think the root of this problem is that commit
> > 7117685461af50f50c03f43e6a622284c8d54694 did only a pretty perfunctory
> > update of the documentation, as the commit message itself admits:
> >
> > Only reference documentation is included. The main section on
> > backup still needs
> > to be rewritten to cover this, but since that is already scheduled
> > for a separate
> > large rewrite, it's not included in this patch.
> >
> > But it doesn't look like that ever really got done.
> > https://www.postgresql.org/docs/10/static/continuous-archiving.html#backup-lowlevel-base-backup
> > really doesn't talk at all about standbys or differences in required
> > procedures between masters and standbys.  It makes statements that are
> > flagrantly riduculous in the case of a standby, like claiming that
> > pg_start_backup() always performs a checkpoint and that pg_stop_backup
> > "terminates the backup mode and performs an automatic switch to the
> > next WAL segment."  Well, obviously not.
> 
> Yep.

Indeed, that was something that I had also heard was being worked on,
but it's really unfortunate that it didn't happen.

> > And at least to me, that's the real bug here.  Somebody needs to go
> > through and fix this documentation properly.
> 
> So, Magnus? :)

I continue to be off the opinion that rewriting the documentation in
this case to match what we actually do is pretty unfriendly to users who
have built tools using the documentation at the time.  Evidently, that
ship has sailed, however, but we didn't help things here by only
changing how PG10 works and not also at least updating the documentation
to be accurate.

I've poked Magnus... more directly regarding this and offered to have
someone help with the development of better documentation in this area.
Hopefully we can get the docs in the back-branches fixed for the next
round of minor releases, and call out those updates in the release
notes, at least.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] [BUGS] Replication to Postgres 10 on Windows is broken

2017-08-14 Thread Tom Lane

Peter Eisentraut  writes:
> It appears there is a patch that fixes the issue now.  Let's wait until
> Wednesday to see if that ends up being successful.

Jobim reported success with add-connected-event-2.patch --- are you
expecting more testing to happen?

I did instrument the loop in libpqrcv_connect and verified that the
process latch is set the first time through.  I haven't traced down
exactly why it's set at that point, but that confirms why Andres'
initial patch didn't work and the new one did.

I think we could commit add-connected-event-2.patch and call this
issue resolved.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/perl extension fails on Windows

2017-08-14 Thread Tom Lane

Sandeep Thakkar  writes:
> On Thu, Aug 10, 2017 at 9:04 PM, Tom Lane  wrote:
>> Got it.  So in short, it seems like the attached patch ought to fix it
>> for MSVC builds.  (We'd also need to teach PGAC_CHECK_PERL_EMBED_CCFLAGS
>> to let _USE_32BIT_TIME_T through on Windows, but let's confirm the theory
>> first.)

> We built the sources with this patch and were able to create the plperl
> extension on Windows 32bit and 64bit.

Excellent, thanks for testing.  I'll finish up the configure-script part
and push this shortly.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] Replication to Postgres 10 on Windows is broken

2017-08-14 Thread Peter Eisentraut

On 8/13/17 15:46, Noah Misch wrote:
> IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 10 open item is long past due
> for your status update.  Please reacquaint yourself with the policy on open
> item ownership[1] and then reply immediately.  If I do not hear from you by
> 2017-08-14 20:00 UTC, I will transfer this item to release management team
> ownership without further notice.

It appears there is a patch that fixes the issue now.  Let's wait until
Wednesday to see if that ends up being successful.

Otherwise, the plan forward would be to decorate the change in the
original commit with an #ifndef WIN32.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Fix a typo in sequence.c

2017-08-14 Thread Masahiko Sawada

Hi,

While reading source code, I found a typo in sequence.c file. Attached
patch for fixing it.

s/localy/locally/g

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


fix_typo_in_sequence_c.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Orphaned files in base/[oid]

2017-08-14 Thread Chris Travers

Hi all;

I am trying to track down a problem we are seeing that looks very similar
to bug #12050, and would certainly consider trying to contribute a fix if
we agree on one.  (I am not sure we can, so absent that, the next question
is whether it makes sense to create a utility to fix the problem when it
comes up so that a dump/restore is not needed).

The system:
PostgreSQL 9.6.3
Gentoo Linux.

Problem:
The system this came up on is PostgreSQL 9.6.3 and has had repeated trouble
with disk space.  Querying pg_database_size, as well as du on the
subdirectory of base/ show total usage to be around 3.8TB.  Summing up the
size of the relations in pg_class though shows around 2.1TB.

Initial troubleshooting found around 150 GB of space in pg_temp which had
never been cleared and was at least several days old.  Restarting the
server cleared these up.

Poking around the base/[oid] directory, I found a large number of files
which did not correspond with a pg_class entry.  One of the apparent
relations was nearly 1TB in size.

What I think happened:
I think various pg_temp/* and orphaned relation files (In base/[oid]) were
created when PostgreSQL crashed due to running out of space in various
operations including creating materialised views.

So my question is if there is a way we can safely clean these up on server
restart?  If not does it make sense to try to create a utility that can
connect to PostgreSQL, seek out valid files, and delete the rest?

-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin

Re: [HACKERS] Adding support for Default partition in partitioning

2017-08-14 Thread Jeevan Ladhe

Hi Robert,


> 0005:
> > Extend default partitioning support to allow addition of new partitions.
>
> +   if (spec->is_default)
> +   {
> +   /* Default partition cannot be added if there already
> exists one. */
> +   if (partdesc->nparts > 0 &&
> partition_bound_has_default(boundinfo))
> +   {
> +   with = boundinfo->default_index;
> +   ereport(ERROR,
> +
> (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
> +errmsg("partition \"%s\"
> conflicts with existing default partition \"%s\"",
> +   relname,
> get_rel_name(partdesc->oids[with])),
> +parser_errposition(pstate,
> spec->location)));
> +   }
> +
> +   return;
> +   }
>
> I generally think it's good to structure the code so as to minimize
> the indentation level.  In this case, if you did if (partdesc->nparts
> == 0 || !partition_bound_has_default(boundinfo)) return; first, then
> the rest of it could be one level less indented.  Also, perhaps it
> would be clearer to test boundinfo == NULL rather than
> partdesc->nparts == 0, assuming they are equivalent.


I think even with this change there will be one level of indentation
needed for throwing the error, as the error is to be thrown only if
there exists a default partition.


>
>
-* We must also lock the default partition, for the same
> reasons explained
> -* in heap_drop_with_catalog().
> +* We must lock the default partition, for the same reasons
> explained in
> +* DefineRelation().
>
> I don't really see the point of this change.  Whichever earlier patch
> adds this code could include or omit the word "also" as appropriate,
> and then this patch wouldn't need to change it.
>
>
Actually the change is made because if the difference in the function name.
I will remove ‘also’ from the first patch itself.


> > 0007:
> > This patch introduces code to check if the scanning of default partition
> > child
> > can be skipped if it's constraints are proven.
>
> If I understand correctly, this is actually a completely separate
> feature not intrinsically related to default partitioning.


I don't see this as a new feature, since scanning the default partition
will be introduced by this series of patches only, and rather than a
feature this can be classified as a completeness of default skip
validation logic. Your thoughts?

Regards,
Jeevan Ladhe

Re: [HACKERS] parallelize queries containing initplans

2017-08-14 Thread Amit Kapila

On Sun, Aug 13, 2017 at 6:49 PM, Haribabu Kommi
 wrote:
> On Fri, Aug 11, 2017 at 1:18 AM, Amit Kapila 
> wrote:
>>
>
> Thanks for the updated patch. Patch looks fine. I just have some
> minor comments.
>
> + * ExecEvalParamExecParams
> + *
> + * Execute the subplan stored in PARAM_EXEC initplans params, if not
> executed
> + * till now.
> + */
> +void
> +ExecEvalParamExecParams(Bitmapset *params, EState *estate)
>
> I feel it is better to explain when this function executes the sub plans
> that are
> not executed till now? Means like in what scenario?
>

It just means that it will execute the same initplan (subplan) just
once in master backend even if it used in multiple places.  This is
the same kind of usage as we have in ExecEvalParamExec.  You can find
its usage by using some query like

explain analyse select sum(t1.i) from t1, t2 where t1.j=t2.j and t1.k
= (select count(k) from t3) and t1.k=t2.k;

Ensure you insert some rows in t1 and t2 that match the count from t3.
If you are using the schema and data given in Kuntal's script in email
above, then you need to insert something like
t1(900,900,900);t2(900,900,900);

It generates plan like below:

postgres=# explain analyse select sum(t1.i) from t1, t2 where
t1.j=t2.j and t1.k = (select count(k) from t3) and t1.k=t2.k;
QUERY PLAN
---
 Aggregate  (cost=29.65..29.66 rows=1 width=8) (actual
time=22572.521..22572.521 rows=1 loops=1)
   InitPlan 1 (returns $0)
 ->  Finalize Aggregate  (cost=9.70..9.71 rows=1 width=8) (actual
time=4345.110..4345.111 rows=1 loops=1)
   ->  Gather  (cost=9.69..9.70 rows=2 width=8) (actual
time=4285.019..4345.098 rows=3 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 ->  Partial Aggregate  (cost=9.69..9.70 rows=1
width=8) (actual time=0.154..0.155 rows=1 loops=3)
   ->  Parallel Seq Scan on t3  (cost=0.00..8.75
rows=375 width=4) (actual time=0.011..0.090 rows=300 loops=3)
   ->  Nested Loop  (cost=0.00..19.93 rows=1 width=4) (actual
time=22499.918..22572.512 rows=1 loops=1)
 Join Filter: (t1.j = t2.j)
 ->  Gather  (cost=0.00..9.67 rows=2 width=12) (actual
time=10521.356..10521.363 rows=1 loops=1)
   Workers Planned: 2
   Params Evaluated: $0
   Workers Launched: 2
   ->  Parallel Seq Scan on t1  (cost=0.00..9.67 rows=1
width=12) (actual time=0.506..0.507 rows=0 loops=3)
 Filter: (k = $0)
 Rows Removed by Filter: 299
 ->  Materialize  (cost=0.00..10.21 rows=2 width=8) (actual
time=11978.557..12051.142 rows=1 loops=1)
   ->  Gather  (cost=0.00..10.20 rows=2 width=8) (actual
time=11978.530..12051.113 rows=1 loops=1)
 Workers Planned: 2
 Params Evaluated: $0
 Workers Launched: 2
 ->  Parallel Seq Scan on t2  (cost=0.00..10.20
rows=1 width=8) (actual time=0.067..0.067 rows=0 loops=3)
   Filter: (k = $0)
   Rows Removed by Filter: 333
 Planning time: 15103.237 ms
 Execution time: 22574.703 ms
(27 rows)

You can notice that initplan is used at multiple nodes, but it will be
evaluated just once.  If you want, I can add a sentence in the
comments, but I think this is somewhat obvious and the same use case
already exists.  Let me know if you still think that comments need to
be expanded?

>
> + if (params == NULL)
> + nparams = 0;
> + else
> + nparams = bms_num_members(params);
>
> bms_num_members return 0 in case if the params is NULL.
> Is it fine to keep the specific check for NULL is required for performance
> benefit
> or just remove it? Anything is fine for me.
>

I don't see any performance benefit here, so removed the if check.

>
> + if (IsA(plan, Gather))
> + ((Gather *) plan)->initParam = bms_intersect(plan->lefttree->extParam,
> initSetParam);
> + else
> + ((GatherMerge *) plan)->initParam =
> bms_intersect(plan->lefttree->extParam, initSetParam);
>
>
> I think the above code is to find out the common parameters that are prsent
> in the external
> and out params.
>

Here, we want to save all the initplan params that can be used below
the gather node.  extParam contains the set of all external PARAM_EXEC
params that can be used below gather node and we just want initplan
params out of those.

> It may be better to explain the logic in the comments.
>

I have kept comments atop of the function set_param_references to
explain the context, but now I have added few more in the code as
suggested by you.  See if that suffices the need.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


pq_pushdown_initplan_v7.patch
Description: Binary data

Re: [HACKERS] Regressions failures with libxml2 on ArchLinux

2017-08-14 Thread Michael Paquier

On Mon, Aug 14, 2017 at 6:46 PM, Aleksander Alekseev
 wrote:
> Hi Michael,
>
>> It's been one month since I have done some serious development with
>> Archlinux (I was abroad and away from the laptop dedicated to that),
>> and surprise, I can see failures in the PG regression tests, like the
>> following short extract (result compared to expected/xml.out):
>
> I can confirm that I see the same errors on Arch Linux with latest
> updates when PostgreSQL is compiled with --with-libxml and/or
> --with-libxslt. I submitted a few details on bugs.archlinux.org [1]
> since probably not all Arch Linux maintainers know how to reproduce an
> issue.
>
> [1]: https://bugs.archlinux.org/task/55134

Thanks for adding the details directly, downgrading the hard way is
what I am doing now using the past packages of libxml2 in Arch's
archives [1]. ArchLinux is a bit wrong in the fact of shipping a
package with a behavior change. Let's wait also for libxml2 folks to
see what they have to provide on the matter... The next release of
libxml2 would hurt Postgres if it were to be released today.

[1]: https://archive.archlinux.org/packages/l/libxml2/
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Regressions failures with libxml2 on ArchLinux

2017-08-14 Thread Aleksander Alekseev

Hi Michael,

> I can confirm that I see the same errors on Arch Linux with latest
> updates when PostgreSQL is compiled with --with-libxml and/or
> --with-libxslt. I submitted a few details on bugs.archlinux.org [1]
> since probably not all Arch Linux maintainers know how to reproduce an
> issue.
> 
> [1]: https://bugs.archlinux.org/task/55134

TWIMC I've also described a workaround there.

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature

Re: [HACKERS] Regressions failures with libxml2 on ArchLinux

2017-08-14 Thread Aleksander Alekseev

Hi Michael,

> It's been one month since I have done some serious development with
> Archlinux (I was abroad and away from the laptop dedicated to that),
> and surprise, I can see failures in the PG regression tests, like the
> following short extract (result compared to expected/xml.out):

I can confirm that I see the same errors on Arch Linux with latest
updates when PostgreSQL is compiled with --with-libxml and/or
--with-libxslt. I submitted a few details on bugs.archlinux.org [1]
since probably not all Arch Linux maintainers know how to reproduce an
issue.

[1]: https://bugs.archlinux.org/task/55134

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature

Re: [HACKERS] pl/perl extension fails on Windows

2017-08-14 Thread Sandeep Thakkar

Hi

On Thu, Aug 10, 2017 at 9:04 PM, Tom Lane  wrote:

> Ashutosh Sharma  writes:
> > On Thu, Aug 10, 2017 at 8:06 PM, Tom Lane  wrote:
> >> Yeah ... however, if that's there, then there's something wrong with
> >> Ashutosh's explanation, because that means we *are* building with
> >> _USE_32BIT_TIME_T in 32-bit builds.  It's just getting there in a
> >> roundabout way.  (Or, alternatively, this code is somehow not doing
> >> anything at all.)
>
> > I am extremely sorry if i have communicated the things wrongly, what i
> > meant was we are always considering _USE_32BIT_TIME_T flag to build
> > plperl module on Windows 32-bit platform but unfortunately that is not
> > being considered/defined in perl code in case we use VC++ compiler
> > version greater than 8.0. and that's the reason for the binary
> > mismatch error on 32 bit platform.
>
> Got it.  So in short, it seems like the attached patch ought to fix it
> for MSVC builds.  (We'd also need to teach PGAC_CHECK_PERL_EMBED_CCFLAGS
> to let _USE_32BIT_TIME_T through on Windows, but let's confirm the theory
> first.)
>
> We built the sources with this patch and were able to create the plperl
extension on Windows 32bit and 64bit.


> regards, tom lane
>
>


-- 
Sandeep Thakkar
EDB

[HACKERS] Secondary index access optimizations

2017-08-14 Thread Konstantin Knizhnik


Hi hackers,

I am trying to compare different ways of optimizing work with huge 
append-only tables in PostgreSQL where primary key is something like 
timestamp and queries are usually accessing most recent data using some 
secondary keys. Size of secondary index is one of the most critical 
factors limiting  insert/search performance. As far as data is inserted 
in timestamp ascending order, access to primary key is well localized 
and accessed tables are present in memory. But if we create secondary 
key for the whole table, then access to it will require random reads 
from the disk and significantly decrease performance.


There are two well known solutions of the problem:
1. Table partitioning
2. Partial indexes

This approaches I want to compare. First of all I want to check if 
optimizer is able to generate efficient query execution plan covering 
different time intervals.

Unfortunately in both cases generated plan is not optimal.

1. Table partitioning:

create table base (k integer primary key, v integer);
create table part1 (check (k between 1 and 1)) inherits (base);
create table part2 (check (k between 10001 and 2)) inherits (base);
create index pi1 on part1(v);
create index pi2 on part2(v);
insert int part1 values (generate series(1,1), random());
insert into part2 values (generate_series(10001,2), random());
explain select * from base where k between 1 and 2 and v = 100;
  QUERY PLAN
---
 Append  (cost=0.00..15.65 rows=3 width=8)
   ->  Seq Scan on base  (cost=0.00..0.00 rows=1 width=8)
 Filter: ((k >= 1) AND (k <= 2) AND (v = 100))
   ->  Index Scan using pi1 on part1  (cost=0.29..8.31 rows=1 width=8)
 Index Cond: (v = 100)
 Filter: ((k >= 1) AND (k <= 2))
   ->  Index Scan using pi2 on part2  (cost=0.29..7.34 rows=1 width=8)
 Index Cond: (v = 100)
 Filter: ((k >= 1) AND (k <= 2))

Questions:
- Is there some way to avoid sequential scan of parent table? Yes, it is 
empty and so sequential scan will not take much time, but ... it still 
requires some additional actions and so increasing query execution time.
- Why index scan of partition indexes includes filter condition if it is 
guaranteed by check constraint that all records of this partition match 
search predicate?



2. Partial indexes:

create table t (k integer primary key, v integer);
insert into t values (generate_series(1,2),random());
create index i1 on t(v) where k between 1 and 1;
create index i2 on t(v) where k between 10001 and 2;
postgres=# explain select * from t where k between 1 and 1 and v = 100;
 QUERY PLAN

 Index Scan using i1 on t  (cost=0.29..7.28 rows=1 width=8)
   Index Cond: (v = 100)
(2 rows)


Here we get perfect plan. Let's try to extend search interval:


postgres=# explain select * from t where k between 1 and 2 and v = 100;
QUERY PLAN
--
 Index Scan using t_pkey on t  (cost=0.29..760.43 rows=1 width=8)
   Index Cond: ((k >= 1) AND (k <= 2))
   Filter: (v = 100)
(3 rows)

Unfortunately in this case Postgres is not able to apply partial indexes.
And this is what I expected to get:

postgres=# explain select * from t where k between 1 and 1 and v = 
100 union all select * from t where k between 10001 and 2 and v = 100;

  QUERY PLAN
--
 Append  (cost=0.29..14.58 rows=2 width=8)
   ->  Index Scan using i1 on t  (cost=0.29..7.28 rows=1 width=8)
 Index Cond: (v = 100)
   ->  Index Scan using i2 on t t_1  (cost=0.29..7.28 rows=1 width=8)
 Index Cond: (v = 100)


I wonder if there are some principle problems in supporting this two 
things in optimizer:
1. Remove search condition for primary key if it is fully satisfied by 
derived table check constraint.
2. Append index scans of several partial indexes if specified interval 
is covered by their conditions.


I wonder if someone is familiar with this part of optimizer ad can 
easily fix it.
Otherwise I am going to spend some time on solving this problems (if 
community think that such optimizations will be useful).


--

Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

2017-08-14 Thread Masahiko Sawada

On Tue, Aug 8, 2017 at 10:50 PM, Fabien COELHO  wrote:
>
> Hello Mahahiko-san,
>
> My 0.02€ about the patch & feature, which only reflect my point of view:

Thank you for reviewing the patch!

> Please add a number to patches to avoid ambiguities. This was patch was the
> second sent on the thread.
>
> There is no need to have both custom_init & init functions. I'll suggest to
> remove function "init", rename "custom_init" as "init", and make the option
> default to what is appropriate so that it initialize the schema as
> expected, and there is only one initialization mechanism.
>
> I would suggest to make initialization sub options (no-vacuum,
> foreign-key...) rely on updating the initialization operations instead of
> being maintained separately. Currently "--no-vacuum --custom-init=vacuum"
> would skip vacuum, which is quite debatable...
>
> I'm not sure of the "custom-initialize" option name, but why not. I suggest
> to remove "is_initialize_suite", and make custom-initialize require -i as
> seems logical and upward compatible.
>
> ISTM that there should be short names for the phases. Maybe using only one
> letter could simplify the code significantly, dropping commas and list, eg:
> "t" for "create_table", "d" for "load_data", "p" for "create_pkey", "f" for
> "create_fkey", "v" for "vacuum".
>
> I do not think that allowing a space in the list is a shell-wise idea.

I think we can use it like --custom-initialize="create_table, vacuum"
which is similar to what we specify a connection option to psql for
example. But it will be unnecessary if we have the one letter version.

> I'm also wondering whether using a list is a good option, because it implies
> a large parse function, list management and so on. With the one letter
> version, it could be just a string to be scanned char by char for
> operations.

I basically agree with the one letter version. But I'm concerned that
it'll confuse users if we have more initialization steps for the
pgbench initialization. If we add more various initialization steps it
makes pgbench command hard to read and the users might have to
remember these options.

>
> Otherwise, at least allow short two letter codes (eg: ct ld pk fk va), in
> order to avoid writing a very long thing on the command line, which is quite
> a pain:
>
>   sh> pgbench
> --custom-initialize=create_table,load_data,primary_key,foreign_key,vacuum
> ...
>
> vs
>
>   sh> pgbench -i -I tdpfv ...
>
> Maybe there could be short-hands for usual setups, eg "default" for "tdpv"
> or maybe "ct,ld,pk,va", "full" for "tdpfv" or maybe "ct,ld,pk,fk,va"...

If --custom-initialize option requires for i option to be set,
"pgbench -i" means the initialization with full steps and "pgbench -i
--custom-initialize=..." means the initialization with custom
operation steps.

> Typo in doc "initailize" -> "initialize". Option values should be presented
> in their logical execution order, i.e. put vacuum at the end.
>
> Typo in help "initilize" -> "initialize". I would suggest to drop the space
> in the
> option value in the presentation so that quotes are not needed.
>
> Remove the "no-primary-keys" from the long option array as it has
> disappeared. You might consider make "custom-initialize" take the 'I' short
> option.
>
> Ranting unrelated to this patch: the automatic aid type switching based on
> scale is a bad idea (tm), because when trying to benchmark it means that
> changing the scale also changes the schema, and you really do not need that.
> ISTM that it should always use INT8.

Hmm, I think it's a valid point. Should we allow users to specify like
the above thing in the custom initialization feature as well?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] Replication to Postgres 10 on Windows is broken

2017-08-14 Thread Augustine, Jobin

Thank you Tom Lane,
This patch fixes the problem.

With this patch, streaming replication started working (replication to
Windows)

(Tested for Linux to Windows replication)

-Jobin.

On Mon, Aug 14, 2017 at 2:17 AM, Tom Lane  wrote:

> Andres Freund  writes:
> > On 2017-08-11 18:11:03 +0530, Augustine, Jobin wrote:
> >> Appears that patch is not helping.
>
> > That's too bad. Any chance you could install
> > https://docs.microsoft.com/en-us/sysinternals/downloads/procmon and
> > activate monitoring just for that phase? I think it can export to a text
> > file...
>
> It strikes me that maybe the misuse of io_flag could be contributing
> to this: if the walreceiver process's latch were set, we'd end up calling
> PQconnectPoll before the socket had necessarily come ready, which would
> produce the described symptom.  That's grasping at straws admittedly,
> because I'm not sure why the walreceiver process's latch would be set
> at this point; but it seems like we ought to test a version of the patch
> that we believe correct before deciding that we still have a problem.
>
> To move things along, here's a corrected patch --- Jobin, please test.
>
> regards, tom lane
>
>


-- 

*Jobin Augustine*
Architect : Production Database Operations


*OpenSCG*


*phone : +91 9989932600*

Re: [HACKERS] Patches I'm thinking of pushing shortly

2017-08-14 Thread Magnus Hagander

On Sun, Aug 13, 2017 at 11:43 PM, Robert Haas  wrote:

> On Sun, Aug 13, 2017 at 5:24 PM, Tom Lane  wrote:
> >> I'd vote for including this in v10.  There doesn't seem to be any
> >> downside to this: it's a no brainer to avoid our exploding hash table
> >> case when we can see it coming.
> >
> > Anybody else want to vote that way?  For myself it's getting a bit late
> > in the beta process to be including inessential changes, but I'm willing
> > to push it to v10 not just v11 if there's multiple people speaking for
> > that.
>
> I'd vote for waiting until v11.  I think it's too late to be doing
> things that might change good plans into bad ones or visca versa;
> that's a recipe for having to put out 10.1 and 10.2 a little quicker
> than I'd like.
>

+1 for waiting until v11 with it.

We have plenty enough other things that could end up needing a quick
post-release-release, and those are things that have received at least
somem testing...

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: [HACKERS] Regressions failures with libxml2 on ArchLinux

2017-08-14 Thread Arthur Zakirov

Hello,

On Sat, Aug 12, 2017 at 08:46:38PM +0900, Michael Paquier wrote:
> I can notice as well that no buildfarm machines are running ArchLinux
> now, so that's one reason why this got undetected until now. My own
> machines running Archlinux ARM have been unplugged for a full month,
> and I can see the failure there. They are not able to report back to
> the buildfarm, but that's a separate issue, visibly that's an access
> permission.
> 
> I have not investigated much this problem yet, but has somebody else
> seen those diffs?
> 

I have libxml2 2.9.4+96+gfb56f80e-1 on my Archlinux. All regression tests
passed without any fails.
I ran check and installcheck commands.

Of course my environment is not completely match to your environment. It
could be a reason why we have different results.

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standby server

2017-08-14 Thread Michael Paquier

On Sun, Aug 6, 2017 at 3:22 AM, Robert Haas  wrote:
> All of (1)-(3) are legitimate user choices, although not everyone will
> make them.  (4) is unfortunately the procedure recommended by our
> documentation, which is where the problem comes in.  I think it's
> pretty lame that the documentation recommends ignoring the return
> value of pg_stop_backup(); ISTM that it would be very wise to
> recommend cross-checking the return value against the WAL files you're
> keeping for the backup.  Even if we thought the waiting logic was
> working perfectly in every case, having a cross-check on something as
> important as backup integrity seems like an awfully good idea.

I would got a little bit more far and say that this is mandatory as
the minimum recovery point that needs to be reached is the LSN
returned by pg_stop_backup(). For backups taken from primaries, this
is a larger deal because XLOG_BACKUP_END may not be present if the
last WAL segment switched is not in the archive. For backups taken
from standbys, the thing is more tricky as the control file should be
backed up last. I would think that the best thing we could do is to
extend pg_stop_backup a bit more so as it returns the control file to
write in the backup using a bytea to hold the data for example.

> I think the root of this problem is that commit
> 7117685461af50f50c03f43e6a622284c8d54694 did only a pretty perfunctory
> update of the documentation, as the commit message itself admits:
>
> Only reference documentation is included. The main section on
> backup still needs
> to be rewritten to cover this, but since that is already scheduled
> for a separate
> large rewrite, it's not included in this patch.
>
> But it doesn't look like that ever really got done.
> https://www.postgresql.org/docs/10/static/continuous-archiving.html#backup-lowlevel-base-backup
> really doesn't talk at all about standbys or differences in required
> procedures between masters and standbys.  It makes statements that are
> flagrantly riduculous in the case of a standby, like claiming that
> pg_start_backup() always performs a checkpoint and that pg_stop_backup
> "terminates the backup mode and performs an automatic switch to the
> next WAL segment."  Well, obviously not.

Yep.

> And at least to me, that's the real bug here.  Somebody needs to go
> through and fix this documentation properly.

So, Magnus? :)
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Regressions failures with libxml2 on ArchLinux

2017-08-14 Thread Michael Paquier

On Sat, Aug 12, 2017 at 8:46 PM, Michael Paquier
 wrote:
> I have not investigated much this problem yet, but has somebody else
> seen those diffs?

Coming back to that... Downgrading down to 2.9.4+4+g3169602-1 is
proving to put things back the way they should. Even downgrading to
2.9.4+91+g872fea94-1 did not help much, and 2.9.4+96+gfb56f80e-1 is
the newest version.

I have logged an issue to archlinux website to let them know about the
regression, because they are shipping a broken package:
https://bugs.archlinux.org/task/55134

Using this information I have done as well some bisecting of libxml2,
to finish with the following commit as culprit:
commit 46dc989080d5d6b7854de8fb3cb3de55ecbf0621
Author: Nick Wellnhofer 
Date:   Thu Jun 8 02:24:56 2017 +0200

Don't switch encoding for internal parameter entities

This is only needed for external entities. Trying to switch the encoding
for internal entities could also cause a memory leak in recovery mode.

So I have logged as well a bug for the upstream folks:
https://bugzilla.gnome.org/show_bug.cgi?id=786267

Please note that I have not concluded yet if PostgreSQL is to blame or
not in this case with its use of the libxml2 APIs...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

97 matches

Mail list logo