date:20171201

Re: [HACKERS] Parallel Hash take II

2017-12-01 Thread Andres Freund

On 2017-11-27 22:25:12 +1300, Thomas Munro wrote:
> On Thu, Nov 23, 2017 at 12:36 AM, Thomas Munro
>  wrote:
> > Here's a new patch set with responses to the last batch of review comments.

Looking at 0004-Add-shared-tuplestores.patch

Comments:
- I'd rename mutex to lock. Seems quite possible that we end up with shared
  lockers too.
- Expand "Simple mechanism for sharing tuples between backends." intro
  comment - that doesn't really seem like a meaningful description of
  the mechanism. Should probably mention that it's similar to
  tuplestores etc...
- I'm still concerned about the chunking mechanism. How about this
  sketch of an alternative solution?

  Chunks are always the same length. To avoid having to read the length
  from disk while holding a lock, introduce continuation chunks which
  store the amount of space needed by the overlarge tuple at the
  start. The reading process stays largely the same, except that if a
  backend reads a chunk that's a continuation, it reads the length of
  the continuation and skips ahead. That could mean that more than one
  backend read continuation chunks, but the window is small and there's
  normally not goign to be that many huge tuples (otherwise things are
  slow anyway).
- why are we using a separate hardcoded 32 for sts names? Why not just
  go for NAMEDATALEN or such?
- I'd replace most of the "should's" in comments with "need".

Greetings,

Andres Freund

Re: [HACKERS] Parallel Hash take II

2017-12-01 Thread Andres Freund

Hi,

On 2017-11-27 22:25:12 +1300, Thomas Munro wrote:
Pushed 0003-Add-infrastructure-for-sharing-temporary-files-betwe.patch
after minor adjustments (some including conversing with you).

Changes:
- Changed an impossible elog() into an Assert().
- changed SharedFileSet->counter, and the backend static variable, to
  uint32. Not impossible that a 32bit int overflows over the course of a
  few weeks, and we a) imo shouldn't unnecessarily rely on signed
  overflow being defined b) a negative number would look weird, even if
  well defined (-fwrapv et al).
- Added a small comment about arbitrary-ness of the 8 in Oid
  tablespaces[8].
- pgindent'ed

Questions:
- Right now RemovePgTempFilesInDir() will recurse into appropriately
  named directories, and when it recurses it doesn't require the same
  name pattern checks. I think that's good, but I think it'd be prudent
  to be a bit more paranoid and prevent recursing into symlinked
  subdirectories.
- As we don't clean temp files after crash-restarts it isn't impossible
  to have a bunch of crash-restarts and end up with pids *and* per-pid
  shared file set counters reused. Which'd lead to conflicts. Do we care?

Greetings,

Andres Freund

Re: [HACKERS] pow support for pgbench

2017-12-01 Thread Fabien COELHO



Hello Robert,


The fact that the return type is not consistently of one type bothers
me.  I'm not sure pgbench's expression language is a good place to
runtime polymorphism -- SQL doesn't work that way.


Sure.

Pg has a NUMERIC adaptative precision version, which is cheating, because 
it can return kind of an "int" or a "float", depending on whether there 
are digits after the decimal point or not.


Pgbench does not have support for NUMERIC, just INT & DOUBLE, so the 
current version is an approximation of that.


Now it is always possible to just do DOUBLE version, but this won't match 
SQL behavior either.



+ /*
+  * pow() for integer values with exp >= 0. Matches SQL pow() behaviour
+  */

What's the name of the backend function whose behavior this matches?


POW(numeric,numeric) -> numeric, which matches "numeric_power".

--
Fabien.

Re: Protect syscache from bloating with negative cache entries

2017-12-01 Thread Andres Freund

On 2017-12-01 17:03:28 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-12-01 16:40:23 -0500, Tom Lane wrote:
> >> I have no faith in either of these proposals, because they both assume
> >> that the problem only arises over the course of many minutes.  In the
> >> recent complaint about pg_dump causing relcache bloat, it probably does
> >> not take nearly that long for the bloat to occur.
> 
> > To me that's a bit of a different problem than what I was discussing
> > here.  It also actually doesn't seem that hard - if your caches are
> > growing fast, you'll continually get hash-resizing of the
> > various. Adding cache-pruning to the resizing code doesn't seem hard,
> > and wouldn't add meaningful overhead.
> 
> That's an interesting way to think about it, as well, though I'm not
> sure it's quite that simple.  If you tie this to cache resizing then
> the cache will have to grow up to the newly increased size before
> you'll prune it again.  That doesn't sound like it will lead to nice
> steady-state behavior.

Yea, it's not perfect - but if we do pruning both at resize *and* on
regular intervals, like once-a-minute as I was suggesting, I don't think
it's that bad. The steady state won't be reached within seconds, true,
but the negative consequences of only attempting to shrink the cache
upon resizing when the cache size is growing fast anyway doesn't seem
that large.

I don't think we need to be super accurate here, there just needs to be
*some* backpressure.

I've had cases in the past where just occasionally blasting the cache
away would've been good enough.

Greetings,

Andres Freund

Re: Protect syscache from bloating with negative cache entries

2017-12-01 Thread Tom Lane

Andres Freund  writes:
> On 2017-12-01 16:40:23 -0500, Tom Lane wrote:
>> I have no faith in either of these proposals, because they both assume
>> that the problem only arises over the course of many minutes.  In the
>> recent complaint about pg_dump causing relcache bloat, it probably does
>> not take nearly that long for the bloat to occur.

> To me that's a bit of a different problem than what I was discussing
> here.  It also actually doesn't seem that hard - if your caches are
> growing fast, you'll continually get hash-resizing of the
> various. Adding cache-pruning to the resizing code doesn't seem hard,
> and wouldn't add meaningful overhead.

That's an interesting way to think about it, as well, though I'm not
sure it's quite that simple.  If you tie this to cache resizing then
the cache will have to grow up to the newly increased size before
you'll prune it again.  That doesn't sound like it will lead to nice
steady-state behavior.

regards, tom lane

Re: [HACKERS] proposal: psql command \graw

2017-12-01 Thread Tom Lane

Pavel Stehule  writes:
> 2017-12-01 16:36 GMT+01:00 Robert Haas :
>> I vote to reject this patch.  It doesn't do anything that you can't
>> already do; it just adds some syntactic sugar.  And that syntactic
>> sugar saves only a handful of keystrokes.  If you want unaligned,
>> tuples-only mode, you can set it in 5 keystrokes:

> When you don't want to see column names, then it is true.

If this is "\a\t except we aren't removing column headers", then it
seems like "\graw" is a remarkably misleading name for the functionality.
I'd been assuming, from reading no more than the patch subject line,
was that it was something like "dump the raw field data with no headers,
footers, or separators", with a claimed use-case involving passing bytea
data to some consumer or other.

FWIW, I concur with Robert's assessment, both that the use-case is pretty
thin and that we do not want to get into the business of encouraging a
boatload of \gxxx variants, especially not ones with randomly chosen names.

> For some usage - like printing via gnuplot you would to see column names -
> and it cannot be done with current settings.

That seems like an argument for new pset option(s) to suppress footers
but not headers.  It's not an argument for invoking such options through
a \g variant; that's totally non-orthogonal.

regards, tom lane

Re: [HACKERS] Custom compression methods

2017-12-01 Thread Ildus Kurbangaliev

On Fri, 1 Dec 2017 16:38:42 -0300
Alvaro Herrera  wrote:

> 
> To me it makes sense to say "let's create this method which is for
> data compression" (CREATE ACCESS METHOD hyperz TYPE COMPRESSION)
> followed by either "let's use this new compression method for the
> type tsvector" (ALTER TYPE tsvector SET COMPRESSION hyperz) or "let's
> use this new compression method for the column tc" (ALTER TABLE ALTER
> COLUMN tc SET COMPRESSION hyperz).
> 

Hi, I think if CREATE ACCESS METHOD can be used for compression, then it
could be nicer than CREATE COMPRESSION METHOD. I just don't
know that compression could go as access method or not. Anyway
it's easy to change syntax and I don't mind to do it, if it will be
neccessary for the patch to be commited.

-- 

Regards,
Ildus Kurbangaliev

Re: [HACKERS] Custom compression methods

2017-12-01 Thread Andres Freund

On 2017-12-01 16:14:58 -0500, Robert Haas wrote:
> Honestly, if we can give everybody a 4% space reduction by switching
> to lz4, I think that's totally worth doing -- but let's not make
> people choose it, let's make it the default going forward, and keep
> pglz support around so we don't break pg_upgrade compatibility (and so
> people can continue to choose it if for some reason it works better in
> their use case).  That kind of improvement is nothing special in a
> specific workload, but TOAST is a pretty general-purpose mechanism.  I
> have become, through a few bitter experiences, a strong believer in
> the value of trying to reduce our on-disk footprint, and knocking 4%
> off the size of every TOAST table in the world does not sound
> worthless to me -- even though context-aware compression can doubtless
> do a lot better.

+1. It's also a lot faster, and I've seen way way to many workloads with
50%+ time spent in pglz.

Greetings,

Andres Freund

Re: Protect syscache from bloating with negative cache entries

2017-12-01 Thread Andres Freund

On 2017-12-01 16:40:23 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-12-01 16:20:44 -0500, Robert Haas wrote:
> >> Well, yeah, that would be insane.  But I think even something very
> >> rough could work well enough.  I think our goal should be to eliminate
> >> cache entries that are have gone unused for many *minutes*, and
> >> there's no urgency about getting it to any sort of exact value.  For
> >> non-idle backends, using the most recent statement start time as a
> >> proxy would probably be plenty good enough.  Idle backends might need
> >> a bit more thought.
> 
> > Our timer framework is flexible enough that we can install a
> > once-a-minute timer without much overhead. That timer could increment a
> > 'cache generation' integer. Upon cache access we write the current
> > generation into relcache / syscache (and potentially also plancache?)
> > entries. Not entirely free, but cheap enough. In those once-a-minute
> > passes entries that haven't been touched in X cycles get pruned.
> 
> I have no faith in either of these proposals, because they both assume
> that the problem only arises over the course of many minutes.  In the
> recent complaint about pg_dump causing relcache bloat, it probably does
> not take nearly that long for the bloat to occur.

To me that's a bit of a different problem than what I was discussing
here.  It also actually doesn't seem that hard - if your caches are
growing fast, you'll continually get hash-resizing of the
various. Adding cache-pruning to the resizing code doesn't seem hard,
and wouldn't add meaningful overhead.

Greetings,

Andres Freund

Re: Protect syscache from bloating with negative cache entries

2017-12-01 Thread Andres Freund

On 2017-12-01 16:20:44 -0500, Robert Haas wrote:
> Well, yeah, that would be insane.  But I think even something very
> rough could work well enough.  I think our goal should be to eliminate
> cache entries that are have gone unused for many *minutes*, and
> there's no urgency about getting it to any sort of exact value.  For
> non-idle backends, using the most recent statement start time as a
> proxy would probably be plenty good enough.  Idle backends might need
> a bit more thought.

Our timer framework is flexible enough that we can install a
once-a-minute timer without much overhead. That timer could increment a
'cache generation' integer. Upon cache access we write the current
generation into relcache / syscache (and potentially also plancache?)
entries. Not entirely free, but cheap enough. In those once-a-minute
passes entries that haven't been touched in X cycles get pruned.

Greetings,

Andres Freund

Re: [HACKERS] Custom compression methods

2017-12-01 Thread Tomas Vondra

On 12/01/2017 08:38 PM, Alvaro Herrera wrote:
> Tomas Vondra wrote:
> 
>> On 11/30/2017 09:51 PM, Alvaro Herrera wrote:
> 
>>> Just passing by, but wouldn't this fit in the ACCESS METHOD group of
>>> commands?  So this could be simplified down to
>>> CREATE ACCESS METHOD ts1 TYPE COMPRESSION
>>> we have that for indexes and there are patches flying for heap storage,
>>> sequences, etc.
>>
>> I think that would conflate two very different concepts. In my mind,
>> access methods define how rows are stored.
> 
> In mine, they define how things are accessed (i.e. more general than
> what you're thinking).  We *currently* use them to store rows [in
> indexes], but there is no reason why we couldn't expand that.
> 

Not sure I follow. My argument was not as much about whether the rows
are stored as rows or in some other (columnar) format, but that access
methods deal with "tuples" (i.e. row in the "logical" way). I assume
that even if we end up implementing other access method types, they will
still be tuple-based.

OTOH compression methods (at least as introduced by this patch) operate
on individual values, and have very little to do with access to the
value (in a sense it's a transparent thing).

>
> So we group access methods in "types"; the current type we have is for
> indexes, and methods in that type define how are indexes accessed.  This
> new type would indicate how would values be compressed.  I disagree that
> there is no parallel there.
> 
> I'm trying to avoid pointless proliferation of narrowly defined DDL
> commands.
> 

Of course, the opposite case is using the same DDL for very different
concepts (although I understand you don't see it that way).

But in fairness, I don't really care if we call this COMPRESSION METHOD
or ACCESS METHOD or DARTH VADER ...

>> Furthermore, the "TYPE" in CREATE COMPRESSION method was meant to
>> restrict the compression algorithm to a particular data type (so, if it
>> relies on tsvector, you can't apply it to text columns).
> 
> Yes, of course.  I'm saying that the "datatype" property of a
> compression access method would be declared somewhere else, not in the
> TYPE clause of the CREATE ACCESS METHOD command.  Perhaps it makes sense
> to declare that a certain compression access method is good only for a
> certain data type, and then you can put that in the options clause,
> "CREATE ACCESS METHOD hyperz TYPE COMPRESSION WITH (type = tsvector)".
> But many compression access methods would be general in nature and so
> could be used for many datatypes (say, snappy).
> 
> To me it makes sense to say "let's create this method which is for data
> compression" (CREATE ACCESS METHOD hyperz TYPE COMPRESSION) followed by
> either "let's use this new compression method for the type tsvector"
> (ALTER TYPE tsvector SET COMPRESSION hyperz) or "let's use this new
> compression method for the column tc" (ALTER TABLE ALTER COLUMN tc SET
> COMPRESSION hyperz).
> 

The WITH syntax does not seem particularly pretty to me, TBH. I'd be
much happier with "TYPE tsvector" and leaving WITH for the options
specific to each compression method.

FWIW I think syntax is the least critical part of this patch. It's ~1%
of the patch, and the gram.y additions are rather trivial.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: BUG #14941: Vacuum crashes

2017-12-01 Thread Bossart, Nathan

On 12/1/17, 3:04 PM, "Robert Haas"  wrote:
> On Fri, Dec 1, 2017 at 1:53 PM, Bossart, Nathan  wrote:
>> There is already an internal flag called VACOPT_NOWAIT that gives
>> autovacuum this ability in certain cases (introduced in the aforementioned
>> commit), and I recently brought up this potential use-case as
>> justification for a patch [0].  I'd be happy to submit a patch for
>> providing VACOPT_NOWAIT to users if others think it's something we should
>> do.
>
> Seems entirely reasonable to me, provided that we only update the
> extensible-options syntax:
>
> VACUUM [ ( option [, ...] ) ] [ table_and_columns [, ...] ]
>
> I don't want to add any more options to the older syntax:
>
> VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ table_and_columns [, 
> ...] ]

Right.  This seems like the right path forward to me, as the
VACUUM documentation states that the unparenthesized syntax is
deprecated, and the DISABLE_PAGE_SKIPPING option was not added
to the old syntax, either.

> I am slightly confused as to how we got on to this topic since the
> subject is "Vacuum crashes", but perhaps Lyes was simply speaking
> imprecisely.

I'm hoping Lyes chimes in soon to clarify if I am interpreting
the original report incorrectly.

Nathan

Re: [HACKERS] Custom compression methods

2017-12-01 Thread Robert Haas

On Fri, Dec 1, 2017 at 4:06 PM, Tomas Vondra
 wrote:
>>> I agree with these thoughts in general, but I'm not quite sure
>>> what is your conclusion regarding the patch.
>>
>> I have not reached one. Sometimes I like to discuss problems before
>> deciding what I think. :-)
>
> That's lame! Let's make decisions without discussion ;-)

Oh, right.  What was I thinking?

>> It does seem to me that the patch may be aiming at a relatively narrow
>> target in a fairly large problem space, but I don't know whether to
>> label that as short-sightedness or prudent incrementalism.
>
> I don't know either. I don't think people will start switching their
> text columns to lz4 just because they can, or because they get 4% space
> reduction compared to pglz.

Honestly, if we can give everybody a 4% space reduction by switching
to lz4, I think that's totally worth doing -- but let's not make
people choose it, let's make it the default going forward, and keep
pglz support around so we don't break pg_upgrade compatibility (and so
people can continue to choose it if for some reason it works better in
their use case).  That kind of improvement is nothing special in a
specific workload, but TOAST is a pretty general-purpose mechanism.  I
have become, through a few bitter experiences, a strong believer in
the value of trying to reduce our on-disk footprint, and knocking 4%
off the size of every TOAST table in the world does not sound
worthless to me -- even though context-aware compression can doubtless
do a lot better.

> But the ability to build per-column dictionaries seems quite powerful, I
> guess. And I don't think that can be easily built directly into JSONB,
> because we don't have a way to provide information about the column
> (i.e. how would you fetch the correct dictionary?).

That's definitely a problem, but I think we should mull it over a bit
more before giving up.  I have a few thoughts, but the part of my life
that doesn't happen on the PostgreSQL mailing list precludes
expounding on them right this minute.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Doc tweak for huge_pages?

2017-12-01 Thread Thomas Munro

On Sat, Dec 2, 2017 at 4:08 AM, Peter Eisentraut
 wrote:
> On 11/30/17 23:35, Thomas Munro wrote:
>> On Fri, Dec 1, 2017 at 5:04 PM, Justin Pryzby  wrote:
>>> On Fri, Dec 01, 2017 at 04:01:24PM +1300, Thomas Munro wrote:
 Hi hackers,

 The manual implies that only Linux can use huge pages.  That is not
 true: FreeBSD, Illumos and probably others support larger page sizes
 using transparent page coalescing algorithms.  On my FreeBSD box
 procstat -v often shows PostgreSQL shared buffers in "S"-flagged
 memory.  I think we should adjust the manual to make clear that it's
 the *explicit request for huge pages* that is supported only on Linux
 (and hopefully soon Windows).  Am I being too pedantic?
>>>
>>> I suggest to remove "other" and include Linux in the enumeration, since it 
>>> also
>>> supports "transparent" hugepages.
>>
>> Hmm.  Yeah, it does, but apparently it's not so transparent.  So if we
>> mention that we'd better indicate in the same paragraph that you
>> probably don't actually want to use it.  How about the attached?
>
> Part of the confusion is that the huge_pages setting is only for shared
> memory, whereas the kernel settings affect all memory.

Right.  And more specifically, just the main shared memory area, not
DSM segments.  Updated to make this point.

(I have wondered whether DSM segments should respect this GUC: it
seems plausible that they should when the size is a multiple of the
huge page size, so that very large DSA areas finish up mostly backed
by huge pages, so that very large shared hash tables would benefit
from lower TLB miss rates.  I have only read in an academic paper that
this is a good idea, I haven't investigated whether that would really
help us in practice, and the first problem is that Linux shm_open
doesn't support huge pages anyway so you've need one of the other DSM
implementation options which are currently non-default.)

> Is the same true
> for the proposed Windows patch?

Yes.  It adds a flag to the request for the main shared memory area
(after jumping through various permissions hoops).

-- 
Thomas Munro
http://www.enterprisedb.com

huge-pages-doc-tweak-v3.patch
Description: Binary data

Re: [HACKERS] Custom compression methods

2017-12-01 Thread Robert Haas

On Fri, Dec 1, 2017 at 2:38 PM, Alvaro Herrera  wrote:
> In mine, they define how things are accessed (i.e. more general than
> what you're thinking).  We *currently* use them to store rows [in
> indexes], but there is no reason why we couldn't expand that.
>
> So we group access methods in "types"; the current type we have is for
> indexes, and methods in that type define how are indexes accessed.  This
> new type would indicate how would values be compressed.  I disagree that
> there is no parallel there.

+1.

> I'm trying to avoid pointless proliferation of narrowly defined DDL
> commands.

I also think that's an important goal.

> Yes, of course.  I'm saying that the "datatype" property of a
> compression access method would be declared somewhere else, not in the
> TYPE clause of the CREATE ACCESS METHOD command.  Perhaps it makes sense
> to declare that a certain compression access method is good only for a
> certain data type, and then you can put that in the options clause,
> "CREATE ACCESS METHOD hyperz TYPE COMPRESSION WITH (type = tsvector)".
> But many compression access methods would be general in nature and so
> could be used for many datatypes (say, snappy).
>
> To me it makes sense to say "let's create this method which is for data
> compression" (CREATE ACCESS METHOD hyperz TYPE COMPRESSION) followed by
> either "let's use this new compression method for the type tsvector"
> (ALTER TYPE tsvector SET COMPRESSION hyperz) or "let's use this new
> compression method for the column tc" (ALTER TABLE ALTER COLUMN tc SET
> COMPRESSION hyperz).

+1 to this, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Custom compression methods

2017-12-01 Thread Tomas Vondra

On 12/01/2017 08:20 PM, Robert Haas wrote:
> On Fri, Dec 1, 2017 at 10:18 AM, Tomas Vondra
>  wrote:
>> It has very little impact on this patch, as it has nothing to do with
>> columnar storage. That is, each value is compressed independently.
> 
> I understand that this patch is not about columnar storage, but I 
> think the idea that we may want to operate on the compressed data 
> directly is not only applicable to that case.
> 

Yeah. To clarify, my point was that column stores benefit from
compressing many values at once, and then operating on this compressed
vector. That is not what this patch is doing (or can do), of course.

But I certainly do agree that if the compression can be integrated into
the data type, allowing processing on compressed representation, then
that will beat whatever this patch is doing, of course ...

>>
>> I agree with these thoughts in general, but I'm not quite sure
>> what is your conclusion regarding the patch.
> 
> I have not reached one. Sometimes I like to discuss problems before 
> deciding what I think. :-)
> 

That's lame! Let's make decisions without discussion ;-)

>
> It does seem to me that the patch may be aiming at a relatively narrow
> target in a fairly large problem space, but I don't know whether to
> label that as short-sightedness or prudent incrementalism.
> 

I don't know either. I don't think people will start switching their
text columns to lz4 just because they can, or because they get 4% space
reduction compared to pglz.

But the ability to build per-column dictionaries seems quite powerful, I
guess. And I don't think that can be easily built directly into JSONB,
because we don't have a way to provide information about the column
(i.e. how would you fetch the correct dictionary?).

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: BUG #14941: Vacuum crashes

2017-12-01 Thread Robert Haas

On Fri, Dec 1, 2017 at 1:53 PM, Bossart, Nathan  wrote:
> There is already an internal flag called VACOPT_NOWAIT that gives
> autovacuum this ability in certain cases (introduced in the aforementioned
> commit), and I recently brought up this potential use-case as
> justification for a patch [0].  I'd be happy to submit a patch for
> providing VACOPT_NOWAIT to users if others think it's something we should
> do.

Seems entirely reasonable to me, provided that we only update the
extensible-options syntax:

VACUUM [ ( option [, ...] ) ] [ table_and_columns [, ...] ]

I don't want to add any more options to the older syntax:

VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ table_and_columns [, ...] ]

I am slightly confused as to how we got on to this topic since the
subject is "Vacuum crashes", but perhaps Lyes was simply speaking
imprecisely.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Custom compression methods

2017-12-01 Thread Tomas Vondra



On 12/01/2017 08:48 PM, Alvaro Herrera wrote:
> Ildus Kurbangaliev wrote:
> 
>> If the table is big, decompression could take an eternity. That's why i
>> decided to only to disable it and the data could be decompressed using
>> compression options.
>>
>> My idea was to keep compression options forever, since there will not
>> be much of them in one database. Still that requires that extension is
>> not removed.
>>
>> I will try to find a way how to recompress data first in case it moves
>> to another table.
> 
> I think what you should do is add a dependency between a column that
> compresses using a method, and that method.  So the method cannot be
> dropped and leave compressed data behind.  Since the method is part of
> the extension, the extension cannot be dropped either.  If you ALTER
> the column so that it uses another compression method, then the table is
> rewritten and the dependency is removed; once you do that for all the
> columns that use the compression method, the compression method can be
> dropped.
> 

+1 to do the rewrite, just like for other similar ALTER TABLE commands

>
> Maybe our dependency code needs to be extended in order to support this.
> I think the current logic would drop the column if you were to do "DROP
> COMPRESSION .. CASCADE", but I'm not sure we'd see that as a feature.
> I'd rather have DROP COMPRESSION always fail instead until no columns
> use it.  Let's hear other's opinions on this bit though.
> 

Why should this behave differently compared to data types? Seems quite
against POLA, if you ask me ...

If you want to remove the compression, you can do the SET NOT COMPRESSED
(or whatever syntax we end up using), and then DROP COMPRESSION METHOD.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2017-12-01 Thread Robert Haas

On Fri, Dec 1, 2017 at 1:28 PM, Robert Haas  wrote:
> [ lots of minor comments ]

When I took a break from sitting at the computer, I realized that I
think this has a more serious problem: won't it permanently leak
reference counts if someone hits ^C or an error occurs while the lock
is held?  I think it will -- it probably needs to do cleanup at the
places where we do LWLockReleaseAll() that includes decrementing the
shared refcount if necessary, rather than doing cleanup at the places
we release heavyweight locks.

I might be wrong about the details here -- this is off the top of my head.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Transaction control in procedures

2017-12-01 Thread Robert Haas

On Fri, Dec 1, 2017 at 2:48 PM, Peter Eisentraut
 wrote:
> Here is a new patch, now on top of master.  The main changes are that a
> lot of documentation has been added.

This feature doesn't have many tests.  I think it should have a lot
more of them.  It's tinkering with the transaction control machinery
of the system in a fairly fundamental way, and that could break
things.

I suggest, in particular, testing how it interactions with resources
such as cursors and prepared statements.  For example, what happens if
you commit or roll back inside a cursor-for loop (given that the
cursor is not holdable)?There are several kinds of cursor loops in
PostgreSQL, plus there are cursors, prepared statements, and portals
that can be created using SQL commands or protocol messages.  I
suspect that there are quite a few tricky interactions there.

Other things to think about:

- COMMIT or ROLLBACK inside a PLpgsql block with an attached EXCEPTION
block, or when an SQL SAVEPOINT has been established previously.

- COMMIT or ROLLBACK inside a procedure with a SET clause attached,
and/or while SET LOCAL is in effect either at the inner or outer
level.

- COMMIT or ROLLBACK with open large objects.

- COMMIT inside a procedure fails because of a serialization failure,
deferred constraint, etc.

In some cases, there are not only questions of correctness (it
shouldn't crash/give wrong answers) but also definitional questions
(what exactly should happen in that case, anyway?).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Additional logging for VACUUM and ANALYZE

2017-12-01 Thread Robert Haas

On Fri, Dec 1, 2017 at 11:29 AM, Bossart, Nathan  wrote:
> Thanks for the review, Robert.  I've attached a new version that
> addresses your feedback.

Thanks.  I think this looks fine now, except that (1) it needs a
pgindent run and (2) I vote for putting the test case back.  Michael
thought the test case was too much because this is so obscure, but I
think that's exactly why it needs a test case.  Otherwise, somebody a
few years from now may not even be able to figure out how to hit this
message, and if it gets broken, we won't know.  This code seems to be
fairly easy to break in subtle ways, so I think more test coverage is
good.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Transaction control in procedures

2017-12-01 Thread Peter Eisentraut

On 11/14/17 18:38, Peter Eisentraut wrote:
> On 10/31/17 15:38, Peter Eisentraut wrote:
>> Here is a patch that implements transaction control in PL/Python
>> procedures.  (This patch goes on top of "SQL procedures" patch v1.)
> 
> Here is an updated patch, now on top of "SQL procedures" v2.

Here is a new patch, now on top of master.  The main changes are that a
lot of documentation has been added.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 1a5e5fde2c0da663cc010b3e19418d0b2141304b Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 1 Dec 2017 14:40:29 -0500
Subject: [PATCH v3] Transaction control in PL procedures

In each of the supplied procedural languages (PL/pgSQL, PL/Perl,
PL/Python, PL/Tcl), add language-specific commit and rollback
functions/commands to control transactions in procedures in that
language.  Add similar underlying functions to SPI.  Some additional
cleanup so that transaction commit or abort doesn't blow away data
structures still used by the procedure call.  Add execution context
tracking to CALL and DO statements so that transaction control commands
can only be issued in top-level procedure and block calls, not function
calls or other procedure or block calls.
---
 doc/src/sgml/plperl.sgml  |  49 +
 doc/src/sgml/plpgsql.sgml |  86 +
 doc/src/sgml/plpython.sgml|  34 
 doc/src/sgml/pltcl.sgml   |  37 
 doc/src/sgml/spi.sgml | 219 ++
 src/backend/commands/functioncmds.c   |  30 ++-
 src/backend/executor/spi.c|  92 +++--
 src/backend/tcop/utility.c|   4 +-
 src/backend/utils/mmgr/portalmem.c|  39 ++--
 src/include/commands/defrem.h |   4 +-
 src/include/executor/spi.h|   5 +
 src/include/executor/spi_priv.h   |   4 +
 src/include/nodes/nodes.h |   3 +-
 src/include/nodes/parsenodes.h|   7 +
 src/pl/plperl/GNUmakefile |   2 +-
 src/pl/plperl/SPI.xs  |  12 ++
 src/pl/plperl/expected/plperl_transaction.out |  94 ++
 src/pl/plperl/plperl.c|   6 +
 src/pl/plperl/sql/plperl_transaction.sql  |  88 +
 src/pl/plpgsql/src/pl_exec.c  |  48 -
 src/pl/plpgsql/src/pl_funcs.c |  44 +
 src/pl/plpgsql/src/pl_gram.y  |  34 
 src/pl/plpgsql/src/pl_handler.c   |   8 +
 src/pl/plpgsql/src/pl_scanner.c   |   2 +
 src/pl/plpgsql/src/plpgsql.h  |  22 ++-
 src/pl/plpython/Makefile  |   1 +
 src/pl/plpython/expected/plpython_test.out|   4 +-
 src/pl/plpython/expected/plpython_transaction.out |  96 ++
 src/pl/plpython/plpy_main.c   |   7 +-
 src/pl/plpython/plpy_plpymodule.c |  28 +++
 src/pl/plpython/sql/plpython_transaction.sql  |  80 
 src/pl/tcl/Makefile   |   2 +-
 src/pl/tcl/expected/pltcl_transaction.out |  63 +++
 src/pl/tcl/pltcl.c|  44 +
 src/pl/tcl/sql/pltcl_transaction.sql  |  60 ++
 src/test/regress/expected/plpgsql.out | 110 +++
 src/test/regress/sql/plpgsql.sql  |  95 ++
 37 files changed, 1461 insertions(+), 102 deletions(-)
 create mode 100644 src/pl/plperl/expected/plperl_transaction.out
 create mode 100644 src/pl/plperl/sql/plperl_transaction.sql
 create mode 100644 src/pl/plpython/expected/plpython_transaction.out
 create mode 100644 src/pl/plpython/sql/plpython_transaction.sql
 create mode 100644 src/pl/tcl/expected/pltcl_transaction.out
 create mode 100644 src/pl/tcl/sql/pltcl_transaction.sql

diff --git a/doc/src/sgml/plperl.sgml b/doc/src/sgml/plperl.sgml
index 100162dead..82ddf26606 100644
--- a/doc/src/sgml/plperl.sgml
+++ b/doc/src/sgml/plperl.sgml
@@ -661,6 +661,55 @@ Database Access from PL/Perl
 
 
 
+
+
+ 
+  spi_commit()
+  
+   spi_commit
+   in PL/Perl
+ 
+ 
+ 
+  spi_rollback()
+  
+   spi_rollback
+   in PL/Perl
+  
+ 
+ 
+  
+   Commit or roll back the current transaction.  This can only be called
+   in a procedure or anonymous code block called from the top level.
+   (Note that it is not possible to run the SQL
+   commands COMMIT or ROLLBACK
+   via spi_exec_query or similar.  It has to be done
+   using these functions.)  After a transaction is ended, a new
+   transaction is automatically started, so there is no separate function
+   for that.
+  
+

Re: [HACKERS] Custom compression methods

2017-12-01 Thread Robert Haas

On Fri, Dec 1, 2017 at 10:18 AM, Tomas Vondra
 wrote:
> It has very little impact on this patch, as it has nothing to do with
> columnar storage. That is, each value is compressed independently.

I understand that this patch is not about columnar storage, but I
think the idea that we may want to operate on the compressed data
directly is not only applicable to that case.

> I agree with these thoughts in general, but I'm not quite sure what is
> your conclusion regarding the patch.

I have not reached one.  Sometimes I like to discuss problems before
deciding what I think.  :-)

It does seem to me that the patch may be aiming at a relatively narrow
target in a fairly large problem space, but I don't know whether to
label that as short-sightedness or prudent incrementalism.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Would a BGW need shmem_access or database_connection to enumerate databases?

2017-12-01 Thread Robert Haas

On Fri, Dec 1, 2017 at 10:04 AM, Chapman Flack  wrote:
> Can I call RegisterDynamicBackgroundWorker when not in the postmaster,
> but also not in a "regular backend", but rather another BGW?

I believe that doing it from another BGW works fine.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Partition-wise aggregation/grouping

2017-12-01 Thread Robert Haas

On Fri, Dec 1, 2017 at 7:41 AM, Ashutosh Bapat
 wrote:
> This code creates plans where there are multiple Gather nodes under an Append
> node.

We should avoid that.  Starting and stopping workers is inefficient,
and precludes things like turning the Append into a Parallel Append.

> AFAIU, the workers assigned to one gather node can be reused until that
> Gather node finishes. Having multiple Gather nodes under an Append mean that
> every worker will be idle from the time that worker finishes the work till the
> last worker finishes the work.

No, workers will exit as soon as they finish.  They don't hang around idle.

> index b422050..1941468 100644
> --- a/src/tools/pgindent/typedefs.list
> +++ b/src/tools/pgindent/typedefs.list
> @@ -2345,6 +2345,7 @@ UnlistenStmt
>  UnresolvedTup
>  UnresolvedTupData
>  UpdateStmt
> +UpperPathExtraData
>  UpperRelationKind
>  UpperUniquePath
>  UserAuth
>
> Do we commit this file as part of the feature?

Andres and I regularly commit such changes; Tom rejects them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Runtime Partition Pruning

2017-12-01 Thread Robert Haas

On Fri, Dec 1, 2017 at 6:20 AM, Beena Emerson  wrote:
> David Q1:
> postgres=#  explain analyse execute ab_q1 (3,3); --const
>QUERY PLAN
> -
>  Append  (cost=0.00..43.90 rows=1 width=8) (actual time=0.006..0.006
> rows=0 loops=1)
>->  Seq Scan on ab_a3_b3  (cost=0.00..43.90 rows=1 width=8) (actual
> time=0.005..0.005 rows=0 loops=1)
>  Filter: ((a = 3) AND (b = 3))
>  Planning time: 0.588 ms
>  Execution time: 0.043 ms
> (5 rows)

I think the EXPLAIN ANALYZE input should show something attached to
the Append node so that we can tell that partition pruning is in use.
I'm not sure if that is as simple as "Run-Time Partition Pruning: Yes"
or if we can give a few more useful details.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] pow support for pgbench

2017-12-01 Thread Robert Haas

On Fri, Dec 1, 2017 at 4:57 AM, Raúl Marín Rodríguez
 wrote:
> I've rebased the patch so it can be applied cleanly on top of current
> master.

Please add the new function into the documentation table in alphabetical order.

The fact that the return type is not consistently of one type bothers
me.  I'm not sure pgbench's expression language is a good place to
runtime polymorphism -- SQL doesn't work that way.

+ /*
+  * pow() for integer values with exp >= 0. Matches SQL pow() behaviour
+  */

What's the name of the backend function whose behavior this matches?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Use get_greatest_modulus() in partition_bounds_equal()

2017-12-01 Thread Robert Haas

On Fri, Dec 1, 2017 at 5:13 AM, Ashutosh Bapat
 wrote:
> partition_bounds_equal() accesses the last datum from the given
> partition bounds directly to compare their greatest moduli. Rest of
> the code uses get_greatest_modulus() to get the greatest modulus from
> a partition bound. partition_bounds_equal() should also do the same
> for the sake of consistency and to make sure the in future the code
> remains sane if we change the way we store greatest modulus in
> PartitionBoundInfoData in future.

Makes sense.  Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #14941: Vacuum crashes

2017-12-01 Thread Bossart, Nathan

On 12/1/17, 11:16 AM, "Tomas Vondra"  wrote:
>> The behavior I would like to see is that the void ignores this table and
>> moves to another instead of being blocked. 
>> 
>
> I believe autovacuum should not block waiting for a heavy-weight lock on
> a table since this commit that went into 9.1:
>
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=32896c40ca766146312b28a5a0eb3f66ca0300ed
>
> So I'm wondering what problem you're running into.

It sounds like Lyes is doing a periodic database-wide VACUUM that is
getting blocked on certain relations that are already locked (perhaps
because autovacuum is already working on it).  IIUC, the ask is to provide
a way to skip vacuuming relations that cannot be immediately locked.

There is already an internal flag called VACOPT_NOWAIT that gives
autovacuum this ability in certain cases (introduced in the aforementioned
commit), and I recently brought up this potential use-case as
justification for a patch [0].  I'd be happy to submit a patch for
providing VACOPT_NOWAIT to users if others think it's something we should
do.

As a side note, this seems more like a feature request than a bug report,
so I'm adding pgsql-hackers@ here, too.

Nathan

[0] 
https://www.postgresql.org/message-id/875815E8-7A99-4488-AA07-F254C404E2CF%40amazon.com

Re: Unclear regression test for postgres_fdw

2017-12-01 Thread Robert Haas

On Fri, Dec 1, 2017 at 4:01 AM, Antonin Houska  wrote:
> I see no other problems here.

Committed, thanks for the report and review.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Transform for pl/perl

2017-12-01 Thread Alvaro Herrera

A few very minor comments while quickly paging through:

1. non-ASCII tests are going to cause problems in one platform or
another.  Please don't include that one.

2. error messages
   a) please use ereport() not elog()
   b) conform to style guidelines: errmsg() start with lowercase, others
  are complete phrases (start with uppercase, end with period)
   c) replace
  "The type you was trying to transform can't be represented in JSONB"
  maybe with
  errmsg("could not transform to type \"%s\"", "jsonb"),
  errdetail("The type you are trying to transform can't be represented in 
JSON")
   d) same errmsg() for the other error; figure out suitable errdetail.

3. whitespace: don't add newlines to while, DirectFunctionCallN, pnstrdup.

4. the "relocatability" test seems pointless to me.

5. "#undef _" looks bogus.  Don't do it.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2017-12-01 Thread Robert Haas

On Fri, Dec 1, 2017 at 10:14 AM, Masahiko Sawada  wrote:
> The patched is 10 times faster than current HEAD.

Nifty.

The first hunk in monitoring.sgml looks unnecessary.

The second hunk breaks the formatting of the documentation; you need
to adjust the "morerows" value from 9 to 8 here:

 Lock

And similarly make this one 18:

 IPC

+* Relation extension locks.  The relation extension lock manager is
+specialized in relation extensions. In PostgreSQL 11 relation extension
+lock has been moved out of regular lock. It's similar to regular locks
+but doesn't have full dead lock detection, group locking and multiple
+lock modes. When conflicts occur, lock waits are implemented using
+condition variables.

Higher up, it says that "Postgres uses four types of interprocess
locks", but because you added this, it's now a list of five items.

I suggest moving the section on relation locks to the end and
rewriting the text here as follows: Only one process can extend a
relation at a time; we use a specialized lock manager for this
purpose, which is much simpler than the regular lock manager.  It is
similar to the lightweight lock mechanism, but is ever simpler because
there is only one lock mode and only one lock can be taken at a time.
A process holding a relation extension lock is interruptible, unlike a
process holding an LWLock.

+#define RelExtLockTargetTagToIndex(relextlock_tag) \
+(tag_hash((const void *) relextlock_tag, sizeof(RelExtLockTag)) \
+% N_RELEXTLOCK_ENTS)

How about using a static inline function for this?

+#define SET_RELEXTLOCK_TAG(locktag, d, r) \
+((locktag).dbid = (d), \
+ (locktag).relid = (r))

How about getting rid of this and just doing the assignments instead?

+#define RELEXT_VAL_LOCK ((uint32) ((1 << 25)))
+#define RELEXT_LOCK_MASK((uint32) ((1 << 25)))

It seems confusing to have two macros for the same value and an
almost-interchangeable purpose.  Maybe just call it RELEXT_LOCK_BIT?

+RelationExtensionLockWaiterCount(Relation relation)

Hmm.  This is sort of problematic, because with then new design we
have no guarantee that the return value is actually accurate.  I don't
think that's a functional problem, but the optics aren't great.

+if (held_relextlock.nLocks > 0)
+{
+RelExtLockRelease(held_relextlock.relid, true);
+}

Excess braces.

+int
+RelExtLockHoldingLockCount(void)
+{
+return held_relextlock.nLocks;
+}

Maybe IsAnyRelationExtensionLockHeld(), returning bool?

+/* If the lock is held by me, no need to wait */

If we already hold the lock, no need to wait.

+ * Luckily if we're trying to acquire the same lock as what we
+ * had held just before, we don't need to get the entry from the
+ * array by hashing.

We're not trying to acquire a lock here.  "If the last relation
extension lock we touched is the same one for which we now need to
wait, we can use our cached pointer to the lock instead of recomputing
it."

+registered_wait_list = true;

Isn't it really registered_wait_count?  The only list here is
encapsulated in the CV.

+/* Before retuning, decrement the wait count if we had been waiting */

returning -> returning, but I'd rewrite this as "Release any wait
count we hold."

+ * Acquire the relation extension lock. If we're trying to acquire the same
+ * lock as what already held, we just increment nLock locally and return
+ * without touching the RelExtLock array.

"Acquire a relation extension lock."  I think you can forget the rest
of this; it duplicates comments in the function body.

+ * Since we don't support dead lock detection for relation extension
+ * lock and don't control the order of lock acquisition, it cannot not
+ * happen that trying to take a new lock while holding an another lock.

Since we don't do deadlock detection, caller must not try to take a
new relation extension lock while already holding them.

+if (relid == held_relextlock.relid)
+{
+held_relextlock.nLocks++;
+return true;
+}
+else
+elog(ERROR,
+ "cannot acquire relation extension locks for
multiple relations at the same");

I'd prefer if (relid != held_relextlock.relid) elog(ERROR, ...) to
save a level of indentation for the rest.

+ * If we're trying to acquire the same lock as what we just released
+ * we don't need to get the entry from the array by hashing. we expect
+ * to happen this case because it's a common case in acquisition of
+ * relation extension locks.

"If the last relation extension lock we touched is the same one for we
now need to acquire, we can use our cached pointer to the lock instead
of recomputing it.  This is likely to be a common case in practice."

+/* Could not got the lock, return iff in conditional locking */

"locking conditionally"

+ConditionVariableSleep(&(relextlock->cv),

Re: Transform for pl/perl

2017-12-01 Thread Tom Lane

Robert Haas  writes:
> On Fri, Dec 1, 2017 at 12:30 AM, Michael Paquier
>  wrote:
>> Patch moved to CF 2018-01. Perhaps a committer will look at it at some point.

> FWIW, although I like the idea, I'm not going to work on the patch.  I
> like Perl, but I neither know its internals nor use plperl.  I hope
> someone else will be interested.

I've been assuming (and I imagine other committers think likewise) that
Peter will pick this up at some point, since the whole transform feature
was his work to begin with.  If he doesn't want to touch it, maybe he
should say so explicitly so that other people will feel free to take it.

regards, tom lane

Re: [HACKERS] INSERT ON CONFLICT and partitioned tables

2017-12-01 Thread Robert Haas

On Fri, Dec 1, 2017 at 2:44 AM, Amit Langote
 wrote:
> I forgot to consider the fact that mtstate could be NULL in
> ExecSetupPartitionTupleRouting(), so would result in dereferencing NULL
> pointer when called from CopyFrom(), which fixed in the attached updated
> patch.

a ? b : false can more simply be spelled a && b.

Committed after changing it like that, fixing the broken documentation
build, and making minor edits to the comments and documentation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Issues with logical replication

2017-12-01 Thread Petr Jelinek

On 30/11/17 11:48, Simon Riggs wrote:
> On 30 November 2017 at 11:30, Petr Jelinek  
> wrote:
>> On 30/11/17 00:47, Andres Freund wrote:
>>> On 2017-11-30 00:45:44 +0100, Petr Jelinek wrote:
 I don't understand. I mean sure the SnapBuildWaitSnapshot() can live
 with it, but the problematic logic happens inside the
 XactLockTableInsert() and SnapBuildWaitSnapshot() has no way of
 detecting the situation short of reimplementing the
 XactLockTableInsert() instead of calling it.
>>>
>>> Right. But we fairly trivially can change that. I'm remarking on it
>>> because other people's, not yours, suggestions aimed at making this
>>> bulletproof. I just wanted to make clear that I don't think that's
>>> necessary at all.
>>>
>>
>> Okay, then I guess we are in agreement. I can confirm that the attached
>> fixes the issue in my tests. Using SubTransGetTopmostTransaction()
>> instead of SubTransGetParent() means 3 more ifs in terms of extra CPU
>> cost for other callers. I don't think it's worth worrying about given we
>> are waiting for heavyweight lock, but if we did we can just inline the
>> code directly into SnapBuildWaitSnapshot().
> 
> This will still fail an Assert in TransactionIdIsInProgress() when
> snapshots are overflowed.
> 

Hmm, which one, why?

I see 2 Asserts there, one is:
>   Assert(nxids == 0);
Which is inside the RecoveryInProgress(), surely on standbys there will
still be no PGXACTs with assigned xids so that should be fine.

The other one is:
>   Assert(TransactionIdIsValid(topxid));
Which should be again fine toplevel xid of toplevel xid is same xid
which is a valid one.

So I think we should be fine there.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-12-01 Thread Robert Haas

On Fri, Dec 1, 2017 at 1:36 AM, Rajkumar Raghuwanshi
 wrote:
> On Tue, Oct 31, 2017 at 2:45 PM, Robert Haas  wrote:
>>> OK, committed.  This is a good example of how having good code
>> coverage doesn't necessarily mean you've found all the bugs.  :-)
>>
> As of now partition_join.sql is not having test cases covering cases
> where partition table have default partition, attaching a small test
> case patch to cover those.

That's not that small, and to me it looks like overkill.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] postgres_fdw super user checks

2017-12-01 Thread Robert Haas

On Fri, Dec 1, 2017 at 12:31 AM, Michael Paquier
 wrote:
> I am moving this patch to next CF 2018-01.

There now seems to be a consensus for superuser -> superuser_arg
rather than what Jeff did originally; that approach has 4 votes and
nothing else has more than 1. So, here's a patch that does it that
way.

I tried to see if some documentation update was needed, but I think
the documentation already reflects the proposed new behavior.  It
says:

   
Only superusers may connect to foreign servers without password
authentication, so always specify the password option
for user mappings belonging to non-superusers.
   

Currently, however, that's not accurate.  Right now you need to
specify the password option for user mappings that will be *used by*
non-superusers, not user mappings *belonging to* non-superusers.  So
this patch is, I think, just making the actual behavior match the
documented behavior.  Not sure if anyone has any other suggestions
here.  I think this is definitely a master-only change; should we try
to insert some kind of warning into the back-branch docs?  I
definitely think this should be called out in the v11 release notes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


postgres-fdw-superuser.patch
Description: Binary data

Re: Transform for pl/perl

2017-12-01 Thread Robert Haas

On Fri, Dec 1, 2017 at 12:30 AM, Michael Paquier
 wrote:
> On Tue, Nov 28, 2017 at 5:14 PM, Aleksander Alekseev
>  wrote:
>> The new status of this patch is: Ready for Committer
>
> Patch moved to CF 2018-01. Perhaps a committer will look at it at some point.

FWIW, although I like the idea, I'm not going to work on the patch.  I
like Perl, but I neither know its internals nor use plperl.  I hope
someone else will be interested.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] SQL procedures

2017-12-01 Thread Simon Riggs

On 2 December 2017 at 01:31, Peter Eisentraut
 wrote:
> On 11/30/17 15:50, Thomas Munro wrote:
>> postgres=# \df
>>List of functions
>>  Schema | Name | Result data type | Argument data types | Type
>> +--+--+-+--
>>  public | bar  | integer  | i integer   | func
>>  public | foo  |  | i integer   | proc
>> (2 rows)
>>
>> Should this now be called a "List of routines"?
>
> Maybe, but I hesitate to go around and change all mentions of "function"
> like that.  That might just confuse people.

Agreed

\dfn shows functions only
so we might want to consider having
\df say "List of functions and procedures"
\dfn say "List of functions"

and a new option to list procedures only
\dfp say "List of procedures"

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Constraint exclusion for partitioned tables

2017-12-01 Thread Robert Haas

On Fri, Dec 1, 2017 at 12:21 AM, Michael Paquier
 wrote:
> On Wed, Sep 13, 2017 at 4:07 PM, Ashutosh Bapat
>  wrote:
>> For a partitioned table, this patch saves the time to run constraint
>> exclusion on all the partitions if constraint exclusion succeeds on
>> the partitioned table. If constraint exclusion fails, we have wasted
>> CPU cycles on one run of constraint exclusion. The difference between
>> the time spent in the two scenarios increases with the number of
>> partitions. Practically, users will have a handful partitions rather
>> than a couple and thus running overhead of running constraint
>> exclusion on partitioned table would be justified given the time it
>> will save when CE succeeds.
>
> Moved patch to next CF.

Committed after adding a comment.  Generally, code changes should be
accompanied by comment updates.

I tested this and found out that this is quite useful for cases where
multiple levels of partitioning are in use.  Consider creating 100
partitions like this:

#!/usr/bin/perl

use strict;
use warnings;

print "create table foo (a int, b int, c text) partition by list (a);\n";
for $a (1..10)
{
print "create table foo$a partition of foo for values in ($a)
partition by list (b);\n";
for $b (1..10)
{
print "create table foo${a}_$b partition of foo$a for values
in ($b);\n";
}
}

Then consider this query: select * from foo where a = 5;

Without this patch, we have to reject 90 leaf partitions individually,
but with the patch, we can reject the intermediate partitioned tables;
each time we do, it substitutes for rejecting 10 children
individually.  This seems to me to be a case that is quite likely to
come up in the real world.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] proposal: psql command \graw

2017-12-01 Thread Pavel Stehule

2017-12-01 16:36 GMT+01:00 Robert Haas :

> On Fri, Dec 1, 2017 at 12:16 AM, Michael Paquier
>  wrote:
> > On Sat, Nov 11, 2017 at 12:57 AM, Pavel Stehule 
> wrote:
> >> 2017-11-10 16:38 GMT+01:00 Fabien COELHO :
> >>> So I switched the patch to "ready for committer".
> >>
> >> Thank you very much
> >
> > Patch moved to CF 2018-01 with same status: ready for committer.
>
> I vote to reject this patch.  It doesn't do anything that you can't
> already do; it just adds some syntactic sugar.  And that syntactic
> sugar saves only a handful of keystrokes.  If you want unaligned,
> tuples-only mode, you can set it in 5 keystrokes:
>
> rhaas=# \a\t
> Output format is unaligned.
> Tuples only is on.
>
>
When you don't want to see column names, then it is true.

For some usage - like printing via gnuplot you would to see column names -
and it cannot be done with current settings.



> If you use this command, it takes 4 keystrokes; instead of ending your
> command with a semicolon (1 character) you end it with \graw (5
> characters).
>
> Now, granted, \graw lets you set those options for a single command
> rather than persistently, but I'm just not very interested in having a
> bunch of \g options that enable various combinations of
> options.  Soon we'll have a thicket of \g variants that force
> whichever combinations of options particular developers like to use,
> and if \graw is any indication, the \g variant won't
> necessarily look anything like the normal way of setting those
> options.  And that's not easy to fix, either: \graw could be spelled
> \gat since it forces \a on and \t on, but somebody's bound to
> eventually propose a variant that sets an option that has no
> single-character shorthand.
>
> I'm not going to be bitterly unhappy if somebody else decides to
> commit this, but to me it looks like it gains us very little.
>

This feature marginal, but I don't thing so it is without any value - It is
well usable with gnuplot - and I hope so not only MacOS users will have
terminal with graphic possibilities - and then with this feature can
emulate notebook interface,

Regards

Pavel


>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

builddoc.pl for Windows outdated

2017-12-01 Thread Peter Eisentraut

src/tools/msvc/builddoc.pl and the associated documentation (in
install-windows.sgml) is quite outdated, even for PG10.  If someone has
the required knowledge, please consider supplying an update.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] proposal: psql command \graw

2017-12-01 Thread Robert Haas

On Fri, Dec 1, 2017 at 12:16 AM, Michael Paquier
 wrote:
> On Sat, Nov 11, 2017 at 12:57 AM, Pavel Stehule  
> wrote:
>> 2017-11-10 16:38 GMT+01:00 Fabien COELHO :
>>> So I switched the patch to "ready for committer".
>>
>> Thank you very much
>
> Patch moved to CF 2018-01 with same status: ready for committer.

I vote to reject this patch.  It doesn't do anything that you can't
already do; it just adds some syntactic sugar.  And that syntactic
sugar saves only a handful of keystrokes.  If you want unaligned,
tuples-only mode, you can set it in 5 keystrokes:

rhaas=# \a\t
Output format is unaligned.
Tuples only is on.

If you use this command, it takes 4 keystrokes; instead of ending your
command with a semicolon (1 character) you end it with \graw (5
characters).

Now, granted, \graw lets you set those options for a single command
rather than persistently, but I'm just not very interested in having a
bunch of \g options that enable various combinations of
options.  Soon we'll have a thicket of \g variants that force
whichever combinations of options particular developers like to use,
and if \graw is any indication, the \g variant won't
necessarily look anything like the normal way of setting those
options.  And that's not easy to fix, either: \graw could be spelled
\gat since it forces \a on and \t on, but somebody's bound to
eventually propose a variant that sets an option that has no
single-character shorthand.

I'm not going to be bitterly unhappy if somebody else decides to
commit this, but to me it looks like it gains us very little.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Custom compression methods

2017-12-01 Thread Tomas Vondra

On 12/01/2017 03:23 PM, Robert Haas wrote:
> On Thu, Nov 30, 2017 at 2:47 PM, Tomas Vondra
>  wrote:
>> OK. I think it's a nice use case (and nice gains on the compression
>> ratio), demonstrating the datatype-aware compression. The question is
>> why shouldn't this be built into the datatypes directly?
> 
> Tomas, thanks for running benchmarks of this.  I was surprised to see
> how little improvement there was from other modern compression
> methods, although lz4 did appear to be a modest win on both size and
> speed.  But I share your intuition that a lot of the interesting work
> is in datatype-specific compression algorithms.  I have noticed in a
> number of papers that I've read that teaching other parts of the
> system to operate directly on the compressed data, especially for
> column stores, is a critical performance optimization; of course, that
> only makes sense if the compression is datatype-specific.  I don't
> know exactly what that means for the design of this patch, though.
> 

It has very little impact on this patch, as it has nothing to do with
columnar storage. That is, each value is compressed independently.

Column stores exploit the fact that they get a vector of values,
compressed in some data-aware way. E.g. some form of RLE or dictionary
compression, which allows them to evaluate expressions on the compressed
vector. But that's irrelevant here, we only get row-by-row execution.

Note: The idea to build dictionary for the whole jsonb column (which
this patch should allow) does not make it "columnar compression" in the
"column store" way. The executor will still get the decompressed value.

> As a general point, no matter which way you go, you have to somehow
> deal with on-disk compatibility.  If you want to build in compression
> to the datatype itself, you need to find at least one bit someplace to
> mark the fact that you applied built-in compression.  If you want to
> build it in as a separate facility, you need to denote the compression
> used someplace else.  I haven't looked at how this patch does it, but
> the proposal in the past has been to add a value to vartag_external.

AFAICS the patch does that by setting a bit in the varlena header, and
then adding OID of the compression method after the varlena header. So
you get (verlena header + OID + data).

This has good and bad consequences.

Good: It's transparent for the datatype, so it does not have to worry
about the custom compression at all (and it may change arbitrarily).

Bad: It's transparent for the datatype, so it can't operate directly on
the compressed representation.

I don't think this is an argument against the patch, though. If the
datatype can support intelligent compression (and execution without
decompression), it has to be done in the datatype anyway.

> One nice thing about the latter method is that it can be used for any
> data type generically, regardless of how much bit-space is available
> in the data type representation itself.  It's realistically hard to
> think of a data-type that has no bit space available anywhere but is
> still subject to data-type specific compression; bytea definitionally
> has no bit space but is also can't benefit from special-purpose
> compression, whereas even something like text could be handled by
> starting the varlena with a NUL byte to indicate compressed data
> following.  However, you'd have to come up with a different trick for
> each data type.  Piggybacking on the TOAST machinery avoids that.  It
> also implies that we only try to compress values that are "big", which
> is probably be desirable if we're talking about a kind of compression
> that makes comprehending the value slower. Not all types of
> compression do, cf. commit 145343534c153d1e6c3cff1fa1855787684d9a38,
> and for those that don't it probably makes more sense to just build it
> into the data type.
> 
> All of that is a somewhat separate question from whether we should
> have CREATE / DROP COMPRESSION, though (or Alvaro's proposal of using
> the ACCESS METHOD stuff instead).  Even if we agree that piggybacking
> on TOAST is a good way to implement pluggable compression methods, it
> doesn't follow that the compression method is something that should be
> attached to the datatype from the outside; it could be built into it
> in a deep way.  For example, "packed" varlenas (1-byte header) are a
> form of compression, and the default functions for detoasting always
> produced unpacked values, but the operators for the text data type
> know how to operate on the packed representation.  That's sort of a
> trivial example, but it might well be that there are other cases where
> we can do something similar.  Maybe jsonb, for example, can compress
> data in such a way that some of the jsonb functions can operate
> directly on the compressed representation -- perhaps the number of
> keys is easily visible, for example, or maybe more.  In this view of
> the world, each data type

Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2017-12-01 Thread Masahiko Sawada

On Fri, Dec 1, 2017 at 10:26 AM, Masahiko Sawada  wrote:
> On Fri, Dec 1, 2017 at 3:04 AM, Robert Haas  wrote:
>> On Thu, Nov 30, 2017 at 6:20 AM, Masahiko Sawada  
>> wrote:
 This code ignores the existence of multiple databases; RELEXTLOCK
 contains a relid, but no database OID.  That's easy enough to fix, but
 it actually causes no problem unless, by bad luck, you have two
 relations with the same OID in different databases that are both being
 rapidly extended at the same time -- and even then, it's only a
 performance problem, not a correctness problem.  In fact, I wonder if
 we shouldn't go further: instead of creating these RELEXTLOCK
 structures dynamically, let's just have a fixed number of them, say
 1024.  When we get a request to take a lock, hash  and
 take the result modulo 1024; lock the RELEXTLOCK at that offset in the
 array.
>>>
>>> Attached the latest patch incorporated comments except for the fix of
>>> the memory size for relext lock.
>>
>> It doesn't do anything about the comment of mine quoted above.
>
> Sorry I'd missed the comment.
>
>>   Since it's only possible to hold one relation extension lock at a time, we
>> don't really need the hash table here at all. We can just have an
>> array of 1024 or so locks and map every  pair on to one of
>> them by hashing.  The worst thing we'll get it some false contention,
>> but that doesn't seem awful, and it would permit considerable further
>> simplification of this code -- and maybe make it faster in the
>> process, because we'd no longer need the hash table, or the pin count,
>> or the extra LWLocks that protect the hash table.  All we would have
>> is atomic operations manipulating the lock state, which seems like it
>> would be quite a lot faster and simpler.
>
> Agreed. With this change, we will have an array of the struct that has
> lock state and cv. The lock state has the wait count as well as the
> status of lock.
>
>> BTW, I think RelExtLockReleaseAll is broken because it shouldn't
>> HOLD_INTERRUPTS(); I also think it's kind of silly to loop here when
>> we know we can only hold one lock.  Maybe RelExtLockRelease can take
>> bool force and do if (force) held_relextlock.nLocks = 0; else
>> held_relextlock.nLocks--.  Or, better yet, have the caller adjust that
>> value and then only call RelExtLockRelease() if we needed to release
>> the lock in shared memory.  That avoids needless branching.
>
> Agreed. I'd vote for the latter.
>
>>  On a
>> related note, is there any point in having both held_relextlock.nLocks
>> and num_held_relextlocks?
>
> num_held_relextlocks is actually unnecessary, will be removed.
>
>> I think RelationExtensionLock should be a new type of IPC wait event,
>> rather than a whole new category.
>
> Hmm, I thought the wait event types of IPC seems related to events
> that communicates to other processes for the same purpose, for example
> parallel query, sync repli etc. On the other hand, the relation
> extension locks are one kind of the lock mechanism. That's way I added
> a new category. But maybe it can be fit to the IPC wait event.
>

Attached updated patch. I've done a performance measurement again on
the same configuration as before since the acquiring/releasing
procedures have been changed.

- PATCHED -
tps = 162.579320 (excluding connections establishing)
tps = 162.144352 (excluding connections establishing)
tps = 160.659403 (excluding connections establishing)
tps = 161.213995 (excluding connections establishing)
tps = 164.560460 (excluding connections establishing)
- HEAD -
tps = 157.738645 (excluding connections establishing)
tps = 146.178575 (excluding connections establishing)
tps = 143.788961 (excluding connections establishing)
tps = 144.886594 (excluding connections establishing)
tps = 145.496337 (excluding connections establishing)

* micro-benchmark
PATCHED = 1.61757e+07 (cycles/sec)
HEAD = 1.48685e+06 (cycles/sec)
The patched is 10 times faster than current HEAD.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 8d461c8..7aa7981 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -669,8 +669,8 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   Heavyweight locks, also known as lock manager locks or simply locks,
   primarily protect SQL-visible objects such as tables.  However,
   they are also used to ensure mutual exclusion for certain internal
-  operations such as relation extension.  wait_event will
-  identify the type of lock awaited.
+  operations such as waiting for a transaction to finish.
+  wait_event will identify the type of lock awaited.
  
 
 
@@

Re: [HACKERS] SQL procedures

2017-12-01 Thread Robert Haas

On Fri, Dec 1, 2017 at 9:31 AM, Peter Eisentraut
 wrote:
> On 11/30/17 15:50, Thomas Munro wrote:
>> postgres=# \df
>>List of functions
>>  Schema | Name | Result data type | Argument data types | Type
>> +--+--+-+--
>>  public | bar  | integer  | i integer   | func
>>  public | foo  |  | i integer   | proc
>> (2 rows)
>>
>> Should this now be called a "List of routines"?
>
> Maybe, but I hesitate to go around and change all mentions of "function"
> like that.  That might just confuse people.

Yeah, this is not unlike the problems we have deciding whether to say
"relation" or "table".  It's a problem that comes when most foos are
bars but there are multiple types of exotic foo that are not bars.
That's pretty much the case here -- most functions are probably just
functions, but a few might be procedures or aggregates.  I think
leaving this and similar cases as "functions" is fine.  I wonder
whether it was really necessary for the SQL standards committee (or
Oracle) to invent both procedures and functions to represent very
similar things, but they did.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Doc tweak for huge_pages?

2017-12-01 Thread Peter Eisentraut

On 11/30/17 23:35, Thomas Munro wrote:
> On Fri, Dec 1, 2017 at 5:04 PM, Justin Pryzby  wrote:
>> On Fri, Dec 01, 2017 at 04:01:24PM +1300, Thomas Munro wrote:
>>> Hi hackers,
>>>
>>> The manual implies that only Linux can use huge pages.  That is not
>>> true: FreeBSD, Illumos and probably others support larger page sizes
>>> using transparent page coalescing algorithms.  On my FreeBSD box
>>> procstat -v often shows PostgreSQL shared buffers in "S"-flagged
>>> memory.  I think we should adjust the manual to make clear that it's
>>> the *explicit request for huge pages* that is supported only on Linux
>>> (and hopefully soon Windows).  Am I being too pedantic?
>>
>> I suggest to remove "other" and include Linux in the enumeration, since it 
>> also
>> supports "transparent" hugepages.
> 
> Hmm.  Yeah, it does, but apparently it's not so transparent.  So if we
> mention that we'd better indicate in the same paragraph that you
> probably don't actually want to use it.  How about the attached?

Part of the confusion is that the huge_pages setting is only for shared
memory, whereas the kernel settings affect all memory.  Is the same true
for the proposed Windows patch?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Use of uninitialized variables in ExecFindPartition() for parent partition without leaves (HEAD only)

2017-12-01 Thread Robert Haas

On Thu, Nov 30, 2017 at 11:54 PM, Amit Langote
 wrote:
>> I have added a CF entry for this thread by the way
>> (https://commitfest.postgresql.org/16/1392/), and marked the thing as
>> ready for committer as we agree about the fix. Let's track properly
>> this issue until it gets committed.
>
> Yeah, thanks.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Would a BGW need shmem_access or database_connection to enumerate databases?

2017-12-01 Thread Chapman Flack

On 11/29/2017 05:48 PM, Chapman Flack wrote:
> I'm thinking of writing a background worker that will enumerate
> the databases present, and spin off, for each one, another BGW
> that will establish a connection and do stuff.

Can I even do this?

 "Unlike RegisterBackgroundWorker, which can only be called
  from within the postmaster, RegisterDynamicBackgroundWorker
  must be called from a regular backend."

Can I call RegisterDynamicBackgroundWorker when not in the postmaster,
but also not in a "regular backend", but rather another BGW?

Put another way, can a BGW be regarded as a "regular backend" for the
purpose of this rule?

-Chap

Re: Allowing SSL connection of v11 client to v10 server with SCRAM channel binding

2017-12-01 Thread Peter Eisentraut

On 11/30/17 00:36, Michael Paquier wrote:
> On Wed, Nov 29, 2017 at 1:04 AM, Peter Eisentraut
>  wrote:
>> On 11/22/17 21:08, Michael Paquier wrote:
>>> Yes, agreed. This patch looks good to me. In fe-auth-scram.c, it would
>>> be also nice to add a comment to keep in sync the logics in
>>> build_client_first_message() and build_client_final_message() which
>>> assign the cbind flag value.
>>
>> Could you clarify what comment you would like to have added or changed?
> 
> Sure. Here is with the attached patch what I have in mind. The way
> cbind-flag is assigned in the client-first message should be kept
> in-sync with the way the client-final message builds the binding data
> in c=. It could be possible to add more sanity-checks based on
> assertions by keeping track of the cbind-flag assigned in the
> client-first message as your upthread patch is doing in the backend
> code, but I see a simple comment as a sufficient reminder.

Committed with that comment, thanks.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-12-01 Thread Robert Haas

On Thu, Nov 30, 2017 at 11:39 PM, David Rowley
 wrote:
> I feel like we could do better here with little extra effort. The
> DETACH index feature does not really seem required for this patch.

Because of the dump/restore considerations mentioned in
http://postgr.es/m/ca+tgmobuhghg9v8saswkhbbfywg5a0qb+jgt0uovq5ycbdu...@mail.gmail.com
I believe we need a way to create the index on the parent without
immediately triggering index builds on the children, plus a way to
create an index on a child after-the-fact and attach it to the parent.
Detach isn't strictly required, but why support attach and not detach?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] [POC] Faster processing at Gather node

2017-12-01 Thread Robert Haas

On Sun, Nov 26, 2017 at 3:15 AM, Amit Kapila  wrote:
> Yeah and I think something like that can happen after your patch
> because now the memory for tuples returned via TupleQueueReaderNext
> will be allocated in ExecutorState and that can last for long.   I
> think it is better to free memory, but we can leave it as well if you
> don't feel it important.  In any case, I have written a patch, see if
> you think it makes sense.

Well, I don't really know.  My intuition is that in most cases after
ExecShutdownGatherMergeWorkers() we will very shortly thereafter call
ExecutorEnd() and everything will go away.  Maybe that's wrong, but
Tom put that call where it is in
2d44c58c79aeef2d376be0141057afbb9ec6b5bc, and he could have put it
inside ExecShutdownGatherMergeWorkers() instead.  Now maybe he didn't
consider that approach, but Tom is usually smart about stuff like
that...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling

2017-12-01 Thread Alexey Kondratov

On Fri, Dec 1, 2017 at 1:58 AM, Alvaro Herrera  wrote:
> On a *very* quick look, please use an enum to return from NextCopyFrom
> rather than 'int'.  The chunks that change bool to int are very
> odd-looking.  This would move the comment that explains the value from
> copy.c to copy.h, obviously.  Also, you seem to be using non-ASCII dashes
> in the descriptions of those values; please don't.

I will fix it, thank you.

>
> Or maybe I misunderstood the patch completely.
>

I hope so. Here is my thoughts how it all works, please correct me,
where I am wrong:

1) First, I have simply changed ereport level to WARNING for specific
validations (extra or missing columns, etc) if INGONE_ERRORS option is
used. All these checks are inside NextCopyFrom. Thus, this patch
performs here pretty much the same as before, excepting that it is
possible to skip bad lines, and this part should be safe as well

2) About PG_TRY/CATCH. I use it to catch the only one specific
function call inside NextCopyFrom--it is InputFunctionCall--which is
used just to parse datatype from the input string. I have no idea how
WAL write or trigger errors could get here

All of these is done before actually forming a tuple, putting it into
the heap, firing insert-related triggers, etc. I am not trying to
catch all errors during the row processing, only input data errors. So
why is it unsafe?

Best,

Alexey

Re: [HACKERS] SQL procedures

2017-12-01 Thread Peter Eisentraut

On 11/30/17 15:50, Thomas Munro wrote:
> postgres=# \df
>List of functions
>  Schema | Name | Result data type | Argument data types | Type
> +--+--+-+--
>  public | bar  | integer  | i integer   | func
>  public | foo  |  | i integer   | proc
> (2 rows)
> 
> Should this now be called a "List of routines"?

Maybe, but I hesitate to go around and change all mentions of "function"
like that.  That might just confuse people.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] pg_basebackup --progress output for batch execution

2017-12-01 Thread Peter Eisentraut

On 11/21/17 05:16, Martín Marqués wrote:
> El 21/11/17 a las 04:56, Arthur Zakirov escribió:
>> On Mon, Nov 20, 2017 at 04:45:48PM -0300, Martín Marqués wrote:
>>> New version of patch, without the --batch-mode option and using isatty()
>>>
>>
>> Great!
>>
>>> +   fprintf(stderr, "waiting for checkpoint");
>>> +   if (isatty(fileno(stderr)))
>>> +   fprintf(stderr, "\n");
>>> +   else
>>> +   fprintf(stderr, "\r");
>>
>> Here the condition should be inverted I think. The next condition should be 
>> used:
>>
>>> if (!isatty(fileno(stderr)))
>>> ...
>>
>> Otherwise pg_basebackup will insert '\n' in terminal instead '\r'.
> 
> Ups! Attached the corrected version.:)
> 
> Nice catch. I completely missed that. Thanks.

Committed.

I switched the if logic around even more, so that it is

!   if (isatty(fileno(stderr)))
!   fprintf(stderr, "\r");
!   else
!   fprintf(stderr, "\n");

instead of

!   if (!isatty(fileno(stderr)))
!   fprintf(stderr, "\n");
!   else
!   fprintf(stderr, "\r");


-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Custom compression methods

2017-12-01 Thread Robert Haas

On Thu, Nov 30, 2017 at 2:47 PM, Tomas Vondra
 wrote:
> OK. I think it's a nice use case (and nice gains on the compression
> ratio), demonstrating the datatype-aware compression. The question is
> why shouldn't this be built into the datatypes directly?

Tomas, thanks for running benchmarks of this.  I was surprised to see
how little improvement there was from other modern compression
methods, although lz4 did appear to be a modest win on both size and
speed.  But I share your intuition that a lot of the interesting work
is in datatype-specific compression algorithms.  I have noticed in a
number of papers that I've read that teaching other parts of the
system to operate directly on the compressed data, especially for
column stores, is a critical performance optimization; of course, that
only makes sense if the compression is datatype-specific.  I don't
know exactly what that means for the design of this patch, though.

As a general point, no matter which way you go, you have to somehow
deal with on-disk compatibility.  If you want to build in compression
to the datatype itself, you need to find at least one bit someplace to
mark the fact that you applied built-in compression.  If you want to
build it in as a separate facility, you need to denote the compression
used someplace else.  I haven't looked at how this patch does it, but
the proposal in the past has been to add a value to vartag_external.
One nice thing about the latter method is that it can be used for any
data type generically, regardless of how much bit-space is available
in the data type representation itself.  It's realistically hard to
think of a data-type that has no bit space available anywhere but is
still subject to data-type specific compression; bytea definitionally
has no bit space but is also can't benefit from special-purpose
compression, whereas even something like text could be handled by
starting the varlena with a NUL byte to indicate compressed data
following.  However, you'd have to come up with a different trick for
each data type.  Piggybacking on the TOAST machinery avoids that.  It
also implies that we only try to compress values that are "big", which
is probably be desirable if we're talking about a kind of compression
that makes comprehending the value slower. Not all types of
compression do, cf. commit 145343534c153d1e6c3cff1fa1855787684d9a38,
and for those that don't it probably makes more sense to just build it
into the data type.

All of that is a somewhat separate question from whether we should
have CREATE / DROP COMPRESSION, though (or Alvaro's proposal of using
the ACCESS METHOD stuff instead).  Even if we agree that piggybacking
on TOAST is a good way to implement pluggable compression methods, it
doesn't follow that the compression method is something that should be
attached to the datatype from the outside; it could be built into it
in a deep way.  For example, "packed" varlenas (1-byte header) are a
form of compression, and the default functions for detoasting always
produced unpacked values, but the operators for the text data type
know how to operate on the packed representation.  That's sort of a
trivial example, but it might well be that there are other cases where
we can do something similar.  Maybe jsonb, for example, can compress
data in such a way that some of the jsonb functions can operate
directly on the compressed representation -- perhaps the number of
keys is easily visible, for example, or maybe more.  In this view of
the world, each data type should get to define its own compression
method (or methods) but they are hard-wired into the datatype and you
can't add more later, or if you do, you lose the advantages of the
hard-wired stuff.

BTW, another related concept that comes up a lot in discussions of
this area is that we could do a lot better compression of columns if
we had some place to store a per-column dictionary.  I don't really
know how to make that work.  We could have a catalog someplace that
stores an opaque blob for each column configured to use a compression
method, and let the compression method store whatever it likes in
there.  That's probably fine if you are compressing the whole table at
once and the blob is static thereafter.  But if you want to update
that blob as you see new column values there seem to be almost
insurmountable problems.

To be clear, I'm not trying to load this patch down with a requirement
to solve every problem in the universe.  On the other hand, I think it
would be easy to beat a patch like this into shape in a fairly
mechanical way and then commit-and-forget.  That might be leaving a
lot of money on the table; I'm glad you are thinking about the bigger
picture and hope that my thoughts here somehow contribute.

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Custom compression methods

2017-12-01 Thread Tomas Vondra

On 11/30/2017 09:51 PM, Alvaro Herrera wrote:
> Tomas Vondra wrote:
> 
>> On 11/30/2017 04:20 PM, Ildus Kurbangaliev wrote:
> 
>>> CREATE COMPRESSION METHOD ts1 FOR tsvector HANDLER
>>> tsvector_compression_handler;
>>
>> Understood. Good to know you've considered it, and I agree it doesn't
>> need to be there from the start (which makes the patch simpler).
> 
> Just passing by, but wouldn't this fit in the ACCESS METHOD group of
> commands?  So this could be simplified down to
> CREATE ACCESS METHOD ts1 TYPE COMPRESSION
> we have that for indexes and there are patches flying for heap storage,
> sequences, etc.  I think that's simpler than trying to invent all new
> commands here.  Then (in a future patch) you can use ALTER TYPE to
> define compression for that type, or even add a column-level option to
> reference a specific compression method.
> 

I think that would conflate two very different concepts. In my mind,
access methods define how rows are stored. Compression methods are an
orthogonal concept, e.g. you can compress a value (using a custom
compression algorithm) and store it in an index (using whatever access
method it's using). So not only access methods operate on rows (while
compression operates on varlena values), but you can combine those two
things together. I don't see how you could do that if both are defined
as "access methods" ...

Furthermore, the "TYPE" in CREATE COMPRESSION method was meant to
restrict the compression algorithm to a particular data type (so, if it
relies on tsvector, you can't apply it to text columns). Which is very
different from "TYPE COMPRESSION" in CREATE ACCESS METHOD.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Additional logging for VACUUM and ANALYZE

2017-12-01 Thread Robert Haas

On Thu, Nov 30, 2017 at 10:04 PM, Michael Paquier
 wrote:
> On Wed, Nov 8, 2017 at 12:54 AM, Fabrízio de Royes Mello
>  wrote:
>> Great. Changed status to ready for commiter.
>
> The patch still applies, but no committer seem to be interested in the
> topic, so moved to next CF.

The general idea of this patch seems OK to me.

+rel_lock = true;

I think it would look nicer to initialize this when you declare the
variable, instead of having a separate line of code for that purpose.

+if (IsAutoVacuumWorkerProcess() && params->log_min_duration >= 0)
+elevel = LOG;
+else if (!IsAutoVacuumWorkerProcess())
+elevel = WARNING;
+else
+return;

This can be rewritten with only one call to
IsAutoVacuumWorkerProcess() by reversing the order of the branches.

+PopActiveSnapshot();
+CommitTransactionCommand();
+return false;

vacuum_rel already has too many copies of this logic -- can we try to
avoid duplicating it into two more places?  It seems like you could
easily do that like this:

int elevel = 0;
if (relation != NULL)
{
/* maybe set elevel */
}
if (elevel != 0)
{
if (!rel_lock)
   /* emit message */
else
   /* emit other message */
}

This wouldn't be the first bit of code to assume that elevel == 0 can
be used as a sentinel value meaning "none", so I think it's OK to do
that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] pgbench more operators & functions

2017-12-01 Thread Fabien COELHO




Here is a v13. No code changes, but TAP tests added to maintain pgbench
coverage to green.


Here is a v14, which is just a rebase after the documentation xml-ization.


Regenerated v15 that applies cleanly on head. No changes.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 94b495e..cac257b 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -904,14 +904,32 @@ pgbench  options  d
  
   Sets variable varname to a value calculated
   from expression.
-  The expression may contain integer constants such as 5432,
+  The expression may contain the NULL constant,
+  boolean constants TRUE and FALSE,
+  integer constants such as 5432,
   double constants such as 3.14159,
   references to variables :variablename,
-  unary operators (+, -) and binary operators
-  (+, -, *, /,
-  %) with their usual precedence and associativity,
-  function calls, and
-  parentheses.
+  operators
+  with their usual SQL precedence and associativity,
+  function calls,
+  SQL CASE generic conditional
+  expressions and parentheses.
+ 
+
+ 
+  Functions and most operators return NULL on
+  NULL input.
+ 
+
+ 
+  For conditional purposes, non zero numerical values are
+  TRUE, zero numerical values and NULL
+  are FALSE.
+ 
+
+ 
+  When no final ELSE clause is provided to a
+  CASE, the default value is NULL.
  
 
  
@@ -920,6 +938,7 @@ pgbench  options  d
 \set ntellers 10 * :scale
 \set aid (1021 * random(1, 10 * :scale)) % \
(10 * :scale) + 1
+\set divx CASE WHEN :x  0 THEN :y/:x ELSE NULL END
 
 

@@ -996,6 +1015,177 @@ pgbench  options  d
   
  
 
+ 
+  Built-In Operators
+
+  
+   The arithmetic, bitwise, comparison and logical operators listed in
+are built into pgbench
+   and may be used in expressions appearing in
+   \set.
+  
+
+  
+   pgbench Operators by increasing precedence
+   
+
+ 
+  Operator
+  Description
+  Example
+  Result
+ 
+
+
+ 
+  OR
+  logical or
+  5 or 0
+  TRUE
+ 
+ 
+  AND
+  logical and
+  3 and 0
+  FALSE
+ 
+ 
+  NOT
+  logical not
+  not false
+  TRUE
+ 
+ 
+  IS [NOT] (NULL|TRUE|FALSE)
+  value tests
+  1 is null
+  FALSE
+ 
+ 
+  ISNULL|NOTNULL
+  null tests
+  1 notnull
+  TRUE
+ 
+ 
+  =
+  is equal
+  5 = 4
+  FALSE
+ 
+ 
+  
+  is not equal
+  5  4
+  TRUE
+ 
+ 
+  !=
+  is not equal
+  5 != 5
+  FALSE
+ 
+ 
+  
+  lower than
+  5  4
+  FALSE
+ 
+ 
+  =
+  lower or equal
+  5 = 4
+  FALSE
+ 
+ 
+  
+  greater than
+  5  4
+  TRUE
+ 
+ 
+  =
+  greater or equal
+  5 = 4
+  TRUE
+ 
+ 
+  |
+  integer bitwise OR
+  1 | 2
+  3
+ 
+ 
+  #
+  integer bitwise XOR
+  1 # 3
+  2
+ 
+ 
+  
+  integer bitwise AND
+  1  3
+  1
+ 
+ 
+  ~
+  integer bitwise NOT
+  ~ 1
+  -2
+ 
+ 
+  
+  integer bitwise shift left
+  1  2
+  4
+ 
+ 
+  
+  integer bitwise shift right
+  8  2
+  2
+ 
+ 
+  +
+  addition
+  5 + 4
+  9
+ 
+ 
+  -
+  substraction
+  3 - 2.0
+  1.0
+ 
+ 
+  *
+  multiplication
+  5 * 4
+  20
+ 
+ 
+  /
+  division (integer truncates the results)
+  5 / 3
+  1
+ 
+ 
+  %
+  modulo
+  3 % 2
+  1
+ 
+ 
+  -
+  opposite
+  - 2.0
+  -2.0
+ 
+
+   
+  
+ 
+
  
   Built-In Functions
 
@@ -1042,6 +1232,13 @@ pgbench  options  d
5432.0
   
   
+   exp(x)
+   double
+   exponential
+   exp(1.0)
+   2.718281828459045
+  
+  
greatest(a [, ... ] )
double if any a is double, else integer
largest value among arguments
@@ -1063,6 +1260,20 @@ pgbench  options  d
2.1
   
   
+   ln(x)
+   double
+   natural logarithm
+   ln(2.718281828459045)
+   1.0
+  
+  
+   mod(i, bj)
+   integer
+   modulo
+   mod(54, 32)
+   22
+  
+  
pi()
double
value of the constant PI
diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y
index b3a2d9b..770be98 100644
--- a/src/bin/pgbench/exprparse.y
+++ b/src/bin/pgbench/exprparse.y
@@ -19,13 +19,17 @@
 PgBenchExpr *expr_parse_result;
 
 static PgBenchExprList *make_elist(PgBenchExpr *exp, PgBenchExprList *list);
+static PgBenchExpr *make_null_constant(void);
+static PgBenchExpr *make_boolean_constant(bool bval);
 static PgBenchExpr

Re: [HACKERS] Partition-wise aggregation/grouping

2017-12-01 Thread Ashutosh Bapat

Continuing with review of 0007.

+
+/* Copy input rels's relids to grouped rel */
+grouped_rel->relids = input_rel->relids;

Isn't this done in fetch_upper_rel()? Why do we need it here?
There's also a similar hunk in create_grouping_paths() which doesn't look
appropriate. I guess, you need relids in grouped_rel->relids for FDW. There are
two ways to do this: 1. set grouped_rel->relids for parent upper rel as well,
but then we should pass relids to fetch_upper_rel() instead of setting those
later. 2. For a parent upper rel, in create_foreignscan_plan(), set relids to
all_baserels, if upper_rel->relids is NULL and don't set relids for a parent
upper rel. I am fine with either of those.

+/* partial phase */
+get_agg_clause_costs(root, (Node *) partial_target->exprs,
+ AGGSPLIT_INITIAL_SERIAL,
+ _partial_costs);

IIUC, the costs for evaluating aggregates would not change per child. They
won't be different for parent and any of the children. So, we should be able to
calculate those once, save in "extra" and use repeatedly.

+if (can_sort)
+{
+Path   *path = cheapest_path;
+
+if (!(pathkeys_contained_in(root->group_pathkeys,
+path->pathkeys)))
[ .. clipped patch .. ]
+   NIL,
+   dNumGroups));
+}

We create two kinds of paths partial paths for parallel query and partial
aggregation paths when group keys do not have partition keys. The comments and
code uses partial to mean both the things, which is rather confusing. May be we
should use term "partial aggregation" explicitly wherever it means that in
comments and in variable names.

I still feel that create_grouping_paths() and create_child_grouping_paths()
have a lot of common code. While I can see that there are some pockets in
create_grouping_paths() which are not required in create_child_grouping_paths()
and vice-versa, may be we should create only one function and move those
pockets under "if (child)" or "if (parent)" kind of conditions. It will be a
maintenance burden to keep these two functions in sync in future. If we do not
keep them in sync, that will introduce bugs.

+
+/*
+ * create_partition_agg_paths
+ *
+ * Creates append path for all the partitions and adds into the grouped rel.

I think you want to write "Creates an append path containing paths from all the
child grouped rels and adds into the given parent grouped rel".

+ * For partial mode we need to add a finalize agg node over append path before
+ * adding a path to grouped rel.
+ */
+void
+create_partition_agg_paths(PlannerInfo *root,
+   RelOptInfo *grouped_rel,
+   List *subpaths,
+   List *partial_subpaths,

Why do we have these two as separate arguments? I don't see any call to
create_partition_agg_paths() through add_append_path() passing both of them
non-NULL simultaneously. May be you want use a single list subpaths and another
boolean indicating whether it's list of partial paths or regular paths.

+
+/* For non-partial path, just create a append path and we are done. */
This is the kind of confusion, I am talking about above. Here you have
mentioned "non-partial path" which may mean a regular path but what you
actually mean by that term is a path representing partial aggregates.

+/*
+ * Partial partition-wise grouping paths.  Need to create final
+ * aggregation path over append relation.
+ *
+ * If there are partial subpaths, then we need to add gather path before we
+ * append these subpaths.

More confusion here.

+ */
+if (partial_subpaths)
+{
+ListCell   *lc;
+
+Assert(subpaths == NIL);
+
+foreach(lc, partial_subpaths)
+{
+Path   *path = lfirst(lc);
+doubletotal_groups = path->rows * path->parallel_workers;
+
+/* Create gather paths for partial subpaths */
+Path *gpath = (Path *) create_gather_path(root, grouped_rel, path,
+  path->pathtarget, NULL,
+  _groups);
+
+subpaths = lappend(subpaths, gpath);

Using the argument variable is confusing and that's why you need two different
List variables. Instead probably you could have another variable local to this
function to hold the gather subpaths.

AFAIU, the Gather paths that this code creates has its parent set to
parent grouped
rel. That's not correct. These partial paths come from children of grouped rel
and each gather is producing rows corresponding to one children of grouped rel.
So gather path's parent should be set to corresponding child and not parent
grouped rel.

This code creates plans where there are multiple Gather nodes under an

Re: [HACKERS] Runtime Partition Pruning

2017-12-01 Thread Beena Emerson

Hello,

PFA the updated patch rebased over Amit's v13 patches [1] part of
which is committed. This also fixes few bugs I found.
The InitPlans require execPlan which is not set during ExecInitAppend
and so the evaluation of extern quals is moved from ExecInitAppend to
ExecAppend. This changes the output of explain but only the correct
partition(s) are scanned.

David Q1:
postgres=#  explain analyse execute ab_q1 (3,3); --const
   QUERY PLAN
-
 Append  (cost=0.00..43.90 rows=1 width=8) (actual time=0.006..0.006
rows=0 loops=1)
   ->  Seq Scan on ab_a3_b3  (cost=0.00..43.90 rows=1 width=8) (actual
time=0.005..0.005 rows=0 loops=1)
 Filter: ((a = 3) AND (b = 3))
 Planning time: 0.588 ms
 Execution time: 0.043 ms
(5 rows)

postgres=#  explain analyse execute ab_q1 (3,3); --Param only
ab_a3_b3 plan is executed
   QUERY PLAN
-
 Append  (cost=0.00..395.10 rows=9 width=8) (actual time=0.119..0.119
rows=0 loops=1)
   ->  Seq Scan on ab_a1_b1  (cost=0.00..43.90 rows=1 width=8) (never executed)
 Filter: ((a = $1) AND (b = $2))
   ->  Seq Scan on ab_a1_b2  (cost=0.00..43.90 rows=1 width=8) (never executed)
 Filter: ((a = $1) AND (b = $2))
   ->  Seq Scan on ab_a1_b3  (cost=0.00..43.90 rows=1 width=8) (never executed)
 Filter: ((a = $1) AND (b = $2))
   ->  Seq Scan on ab_a2_b1  (cost=0.00..43.90 rows=1 width=8) (never executed)
 Filter: ((a = $1) AND (b = $2))
   ->  Seq Scan on ab_a2_b2  (cost=0.00..43.90 rows=1 width=8) (never executed)
 Filter: ((a = $1) AND (b = $2))
   ->  Seq Scan on ab_a2_b3  (cost=0.00..43.90 rows=1 width=8) (never executed)
 Filter: ((a = $1) AND (b = $2))
   ->  Seq Scan on ab_a3_b1  (cost=0.00..43.90 rows=1 width=8) (never executed)
 Filter: ((a = $1) AND (b = $2))
   ->  Seq Scan on ab_a3_b2  (cost=0.00..43.90 rows=1 width=8) (never executed)
 Filter: ((a = $1) AND (b = $2))
   ->  Seq Scan on ab_a3_b3  (cost=0.00..43.90 rows=1 width=8) (actual
time=0.006..0.006 rows=0 loops=1)
 Filter: ((a = $1) AND (b = $2))
 Planning time: 0.828 ms
 Execution time: 0.234 ms
(21 rows)


David Q1
postgres=#  explain analyse execute ab_q1 (4); -- Const
  QUERY PLAN
--
 Append  (cost=0.00..49.55 rows=1 width=8) (actual time=0.005..0.005
rows=0 loops=1)
   ->  Seq Scan on ab_a4  (cost=0.00..49.55 rows=1 width=8) (actual
time=0.004..0.004 rows=0 loops=1)
 Filter: ((a >= 4) AND (a <= 5) AND (a = 4))
 Planning time: 0.501 ms
 Execution time: 0.039 ms
(5 rows)

postgres=#  explain analyse execute ab_q1 (4); --Param
  QUERY PLAN
--
 Append  (cost=0.00..99.10 rows=2 width=8) (actual time=0.063..0.063
rows=0 loops=1)
   ->  Seq Scan on ab_a4  (cost=0.00..49.55 rows=1 width=8) (actual
time=0.004..0.004 rows=0 loops=1)
 Filter: ((a >= 4) AND (a <= 5) AND (a = $1))
   ->  Seq Scan on ab_a5  (cost=0.00..49.55 rows=1 width=8) (never executed)
 Filter: ((a >= 4) AND (a <= 5) AND (a = $1))
 Planning time: 0.563 ms
 Execution time: 0.111 ms


I am still working on the patch to add more comments and regression
tests but comments on the code is welcome.

[1]https://www.postgresql.org/message-id/df609168-b7fd-4c0b-e9b2-6e398d411e27%40lab.ntt.co.jp

-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


0001-Implement-runtime-partiton-pruning_v4.patch
Description: Binary data

Re: [HACKERS] pow support for pgbench

2017-12-01 Thread Raúl Marín Rodríguez

Hi Fabien,

The idea is that it would be relative to the "more functions and operators"
> patch, but I guess this is too much for people checking, so I'm fine with
> having it with the current base.
>

I tried applying the last "more functions and operators" patch
(pgbench-more-ops-funcs-14.patch) but it also stopped applying cleanly so I
decided to go for master to avoid having too many issues. Let me know if
you rework that patch and I'll be happy to rebase mine on top.


-- 

*Raúl Marín Rodríguez *carto.com

Use get_greatest_modulus() in partition_bounds_equal()

2017-12-01 Thread Ashutosh Bapat

Hi all,

partition_bounds_equal() accesses the last datum from the given
partition bounds directly to compare their greatest moduli. Rest of
the code uses get_greatest_modulus() to get the greatest modulus from
a partition bound. partition_bounds_equal() should also do the same
for the sake of consistency and to make sure the in future the code
remains sane if we change the way we store greatest modulus in
PartitionBoundInfoData in future.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
From 6aba3444b75f3420af52f602d7a2f53da8da0847 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Fri, 1 Dec 2017 15:36:56 +0530
Subject: [PATCH] Use get_greatest_modulus() in partition_bounds_equal()

partition_bounds_equal() accesses the last datum from the given
partition bounds directly to compare their greatest moduli. Rest of
the code uses get_greatest_modulus() to get the greatest modulus from
a partition bound. Above comparison should also do the same for the
sake of consistency and to make sure the in future the code remains
sane if we change the way we store greatest modulus in
PartitionBoundInfoData.

Ashutosh Bapat.
---
 src/backend/catalog/partition.c |9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 2bf8117..dd4a8d3 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -751,15 +751,13 @@ partition_bounds_equal(int partnatts, int16 *parttyplen, bool *parttypbyval,
 
 	if (b1->strategy == PARTITION_STRATEGY_HASH)
 	{
-		int			greatest_modulus;
+		int			greatest_modulus = get_greatest_modulus(b1);
 
 		/*
 		 * If two hash partitioned tables have different greatest moduli,
-		 * their partition schemes don't match.  For hash partitioned table,
-		 * the greatest modulus is given by the last datum and number of
-		 * partitions is given by ndatums.
+		 * their partition schemes don't match.
 		 */
-		if (b1->datums[b1->ndatums - 1][0] != b2->datums[b2->ndatums - 1][0])
+		if (greatest_modulus != get_greatest_modulus(b2))
 			return false;
 
 		/*
@@ -773,7 +771,6 @@ partition_bounds_equal(int partnatts, int16 *parttyplen, bool *parttypbyval,
 		 * their indexes array will be same.  So, it suffices to compare
 		 * indexes array.
 		 */
-		greatest_modulus = get_greatest_modulus(b1);
 		for (i = 0; i < greatest_modulus; i++)
 			if (b1->indexes[i] != b2->indexes[i])
 return false;
-- 
1.7.9.5

Re: [HACKERS] pow support for pgbench

2017-12-01 Thread Raúl Marín Rodríguez

Hi,

I've rebased the patch so it can be applied cleanly on top of current
master.

On Fri, Dec 1, 2017 at 3:55 AM, Michael Paquier 
wrote:

> On Tue, Nov 7, 2017 at 1:34 AM, Fabien COELHO  wrote:
> > Let's now hope that a committer gets around to consider these patch some
> > day.
>
> Which is not the case yet, so moved to CF 2018-01. Please note that
> the patch proposed does not apply anymore, so its status is changed to
> "waiting on author" for a rebase.
> --
> Michael
>



-- 

*Raúl Marín Rodríguez *carto.com
From b12a7a37b1af1ede1aa9d5d88d0918808c54e19f Mon Sep 17 00:00:00 2001
From: Raul Marin 
Date: Fri, 1 Dec 2017 10:49:17 +0100
Subject: [PATCH] Add pow() support to pgbench

---
 doc/src/sgml/ref/pgbench.sgml|  7 
 src/bin/pgbench/exprparse.y  |  3 ++
 src/bin/pgbench/pgbench.c| 62 
 src/bin/pgbench/pgbench.h|  3 +-
 src/bin/pgbench/t/001_pgbench_with_server.pl | 16 ++-
 5 files changed, 89 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 94b495e606..769e604dd6 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -1099,6 +1099,13 @@ pgbench  options  d
sqrt(2.0)
1.414213562
   
+  
+   pow(x, y)
+   integer if x and y are integers and y >= 0, else double
+   Numeric exponentiation
+   pow(2.0, 10)
+   1024.0
+  
  
  

diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y
index b3a2d9bfd3..4db3b215ab 100644
--- a/src/bin/pgbench/exprparse.y
+++ b/src/bin/pgbench/exprparse.y
@@ -191,6 +191,9 @@ static const struct
 	{
 		"random_exponential", 3, PGBENCH_RANDOM_EXPONENTIAL
 	},
+	{
+		"pow", 2, PGBENCH_POW
+	},
 	/* keep as last array element */
 	{
 		NULL, 0, 0
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index bd96eae5e6..5a5197cf63 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -746,6 +746,27 @@ getPoissonRand(TState *thread, int64 center)
 	return (int64) (-log(uniform) * ((double) center) + 0.5);
 }
 
+ /*
+  * pow() for integer values with exp >= 0. Matches SQL pow() behaviour
+  */
+static int64
+ipow(int64 base, int64 exp)
+{
+	int64 result = 1;
+
+	Assert(exp >= 0);
+
+	while (exp)
+	{
+		if (exp & 1)
+			result *= base;
+		exp >>= 1;
+		base *= base;
+	}
+
+	return result;
+}
+
 /*
  * Initialize the given SimpleStats struct to all zeroes
  */
@@ -1673,6 +1694,47 @@ evalFunc(TState *thread, CState *st,
 return true;
 			}
 
+		case PGBENCH_POW:
+			{
+PgBenchValue *lval = [0];
+PgBenchValue *rval = [1];
+
+Assert(nargs == 2);
+
+/*
+ * If both operands are int and exp >= 0 use
+ * the ipow() function, else use pow()
+ */
+if (lval->type == PGBT_INT &&
+	 rval->type == PGBT_INT)
+{
+
+	int64		li,
+ri;
+
+	if (!coerceToInt(lval, ) ||
+		!coerceToInt(rval, ))
+		return false;
+
+	if (ri >= 0)
+		setIntValue(retval, ipow(li, ri));
+	else
+		setDoubleValue(retval, pow(li, ri));
+}
+else
+{
+	double		ld,
+rd;
+
+	if (!coerceToDouble(lval, ) ||
+		!coerceToDouble(rval, ))
+		return false;
+
+	setDoubleValue(retval, pow(ld, rd));
+}
+return true;
+			}
+
 		default:
 			/* cannot get here */
 			Assert(0);
diff --git a/src/bin/pgbench/pgbench.h b/src/bin/pgbench/pgbench.h
index fd428af274..e0132b5fcf 100644
--- a/src/bin/pgbench/pgbench.h
+++ b/src/bin/pgbench/pgbench.h
@@ -75,7 +75,8 @@ typedef enum PgBenchFunction
 	PGBENCH_SQRT,
 	PGBENCH_RANDOM,
 	PGBENCH_RANDOM_GAUSSIAN,
-	PGBENCH_RANDOM_EXPONENTIAL
+	PGBENCH_RANDOM_EXPONENTIAL,
+	PGBENCH_POW
 } PgBenchFunction;
 
 typedef struct PgBenchExpr PgBenchExpr;
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index c095881312..fcb30cdde5 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -231,7 +231,14 @@ pgbench(
 		qr{command=18.: double 18\b},
 		qr{command=19.: double 19\b},
 		qr{command=20.: double 20\b},
-		qr{command=21.: int 9223372036854775807\b}, ],
+		qr{command=21.: int 9223372036854775807\b},
+		qr{command=23.: int -27\b},
+		qr{command=24.: double 1024\b},
+		qr{command=25.: int 1\b},
+		qr{command=26.: double 1\b},
+		qr{command=27.: double -0.125\b},
+		qr{command=28.: double -0.125\b},
+],
 	'pgbench expressions',
 	{   '001_pgbench_expressions' => q{-- integer functions
 \set i1 debug(random(1, 100))
@@ -261,6 +268,13 @@ pgbench(
 \set maxint debug(:minint - 1)
 -- reset a variable
 \set i1 0
+--- pow() operator
+\set poweri debug(pow(-3,3))
+\set powerd debug(pow(2.0,10))
+\set poweriz debug(pow(0,0))
+\set powerdz debug(pow(0.0,0.0))
+\set powernegi debug(pow(-2,-3))
+\set

Re: Unclear regression test for postgres_fdw

2017-12-01 Thread Antonin Houska

Jeevan Chalke  wrote:

> On Thu, Nov 30, 2017 at 3:44 PM, Jeevan Chalke 
>  wrote:
> 
>  On Thu, Nov 30, 2017 at 1:36 AM, Antonin Houska  wrote:
> 
>  The following test
> 
>  -- Input relation to aggregate push down hook is not safe to pushdown and 
> thus
>  -- the aggregate cannot be pushed down to foreign server.
>  explain (verbose, costs off)
>  select count(t1.c3) from ft1 t1, ft1 t2 where t1.c1 = 
> postgres_fdw_abs(t1.c2);
> 
>  produces the following plan
> 
>  QUERY PLAN
>  
> --
>  Aggregate
>  Output: count(t1.c3)
>  -> Nested Loop
>  Output: t1.c3
>  -> Foreign Scan on public.ft1 t2
>  Remote SQL: SELECT NULL FROM "S 1"."T 1"
>  -> Materialize
>  Output: t1.c3
>  -> Foreign Scan on public.ft1 t1
>  Output: t1.c3
>  Remote SQL: SELECT c3 FROM "S 1"."T 1" WHERE (("C 1" = 
> public.postgres_fdw_abs(c2)))
> 
>  which is not major problem as such, but gdb shows that the comment "aggregate
>  cannot be pushed" is not correct. In fact, postgresGetForeignUpperPaths()
>  *does* create the upper path.
> 
>  The reason that UPPERREL_GROUP_AGG is eventually not used seems to be that
>  postgresGetForeignJoinPaths() -> add_foreign_grouping_paths() ->
>  estimate_path_cost_size() estimates the join cost in rather generic way. 
> While
>  the remote server can push the join clause down to the inner relation of NL,
>  the postgres_fdw cost computation assumes that the join clause is applied to
>  each pair of output and input tuple.
> 
>  I don't think that the postgres_fdw's estimate can be fixed easily, but if 
> the
>  impact of "shipability" on (not) using the upper relation should be tested, 
> we
>  need a different test.
> 
>  Oops. My bad.
>  Agree with your analysis.
>  Will send a patch fixing this testcase.
> 
> Attached patch to fix the test case. In new test case I am using a JOIN
> query where JOIN condition is not safe to push down and hence the JOIN
> itself is unsafe.

I've just verified that postgresGetForeignUpperPaths() does return here:

/*
 * If input rel is not safe to pushdown, then simply return as we cannot
 * perform any post-join operations on the foreign server.
 */
if (!input_rel->fdw_private ||
!((PgFdwRelationInfo *) input_rel->fdw_private)->pushdown_safe)
return;

I see no other problems here. You may need to add the patch to the next CF so
it does not get lost.

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at

Re: [HACKERS] [PATCH] Incremental sort

2017-12-01 Thread Antonin Houska

Alexander Korotkov  wrote:

> On Wed, Nov 22, 2017 at 1:22 PM, Antonin Houska  wrote:
> 
>  Alexander Korotkov  wrote:
> 
>  > Antonin Houska  wrote:
> 
>  > > * ExecIncrementalSort()
>  > >
>  > > ** if (node->tuplesortstate == NULL)
>  > >
>  > > If both branches contain the expression
>  > >
>  > > node->groupsCount++;
>  > >
>  > > I suggest it to be moved outside the "if" construct.
>  >
>  > Done.
> 
>  One more comment on this: I wonder if the field isn't incremented too
>  early. It seems to me that the value can end up non-zero if the input set is
>  to be empty (not sure if it can happen in practice).
> 
> That happens in practice. On empty input set, incremental sort would count 
> exactly one group.
> 
> # create table t (x int, y int);
> CREATE TABLE
> # create index t_x_idx on t (x);
> CREATE INDEX
> # set enable_seqscan = off;
> SET
> # explain (analyze, buffers) select * from t order by x, y;
> QUERY PLAN
> -
> Incremental Sort (cost=0.74..161.14 rows=2260 width=8) (actual 
> time=0.024..0.024 rows=0 loops=1)
> Sort Key: x, y
> Presorted Key: x
> Sort Method: quicksort Memory: 25kB
> Sort Groups: 1
> Buffers: shared hit=1
> -> Index Scan using t_x_idx on t (cost=0.15..78.06 rows=2260 width=8) (actual 
> time=0.011..0.011 rows=0 loops=1)
> Buffers: shared hit=1
> Planning time: 0.088 ms
> Execution time: 0.066 ms
> (10 rows)
> But from prospective of how code works, it's really 1 group. Tuple sort was 
> defined, inserted no tuples, then sorted and got no tuples out of there. So, 
> I'm not sure if it's really incorrect...

I expected the number of groups actually that actually appear in the output,
you consider it the number of groups started. I can't find similar case
elsewhere in the code (e.g. Agg node does not report this kind of
information), so I have no clue. Someone else will have to decide.

> 
> But there is IncrementalSort node on the remote side.
> Let's see what happens. Idea of "CROSS JOIN, not pushed down" test is that 
> cross join with ORDER BY LIMIT is not beneficial to push down, because LIMIT 
> is not pushed down and remote side wouldn't be able to use top-N heapsort. 
> But if remote side has incremental sort then it can be
> used, and fetching first 110 rows is cheap. Let's see plan of original "CROSS 
> JOIN, not pushed down" test with incremental sort.
> 
> # EXPLAIN (ANALYZE, VERBOSE) SELECT t1.c3, t2.c3 FROM ft1 t1 CROSS JOIN ft2 
> t2 ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;

ok, understood, thanks. Perhaps it's worth a comment in the test script.

I'm afraid I still see a problem. The diff removes a query that (although a
bit different from the one above) lets the CROSS JOIN to be pushed down and
does introduce the IncrementalSort in the remote database. This query is
replaced with one that does not allow for the join push down.

*** a/contrib/postgres_fdw/sql/postgres_fdw.sql
--- b/contrib/postgres_fdw/sql/postgres_fdw.sql
*** SELECT t1.c1 FROM ft1 t1 WHERE NOT EXIST
*** 510,517 
  SELECT t1.c1 FROM ft1 t1 WHERE NOT EXISTS (SELECT 1 FROM ft2 t2 WHERE t1.c1 = 
t2.c2) ORDER BY t1.c1 OFFSET 100 LIMIT 10;
  -- CROSS JOIN, not pushed down
  EXPLAIN (VERBOSE, COSTS OFF)
! SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1, t2.c1 
OFFSET 100 LIMIT 10;
! SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1, t2.c1 
OFFSET 100 LIMIT 10;
  -- different server, not pushed down. No result expected.
  EXPLAIN (VERBOSE, COSTS OFF)
  SELECT t1.c1, t2.c1 FROM ft5 t1 JOIN ft6 t2 ON (t1.c1 = t2.c1) ORDER BY 
t1.c1, t2.c1 OFFSET 100 LIMIT 10;
--- 510,517 
  SELECT t1.c1 FROM ft1 t1 WHERE NOT EXISTS (SELECT 1 FROM ft2 t2 WHERE t1.c1 = 
t2.c2) ORDER BY t1.c1 OFFSET 100 LIMIT 10;
  -- CROSS JOIN, not pushed down
  EXPLAIN (VERBOSE, COSTS OFF)
! SELECT t1.c3, t2.c3 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c3, t2.c3 
OFFSET 100 LIMIT 10;
! SELECT t1.c3, t2.c3 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c3, t2.c3 
OFFSET 100 LIMIT 10;
  -- different server, not pushed down. No result expected.
  EXPLAIN (VERBOSE, COSTS OFF)
  SELECT t1.c1, t2.c1 FROM ft5 t1 JOIN ft6 t2 ON (t1.c1 = t2.c1) ORDER BY 
t1.c1, t2.c1 OFFSET 100 LIMIT 10;

Shouldn't the test contain *both* cases?

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at

66 matches

Mail list logo