date:20170924

Re: [HACKERS] Setting pd_lower in GIN metapage

2017-09-24 Thread Michael Paquier

On Mon, Sep 25, 2017 at 2:26 PM, Amit Kapila  wrote:
> I understand that and I think there are always multiple ways to write
> same information.  It might be better to pass this patch series to
> committer if you don't see any mistake because he might anyway change
> some comments before committing.

Yeah, agreed. I don't mind doing that as well honestly.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables in VACUUM commands

2017-09-24 Thread Michael Paquier

On Sat, Sep 23, 2017 at 12:56 AM, Bossart, Nathan  wrote:
> On 9/21/17, 9:55 PM, "Michael Paquier"  wrote:
>> I still think that ExecVacuum() should pass a list of VacuumRelation
>> objects to vacuum(), and get_rel_oids() should take in input a list,
>> and return a completed lists. This way the decision-making of doing
>> everything in the same transaction should happens once in vacuum().
>> And actually, if several relations are defined with VACUUM, your patch
>> would not use one transaction per table as use_own_xacts would be set
>> to false. I think that Tom meant that relations whose processed has
>> finished have to be committed immediately. Per your patch, the commit
>> happens once all relations are committed.
>
> Sorry, I must have misunderstood.  I've attached an updated patch that
> looks more like what you described.  I also cleaned up the test cases
> a bit.
>
> IIUC the only time use_own_xacts will be false is when we are only
> doing ANALYZE and at least one of the following is true:
>
> 1. We are in a transaction block.
> 2. We are processing only one relation.

Yes.

> From the code, it appears that vacuum_rel() always starts and commits a
> new transaction for each relation:
>
>  * vacuum_rel expects to be entered with no transaction active; it 
> will
>  * start and commit its own transaction.  But we are called by an SQL

Yes.

> So, by ensuring that get_rel_oids() returns a list whenever multiple
> tables are specified, we are making sure that commands like
>
> ANALYZE table1, table2, table3;
>
> create transactions for each processed relation (as long as they are
> not inside a transaction block).

Yes.

> I suppose the alternative would be
> to call vacuum() for each relation and to remove the restriction that
> we must be processing more than one relation for use_own_xacts to be
> true.

The main point of my comment is that like ExecVacuum(), vacuum()
should be a high-level function where is decided if multiple
transactions should be run or not. By calling vacuum() multiple times
you break this promise. vacuum_rel should be the one working with
individual transactions.

Here is the diff between v19 and v21 that matters here:
/* Now go through the common routine */
-   if (vacstmt->rels == NIL)
-   vacuum(vacstmt->options, NULL, , NULL, isTopLevel);
-   else
-   {
-   ListCell *lc;
-   foreach(lc, vacstmt->rels)
-   vacuum(vacstmt->options, lfirst_node(VacuumRelation, lc),
-  , NULL, isTopLevel);
-   }
+   vacuum(vacstmt->options, vacstmt->rels, , NULL, isTopLevel);
If you do so, for an ANALYZE with multiple relations you would finish
by using the same transaction for all relations. I think that we had
better be consistent with VACUUM when not using an outer transaction
so as tables are analyzed and committed one by one. This does not
happen here: a unique transaction is used when using a list of
non-partitioned tables.

On Sun, Sep 24, 2017 at 4:37 AM, Bossart, Nathan  wrote:
> Here is a version of the patch without the switch to AutovacMemCxt in
> autovacuum_do_vac_analyze(), which should no longer be necessary after
> 335f3d04.

Thanks for the new version.

+   if (!IsAutoVacuumWorkerProcess())
+   ereport(WARNING,
+ (errmsg("skipping \"%s\" --- relation no longer exists",
+ relation->relname)));
I like the use of WARNING here, but we could use as well a LOG to be
consistent when a lock obtention is skipped.

+* going to commit this transaction and begin a new one between now
+* and then.
+*/
+   relid = RangeVarGetRelid(relinfo->relation, NoLock, false);
We will likely have to wait that the matters discussed in
https://www.postgresql.org/message-id/25023.1506107...@sss.pgh.pa.us
are settled.

+VACUUM FULL vactst, vactst, vactst, vactst;
This is actually a waste of cycles.

I think I don't have much other comments about this patch.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Setting pd_lower in GIN metapage

2017-09-24 Thread Amit Kapila

On Mon, Sep 25, 2017 at 10:13 AM, Michael Paquier
 wrote:
> On Sun, Sep 24, 2017 at 2:25 PM, Amit Kapila  wrote:
>> Added and updated the comments for both btree and hash index patches.
>
> I don't have real complaints about this patch, this looks fine to me.
>
> +* Currently, the advantage of setting pd_lower is in limited cases like
> +* during wal_consistency_checking or while logging for unlogged relation
> +* as for all other purposes, we initialize the metapage.  Note, it also
> +* helps in page masking by allowing to mask unused space.
> I would have reworked this comment a bit, say like that:
> Setting pd_lower is useful for two cases which make use of WAL
> compressibility even if the meta page is initialized at replay:
> - Logging of init forks for unlogged relations.
> - wal_consistency_checking logs extra full-page writes, and this
> allows masking of the unused space of the page.
>
> Now I often get complains that I suck at this exercise ;)
>

I understand that and I think there are always multiple ways to write
same information.  It might be better to pass this patch series to
committer if you don't see any mistake because he might anyway change
some comments before committing.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rethinking autovacuum.c memory handling

2017-09-24 Thread Michael Paquier

On Sun, Sep 24, 2017 at 2:28 AM, Tom Lane  wrote:
> Michael Paquier  writes:
>> I have spent some time looking at your patch and testing it. This
>> looks sane. A small comment that I have would be to add an assertion
>> at the top of perform_work_item to be sure that it is called in the
>> memory context of AutovacMemCxt.
>
> Done like that, thanks for reviewing!

Thanks for considering my idea.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

2017-09-24 Thread Haribabu Kommi

On Mon, Sep 25, 2017 at 4:39 AM, Tomas Vondra 
wrote:

>
>
> On 09/06/2017 09:45 AM, Haribabu Kommi wrote:
> >
> >
> > On Tue, Jul 25, 2017 at 9:33 PM, Tomas Vondra
> > >
> wrote:
> >
> > On 7/25/17 12:55 AM, Tom Lane wrote:
> >
> > Tomas Vondra  > > writes:
> >
> > It seems to me that VACUUM and ANALYZE somewhat disagree on
> what
> > exactly reltuples means. VACUUM seems to be thinking that
> > reltuples
> > = live + dead while ANALYZE apparently believes that
> reltuples =
> > live
> >
> >
> > The question is - which of the reltuples definitions is the
> > right
> > one? I've always assumed that "reltuples = live + dead" but
> > perhaps
> > not?
> >
> >
> > I think the planner basically assumes that reltuples is the live
> > tuple count, so maybe we'd better change VACUUM to get in step.
> >
> >
> > Attached is a patch that (I think) does just that. The disagreement
> > was caused by VACUUM treating recently dead tuples as live, while
> > ANALYZE treats both of those as dead.
> >
> > At first I was worried that this will negatively affect plans in the
> > long-running transaction, as it will get underestimates (due to
> > reltuples not including rows it can see). But that's a problem we
> > already have anyway, you just need to run ANALYZE in the other
> session.
> >
> >
> > Thanks for the patch.
> > From the mail, I understand that this patch tries to improve the
> > reltuples value update in the catalog table by the vacuum command
> > to consider the proper visible tuples similar like analyze command.
> >
> > -num_tuples);
> > +num_tuples - nkeep);
> >
> > With the above correction, there is a problem in reporting the number
> > of live tuples to the stats.
> >
> > postgres=# select reltuples, n_live_tup, n_dead_tup
> >   from pg_stat_user_tables join pg_class using (relname)
> >  where relname = 't';
> >  reltuples | n_live_tup | n_dead_tup
> > ---++
> > 899818 | 799636 | 100182
> > (1 row)
> >
> >
> > The live tuples data value is again decremented with dead tuples
> > value before sending them to stats in function lazy_vacuum_rel(),
> >
> > /* report results to the stats collector, too */
> > new_live_tuples = new_rel_tuples - vacrelstats->new_dead_tuples;
> >
> > The fix needs a correction here also. Or change the correction in
> > lazy_vacuum_rel() function itself before updating catalog table similar
> > like stats.
> >
>
> Ah, haven't noticed that for some reason - you're right, we estimate the
> reltuples based on (num_tuples - nkeep), so it doesn't make much sense
> to subtract nkeep again. Attached is v2 of the fix.
>
> I've removed the subtraction from lazy_vacuum_rel(), leaving just
>
> new_live_tuples = new_rel_tuples;
>
> and now it behaves as expected (no second subtraction). That means we
> can get rid of new_live_tuples altogether (and the protection against
> negative values), and use new_rel_tuples directly.
>
> Which pretty much means that in this case
>
> (pg_class.reltuples == pg_stat_user_tables.n_live_tup)
>
> but I guess that's fine, based on the initial discussion in this thread.


The changes are fine and now it reports proper live tuples in both
pg_class and stats. The other issue of continuous vacuum operation
leading to decrease of number of live tuples is not related to this
patch and that can be handled separately.

I changed the patch status as ready for committer.

Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] Setting pd_lower in GIN metapage

2017-09-24 Thread Michael Paquier

On Sun, Sep 24, 2017 at 2:25 PM, Amit Kapila  wrote:
> Added and updated the comments for both btree and hash index patches.

I don't have real complaints about this patch, this looks fine to me.

+* Currently, the advantage of setting pd_lower is in limited cases like
+* during wal_consistency_checking or while logging for unlogged relation
+* as for all other purposes, we initialize the metapage.  Note, it also
+* helps in page masking by allowing to mask unused space.
I would have reworked this comment a bit, say like that:
Setting pd_lower is useful for two cases which make use of WAL
compressibility even if the meta page is initialized at replay:
- Logging of init forks for unlogged relations.
- wal_consistency_checking logs extra full-page writes, and this
allows masking of the unused space of the page.

Now I often get complains that I suck at this exercise ;)
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] visual studio 2017 build support

2017-09-24 Thread Haribabu Kommi

On Fri, Sep 22, 2017 at 10:40 PM, Andrew Dunstan <
andrew.duns...@2ndquadrant.com> wrote:

>
>
> On 09/21/2017 08:16 PM, Haribabu Kommi wrote:
> >
> >
> > I was about to commit this after a good bit of testing when I
> > noticed this:
> >
> > +   Building with Visual Studio 2017
> is
> > supported
> > +   down to Windows 7 SP1 and
> Windows
> > Server 2012 R2.
> >
> > I was able to build on Windows Server 2008 without a problem, so I'm
> > curious why we are saying it's not supported.
> >
> >
> > Thanks for the review.
> >
> > From the visual studio system requirements [1], in the section of
> > supported
> > operating systems, it is mentioned as windows 7 SP1 and windows server
> > 2012 R2 and didn't mentioned anything about 2008, because of this reason,
> > I mentioned as that it supported till the above operating systems. As
> > I don't
> > have windows server 2008 system availability, so I didn't verify the
> same.
> >
> > The visual studio 2017 product itself is not mentioned as that it
> supports
> > windows server 2008, can we go ahead and mention it in our documentation?
> >
> > [1] -
> > https://www.visualstudio.com/en-us/productinfo/vs2017-
> system-requirements-vs
> >
> >
>
> That page also says:
>
>
> Microsoft Visual Studio Build Tools 2017
>
> Also installs on Windows Server 2008 R2 SP1
>
>
> So I'm inclined to adjust the documentation accordingly.


Thanks for pointing it out, I missed to check the Build tools support
section.
Here I attached the updated patch with the change in documentation to
include the 2008 R2 SP1 operating system also.

Regards,
Hari Babu
Fujitsu Australia


0001-Support-of-PostgreSQL-build-with-visual-studio-2017_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page Scan Mode in Hash Index

2017-09-24 Thread Amit Kapila

On Fri, Sep 22, 2017 at 11:56 PM, Robert Haas  wrote:
> On Thu, Sep 21, 2017 at 3:08 AM, Ashutosh Sharma  
> wrote:
>> I have added a note for handling of logged and unlogged tables in
>> README file and also corrected the header comment for
>> hashbucketcleanup(). Please find the attached 0003*.patch having these
>> changes. Thanks.
>
> I committed 0001 and 0002 with some additional edits as
> 7c75ef571579a3ad7a1d3ee909f11dba5e0b9440.
>

I have noticed a typo in that commit (in Readme) and patch for the
same is attached.

>  I also rebased 0003 and
> edited it a bit; see attached hash-cleanup-changes.patch.
>
> I'm not entirely sold on 0003.  An alternative would be to rip the lsn
> stuff back out of HashScanPosData, and I think we ought to consider
> that.  Basically, 0003 is betting that getting rid of the
> lock-chaining in hash index vacuum is more valuable than being able to
> kill dead items more aggressively.  I bet that's a bad bet.
>
> In the case of btree indexes, since
> 2ed5b87f96d473962ec5230fd820abfeaccb2069, page-at-a-time scanning
> allows most btree index scans to avoid holding buffer pins when the
> scan is suspended, but we gain no such advantage here.  We always have
> to hold a pin on the primary bucket page anyway, so even with this
> patch cleanup is going to block when it hits a bucket containing a
> suspended scan.  0003 helps if (a) the relation is permanent, (b) the
> bucket has overflow pages, and (c) the scan is moving faster than
> vacuum and can overtake it instead of waiting.  But that doesn't seem
> like it will happen very often at all, whereas the LSN check will
> probably fail frequently and cause us to skip cleanup that we could
> usefully have done.  So I propose the attached hashscan-no-lsn.patch
> as an alternative.
>

I think your proposal makes sense.  Your patch looks good but you
might want to tweak the comments atop _hash_kill_items ("However,
having pin on the overflow page doesn't guarantee that vacuum won't
delete any items.).  That part of the comment has been written to
indicate that we have to check LSN in this function unconditonally.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


typo_hash_readme_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Patch to address concerns about ICU collcollate stability in v10 (Was: CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?)

2017-09-24 Thread Peter Geoghegan

On Fri, Sep 22, 2017 at 11:34 AM, Peter Eisentraut
 wrote:
> After reviewing this thread and testing around a bit, I think the code
> is mostly fine as it is, but the documentation is lacking.  Hence,
> attached is a patch to expand the documentation quite a bit, especially
> to document in more detail what ICU locale strings are accepted.

Attached is my counterproposal. This patch has us always make sure
that collcollate is in BCP 47 format, even on ICU versions prior to
ICU 54.

Other improvements/bug fixes:

* Sanitizes locale names within CREATE COLLATION. This has been tested
on ICU 42 (earliest supported version) and ICU 55.

* Enforces a NAMEDATALEN restriction for collcollate during CREATE
COLLATION, forbidding collcollate truncation. This is useful because
truncating can allow subtly wrong answers later on.

* Adds DEBUG1 message with ICU display name, so we can satisfy
ourselves that we're getting the expected behavior, to some degree.

I used this to confirm that we get consistent behavior between ICU 42
and 55 for CREATE COLLATION. On ICU 42, keyword collation attributes
(e.g., the emoji keyword, numeric ordering for natural sort order)
still don't work, just as before, but the locale string is still
considered valid. (This is because ucol_setAttribute() is supposed to
be used there).

* Documents the aforementioned keyword collation attribute restriction
on ICU versions before ICU 54. This was needed anyway. We only claim
for Postgres collations what the ICU docs claim for ICU collators,
even though there is reason to believe that some ICU versions before
ICU 54 actually can do better.

* When using ICU 4.2, the examples in the docs (variant collations
like German Phonebook order) now actually work.

The examples are completely broken right now IMV, because the user has
to know that they are on ICU 4.2, which only accepts the old style
locale strings as things stand. And, they'd have no obvious indication
that things were broken without this patch, because there would have
been no sanitization or other feedback.

* Creates root collation as root-x-icu (collcollate "root"), not
und-x-icu. "und" means undefined language.

* Moves all knowledge of ICU version issues to within a few
pg_locale.c routines, leaving the code reasonably well encapsulated.

* Does an encoding conversion when getting a display name for the
initdb collation comment. This needs to be ascii-safe due to the
initdb/template0 DB encoding restriction, but I suspect that the way
we do it now is subtly wrong. This does imply that SQL_ASCII databases
will never get ICU pg_collation entries that come with comments added
by initdb, but such databases were already unable to use the
collations anyway, so this is no loss at all. (SQL_ASCII is mentioned
here as an example of a database collation that ICU doesn't support,
all of which are affected in this way.)

I decided to implement CREATE COLLATION sanitization based on whether
or not a display string can be generated (if not, or if it's empty, we
reject). This seems like about the right level of strictness to me,
because we're still very forgiving. I admit that that's a little bit
arbitrary, but it does seem to be a good match for Postgres; it is
forgiving of things that could plausibly make sense on another ICU
version to some user at some time, but rejects most things that are
inherently wrong, IMHO. You can still ask for Japanese as spoken in
Madagascar, or even specify a language that ICU has never heard of,
and there is no error. It catches syntax errors only. See the slightly
expanded tests for details. I'm very open to negotiating the exact
details of how we sanitize, but any level of sanitization will be at
least a little bit arbitrary (including what we have right now, which
is no sanitization).

Aside from the specific problems for Postgres that I've noted that the
patch prevents or fixes, there is another reason to do this. The old
style locale name format is officially deprecated by ICU, which makes
it seem like we should never expose it to users in the first place.
Per ICU docs:

"Starting with ICU 54, the following naming scheme and its API
functions are deprecated. Use ucol_open() with language tag collation
keywords instead (see Collation API Details)" [1]

[1] 
http://userguide.icu-project.org/collation/concepts#TOC-Collator-naming-scheme
-- 
Peter Geoghegan
From 989bc2f877aa01af4ac140a43a5cebcffe8b3ec9 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Thu, 21 Sep 2017 17:34:13 -0700
Subject: [PATCH] Consistently canonicalize ICU collations' collcollate as BCP
 47.

Previously, on versions of ICU prior to ICU 54 collcollate was stored in
the legacy locale format at initdb time.  We now always use the BCP 47
format there.  To make that work, collcollate is now converted from BCP
47 format to the old locale tag format on-the-fly as an ICU collator is
initially opened within a backend (though only on versions of ICU

Re: [HACKERS] What's with all the fflush(stderr) calls in pg_standby.c?

2017-09-24 Thread Andres Freund

On 2017-09-25 10:01:35 +0900, Michael Paquier wrote:
> On Mon, Sep 25, 2017 at 9:45 AM, Robert Haas  wrote:
> > On a related note, the idea of removing pg_standby altogether has been
> > proposed a few times.

Including recently by me 
http://archives.postgresql.org/message-id/20170913064824.rqflkadxwpboabgw%40alap3.anarazel.de

> > Apparently there are a few things that it still
> > does better than standby_mode, but nobody seems in a hurry to do
> > anything about that.  Still, I'd be against spending a lot of time
> > trying to improve a tool that has mostly outlived its usefulness - we
> > ought to be trying to enhance the in-core facilities instead.
> 
> +1.

It's also pretty crummy code that has no test coverage. I'd just remove
it.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shaky coding for vacuuming partitioned relations

2017-09-24 Thread Michael Paquier

On Sat, Sep 23, 2017 at 4:13 AM, Tom Lane  wrote:
> Somebody inserted this into vacuum.c's get_rel_oids():
>
> tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
> if (!HeapTupleIsValid(tuple))
> elog(ERROR, "cache lookup failed for relation %u", relid);
>
> apparently without having read the very verbose comment two lines above,
> which points out that we're not taking any lock on the target relation.
> So, if that relation is concurrently being dropped, you're likely to
> get "cache lookup failed for relation " rather than anything more
> user-friendly.

This has been overlooked during the lookups of 3c3bb99, and by
multiple people including me. elog() should never be things users can
face, as well as cache lookups.

> A minimum-change fix would be to replace the elog() with an ereport
> that produces the same "relation does not exist" error you'd have
> gotten from RangeVarGetRelid, had the concurrent DROP TABLE committed
> a few microseconds earlier.  But that feels like its's band-aiding
> around the problem.

Yeah, that's not right. This is a cache lookup error at the end.

> What I'm wondering about is changing the RangeVarGetRelid call to take
> ShareUpdateExclusiveLock rather than no lock.  That would protect the
> syscache lookup, and it would also make the find_all_inheritors call
> a lot more meaningful.
>
> If we're doing a VACUUM, the ShareUpdateExclusiveLock would be dropped
> as soon as we close the caller's transaction, and then we'd acquire
> the same or stronger lock inside vacuum_rel().  So that seems fine.
> If we're doing an ANALYZE, then the lock would continue to be held
> and analyze_rel would merely be acquiring it an extra time, so we'd
> actually be removing a race-condition failure scenario for ANALYZE.
> This would mean a few more cycles in lock management, but since this
> only applies to a manual VACUUM or ANALYZE that specifies a table
> name, I'm not too concerned about that.

I think that I am +1 on that. Testing such a thing I am not seeing
anything wrong either. The call to find_all_inheritors should also use
ShareUpdateExclusiveLock, and the lock on top of RangeVarGetRelid()
needs to be reworked.

Attached is a proposal of patch.

> Thoughts?

As long as I don't forget... Another thing currently on HEAD and
REL_10_STABLE is that OIDs of partitioned tables are used, but the
RangeVar of the parent is used for error reports. This leads to
incorrect reports if a partition gets away in the middle of autovacuum
as only information about the parent is reported to the user. I am not
saying that this needs to be fixed in REL_10_STABLE at this stage
though as this would require some refactoring similar to what the
patch adding support for VACUUM with multiple relations does. But I
digress here.
-- 
Michael


vacuum-partition-locks.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] What's with all the fflush(stderr) calls in pg_standby.c?

2017-09-24 Thread Michael Paquier

On Mon, Sep 25, 2017 at 9:45 AM, Robert Haas  wrote:
> On a related note, the idea of removing pg_standby altogether has been
> proposed a few times.  Apparently there are a few things that it still
> does better than standby_mode, but nobody seems in a hurry to do
> anything about that.  Still, I'd be against spending a lot of time
> trying to improve a tool that has mostly outlived its usefulness - we
> ought to be trying to enhance the in-core facilities instead.

+1.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix number skipping in to_number

2017-09-24 Thread Nathan Wagner

On Thu, Aug 17, 2017 at 12:33:02PM +0100, Oliver Ford wrote:

> Ok I've made that change in the attached v3. I'm not sure as I'm on
> en_US.UTF-8 locale too. Maybe something Windows specific?

This patch applies against master (8485a25a), compiles, and
passes a make check.

I tested both on my mac laptop, and my linux server.

If we want this patch, I'd say it's ready for committer.  We may want
(and I can't believe I'm saying this) more discussion as to exactly what
the strategy for to_number() (and friends) is.  Do we want to duplicate
Oracle's functionality, or do we want a similar function to do similar
things, without necessarily having a goal of identical behavior to
oracle?

For myself, I pretty much never use the to_date, to_number, or
to_timestamp functions except when porting oracle code.  I do use the
to_char functions on occasion.  If strftime were available, I probably
wouldn't use them.

I would commit this patch and update the TODO with a goal of making
to_number as Oracle compatible as is reasonable.

-- 
nw

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] What's with all the fflush(stderr) calls in pg_standby.c?

2017-09-24 Thread Robert Haas

On Sun, Sep 24, 2017 at 11:51 AM, Tom Lane  wrote:
> This looks like cargo-cult programming to me.  stderr is almost
> always line-buffered, making these fflush'es pointless.  If it's
> not line-buffered, that's probably because it's going to a
> noninteractive destination for which this wouldn't matter.
> Moreover, none of our other programs do this.

On a related note, the idea of removing pg_standby altogether has been
proposed a few times.  Apparently there are a few things that it still
does better than standby_mode, but nobody seems in a hurry to do
anything about that.  Still, I'd be against spending a lot of time
trying to improve a tool that has mostly outlived its usefulness - we
ought to be trying to enhance the in-core facilities instead.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #14825: enum type: unsafe use?

2017-09-24 Thread Andrew Dunstan



On 09/24/2017 07:06 PM, Tom Lane wrote:
>
> So I think we should just stop with the blacklist test for v10,
> and then see if we still get complaints (and exactly what they're
> about) so that we can judge how much more work the problem deserves.
> It's still ahead of where we were in previous releases, and ahead of
> where we'd be if we end up reverting the patch altogether.
>
>


That's pretty much what I was saying.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DROP SUBSCRIPTION hangs if sub is disabled in the same transaction

2017-09-24 Thread Masahiko Sawada

On Sat, Sep 23, 2017 at 4:06 AM, Peter Eisentraut
 wrote:
> On 9/21/17 04:43, Masahiko Sawada wrote:
>>> Once we got this patch, DROP SUBSCRIPTION is not transactional either
>>> if dropping a replication slot or if the subscription got disabled in
>>> a transaction block. But we disallow to do DROP SUBSCRIPTION in a
>>> transaction block only in the former case. In the latter case, we
>>> adopted such non-transactional behaviour. Since these behaviours would
>>> be complex for users I attached the documentation patch explaining it.
>>>
>> Hmm, isn't there necessary to care and mention about this kind of
>> inconsistent behavior in docs?
>
> I have added documentation that certain forms of CREATE/DROP
> SUBSCRIPTION cannot be run inside a transaction block, which we have
> done for other such commands.

Thank you!

> I don't think we need to go into further detail.  We don't guarantee
> continuous connections.  If a worker is stopped and restarted in the
> background as an implementation detail, then the user is not impacted.

Agreed.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #14825: enum type: unsafe use?

2017-09-24 Thread Tom Lane

Andrew Dunstan  writes:
> On 09/24/2017 04:37 PM, Tom Lane wrote:
>> What we still need to debate is whether to remove the heuristic
>> type-is-from-same-transaction test, making the user-visible behavior
>> simply "you must commit an ALTER TYPE ADD VALUE before you can use the
>> new value".  I'm kind of inclined to do so; the fuzzy (and inadequately
>> documented) behavior we'll have if we keep it doesn't seem very nice to
>> me.

> I'd rather not. The failure cases are going to be vanishingly small, I
> suspect, and we've already discussed how we might improve that test. If
> you want to put some weasel words in the docs that might be ok.

I'm unconvinced.  We get enough complaints about heuristic behaviors
we have elsewhere.  Also, if we ship it like this, we're going to
have backward compatibility concerns if we try to change the behavior
later.  Now admittedly, the next step forward might well be an exact
solution which would necessarily take every case the heuristic allows
--- but I don't want to box us into having to support exactly the
cases the heuristic would allow.  And I don't want to have to
document which those are, either.

Basically, I don't think anyone's shown an important use case that
wouldn't be covered by "committed or not blacklisted".  That fixes
the original complaint that you couldn't do ALTER ADD VALUE in a
transaction block at all, and with or without the heuristic test,
you can't use the added value without committing.  The case not
covered is where an enum type is built with multiple commands in a
single transaction --- which might be of value, but since it doesn't
work for every such case, we don't know if the heuristic is really
going to provide useful value-add or not.

So I think we should just stop with the blacklist test for v10,
and then see if we still get complaints (and exactly what they're
about) so that we can judge how much more work the problem deserves.
It's still ahead of where we were in previous releases, and ahead of
where we'd be if we end up reverting the patch altogether.

Or in short: having been burned by this heuristic already, I want
it out of there.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create().

2017-09-24 Thread Thomas Munro

On Thu, Aug 17, 2017 at 11:39 AM, Thomas Munro
 wrote:
> On Thu, Jun 29, 2017 at 12:24 PM, Thomas Munro
>  wrote:
>> fallocate-v5.patch
>
> Added to commitfest so we don't lose track of this.

Rebased due to collision with recent configure.in adjustments.  I also
wrote a commit message and retested with create-dsm-test.patch (from
upthread).

So, do we want this patch?  It's annoying to expend cycles doing this,
but it only really hurts if you allocate a lot of DSM space that you
never actually use.  If that ever becomes a serious problem, perhaps
that'll be a sign that we should be reusing the space between queries
anyway?

-- 
Thomas Munro
http://www.enterprisedb.com

fallocate-v6.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #14825: enum type: unsafe use?

2017-09-24 Thread Andrew Dunstan



On 09/24/2017 04:37 PM, Tom Lane wrote:
> Andrew Dunstan  writes:
>> OK, here's the finished patch. It has a pretty small footprint all
>> things considered, and I think it guarantees that nothing that could be
>> done in this area in 9.6 will be forbidden. That's probably enough to
>> get us to 10 without having to revert the whole thing, ISTM, and we can
>> leave any further refinement to the next release.
> I think this could do with some more work on the comments and test cases,
> but it's basically sound.
>
> What we still need to debate is whether to remove the heuristic
> type-is-from-same-transaction test, making the user-visible behavior
> simply "you must commit an ALTER TYPE ADD VALUE before you can use the
> new value".  I'm kind of inclined to do so; the fuzzy (and inadequately
> documented) behavior we'll have if we keep it doesn't seem very nice to
> me.
>
>   



I'd rather not. The failure cases are going to be vanishingly small, I
suspect, and we've already discussed how we might improve that test. If
you want to put some weasel words in the docs that might be ok.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH : Generational memory allocator (was PATCH: two slab-like memory allocators)

2017-09-24 Thread Simon Riggs

On 24 September 2017 at 21:32, Tomas Vondra
 wrote:

> Attached is an updated version of the patch, tweaking the comments.

That looks good, thanks. Marking Ready for Committer to give notice
before commit.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #14825: enum type: unsafe use?

2017-09-24 Thread Tom Lane

Andrew Dunstan  writes:
> OK, here's the finished patch. It has a pretty small footprint all
> things considered, and I think it guarantees that nothing that could be
> done in this area in 9.6 will be forbidden. That's probably enough to
> get us to 10 without having to revert the whole thing, ISTM, and we can
> leave any further refinement to the next release.

I think this could do with some more work on the comments and test cases,
but it's basically sound.

What we still need to debate is whether to remove the heuristic
type-is-from-same-transaction test, making the user-visible behavior
simply "you must commit an ALTER TYPE ADD VALUE before you can use the
new value".  I'm kind of inclined to do so; the fuzzy (and inadequately
documented) behavior we'll have if we keep it doesn't seem very nice to
me.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] comments improvements

2017-09-24 Thread Erik Rijkers


comments improvements--- src/backend/optimizer/prep/prepunion.c.orig	2017-09-24 17:40:34.888790877 +0200
+++ src/backend/optimizer/prep/prepunion.c	2017-09-24 17:41:39.796748743 +0200
@@ -2413,7 +2413,7 @@
  * 		Find AppendRelInfo structures for all relations specified by relids.
  *
  * The AppendRelInfos are returned in an array, which can be pfree'd by the
- * caller. *nappinfos is set to the the number of entries in the array.
+ * caller. *nappinfos is set to the number of entries in the array.
  */
 AppendRelInfo **
 find_appinfos_by_relids(PlannerInfo *root, Relids relids, int *nappinfos)
--- src/test/regress/sql/triggers.sql.orig	2017-09-24 17:40:45.760783805 +0200
+++ src/test/regress/sql/triggers.sql	2017-09-24 17:41:33.448752854 +0200
@@ -1409,7 +1409,7 @@
 --
 -- Verify behavior of statement triggers on partition hierarchy with
 -- transition tables.  Tuples should appear to each trigger in the
--- format of the the relation the trigger is attached to.
+-- format of the relation the trigger is attached to.
 --
 
 -- set up a partition hierarchy with some different TupleDescriptors

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH : Generational memory allocator (was PATCH: two slab-like memory allocators)

2017-09-24 Thread Tomas Vondra

Hi,

Attached is an updated version of the patch, tweaking the comments.

1) I've added a section at the end of src/backend/utils/mmgr/README,
briefly explaining the alternative memory allocators we have. I don't
think we should get into too much low-level detail here, that's more
appropriate for the .c file for each context.

2) I've slightly reworded a paragraph in generation.c describing what
use cases are suitable for the memory context. It used to say:

   This memory context is based on the assumption that the allocated
   chunks have similar lifespan, i.e. that chunks allocated close from
   each other (by time) will also be freed in close proximity, and
   mostly in the same order. This is typical for various queue-like use
   cases, i.e. when tuples are constructed, processed and then thrown
   away.

and now it says:

   This memory context is based on the assumption that the chunks are
   freed roughly in the same order as they were allocated (FIFO), or in
   groups with similar lifespan (generations - hence the name of the
   context). This is typical for various queue-like use cases, i.e. when
   tuples are constructed, processed and then thrown away.

3) I've also added a brief note into reorderbuffer.c mentioning that it
uses SlabContext and GenerationContext. As I explained, I don't think we
should include details about how we tested the patch or whatever here.

regard

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 25806c68a05287f3294f2d8508bd45599232f67b Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Sun, 24 Sep 2017 22:19:17 +0200
Subject: [PATCH] Generational memory allocator

This memory context is based on the assumption that the allocated chunks
have similar lifespan, i.e. that chunks allocated close from each other
(by time) will also be freed in close proximity, and mostly in the same
order. This is typical for various queue-like use cases, i.e. when
tuples are constructed, processed and then thrown away.

The memory context uses a very simple approach to free space management.
Instead of a complex global freelist, each block tracks a number
of allocated and freed chunks. The space released by freed chunks is not
reused, and once all chunks are freed (i.e. when nallocated == nfreed),
the whole block is thrown away. When the allocated chunks have similar
lifespan, this works very well and is extremely cheap.
---
 src/backend/replication/logical/reorderbuffer.c |  80 +--
 src/backend/utils/mmgr/Makefile |   2 +-
 src/backend/utils/mmgr/README   |  23 +
 src/backend/utils/mmgr/generation.c | 768 
 src/include/nodes/memnodes.h|   4 +-
 src/include/nodes/nodes.h   |   1 +
 src/include/replication/reorderbuffer.h |  15 +-
 src/include/utils/memutils.h|   5 +
 8 files changed, 819 insertions(+), 79 deletions(-)
 create mode 100644 src/backend/utils/mmgr/generation.c

diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 0f607ba..dc0ad5b 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -43,6 +43,12 @@
  *	  transaction there will be no other data carrying records between a row's
  *	  toast chunks and the row data itself. See ReorderBufferToast* for
  *	  details.
+ *
+ *	  ReorderBuffer uses two special memory context types - SlabContext for
+ *	  allocations of fixed-length structures (changes and transactions), and
+ *	  GenerationContext for the variable-length transaction data (allocated
+ *	  and freed in groups with similar lifespan).
+ *
  * -
  */
 #include "postgres.h"
@@ -150,15 +156,6 @@ typedef struct ReorderBufferDiskChange
  */
 static const Size max_changes_in_memory = 4096;
 
-/*
- * We use a very simple form of a slab allocator for frequently allocated
- * objects, simply keeping a fixed number in a linked list when unused,
- * instead pfree()ing them. Without that in many workloads aset.c becomes a
- * major bottleneck, especially when spilling to disk while decoding batch
- * workloads.
- */
-static const Size max_cached_tuplebufs = 4096 * 2;	/* ~8MB */
-
 /* ---
  * primary reorderbuffer support routines
  * ---
@@ -248,6 +245,10 @@ ReorderBufferAllocate(void)
 			SLAB_DEFAULT_BLOCK_SIZE,
 			sizeof(ReorderBufferTXN));
 
+	buffer->tup_context = GenerationContextCreate(new_ctx,
+		   "Tuples",
+		   SLAB_LARGE_BLOCK_SIZE);
+
 	hash_ctl.keysize = sizeof(TransactionId);
 	hash_ctl.entrysize = sizeof(ReorderBufferTXNByIdEnt);
 	hash_ctl.hcxt = buffer->context;
@@ -258,15 +259,12 @@ ReorderBufferAllocate(void)

Re: [HACKERS] [BUGS] BUG #14825: enum type: unsafe use?

2017-09-24 Thread Andrew Dunstan



On 09/23/2017 06:06 PM, Tom Lane wrote:
> Andrew Dunstan  writes:
>> OK, I think I'm convinced. Here's is the WIP code I put together for the
>> blacklist. I'm was looking for a place to put the init call, but since
>> it's possibly not going anywhere I stopped :-) . My initial thought
>> about substransactions was that we should ignore them for this purpose
>> (That's why I used TopTransactionContext for the table).
> For the blacklist, I agree we could just ignore subtransactions: all
> subtransaction levels are equally uncommitted for this purpose, and
> leaving entries from failed subtransactions in place seems like a
> non-issue, since they'd never be referenced again.  (Well, barring OID
> wraparound and an enum-value-OID collision while the transaction runs,
> but I think we can ignore that as having probability epsilon.)
>
> But you need to actually put the table in TopTransactionContext, not
> CurTransactionContext ;-).  Also, I don't think you need an init call
> so much as an end-of-transaction cleanup call.  Maybe call it
> AtEOXactEnum(), for consistency with other functions called in the
> same area.
>
>> w.r.t. table size - how large? I confess I haven't seen any systems with
>> more than a few hundred enum types. But even a million or two shouldn't
>> consume a huge amount of memory, should it?
> Dynahash tables are self-expanding, so I don't see a need to stress about
> that too much.  Anything in 10-100 seems reasonable for initial size.
>



OK, here's the finished patch. It has a pretty small footprint all
things considered, and I think it guarantees that nothing that could be
done in this area in 9.6 will be forbidden. That's probably enough to
get us to 10 without having to revert the whole thing, ISTM, and we can
leave any further refinement to the next release.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 93dca7a..1d6f774 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -32,6 +32,7 @@
 #include "access/xlogutils.h"
 #include "catalog/catalog.h"
 #include "catalog/namespace.h"
+#include "catalog/pg_enum.h"
 #include "catalog/storage.h"
 #include "commands/async.h"
 #include "commands/tablecmds.h"
@@ -2126,6 +2127,7 @@ CommitTransaction(void)
 	smgrDoPendingDeletes(true);
 
 	AtCommit_Notify();
+	AtEOXact_Enum();
 	AtEOXact_GUC(true, 1);
 	AtEOXact_SPI(true);
 	AtEOXact_on_commit_actions(true);
@@ -2405,6 +2407,7 @@ PrepareTransaction(void)
 
 	/* PREPARE acts the same as COMMIT as far as GUC is concerned */
 	AtEOXact_GUC(true, 1);
+	AtEOXact_Enum();
 	AtEOXact_SPI(true);
 	AtEOXact_on_commit_actions(true);
 	AtEOXact_Namespace(true, false);
@@ -2606,6 +2609,7 @@ AbortTransaction(void)
 			 false, true);
 		smgrDoPendingDeletes(false);
 
+		AtEOXact_Enum();
 		AtEOXact_GUC(false, 1);
 		AtEOXact_SPI(false);
 		AtEOXact_on_commit_actions(false);
diff --git a/src/backend/catalog/pg_enum.c b/src/backend/catalog/pg_enum.c
index fe61d4d..3056f68 100644
--- a/src/backend/catalog/pg_enum.c
+++ b/src/backend/catalog/pg_enum.c
@@ -28,6 +28,8 @@
 #include "utils/builtins.h"
 #include "utils/catcache.h"
 #include "utils/fmgroids.h"
+#include "utils/hsearch.h"
+#include "utils/memutils.h"
 #include "utils/syscache.h"
 #include "utils/tqual.h"
 
@@ -38,6 +40,8 @@ Oid			binary_upgrade_next_pg_enum_oid = InvalidOid;
 static void RenumberEnumType(Relation pg_enum, HeapTuple *existing, int nelems);
 static int	sort_order_cmp(const void *p1, const void *p2);
 
+/* hash table of values added in the current transaction by AddEnumLabel */
+static HTAB *enum_blacklist = NULL;
 
 /*
  * EnumValuesCreate
@@ -460,8 +464,49 @@ restart:
 	heap_freetuple(enum_tup);
 
 	heap_close(pg_enum, RowExclusiveLock);
+
+	/* Set up the blacklist hash if required */
+	if (enum_blacklist == NULL)
+	{
+		HASHCTL hash_ctl;
+		memset(_ctl, 0, sizeof(hash_ctl));
+		hash_ctl.keysize = sizeof(Oid);
+		hash_ctl.entrysize = sizeof(Oid);
+		hash_ctl.hcxt = TopTransactionContext;
+		enum_blacklist = hash_create("Enum blacklist for current transaction",
+		   32,
+		   _ctl,
+		   HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+	}
+
+	/* Add the new value to the blacklist */
+	(void) hash_search(enum_blacklist, , HASH_ENTER, NULL);
 }
 
+/* Test if the enum is on the blacklist */
+bool
+EnumBlacklisted(Oid enum_id)
+{
+	bool found;
+
+	if (enum_blacklist == NULL)
+		return false;
+
+	(void) hash_search(enum_blacklist, _id, HASH_FIND, );
+	return found;
+}
+
+/*
+ * Clean up the blacklist hash at the end of the transaction. The memory will
+ * have been deallocated, so all we need to do is set the pointer back to
+ * NULL for the next transaction.
+ */
+void
+AtEOXact_Enum(void)
+{
+	enum_blacklist = NULL;
+}
 
 /*
  * RenameEnumLabel
diff --git

Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

2017-09-24 Thread Tomas Vondra

Hi,

Apologies, I forgot to respond to the second part of your message.

On 09/06/2017 09:45 AM, Haribabu Kommi wrote:
>
> While testing this patch, I found another problem that is not related to
> this patch. When the vacuum command is executed mutiple times on
> a table with no dead rows, the number of reltuples value is slowly
> reducing.
> 
> postgres=# select reltuples, n_live_tup, n_dead_tup
>               from pg_stat_user_tables join pg_class using (relname)
>              where relname = 't';
>  reltuples | n_live_tup | n_dead_tup 
> ---++
>     899674 |     899674 |          0
> (1 row)
> 
> postgres=# vacuum t;
> VACUUM
> postgres=# select reltuples, n_live_tup, n_dead_tup
>               from pg_stat_user_tables join pg_class using (relname)
>              where relname = 't';
>  reltuples | n_live_tup | n_dead_tup 
> ---++
>     899622 |     899622 |          0
> (1 row)
> 
> postgres=# vacuum t;
> VACUUM
> postgres=# select reltuples, n_live_tup, n_dead_tup
>               from pg_stat_user_tables join pg_class using (relname)
>              where relname = 't';
>  reltuples | n_live_tup | n_dead_tup 
> ---++
>     899570 |     899570 |          0
> (1 row)
> 
> 
> In lazy_scan_heap() function, we force to scan the last page of the 
> relation to avoid the access exclusive lock in lazy_truncate_heap if
> there are tuples in the last page. Because of this reason, the 
> scanned_pages value will never be 0, so the vac_estimate_reltuples 
> function will estimate the tuples based on the number of tuples from
> the last page of the relation. This estimation is leading to reduce
> the number of retuples.
> 

Hmmm, that's annoying. Perhaps if we should not update the values in
this case, then? I mean, if we only scan the last page, how reliable the
derived values are?

For the record - AFAICS this issue is unrelated to do with the patch
(i.e. it's not introduced by it).

> I am thinking whether this problem really happen in real world 
> scenarios to produce a fix?
> 

Not sure.

As vacuum run decrements the query only a little bit, so you'd have to
run the vacuum many times to be actually bitten by it. For people
relying on autovacuum that won't happen, as it only runs on tables with
certain number of dead tuples.

So you'd have to be running VACUUM in a loop or something (but not
VACUUM ANALYZE, because that works fine) from a script, or something
like that.

That being said, fixing a bug is always a good thing I guess.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

2017-09-24 Thread Tomas Vondra



On 09/06/2017 09:45 AM, Haribabu Kommi wrote:
> 
> 
> On Tue, Jul 25, 2017 at 9:33 PM, Tomas Vondra
> > wrote:
> 
> On 7/25/17 12:55 AM, Tom Lane wrote:
> 
> Tomas Vondra  > writes:
> 
> It seems to me that VACUUM and ANALYZE somewhat disagree on what
> exactly reltuples means. VACUUM seems to be thinking that
> reltuples
> = live + dead while ANALYZE apparently believes that reltuples =
> live
> 
> 
> The question is - which of the reltuples definitions is the
> right
> one? I've always assumed that "reltuples = live + dead" but
> perhaps
> not?
> 
> 
> I think the planner basically assumes that reltuples is the live
> tuple count, so maybe we'd better change VACUUM to get in step.
> 
> 
> Attached is a patch that (I think) does just that. The disagreement
> was caused by VACUUM treating recently dead tuples as live, while
> ANALYZE treats both of those as dead.
> 
> At first I was worried that this will negatively affect plans in the
> long-running transaction, as it will get underestimates (due to
> reltuples not including rows it can see). But that's a problem we
> already have anyway, you just need to run ANALYZE in the other session.
> 
> 
> Thanks for the patch.
> From the mail, I understand that this patch tries to improve the
> reltuples value update in the catalog table by the vacuum command
> to consider the proper visible tuples similar like analyze command.
> 
> -num_tuples);
> +num_tuples - nkeep);
> 
> With the above correction, there is a problem in reporting the number
> of live tuples to the stats.
> 
> postgres=# select reltuples, n_live_tup, n_dead_tup
>               from pg_stat_user_tables join pg_class using (relname)
>              where relname = 't';
>  reltuples | n_live_tup | n_dead_tup 
> ---++
>     899818 |     799636 |     100182
> (1 row)
> 
> 
> The live tuples data value is again decremented with dead tuples
> value before sending them to stats in function lazy_vacuum_rel(),
> 
> /* report results to the stats collector, too */
> new_live_tuples = new_rel_tuples - vacrelstats->new_dead_tuples;
> 
> The fix needs a correction here also. Or change the correction in 
> lazy_vacuum_rel() function itself before updating catalog table similar
> like stats.
> 

Ah, haven't noticed that for some reason - you're right, we estimate the
reltuples based on (num_tuples - nkeep), so it doesn't make much sense
to subtract nkeep again. Attached is v2 of the fix.

I've removed the subtraction from lazy_vacuum_rel(), leaving just

new_live_tuples = new_rel_tuples;

and now it behaves as expected (no second subtraction). That means we
can get rid of new_live_tuples altogether (and the protection against
negative values), and use new_rel_tuples directly.

Which pretty much means that in this case

(pg_class.reltuples == pg_stat_user_tables.n_live_tup)

but I guess that's fine, based on the initial discussion in this thread.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 45b1859..1886f0d 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -198,7 +198,6 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 	BlockNumber new_rel_pages;
 	double		new_rel_tuples;
 	BlockNumber new_rel_allvisible;
-	double		new_live_tuples;
 	TransactionId new_frozen_xid;
 	MultiXactId new_min_multi;
 
@@ -335,13 +334,9 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 		false);
 
 	/* report results to the stats collector, too */
-	new_live_tuples = new_rel_tuples - vacrelstats->new_dead_tuples;
-	if (new_live_tuples < 0)
-		new_live_tuples = 0;	/* just in case */
-
 	pgstat_report_vacuum(RelationGetRelid(onerel),
 		 onerel->rd_rel->relisshared,
-		 new_live_tuples,
+		 new_rel_tuples,
 		 vacrelstats->new_dead_tuples);
 	pgstat_progress_end_command();
 
@@ -1267,7 +1262,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	vacrelstats->new_rel_tuples = vac_estimate_reltuples(onerel, false,
 		 nblocks,
 		 vacrelstats->tupcount_pages,
-		 num_tuples);
+		 num_tuples - nkeep);
 
 	/*
 	 * Release any remaining pin on visibility map page.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ICU locales and text/char(n) SortSupport on Windows

2017-09-24 Thread Peter Geoghegan

On Sun, Sep 24, 2017 at 5:04 AM, Peter Eisentraut
 wrote:
> On 9/22/17 12:25, Peter Geoghegan wrote:
>> On Fri, Sep 22, 2017 at 7:25 AM, Peter Eisentraut
>>  wrote:
>>> I agree.  The attached patch should do it.
>>
>> I see one small issue here: You'll now need to set ssup->comparator
>> back to NULL before you return early in the Windows' libc case. That
>> way, a shim comparator (that goes through bttextcmp(), in the case of
>> text) will be installed within FinishSortSupportFunction(). Other than
>> that, looks good to me.
>
> committed accordingly

Thanks.

I am currently working on a patch for the issues with ICU colcollate
stability as I see them. I should be able to post something later
today or tomorrow. I would appreciate it if you held off on committing
anything there until you've considered what I'll propose.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] logical replication and statistics

2017-09-24 Thread Pavel Stehule

Hi

I did trivial example of logical replication (one table, one publication,
one subscription)

I am little bit surprised so after some work - the replication is working,
the statistics are empty

#master
postgres=# select * from pg_stat_replication ;
(0 rows)

#slave
postgres=# select * from pg_stat_subscription ;
-[ RECORD 1 ]-+-
subid | 16472
subname   | test_sub
pid   |
relid |
received_lsn  |
last_msg_send_time|
last_msg_receipt_time |
latest_end_lsn|
latest_end_time   |

Should be some enabled?

Regards

Pavel

[HACKERS] What's with all the fflush(stderr) calls in pg_standby.c?

2017-09-24 Thread Tom Lane

This looks like cargo-cult programming to me.  stderr is almost
always line-buffered, making these fflush'es pointless.  If it's
not line-buffered, that's probably because it's going to a
noninteractive destination for which this wouldn't matter.
Moreover, none of our other programs do this.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Built-in plugin for logical decoding output

2017-09-24 Thread Simon Riggs

On 24 September 2017 at 15:15, Craig Ringer  wrote:
> On 23 September 2017 at 06:28, Gregory Brail  wrote:
>
>>
>> Would the community support the development of another plugin that is
>> distributed as part of "contrib" that addresses these issues?
>
>
> Petr Jelinek and I tried just that with pglogical. Our submission was
> knocked back with the complaint that there was no in-core user of the code,
> and it couldn't be evaluated usefully without an in-core consumer/receiver.
>
> It's possible we'd make more progress if we tried again now, since we could
> probably write a test suite using the TAP test framework and a small
> src/test/modules consumer. But now we'd probably instead get blocked with
> the complaint that the output plugin used for logical replication should be
> sufficient for any reasonable need. I anticipate that we'd have some
> disagreements about what a reasonable need is, but ... *shrug*.
>
> I personally think we _should_ have such a thing, and that it should be
> separate to the logical replication plugin to allow us to evolve that
> without worrying about out of core dependencies etc.

We plan to submit the next evolution of the code in 2018, in time for
the early cycle of PG12.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Remove pgbench "progress" test pending solution of its timing is (fwd)

2017-09-24 Thread Fabien COELHO





Hello Tom,

Well, I think it's mostly about valgrind making everything really slow. Since 
we have seen some passes from skink recently, perhaps there was also a 
component of more-load-on-the-machine-than-usual.  But in the end this is 
just evidence for my point that regression tests have to be very much not 
timing-sensitive.  We run them under all kinds of out-of-the-ordinary stress.


Attached is an attempt at improving the situation:

(1) there are added comments to explain the whys, which is to provide
coverage for pgbench time-related features... while still not being
time-sensitive, which is a challenge.

(2) the test now only expects "progress: \d" from the output, so it is enough 
that one progress is shown, whenever it is shown.


(3) if the test is detected to have gone AWOL, detailed log checks are
coldly skipped.

This would have passed on "skink" under the special conditions it encountered.

I cannot guaranty that it would pass under any circumstance, though.

If it still encounters a failure, ISTM that it should only be a missing 
"progress:" in the output, which has not been encountered so far.


If it occurs, a few options would remain, none of them very convincing:

 - give the test some more time, eg 3 seconds (hmmm... could still fail
   after any time...)

 - drop the "progress:" expectation (hmmm... but then it does not check
   anything).

 - make the "progress:" output check conditional to the running time
   (hmmm... it would require changing the command_checks_all function,
and there is still a chance that the bench was stuck for 2 seconds
then given time to realize that it has to stop right now...)

 - use an even simpler transaction, eg "SELECT;" which is less likely to
   get stuck (but still could get stuck...).

For the random-ness related test (--sample-rate), we could add a feature to 
pgbench which allows to control the random seed, so that the number of samples 
could be known in advance for testing purposes.


--
Fabien.diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 11bc0fe..5043504 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -4,13 +4,14 @@ use warnings;
 use PostgresNode;
 use TestLib;
 use Test::More;
+use Time::HiRes qw{time};
 
 # start a pgbench specific server
 my $node = get_new_node('main');
 $node->init;
 $node->start;
 
-# invoke pgbench
+# invoke pgbench, return elapsed time
 sub pgbench
 {
 	my ($opts, $stat, $out, $err, $name, $files) = @_;
@@ -32,10 +33,13 @@ sub pgbench
 			append_to_file($filename, $$files{$fn});
 		}
 	}
+
+	my $t0 = time();
 	$node->command_checks_all(\@cmd, $stat, $out, $err, $name);
 
 	# cleanup?
 	#unlink @filenames or die "cannot unlink files (@filenames): $!";
+	return time() - $t0;
 }
 
 # Test concurrent insertion into table with UNIQUE oid column.  DDL expects
@@ -445,14 +449,53 @@ sub check_pgbench_logs
 	ok(unlink(@logs), "remove log files");
 }
 
-# with sampling rate
+# note: --progress-timestamp is not tested
+
+# The point of this test is coverage of various time-related features
+# (-T, -P, --aggregate-interval, --rate, --latency-limit...), so it is
+# somehow time sensitive.
+# The checks performed are quite loose so as to still pass even under
+# degraded (high load, slow host, valgrind run) testing conditions.
+# Maybe it might fail if no every second progress report is
+# shown over 2 seconds...
+my $elapsed = pgbench(
+	'-T 2 -P 1 -l --log-prefix=001_pgbench_log_1 --aggregate-interval=1'
+	  . ' -S -b se@2 --rate=20 --latency-limit=1000 -j ' . $nthreads
+	  . ' -c 3 -r',
+	0,
+	[   qr{type: multiple},
+		qr{clients: 3},
+		qr{threads: $nthreads},
+		# the shown duration is really -T argument value
+		qr{duration: 2 s},
+		qr{script 1: .* select only},
+		qr{script 2: .* select only},
+		qr{statement latencies in milliseconds},
+		qr{FROM pgbench_accounts} ],
+	[	qr{vacuum},
+		# hopefully at least expect some progress report?
+		qr{progress: \d\b} ],
+	'pgbench progress');
+
+# if the test has gone AWOL, coldly skip these detailed checks.
+if (abs($elapsed - 2.0) < 0.5)
+{
+	# $nthreads threads, 2 seconds, but due to timing imprecision we might get
+	# only 1 or as many as 3 progress reports per thread.
+	check_pgbench_logs('001_pgbench_log_1', $nthreads, 1, 3,
+		qr{^\d+ \d{1,2} \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+$});
+}
+
+# with sampling rate, not time sensitive
 pgbench(
 '-n -S -t 50 -c 2 --log --log-prefix=001_pgbench_log_2 --sampling-rate=0.5',
 	0,
 	[ qr{select only}, qr{processed: 100/100} ],
-	[qr{^$}],
+	[ qr{^$} ],
 	'pgbench logs');
 
+# random 50% of 2*50 transactions, accept between 8 and 92
+# probability of failure is about 2 * 2^-42 (?)
 check_pgbench_logs('001_pgbench_log_2', 1, 8, 92,
 	qr{^0 \d{1,2} \d+ \d \d+ \d+$});
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] Built-in plugin for logical decoding output

2017-09-24 Thread Craig Ringer

On 24 September 2017 at 07:41, Euler Taveira  wrote:

> It is difficult to
> coordinate a change like that having only one-way communication).
> 
>

I really think we need to fix that at some point, such that:

* Downstream connections can send CopyData messages *up* the COPY BOTH
protocol, where they're passed to a hook on the output plugin; and

* Output plugins can hook the walsender's event loop (latch set, etc) and
send their own messages without being driven by a logical decoding event .

I wanted to do that some time ago but ran into some issues and time
constraints. Because of the need to support older versions I'm now
committed to an approach using direct libpq connections and function calls
instead, but it seems like a real shame to do that when the replication
protocol connection is *right there*...

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Built-in plugin for logical decoding output

2017-09-24 Thread Craig Ringer

On 23 September 2017 at 06:28, Gregory Brail  wrote:

> Would the community support the development of another plugin that is
> distributed as part of "contrib" that addresses these issues?
>

Petr Jelinek and I tried just that with pglogical. Our submission was
knocked back with the complaint that there was no in-core user of the code,
and it couldn't be evaluated usefully without an in-core consumer/receiver.

It's possible we'd make more progress if we tried again now, since we could
probably write a test suite using the TAP test framework and a small
src/test/modules consumer. But now we'd probably instead get blocked with
the complaint that the output plugin used for logical replication should be
sufficient for any reasonable need. I anticipate that we'd have some
disagreements about what a reasonable need is, but ... *shrug*.

I personally think we _should_ have such a thing, and that it should be
separate to the logical replication plugin to allow us to evolve that
without worrying about out of core dependencies etc.

There's some common functionality that needs factoring out into the logical
decoding framework, like some sort of relation metadata cache, some concept
of "replication sets" or a set of tables to include/exclude, etc. Doing
that is non-trivial work, but it's unlikely that two plugins with similar
and overlapping implementations of such things would be accepted; in that
case I'd be firmly in the "no" camp too.

Code in Pg has a cost, and we do have to justify that cost when we drop
things in contrib/. It's not a free slush pile. So a solid argument does
need to be made for why having this module living in github/whatever isn't
good enough.

I'd be happy to submit a patch, or GitHub repo, or whatever works best as
> an example. (Also, although Transicator uses protobuf, I'm happy to have it
> output a simple binary format as well.)
>

PostgreSQL tends to be very, very conservative about dependencies and
favours (not-)-invented-here rather heavily. Optional dependencies are
accepted sometimes when they can be neatly isolated to one portion of the
codebase and/or abstracted away, so it's not impossible you'd get
acceptance for something like protocol buffers. But there's pretty much
zero chance you'll get it as a hard dependency, you'll need a simple text
and binary protocol too.

At which point the question will arise, why aren't these 3 separate output
plugins? The text one, the binary one for in-core and the protobuf one to
be maintained out of core.

That's a pretty sensible question. The answer is that they'll all need to
share quite a bit of common infrastructure. But if that's infrastructure
all plugins need, shouldn't it be pushed "up" into the logical decoding
layer's supporting framework? Patches welcome for the next major release
cycle.

Thus, that's where I think you should actually start. Extract (and where
necessary generalize) key parts of your code that should be provided by
postgres its self, not implemented by each plugin. And submit it so all
plugins can share it and yours can be simpler. Eventually to the point
where output plugins are often simple format wrappers.

You might want to look at

* pglogical's output plugin; and
* bottled-water

for ideas about things that would benefit from shared infrastructure, and
ways to generalize it. I will be very happy to help there as time permits.

> As a side note, doing this would also help making logical decoding a
> useful feature for customers of Amazon and Google's built-in Postgres
> hosting options.
>

Colour me totally unconvinced there. Either, or both, can simply bless
out-of-tree plugins as it is; after all, they can and do patch the core
server freely too.

It'd *help* encourage them both to pick the same plugin, but that's about
it. And only if the plugin could satisfy their various constraints about no
true superuser access, etc.

I guess I'm a bit frustrated, because *I tried this*, and where was anyone
from Google or Amazon then? But now there's a new home-invented plugin that
we should adopt, ignoring any of the existing ones. Why?

https://github.com/apigee-labs/transicator/tree/master/pgoutput
>

No README?

Why did this need to be invented, rather than using an existing plugin?

I don't mind, I mean, it's great that you're using the plugin
infrastructure and using postgres. I'm just curious what bottled-water,
pglogical, etc lacked, what made you go your own way?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] OpenFile() Permissions Refactor

2017-09-24 Thread David Steele

On 9/23/17 10:22 AM, Peter Eisentraut wrote:
> On 9/13/17 10:26, David Steele wrote:
>> Here's a new patch based on your review.  Where I had a question I made
>> a choice as described below:
> 
> I have committed the changes to the file APIs and a fix for the umask
> save/restore issue.

Thank you!

> The mkdir changes didn't really inspire me.  Replacing mkdir() with
> MakeDirectoryPerm() without any change in functionality doesn't really
> improve clarity.  

OK.  I had hoped removing the need to specify the mode at every call
site was functionality enough.  Even so, I'm a little surprised you
didn't keep PG_DIR_MODE_DEFAULT.

> Maybe we'll revisit that when your next patches arrive.

The next patch set was be this same refactor on the tools (initdb,
pg_rewind, etc), but if you think the mkdir refactor did not add enough
value then I'll rethink my plans.

I may need to present all the patches in one CF so it's clear where all
this is going: allowing group read on $PGDATA.

Thanks,
-- 
-David
da...@pgmasters.net

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ICU locales and text/char(n) SortSupport on Windows

2017-09-24 Thread Peter Eisentraut

On 9/22/17 12:25, Peter Geoghegan wrote:
> On Fri, Sep 22, 2017 at 7:25 AM, Peter Eisentraut
>  wrote:
>> I agree.  The attached patch should do it.
> 
> I see one small issue here: You'll now need to set ssup->comparator
> back to NULL before you return early in the Windows' libc case. That
> way, a shim comparator (that goes through bttextcmp(), in the case of
> text) will be installed within FinishSortSupportFunction(). Other than
> that, looks good to me.

committed accordingly

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Log LDAP "diagnostic messages"?

2017-09-24 Thread Thomas Munro

On Wed, Sep 20, 2017 at 7:57 AM, Peter Eisentraut
 wrote:
> In the 0001 patch, I would move the ldap_unbind() calls after the
> ereport(LOG) calls.  We do all the other resource cleanup (pfree() etc.)
> after the ereport() calls, so it would be weird to do this one
> differently.  Also, in the second patch you move one of the
> ldap_unbind() calls down anyway.

Fair point.  In that case there are a few others we should consider
moving down too for consistency, like in the attached.

> In the 0002 patch, I think this is a bit repetitive and could be
> refactored even more.  The end result could look like
>
> ereport(LOG,
> (errmsg("blah"),
>  errdetail_for_ldap(ldap)));
> ldap_unbind(ldap);

Thanks, that is much tidier.  Done that way in the attached.

Here also is a small addition to your TAP test which exercises the
non-NULL code path because slapd rejects TLS by default with a
diagnostic message.  I'm not sure if this is worth adding, since it
doesn't actually verify that the code path is reached (though you can
see that it is from the logs).

-- 
Thomas Munro
http://www.enterprisedb.com

0001-Improve-LDAP-cleanup-code-in-error-paths.patch
Description: Binary data

0002-Log-diagnostic-messages-if-errors-occur-during-LDAP-.patch
Description: Binary data

0003-Add-a-regression-test-to-log-an-LDAP-diagnostic-mess.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Built-in plugin for logical decoding output

2017-09-24 Thread Alvaro Hernandez




On 24/09/17 02:41, Euler Taveira wrote:

2017-09-23 14:01 GMT-03:00 Alvaro Hernandez :

 However, AFAIK, AWS's DMS uses it for production purposes (see
http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_PostgreSQL.html).


It seems a bad idea. AFAICS test_decoding was not designed to be a
ready-for-production plugin. It is just a proof of concept for logical
decoding.


    Yes, this is what I heard and read.

    However, if DMS uses it for what I'd call production use, I assume 
it is actually production quality. I bet they do enough testing, and 
don't ship software to potentially millions of customers if it doesn't 
work well. So... first, I'd consider this a a sign of robustness. 
Second. my hats off for the plugin code ;)



 I would be happy to see another logical decoding plugin into core
starting on 11. However, this also poses a bit of a challenge for middleware
implementors: you need to support one for 9.4-9.5 (test_decoding), another
for 10 (pgoutput) and maybe another for 11 onwards. The idea of asking users
to install a binary plugin is very unsexy, so these are the options
available.


wal2json works for 9.4+ (besides the WAL messages I committed a month
ago). Since this boat was already shipped we can arrange some packages
for 9.4-10 (an external project) and ask vendors to support the
backward-compatible plugin. The middleware implementor will have to
support this new plugin format. Being JSON a widespread format, it is
easier to refactor the code to parse JSON.


    I agree its far better to parse JSON than the test_decoding output. 
But asking any potential user to install a dynamic library, from a third 
party website, which will need to be compiled for many potential 
OSes/Archs, or even impossible if running on a managed environment... is 
not a great experience. Unless PostgreSQL would backport a plugin and 
ship it in newer releases, if test_decoding is good enough, I'd rather 
stick to it.





 However, having said that, and while json is a great output format for
interoperability, if there's a discussion on which plugin to include next,
I'd also favor one that has some more compact representation format (or that
supports several formats, not only json).


We could certainly extend pgoutput to support more than one format
(like pglogical did AFAIR), however, we wouldn't reuse code (different
formats) and will have a fat plugin (I don't foresee a plugin using
different formats in the same connection. It is difficult to
coordinate a change like that having only one-way communication).



    I think pgoutput is what it is. Maybe instead than growing it, my 
+1 would be to add a new plugin that rather than being json only, would 
also support other formats, like an efficient binary serialization.



    Álvaro


--

Alvaro Hernandez


---
OnGres



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench regression test failure

2017-09-24 Thread Fabien COELHO



Hello Tom,


# progress: 2.6 s, 6.9 tps, lat 0.000 ms stddev 0.000, lag 0.000 ms, 18 skipped
# progress: 3.0 s, 0.0 tps, lat -nan ms stddev -nan, lag -nan ms, 0 skipped
# progress: 4.0 s, 1.0 tps, lat 2682.730 ms stddev 0.000, lag 985.509 ms, 0 
skipped


(BTW, the "-nan" bits suggest an actual pgbench bug, independently of 
anything else.)


From my point of view, NaN is expected when no test were executed in the 
interval: if there was no transaction, it does not make sense to talk 
about its latency, so NaN is the right answer.


However, the above "6.9 tps, lat 0.000, stddev 0.000, lag 0.000" is 
inconsistent. As "6.9 = 18 / 2.6", it means that progress tps calculation 
should remove skipped transactions...


Attached patch attempts to report more consistent figures in the progress 
and in final report when transactions are skipped.


  sh> cat sleep-100.sql
  \sleep 100 ms
  SELECT 1;

  sh> ./pgbench -P 1 -t 100 -f sleep-100.sql -R 20 -L 1
  [...]
  progress: 1.0 s, 7.0 tps, lat 100.145 ms stddev 0.042, lag 0.000 ms, 16 
skipped
  progress: 2.0 s, 6.0 tps, lat 100.133 ms stddev 0.040, lag 0.021 ms, 7 skipped
  progress: 3.0 s, 9.0 tps, lat 100.115 ms stddev 0.016, lag 0.000 ms, 11 
skipped
  [...]
  number of transactions actually processed: 38/100
  number of transactions skipped: 62 (62.000 %)
  number of transactions above the 1.0 ms latency limit: 38 (38.000 %)
  latency average = 100.142 ms
  tps = 7.091010 (including connections establishing)
  tps = 7.094144 (excluding connections establishing)
  script statistics:
   - number of transactions skipped: 62 (62.000%)

--
Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index e37496c..9ca9734 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -2584,7 +2584,7 @@ processXactStats(TState *thread, CState *st, instr_time *now,
 		doLog(thread, st, agg, skipped, latency, lag);
 
 	/* XXX could use a mutex here, but we choose not to */
-	if (per_script_stats)
+	if (per_script_stats || latency_limit)
 		accumStats(_script[st->use_file].stats, skipped, latency, lag);
 }
 
@@ -3522,11 +3522,14 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	double		time_include,
 tps_include,
 tps_exclude;
+	int64		ntx = total->cnt - total->skipped;
 
 	time_include = INSTR_TIME_GET_DOUBLE(total_time);
-	tps_include = total->cnt / time_include;
-	tps_exclude = total->cnt / (time_include -
-(INSTR_TIME_GET_DOUBLE(conn_total_time) / nclients));
+
+	/* tps is about actually executed transactions */
+	tps_include = ntx / time_include;
+	tps_exclude = ntx /
+		(time_include - (INSTR_TIME_GET_DOUBLE(conn_total_time) / nclients));
 
 	/* Report test parameters. */
 	printf("transaction type: %s\n",
@@ -3539,7 +3542,7 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	{
 		printf("number of transactions per client: %d\n", nxacts);
 		printf("number of transactions actually processed: " INT64_FORMAT "/%d\n",
-			   total->cnt - total->skipped, nxacts * nclients);
+			   total->cnt, nxacts * nclients);
 	}
 	else
 	{
@@ -4660,7 +4663,8 @@ threadRun(void *arg)
 			{
 /* generate and show report */
 StatsData	cur;
-int64		run = now - last_report;
+int64		run = now - last_report,
+			ntx;
 double		tps,
 			total_run,
 			latency,
@@ -4675,7 +4679,7 @@ threadRun(void *arg)
  * XXX: No locking. There is no guarantee that we get an
  * atomic snapshot of the transaction count and latencies, so
  * these figures can well be off by a small amount. The
- * progress is report's purpose is to give a quick overview of
+ * progress report's purpose is to give a quick overview of
  * how the test is going, so that shouldn't matter too much.
  * (If a read from a 64-bit integer is not atomic, you might
  * get a "torn" read and completely bogus latencies though!)
@@ -4689,15 +4693,14 @@ threadRun(void *arg)
 	cur.skipped += thread[i].stats.skipped;
 }
 
+/* we count only actually executed transactions */
+ntx = (cur.cnt - cur.skipped) - (last.cnt - last.skipped);
 total_run = (now - thread_start) / 100.0;
-tps = 100.0 * (cur.cnt - last.cnt) / run;
-latency = 0.001 * (cur.latency.sum - last.latency.sum) /
-	(cur.cnt - last.cnt);
-sqlat = 1.0 * (cur.latency.sum2 - last.latency.sum2)
-	/ (cur.cnt - last.cnt);
+tps = 100.0 * ntx / run;
+latency = 0.001 * (cur.latency.sum - last.latency.sum) / ntx;
+sqlat = 1.0 * (cur.latency.sum2 - last.latency.sum2) / ntx;
 stdev = 0.001 * sqrt(sqlat - 100.0 * latency * latency);
-lag = 0.001 * (cur.lag.sum - last.lag.sum) /
-	(cur.cnt - last.cnt);
+lag = 0.001 * (cur.lag.sum - last.lag.sum) / ntx;
 
 if (progress_timestamp)
 {
@@ -4714,6 +4717,7 @@ threadRun(void *arg)
 			 (long) tv.tv_sec, (long) (tv.tv_usec / 1000));
 }
 else
+	/* round seconds are expected, nut the thread may

38 matches

Mail list logo