Re: [HACKERS] add more NLS to bin

2016-10-30 Thread Michael Paquier
On Thu, Oct 27, 2016 at 10:02 AM, Peter Eisentraut
 wrote:
> Here is a series of patches to add NLS to the remaining bin programs,
> which were moved from contrib a while ago.  (If you're missing pgbench,
> I'm skipping that for now because it's more complicated.)  I'll add this
> to the commit fest.

I have been looking at this patch set, and that's definitely a good
idea to do this change.
1) 0001 for pg_archivecleanup is missing nothing.

2) For 0002 and pg_test_fsync, I am seeing a missing entry:
printf(NA_FORMAT, "n/a*\n");

3) For pg_test_timing, the doc changes could be into a separate
change, but that's fine to group them as well. I am seeing no missing
strings for translations.

4) 0004 and pg_upgrade... In check.c, three places like that:
if (!db_used)
{
fprintf(script, "Database: %s\n", active_db->db_name);
db_used = true;
}

In exec.c:
#endif
fprintf(log, "command: %s\n", cmd);
#ifdef WIN32

+GETTEXT_FLAGS= \
+pg_fatal:1:c-format \
+pg_log:2:c-format \
+prep_status:1:c-format \
+report_stats:2:c-forma
s/report_stats/report_status/

In info.c, missing some entries in report_unmatched_relation() when
reporting unmatching relations?

In parseCommandLine() of option.c, the "pg_upgrade run on" string
needs to be handled.

In util.c, doesn't pg_log_v() need to handle strings used in fprintf?

In version.c, this one:
if (!db_used)
{
fprintf(script, "Database: %s\n", active_db->db_name);
db_used = true;
}


5) 0005 and pg_xlogdump, I am not seeing a missing entry.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance improvement for joins where outer side is unique

2016-10-30 Thread David Rowley
On 8 April 2016 at 06:49, Tom Lane  wrote:
> David Rowley  writes:
> Just had a thought about this, which should have crystallized a long
> time ago perhaps.  Where I'd originally imagined you were going with
> this idea is to do what the thread title actually says, and check for
> joins in which the *outer* side is unique.  I can't see that that's
> of any value for nestloop or hash joins, but for merge joins, knowing
> that the outer side is unique could be extremely valuable because
> we could skip doing mark/restore backups on the inner side, hugely
> reducing the cost when the inner side has many duplicates.  Now I'm
> not asking for the initially-committed version of the patch to do that,
> but I think we need to design it to be readily extensible to do so.

I've rebased the changes I made to address this back in April to current master.

The changes get rid of the changing JOIN_INNER to JOIN_SEMI conversion
and revert back to the original "inner_unique" marking of the joins.
In addition to this I've also added "outer_unique", which is only made
use of in merge join to control if the outer side needs to enable mark
and restore or not.

However, having said that, I'm not sure why we'd need outer_unique
available so we'd know that we could skip mark/restore. I think
inner_unique is enough for this purpose. Take the comment from
nodeMergejoin.c:

* outer: (0 ^1 1 2 5 5 5 6 6 7) current tuple: 1
 * inner: (1 ^3 5 5 5 5 6) current tuple: 3
 ...
*
 * Consider the above relations and suppose that the executor has
 * just joined the first outer "5" with the last inner "5". The
 * next step is of course to join the second outer "5" with all
 * the inner "5's". This requires repositioning the inner "cursor"
 * to point at the first inner "5". This is done by "marking" the
 * first inner 5 so we can restore the "cursor" to it before joining
 * with the second outer 5. The access method interface provides
 * routines to mark and restore to a tuple.

If only one inner "5" can exist (unique_inner), then isn't the inner
side already in the correct place, and no restore is required?


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


unique_joins_2016-10-31.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] commit fest manager for CF 2016-11?

2016-10-30 Thread Michael Paquier
On Mon, Oct 31, 2016 at 1:49 PM, Peter Eisentraut
 wrote:
> Commit fest CF 2016-11 is supposed to start in about a day.  I don't
> think a commit fest manager was chosen yet.  Volunteers?

I'd like to pass my turn on this one as CFM, but I can bring in some
time for the one of January.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] commit fest manager for CF 2016-11?

2016-10-30 Thread Peter Eisentraut
Commit fest CF 2016-11 is supposed to start in about a day.  I don't
think a commit fest manager was chosen yet.  Volunteers?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Microvacuum support for Hash Index

2016-10-30 Thread Ashutosh Sharma
Hi Amit,

Thanks for showing your interest and reviewing my patch. I have
started looking into your review comments. I will share the updated
patch in a day or two.

With Regards,
Ashutosh Sharma
EnterpriseDB: http://www.enterprisedb.com

On Fri, Oct 28, 2016 at 4:42 PM, Amit Kapila  wrote:
> On Mon, Oct 24, 2016 at 2:21 PM, Ashutosh Sharma  
> wrote:
>> Hi All,
>>
>> I have added a microvacuum support for hash index access method and
>> attached is the v1 patch for the same.
>>
>
> This is an important functionality for hash index as we already do
> have same functionality for other types of indexes like btree.
>
>> The patch basically takes care
>> of the following things:
>>
>> 1. Firstly, it changes the marking of dead tuples from
>> 'tuple-at-a-time' to 'page-at-a-time' during hash index scan. For this
>> we accumulate the heap tids and offset of all the hash index tuples if
>> it is pointed by kill_prior_tuple during scan and then mark all
>> accumulated tids as LP_DEAD either while stepping from one page to
>> another (assuming the scan in both forward and backward direction) or
>> during end of the hash index scan or during rescan.
>>
>> 2. Secondly, when inserting tuple into hash index table, if not enough
>> space is found on a current page then it ensures that we first clean
>> the dead tuples if found in the current hash index page before moving
>> to the next page in a bucket chain or going for a bucket split. This
>> basically increases the page reusability and reduces the number of
>> page splits, thereby reducing the overall size of hash index table.
>>
>
> Few comments on patch:
>
> 1.
> +static void
> +hash_xlog_vacuum_one_page(XLogReaderState *record)
> +{
> + XLogRecPtr lsn = record->EndRecPtr;
> + xl_hash_vacuum *xldata = (xl_hash_vacuum *) XLogRecGetData(record);
> + Buffer bucketbuf = InvalidBuffer;
> + Buffer buffer;
> + Buffer metabuf;
> + Page page;
> + XLogRedoAction action;
>
> While replaying the delete/vacuum record on standby, it can conflict
> with some already running queries.  Basically the replay can remove
> some row which can be visible on standby.  You need to resolve
> conflicts similar to what we do in btree delete records (refer
> btree_xlog_delete).
>
> 2.
> + /*
> + * Write-lock the meta page so that we can decrement
> + * tuple count.
> + */
> + _hash_chgbufaccess(rel, metabuf, HASH_NOLOCK, HASH_WRITE);
> +
> + _hash_vacuum_one_page(rel, metabuf, buf, bucket_buf,
> +  (buf == bucket_buf) ? true : false);
> +
> + _hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK);
>
> It seems here meta page lock is acquired for duration more than
> required and also it is not required when there are no deletable items
> on page. You can take the metapage lock before decrementing the count.
>
> 3.
>   Assert(offnum <= maxoff);
> +
>
> Spurious space.  There are some other similar spurious white space
> changes in patch, remove them as well.
>
>
>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] make coverage-html on OS X

2016-10-30 Thread Jim Nasby

On 10/29/16 3:31 PM, Jim Nasby wrote:

tl;dr: It's critical that you actually do a make install, or at least it
is if you've set --prefix with configure. If you don't, then even if you
do make check you'le going to get the *installed* libpq, and not the
*built* libpq.


Actually, sometimes that isn't even enough. I had one checkout that was 
refusing to successfully build coverage (producing an old .gcda for 
*everything*) until I rm'd the entire install directory (make uninstall 
was not enough). I also killed the running postmaster as part of that, 
so it's possible that's what was causing the problems.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-30 Thread Jim Nasby

On 10/30/16 1:32 PM, Tomas Vondra wrote:


Now, maybe this has nothing to do with PostgreSQL itself, but maybe it's
some sort of CPU / OS scheduling artifact. For example, the system has
36 physical cores, 72 virtual ones (thanks to HT). I find it strange
that the "good" client counts are always multiples of 72, while the
"bad" ones fall in between.

  72 = 72 * 1   (good)
 108 = 72 * 1.5 (bad)
 144 = 72 * 2   (good)
 180 = 72 * 2.5 (bad)
 216 = 72 * 3   (good)
 252 = 72 * 3.5 (bad)
 288 = 72 * 4   (good)

So maybe this has something to do with how OS schedules the tasks, or
maybe some internal heuristics in the CPU, or something like that.


It might be enlightening to run a series of tests that are 72*.1 or *.2 
apart (say, 72, 79, 86, ..., 137, 144).

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sequential scan result order vs performance

2016-10-30 Thread Jim Nasby

On 10/30/16 9:12 AM, Tom Lane wrote:

I think there will be a lot of howls.  People expect that creating
a table by inserting a bunch of rows, and then reading back those
rows, will not change the order.  We already futzed with that guarantee
a bit with syncscans, but that only affects quite large tables --- and
even there, we were forced to provide a way to turn it off.


Leaving a 30% performance improvement on the floor because some people 
don't grok how sets work seems insane to me.


We could have a GUC to disable this. I suspect ORDER BY ctid would be 
another option.


BTW, I've sometimes wished for a mode where queries would silently have 
result ordering intentionally futzed, to eliminate any possibility of 
dependence on tuple ordering (as well as having sequences start at some 
random value). I guess with the hooks that are in place today it 
wouldn't be hard to stick a ORDER BY random() in if there wasn't already 
a Sort node at the top level...

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-10-30 Thread Karl O. Pinc
Hi Gilles,

On Sat, 29 Oct 2016 22:00:08 +0200
Gilles Darold  wrote:

> The attached v10 of the current_logfiles patch 

Attached is a patch to apply on top of the v10 patch.

It updates current_logfiles only once per log rotation.
I see no reason to open and write the file twice
if both csvlog and stderr logging is happening.

I have 2 more questions.

I don't understand why you're calling 
logfile_writename(last_file_name, last_csv_file_name);
in the SIGHUP code.  Presumably you've already
written the old logfile names to current_logfiles.
On SIGHUP you want to write the new names, not
the old ones.  But the point of the SIGHUP code
is to look for changes in logfile writing and then
call logfile_rotate() to make those changes.  And
logfile_rotate() calls logfile_writename() as appropriate.
Shouldn't the logfile_writename call in the SIGHUP
code be eliminated?

Second, and I've not paid really close attention here,
you're calling logfile_writename() with last_file_name
and last_csv_file_name in a number of places.  Are you
sure that last_file_name and last_csv_file_name is reset
to the empty string if stderr/csvlog logging is turned
off and the config file re-read?  You might be recording
that logging is going somewhere that's been turned off
by a config change.  I've not noticed last_file_name and
(especially) last_csv_file_name getting reset to the empty
string.

FYI, ultimately, in order to re-try writes to current_logfiles
after ENFILE and EMFILE failure, I'm thinking that I'll probably
wind up with logfile_writename() taking no arguments.  It will
always write last_file_name and last_csv_file_name.  Then it's
a matter of making sure that these variables contain accurate
values.  It would be helpful to let me know if you have any
insight regards config file re-read and resetting of these
variables before I dive into writing code which retries writes to
current_logfiles.

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


patch_pg_current_logfile-v10.diff.only_once
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Radix tree for character conversion

2016-10-30 Thread Kyotaro HORIGUCHI
Hello,

At Fri, 28 Oct 2016 09:42:25 -0400, Tom Lane  wrote in 
<13049.1477662...@sss.pgh.pa.us>
> Robert Haas  writes:
> > On Thu, Oct 27, 2016 at 3:23 AM, Kyotaro HORIGUCHI
> >  wrote:
> >> Perhaps we can put the files into our repositoy by providing some
> >> notifications.
> 
> > Uggh, I don't much like advertising clauses.
> 
> Even if the license were exactly compatible with ours, I'd be -1 on
> bloating our tarballs with these files.  They're large and only a
> tiny fraction of developers, let alone end users, will ever care
> to look at them.

I understood that the intention of Heikki's suggestion, that is,
these might be included in PostgreSQL's repository, is looking
for a kind of stability, or consistency. The source files are not
revision-mangaged. In case where the authorities get unwanted
changes or no longer avaiable, .map files have to be edited
irelevantly from the authority files maybe from the reason that
. Actually some map files have lost their authority file or other
map files have got several direct modifications. We will be free
from such disturbance by containing "frozen" authority files.

On the other hand, I also agree that the advertising or
additional bloats of source repositiry are a nuisance.

> I think it's fine as long as we have a README file that explains
> where to get them.  (I'm not even very thrilled with the proposed
> auto-download script, as it makes undesirable assumptions about
> which internet tools you use, not to mention that it won't work
> at all on Windows.)

Mmm. It would be a pain in the neck. Some of the files are
already stored in "OBSOLETE" directory in the Unicode consortium
ftp site, and one of them has been vanished and available from
another place, a part of ICU source tree. On the other hand map
files are assumed to be generated from the scripts and are to
discuraged to be directly edited. Radix map files are uneditable
and currently made from the authority files. If some authority
files are gone, the additional edit have to be done directly onto
map files, and they are in turn to be the authority for radix
files.  (it's quite easy to chage the authority to current map
files, though).

By the way, the following phrase of the terms of license.

http://www.unicode.org/copyright.html

| COPYRIGHT AND PERMISSION NOTICE
| 
| Copyright (c) 1991-2016 Unicode, Inc. All rights reserved.
| Distributed under the Terms of Use in http://www.unicode.org/copyright.html.
| 
| Permission is hereby granted, free of charge, to any person obtaining
| a copy of the Unicode data files and any associated documentation
| (the "Data Files") or Unicode software and any associated documentation
| (the "Software") to deal in the Data Files or Software
| without restriction, including without limitation the rights to use,
| copy, modify, merge, publish, distribute, and/or sell copies of
| the Data Files or Software, and to permit persons to whom the Data Files
| or Software are furnished to do so, provided that either
| (a) this copyright and permission notice appear with all copies
| of the Data Files or Software, or
| (b) this copyright and permission notice appear in associated
| Documentation.

I'm afraid that the map (and _radix.map files) are the translates
of the "Data Files", and 'translate' is a part of 'modify'.

Either the notice is necessary or not, if we decide to wipe the
'true' authority out from our source files, I'd like to make the
map files (preferably with comments) as the second authority,
_radix.map files are to be getenerated from them, since they're
not editable.

> I'd actually vote for getting rid of the reference files we
> have in the tree now (src/backend/utils/mb/Unicode/*txt), on
> the same grounds.  That's 600K of stuff that does not need to
> be in our tarballs.

Anyway, I'd like to register this as an item of this CF.

regares,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical Replication WIP

2016-10-30 Thread Steve Singer

On 10/24/2016 09:22 AM, Petr Jelinek wrote:

Hi,

attached is updated version of the patch.

There are quite a few improvements and restructuring, I fixed all the
bugs and basically everything that came up from the reviews and was
agreed on. There are still couple of things missing, ie column type
definition in protocol and some things related to existing data copy.


Here are a few things I've noticed so far.

+
+CREATE SUBSCRIPTION mysub WITH CONNECTION dbname=foo host=bar 
user=repuser PUBLICATION mypub;

+
+  
+  

The documentation above doesn't match the syntax, CONNECTION needs to be 
in single quotes not double quotes

I think you want
+
+CREATE SUBSCRIPTION mysub WITH CONNECTION 'dbname=foo host=bar 
user=repuser' PUBLICATION mypub;

+
+  
+  



I am not sure if this is a known issue covered by your comments about 
data copy but I am still having issues with error reporting on a failed 
subscription.


I created a subscription, dropped the subscription and created a second 
one.  The second subscription isn't active but shows no errors.



P: create publication mypub for table public.a;
S: create subscription mysub with connection 'dbname=test host=localhost 
port=5440' publication mypub;

P:  insert into a(b) values ('t');
S: select * FROM a;
 a | b
---+---
 1 | t
(1 row)

Everything is good
Then I do
S: drop subscription mysub;

S: create subscription mysub2 with connection 'dbname=test 
host=localhost port=5440' publication mypub;

P:  insert into a(b) values ('f');
S: select * FROM a;
 a | b
---+---
 1 | t

The data doesn't replicate


select * FROM pg_stat_subscription;
 subid | subname | pid | relid | received_lsn | last_msg_send_time | 
last_msg_receipt_time | latest_end_lsn | latest_end_time

---+-+-+---+--++---++-
 16398 | mysub2  | |   |  | |   
||

(1 row)


The only thing in my log is

2016-10-30 15:27:27.038 EDT [6028] NOTICE:  dropped replication slot 
"mysub" on publisher
2016-10-30 15:27:36.072 EDT [6028] NOTICE:  created replication slot 
"mysub2" on publisher

2016-10-30 15:27:36.082 EDT [6028] NOTICE:  synchronized table states


I'd expect an error in the log or something.
However, if I delete everything from the table on the subscriber then 
the subscription proceeds


I think there are still problems with signal handling in the initial sync

If I try to drop mysub2 (while the subscription is stuck instead of 
deleting the data) the drop hangs
If I then try to kill the postmaster for the subscriber nothing happens, 
have to send it a -9 to go away.


However once I do that and then restart the postmaster for the 
subscriber I start to see the duplicate key errors in the log


2016-10-30 16:00:54.635 EDT [7018] ERROR:  duplicate key value violates 
unique constraint "a_pkey"

2016-10-30 16:00:54.635 EDT [7018] DETAIL:  Key (a)=(1) already exists.
2016-10-30 16:00:54.635 EDT [7018] CONTEXT:  COPY a, line 1
2016-10-30 16:00:54.637 EDT [7007] LOG:  worker process: logical 
replication worker 16400 sync 16387 (PID 7018) exited with exit code 1


I'm not sure why I didn't get those until I restarted the postmaster but 
it seems to happen whenever I drop a subscription then create a new one. 
Creating the second subscription from the same psql session as I 
create/drop the first seems important  in reproducing this





I am also having issues dropping a second subscription from the same 
psql session


(table a is empty on both nodes to avoid duplicate key errors)
S: create subscription sub1 with connection  'host=localhost dbname=test 
port=5440' publication mypub;
S: create subscription sub2 with connection  'host=localhost dbname=test 
port=5440' publication mypub;

S: drop subscription sub1;
S: drop subscription sub2;

At this point the drop subscription hangs.








The biggest changes are:

I added one more prerequisite patch (the first one) which adds ephemeral
slots (or well implements UI on top of the code that was mostly already
there). The ephemeral slots are different in that they go away either on
error or when session is closed. This means the initial data sync does
not have to worry about cleaning up the slots after itself. I think this
will be useful in other places as well (for example basebackup). I
originally wanted to call them temporary slots in the UI but since the
behavior is bit different from temp tables I decided to go with what the
underlying code calls them in UI as well.

I also split out the libpqwalreceiver rewrite to separate patch which
does just the re-architecture and does not really add new functionality.
And I did the re-architecture bit differently based on the review.

There is now new executor module in execReplication.c, no new nodes but
several utility commands. I moved there the tuple lookup functions from
apply and also wrote new interfaces for doing inserts/updates/deletes to

Re: [HACKERS] sources.sgml typo

2016-10-30 Thread Tatsuo Ishii
>> In doc/src/sgml/sources.sgml:
>> 
>> When the definition an inline function references symbols
>> (i.e. variables, functions) that are only available as part of the
>> backend, the function may not be visible when included from frontend
>> code.
>> 
>> Shinichi Matsuda reported that there should be "of" between
>> "definition" and "an inline function" and I think he is right. Patch
>> attached. If there's no objection, I will commit it.
> 
> Sounds good.

Done.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-30 Thread Tomas Vondra

Hi,

On 10/27/2016 01:44 PM, Amit Kapila wrote:

On Thu, Oct 27, 2016 at 4:15 AM, Tomas Vondra
 wrote:


FWIW I plan to run the same test with logged tables - if it shows similar
regression, I'll be much more worried, because that's a fairly typical
scenario (logged tables, data set > shared buffers), and we surely can't
just go and break that.



Sure, please do those tests.



OK, so I do have results for those tests - that is, scale 3000 with 
shared_buffers=16GB (so continuously writing out dirty buffers). The 
following reports show the results slightly differently - all three "tps 
charts" next to each other, then the speedup charts and tables.


Overall, the results are surprisingly positive - look at these results 
(all ending with "-retest"):


[1] http://tvondra.bitbucket.org/index2.html#dilip-3000-logged-sync-retest

[2] 
http://tvondra.bitbucket.org/index2.html#pgbench-3000-logged-sync-noskip-retest


[3] 
http://tvondra.bitbucket.org/index2.html#pgbench-3000-logged-sync-skip-retest


All three show significant improvement, even with fairly low client 
counts. For example with 72 clients, the tps improves 20%, without 
significantly affecting variability variability of the results( measured 
as stdddev, more on this later).


It's however interesting that "no_content_lock" is almost exactly the 
same as master, while the other two cases improve significantly.


The other interesting thing is that "pgbench -N" [3] shows no such 
improvement, unlike regular pgbench and Dilip's workload. Not sure why, 
though - I'd expect to see significant improvement in this case.


I have also repeated those tests with clog buffers increased to 512 (so 
4x the current maximum of 128). I only have results for Dilip's workload 
and "pgbench -N":


[4] 
http://tvondra.bitbucket.org/index2.html#dilip-3000-logged-sync-retest-512


[5] 
http://tvondra.bitbucket.org/index2.html#pgbench-3000-logged-sync-skip-retest-512


The results are somewhat surprising, I guess, because the effect is 
wildly different for each workload.


For Dilip's workload increasing clog buffers to 512 pretty much 
eliminates all benefits of the patches. For example with 288 client, the 
group_update patch gives ~60k tps on 128 buffers [1] but only 42k tps on 
512 buffers [4].


With "pgbench -N", the effect is exactly the opposite - while with 128 
buffers there was pretty much no benefit from any of the patches [3], 
with 512 buffers we suddenly get almost 2x the throughput, but only for 
group_update and master (while the other two patches show no improvement 
at all).


I don't have results for the regular pgbench ("noskip") with 512 buffers 
yet, but I'm curious what that will show.


In general I however think that the patches don't show any regression in 
any of those workloads (at least not with 128 buffers). Based solely on 
the results, I like the group_update more, because it performs as good 
as master or significantly better.



2. We do see in some cases that granular_locking and
no_content_lock patches has shown significant increase in
contention on CLOGControlLock. I have already shared my analysis
for same upthread [8].




I've read that analysis, but I'm not sure I see how it explains the "zig 
zag" behavior. I do understand that shifting the contention to some 
other (already busy) lock may negatively impact throughput, or that the 
group_update may result in updating multiple clog pages, but I don't 
understand two things:


(1) Why this should result in the fluctuations we observe in some of the 
cases. For example, why should we see 150k tps on, 72 clients, then drop 
to 92k with 108 clients, then back to 130k on 144 clients, then 84k on 
180 clients etc. That seems fairly strange.


(2) Why this should affect all three patches, when only group_update has 
to modify multiple clog pages.


For example consider this:

http://tvondra.bitbucket.org/index2.html#dilip-300-logged-async

For example looking at % of time spent on different locks with the 
group_update patch, I see this (ignoring locks with ~1%):


 event_type wait_event   36   72  108  144  180  216  252  288
 -
 -  -60   63   45   53   38   50   33   48
 Client ClientRead   33   239   146   1048
 LWLockNamedCLogControlLock   27   33   14   34   14   33   14
 LWLockTranche  buffer_content029   13   19   18   26   22

I don't see any sign of contention shifting to other locks, just 
CLogControlLock fluctuating between 14% and 33% for some reason.


Now, maybe this has nothing to do with PostgreSQL itself, but maybe it's 
some sort of CPU / OS scheduling artifact. For example, the system has 
36 physical cores, 72 virtual ones (thanks to HT). I find it strange 
that the "good" client counts are always multiples of 72, while the 
"bad" ones fall in between.


  72 = 72 * 1   (good)
 

Re: [HACKERS] [sqlsmith] Missing CHECK_FOR_INTERRUPTS in tsquery_rewrite

2016-10-30 Thread Tom Lane
I wrote:
> Also, I think this is outright *wrong* for phrase search --- dropping some
> of the child nodes without any other adjustment isn't valid is it?

After further digging, it seems there's no bug because the tree is
originally binary and QTNTernary won't try to flatten OP_PHRASE nodes.
So we can't actually get to this logic for such nodes.  But seems like
an Assert for that wouldn't be a bad thing.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Missing CHECK_FOR_INTERRUPTS in tsquery_rewrite

2016-10-30 Thread Tom Lane
Andreas Seltenreich  writes:
> testing with sqlsmith yielded an uncancellable backend hogging CPU time.
> Gdb showed it was busy in findeq() of tsquery_rewrite.c.  This function
> appears to have exponential complexity wrt. the size of the involved
> tsqueries.  The following query runs for 12s on my machine with no way
> to cancel it and incrementing the length of the first argument by 1
> doubles this time.

> select ts_rewrite(
>   (select string_agg(i::text, '&')::tsquery from generate_series(1,32) g(i)),
>   (select string_agg(i::text, '&')::tsquery from generate_series(1,19) g(i)),
>   'foo');

> The attached patch adds a CHECK_FOR_INTERRUPTS to make it cancellable.

A CHECK_FOR_INTERRUPTS is probably a good idea, but man is this code
stupid.  It seems to be checking for subset inclusion by forming every
possible subset of the test node and then checking for exact equality
to the target set.  Seems like we should be able to do better.

Also, I think this is outright *wrong* for phrase search --- dropping some
of the child nodes without any other adjustment isn't valid is it?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] delta relations in AFTER triggers

2016-10-30 Thread Kevin Grittner
On Sun, Oct 2, 2016 at 11:20 PM, Michael Paquier
 wrote:
> On Sat, Sep 10, 2016 at 7:28 AM, Kevin Grittner  wrote:
>> v6 fixes recently-introduced bit-rot.
>
> Not as big as I thought, only 2k when both patches are combined... The
> patch without noapi in its name needs to be applied first, and after
> the patch with noapi can be applied.
>  60 files changed, 2073 insertions(+), 63 deletions(-)
> Moved to next CF.

In an attempt to make this patch more digestible for reviewers, I
have split it up as follows:

transition-c-triggers-only-v7.diff
 contrib/pgstattuple/pgstattuple.c|   2 +
 doc/src/sgml/catalogs.sgml   |  16 ++
 doc/src/sgml/ref/create_trigger.sgml |  94 +--
 src/backend/commands/tablecmds.c |   5 +-
 src/backend/commands/trigger.c   | 327 -
 src/backend/nodes/copyfuncs.c|  16 ++
 src/backend/nodes/equalfuncs.c   |  14 ++
 src/backend/nodes/outfuncs.c |  13 +
 src/backend/parser/gram.y|  70 +-
 src/backend/utils/adt/ruleutils.c|  23 ++
 src/bin/pg_basebackup/walmethods.c   |   5 -
 src/include/catalog/pg_trigger.h |  13 +-
 src/include/commands/trigger.h   |   2 +
 src/include/nodes/nodes.h|   1 +
 src/include/nodes/parsenodes.h   |  17 ++
 src/include/parser/kwlist.h  |   3 +
 src/include/utils/reltrigger.h   |   7 +
 17 files changed, 581 insertions(+), 47 deletions(-)

This part is virtually unchanged (just curing bit-rot) since
August, 2014, when I believe I had addressed all issues raised by
reviewers.  It does provide a barely usable feature, since the
syntax for transition tables is added and tuplestores are created
when needed (and only when needed), with references stored in the
TriggerData structure.  No new execution nodes are provided, so
only C triggers can use these relations, and must explicitly and
directly access the tuplestores from within the C code -- there is
no support for referencing these tuplestores from within queries.

This is basic infrastructure needed for the more complete feature.
As far as I know there are no objections to what is implemented
here.  I have pulled it out to make the review of the more
controversial portions easier.  Since it had quite a bit of review
two years ago, I will do some testing to make sure that nothing has
broken and then push this part in a few days if there are no
objections.

transition-noapi.diff
 contrib/pgstattuple/pgstattuple.c |   2 -
 doc/src/sgml/spi.sgml | 279 
 src/backend/commands/explain.c|  10 +
 src/backend/executor/Makefile |   3 +-
 src/backend/executor/execAmi.c|   6 +
 src/backend/executor/execProcnode.c   |  14 +
 src/backend/executor/nodeTuplestorescan.c | 200 ++
 src/backend/nodes/copyfuncs.c |  25 ++
 src/backend/nodes/nodeFuncs.c |   2 +
 src/backend/nodes/outfuncs.c  |  20 ++
 src/backend/nodes/print.c |   4 +
 src/backend/nodes/readfuncs.c |   7 +
 src/backend/optimizer/path/allpaths.c |  44 +++
 src/backend/optimizer/path/costsize.c |  66 +
 src/backend/optimizer/plan/createplan.c   |  71 +
 src/backend/optimizer/plan/setrefs.c  |  11 +
 src/backend/optimizer/plan/subselect.c|   5 +
 src/backend/optimizer/prep/prepjointree.c |   2 +
 src/backend/optimizer/util/pathnode.c |  25 ++
 src/backend/optimizer/util/plancat.c  |   4 +-
 src/backend/optimizer/util/relnode.c  |   1 +
 src/backend/parser/Makefile   |   3 +-
 src/backend/parser/analyze.c  |  11 +
 src/backend/parser/parse_clause.c |  23 +-
 src/backend/parser/parse_node.c   |   2 +
 src/backend/parser/parse_relation.c   | 115 +++-
 src/backend/parser/parse_target.c |   2 +
 src/backend/parser/parse_tuplestore.c |  34 +++
 src/backend/tcop/utility.c|   3 +-
 src/backend/utils/adt/ruleutils.c |   1 +
 src/backend/utils/cache/Makefile  |   2 +-
 src/backend/utils/cache/tsrcache.c| 111 
 src/bin/pg_basebackup/walmethods.c|   5 +
 src/include/executor/nodeTuplestorescan.h |  24 ++
 src/include/nodes/execnodes.h |  18 ++
 src/include/nodes/nodes.h |   2 +
 src/include/nodes/parsenodes.h|  11 +-
 src/include/nodes/plannodes.h |  10 +
 src/include/optimizer/cost.h  |   3 +
 src/include/optimizer/pathnode.h  |   2 +
 src/include/parser/parse_node.h   |   2 +
 src/include/parser/parse_relation.h   |   4 +
 src/include/parser/parse_tuplestore.h |  24 ++
 src/include/utils/tsrcache.h  |  27 ++
 src/include/utils/tsrmd.h |  29 ++
 src/include/utils/tsrmdcache.h|  31 +++
 src/include/utils/tuplestore.h|  14 +
 src/pl/plpgsql/src/pl_comp.c  |  

Re: [HACKERS] sequential scan result order vs performance

2016-10-30 Thread Tom Lane
Andres Freund  writes:
> It's quite easy to change iteration so we start with the latest item,
> and iterate till the first, rather than the other way round. In
> benchmarks with somewhat wide columns and aggregation, this yields
> speedups of over 30%, before hitting other bottlenecks.

> I do wonder however if it's acceptable to change the result order of
> sequential scans.

I think there will be a lot of howls.  People expect that creating
a table by inserting a bunch of rows, and then reading back those
rows, will not change the order.  We already futzed with that guarantee
a bit with syncscans, but that only affects quite large tables --- and
even there, we were forced to provide a way to turn it off.

If you were talking about 3X then maybe it would be worth it, but for 30%
(on a subset of queries) I am not excited.

I wonder whether we could instead adjust the rules for insertion so
that tuples tend to be physically in order by itemid.  I'm imagining
leaving two "holes" in a page and sometimes (hopefully not often)
having to shift data during insert to preserve the ordering.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sources.sgml typo

2016-10-30 Thread Andres Freund
On 2016-10-30 21:30:45 +0900, Tatsuo Ishii wrote:
> In doc/src/sgml/sources.sgml:
> 
> When the definition an inline function references symbols
> (i.e. variables, functions) that are only available as part of the
> backend, the function may not be visible when included from frontend
> code.
> 
> Shinichi Matsuda reported that there should be "of" between
> "definition" and "an inline function" and I think he is right. Patch
> attached. If there's no objection, I will commit it.

Sounds good.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] sources.sgml typo

2016-10-30 Thread Tatsuo Ishii
In doc/src/sgml/sources.sgml:

When the definition an inline function references symbols
(i.e. variables, functions) that are only available as part of the
backend, the function may not be visible when included from frontend
code.

Shinichi Matsuda reported that there should be "of" between
"definition" and "an inline function" and I think he is right. Patch
attached. If there's no objection, I will commit it.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
diff --git a/doc/src/sgml/sources.sgml b/doc/src/sgml/sources.sgml
index d132ee4..877fced 100644
--- a/doc/src/sgml/sources.sgml
+++ b/doc/src/sgml/sources.sgml
@@ -898,7 +898,7 @@ BETTER: unrecognized node type: 42
  expressions of various types need to be passed to the macro.
 
 
- When the definition an inline function references symbols
+ When the definition of an inline function references symbols
  (i.e. variables, functions) that are only available as part of the
  backend, the function may not be visible when included from frontend
  code.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mention column name in error messages

2016-10-30 Thread Franck Verrot
On Sun, Oct 30, 2016 at 12:04 AM, Michael Paquier  wrote:
>
> Okay, so I have reworked the patch a bit and finished with the
> attached, adapting the context message to give more information. I
> have noticed as well a bug in the patch: the context callback was set
> before checking if the column used for transformation is checked on
> being a system column or not, the problem being that the callback
> could be called without the fields set.
>

Interesting. I wasn't sure it was set at the right time indeed.

I have updated the regression tests that I found, the main portion of
> the patch being dedicated to that and being sure that all the
> alternate outputs are correctly refreshed. In this case int8, float4,
> float8, xml and contrib/citext are the ones impacted by the change
> with alternate outputs.
>

Thanks! I couldn't find time to update it last week, so thanks for chiming
in.


> I am passing that down to a committer for review. The patch looks
> large, but at 95% it involves diffs in the regression tests,
> alternative outputs taking a large role in the bloat.
>

Great, thanks Michael!

--
Franck


Re: [HACKERS] JIT compiler for expressions

2016-10-30 Thread Andres Freund
Hi
,
On 2016-10-28 14:47:35 +0300, Dmitry Melnik wrote:
> We'd like to present our work on adding LLVM JIT compilation of expressions
> in SQL queries for PostgreSQL.

Great!  I'm also working in the area, albeit with a, I think, a bit
different approach[1].   Is your goal to integrate this work into postgres
proper, or is this more an academic research project?

If the former, lets try to collaborate. If the latter, lets talk, so
we're not integrating something dumb ;)

[1]. So far I've basically converted expression evaluation, and tuple
deforming, into small interpreted (switch/computed goto opcode dispatch)
mini-languages, which then can be JITed. Adding a small handwritten
x86-64 JIT (out of fun, not because I think that's a good approach) also
resulted in quite noticeable speedups.  Did you experiment with JITing
tuple deforming as well?  The reason I was thinking of going in this
direction, is that it's a lot faster to compile such mini pieces of
code, and it already gives significant speedups. There still are
function calls to postgres functions, but they're all direct function
calls, instead of indirect ones.


> Currently, our JIT is used to compile expressions in every query, so for
> short-running queries JIT compilation may take longer than query execution
> itself. We plan to address it by using planner estimate to decide whether
> it worth JIT compiling, also we may try parallel JIT compilation. But for
> now we recommend testing it on a large workload  in order to pay off the
> compilation (we've tested on 40GB database for Q1).

Could you give some estimates about how long JITing takes for you, say
for Q1?  Different approaches here obviously have very different
tradeoffs.


> This work is a part of our greater effort on implementing full JIT compiler
> in PostgreSQL, where along with JITting expressions we've changed the
> iteration model from Volcano-style to push-model and reimplemented code
> generation with LLVM for most of Scan/Aggregation/Join methods. That
> approach gives much better speedup (x4-5 times on Q1), but it takes many
> code changes, so we're developing it as PostgreSQL extension.

FWIW, I think long term, we're going to want something like that in
core. I'm currently working on a "gradual" transformation towards that,
by *optionally* dealing with "batches" of tuples which get passed
around.  Noticeable speedups, but not in the 4-5x range.


> Also we're going to give a lightning talk at upcoming PGConf.EU in Tallinn,
> and discuss the further development with PostgreSQL community. We'd
> appreciate any feedback!

Cool, lets chat a bit, I'm also here.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sequential scan result order vs performance

2016-10-30 Thread Andres Freund
Hi,

On 2016-10-30 00:36:55 -0700, Andres Freund wrote:
> It's quite easy to change iteration so we start with the latest item,
> and iterate till the first, rather than the other way round. In
> benchmarks with somewhat wide columns and aggregation, this yields
> speedups of over 30%, before hitting other bottlenecks.

One more point: Over IM Robert commented that it's not guaranteed that
itemid order correlates precisely with reverse "tuple data" order. After
PageRepairFragmentation() that'll not be the case anymore. That's true,
but I suspect in many cases with larger analytics queries the
correlation will still be significant, and we also could make it
guaranteed with the price of making PageRepairFragmentation() a bit more
expensive.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JIT compiler for expressions

2016-10-30 Thread Greg Stark
This sounds amazing.

My only comment is that LLVM 3.7 is kind of old in the accelerated world of
LLVM. If you have patches to LLVM you need you won't have much success
submitting them as patches on 3.7.

The current stable release is 3.9 and the development snapshots are 4.0. I
know LLVM moves quickly and that makes it hard to try to track the
development. If you worked with 4.0 you might find the apis you're using
getting deprecated and rewritten several times while your project is under
development.


[HACKERS] sequential scan result order vs performance

2016-10-30 Thread Andres Freund
Hi,

while working on the executor, to process "batches" or "bubbles" of
tuples I hit some weird performance issues (as in things didn't improve
as much as I had hoped).  A fair amount of headscratching later I
figured out that the tuple order in sequential scans is a major
bottleneck.

When iterating over a page we return tuples in itemid order, which makes
them returned in *descending* order address-wise, as tuples are stored
starting from the end of the page.  But when actually accessing the
tuples, we access them increasing address order (header, and then column
by column). It appears that that, quite understandable confuses prefetch
units, leading to drastically increased cache miss ratios.

It's quite easy to change iteration so we start with the latest item,
and iterate till the first, rather than the other way round. In
benchmarks with somewhat wide columns and aggregation, this yields
speedups of over 30%, before hitting other bottlenecks.

I do wonder however if it's acceptable to change the result order of
sequential scans. People don't tend to specify ORDER BY everwhere - as
evidenced by large swathes of our regression tests failing spuriously -
so they might not be happy to see a somewhat weird order (pages
sequentially increasing, but individual tuples inside a page in reverse
order).

We could change the order only in cases where the user doesn't actually
see the result, say below aggregation, sort, and whatnot nodes. On the
other hand the benefit is quite significant for heavily filtered
sequential scans as well, COPY out also benefits.

Comments?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-10-30 Thread Karl O. Pinc
On Tue, 28 Jun 2016 11:06:24 +0200
Gilles Darold  wrote:

> Le 07/04/2016 08:30, Karl O. Pinc a écrit :

> > "src/backend/postmaster/syslogger.c expects to see fopen() fail
> > with  
> ENFILE and EMFILE.  What will you do if you get these?"
> 
> - Nothing, if the problem occurs during the log rotate call, then
> log rotation file is disabled so logfile_writename() will not be
> called. Case where the problem occurs between the rotation call and
> the logfile_writename() call is possible but I don't think that it
> will be useful to try again. In this last case log filename will be
> updated during next rotation.

The case I'm interested in is when the rotation call succeeds but
you get ENFILE or EMFILE in logfile_writename() and current_logfiles
is not updated.

This looks like an ugly problem that only happens
sporadically under load.  If I've set log
rotation to, say, 1 week, and I'm post-processing my logs based
on the current_logfiles content, and the logs rotate but
current_logfiles is not updated, then I lose a week of log
post-processing.

I'm experimenting with some code that retries writing current_logfiles,
if it failed due to ENFILE or EMFILE, the next time a log message
is written.  If that's too often it'd be easy enough to add
a backoff counter that retries progressively less frequently
based on a "clock tick" per log message write.

However, per my last email, it'll be Tuesday before I really get
back to this.   Let me know if, instead, you want to jump in
and write the code, have a better idea, think this is a really 
stupid approach, or have other reasons why I should abandon
this plan.

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-10-30 Thread Karl O. Pinc
On Sat, 29 Oct 2016 22:00:08 +0200
Gilles Darold  wrote:

> The attached v10 of the current_logfiles patch include your last
> changes on documentation but not the patch on v9 about the
> user-supplied GUC value. I think the v10 path is ready for committers
> and that the additional patch to add src/include/utils/guc_values.h
> to define user GUC values is something that need to be taken outside
> this one. Imo, thoses GUC values (stderr, csvlog) are not expected to
> change so often to require a global definition, but why not, if
> committers think this must be done I can add it to a v11 patch.

I agree that the guc_values.h patches should be submitted separately
to the committers, when your patch is submitted.  Creating symbols
for these values is a matter of coding style they should resolve.
(IMO it's not whether the values will change, it's whether
someone reading the code can read the letters "stdout" and know
to what they refer and where to find related usage elsewhere in 
the code.)

I'll keep up the guc_values.h patches and have them ready for
submission along with your patch.

I don't think your patch is quite ready for submission to
the committers.

Attached is a patch to be applied on top of your v10 patch
which does basic fixup to logfile_writename().

I have some questions about logfile_writename():

 Why does the logfile_open() call fail silently?
 Why not use ereport() here?  (With a WARNING level.)

 Why do the ereport() invocations all have a LOG level?
 You're not recovering and the errors are unexpected
 so I'd think WARNING would be the right level.
 (I previously, IIRC, suggested LOG level -- only if
 you are retrying and recovering from an error.)


Have you given any thought to my proposal to change
CURRENT_LOG_FILENAME to LOG_METAINFO_FILE?


I'm not sure I've looked at every bit of your patch
yet.  I won't have much time to do more real work
until after Tuesday morning.

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


patch_pg_current_logfile-v10.diff.logfile_writename
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers