Re: JIT compiling with LLVM v9.0

2018-01-27 Thread Jeff Davis
On Sat, Jan 27, 2018 at 5:15 PM, Andres Freund  wrote:
>> Also, I'm sure you considered this, but I'd like to ask if we can try
>> harder make the JIT itself happen in an extension. It has some pretty
>> huge benefits:
>
> I'm very strongly against this. To the point that I'll not pursue JITing
> further if that becomes a requirement.

I would like to see this feature succeed and I'm not making any
specific demands.

> infeasible because quite freuquently both non-JITed code and JITed code
> need adjustments. That'd solve your concern about

Can you explain further?

> I think it's a fools errand to try to keep in sync with core changes on
> the expression evaluation and struct definition side of things. There's
> planner integration, error handling integration and similar related
> things too, all of which require core changes. Therefore I don't think
> there's a reasonable chance of success of doing this outside of core
> postgres.

I wasn't suggesting the entire patch be done outside of core. Core
will certainly need to know about JIT compilation, but I am not
convinced that it needs to know about the details of LLVM. All the
references to the LLVM library itself are contained in a few files, so
you've already got it well organized. What's stopping us from putting
that code into a "jit provider" extension that implements the proper
interfaces?

> Well, but doing this outside of core would pretty much prohibit doing so
> forever, no?

First of all, building .bc files at build time is much less invasive
than linking to the LLVM library. Any version of clang will produce
bitcode that can be read by any LLVM library or tool later (more or
less).

Second, we could change our minds later. Mark any extension APIs as
experimental, and decide we want to move LLVM into postgres whenever
it is needed.

Third, there's lots of cool stuff we can do here:
  * put the source in the catalog
  * an extension could have its own catalog and build the source into
bitcode and cache it there
  * the source for functions would flow to replicas, etc.
  * security-conscious environments might even choose to run some of
the C code in a safe C interpreter rather than machine code

So I really don't see this as permanently closing off our options.

Regards,
 Jeff Davis



Re: Reorder C includes in partition.c

2018-01-27 Thread Bruce Momjian
On Thu, Jan 25, 2018 at 06:58:40PM +0900, Etsuro Fujita wrote:
> Here is a small patch for $SUBJECT.
> 
> Best regards,
> Etsuro Fujita

> diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
> index 8adc4ee..e69bbc0 100644
> --- a/src/backend/catalog/partition.c
> +++ b/src/backend/catalog/partition.c
> @@ -46,11 +46,11 @@
>  #include "utils/array.h"
>  #include "utils/builtins.h"
>  #include "utils/datum.h"
> -#include "utils/memutils.h"
>  #include "utils/fmgroids.h"
>  #include "utils/hashutils.h"
>  #include "utils/inval.h"
>  #include "utils/lsyscache.h"
> +#include "utils/memutils.h"
>  #include "utils/rel.h"
>  #include "utils/ruleutils.h"
>  #include "utils/syscache.h"

Patch applied to head.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Cancelling parallel query leads to segfault

2018-01-27 Thread Andres Freund
Hi Peter (and others who mucked around with related code),

While testing another patch I found that cancelling a parallel query on
master segfaults the leader in an interesting manner:

#0  0x5648721cb361 in tas (lock=0x7fbd8e025360 ) at 
/home/andres/src/postgresql/src/include/storage/s_lock.h:228
#1  0x5648721cc2d6 in shm_mq_detach_internal (mq=0x7fbd8e025360) at 
/home/andres/src/postgresql/src/backend/storage/ipc/shm_mq.c:812
#2  0x5648721cc25c in shm_mq_detach (mqh=0x564874956bb0) at 
/home/andres/src/postgresql/src/backend/storage/ipc/shm_mq.c:778
#3  0x56487201e8fc in ExecParallelFinish (pei=0x5648749514f8) at 
/home/andres/src/postgresql/src/backend/executor/execParallel.c:1011
#4  0x56487203706c in ExecShutdownGatherWorkers (node=0x564874907148) at 
/home/andres/src/postgresql/src/backend/executor/nodeGather.c:383
#5  0x5648720370b9 in ExecShutdownGather (node=0x564874907148) at 
/home/andres/src/postgresql/src/backend/executor/nodeGather.c:400
#6  0x564872036c84 in ExecEndGather (node=0x564874907148) at 
/home/andres/src/postgresql/src/backend/executor/nodeGather.c:241
#7  0x564872020eff in ExecEndNode (node=0x564874907148) at 
/home/andres/src/postgresql/src/backend/executor/execProcnode.c:609
#8  0x5648720522f5 in ExecEndSort (node=0x564874906eb0) at 
/home/andres/src/postgresql/src/backend/executor/nodeSort.c:260
#9  0x564872021053 in ExecEndNode (node=0x564874906eb0) at 
/home/andres/src/postgresql/src/backend/executor/execProcnode.c:695
#10 0x56487203193e in ExecEndAgg (node=0x564874906788) at 
/home/andres/src/postgresql/src/backend/executor/nodeAgg.c:3329
#11 0x564872021075 in ExecEndNode (node=0x564874906788) at 
/home/andres/src/postgresql/src/backend/executor/execProcnode.c:703
#12 0x56487201a102 in ExecEndPlan (planstate=0x564874906788, 
estate=0x564874906538) at 
/home/andres/src/postgresql/src/backend/executor/execMain.c:1604
#13 0x56487201823b in standard_ExecutorEnd (queryDesc=0x56487485e808) at 
/home/andres/src/postgresql/src/backend/executor/execMain.c:493
#14 0x564872018167 in ExecutorEnd (queryDesc=0x56487485e808) at 
/home/andres/src/postgresql/src/backend/executor/execMain.c:464
#15 0x564871fb136e in PortalCleanup (portal=0x5648748a7b48) at 
/home/andres/src/postgresql/src/backend/commands/portalcmds.c:301
#16 0x5648723a2148 in AtAbort_Portals () at 
/home/andres/src/postgresql/src/backend/utils/mmgr/portalmem.c:781
#17 0x564871e84168 in AbortTransaction () at 
/home/andres/src/postgresql/src/backend/access/transam/xact.c:2540
#18 0x564871e865e2 in AbortOutOfAnyTransaction () at 
/home/andres/src/postgresql/src/backend/access/transam/xact.c:4387
#19 0x564872378c40 in ShutdownPostgres (code=1, arg=0) at 
/home/andres/src/postgresql/src/backend/utils/init/postinit.c:1156
#20 0x5648721c2a63 in shmem_exit (code=1) at 
/home/andres/src/postgresql/src/backend/storage/ipc/ipc.c:228
#21 0x5648721c293a in proc_exit_prepare (code=1) at 
/home/andres/src/postgresql/src/backend/storage/ipc/ipc.c:185
#22 0x5648721c28a2 in proc_exit (code=1) at 
/home/andres/src/postgresql/src/backend/storage/ipc/ipc.c:102
#23 0x5648723638cd in errfinish (dummy=0) at 
/home/andres/src/postgresql/src/backend/utils/error/elog.c:543
#24 0x5648721f5f54 in ProcessInterrupts () at 
/home/andres/src/postgresql/src/backend/tcop/postgres.c:2921
#25 0x564872022630 in ExecScanFetch (node=0x564874907c18, 
accessMtd=0x564872050b32 , recheckMtd=0x564872050bfd ) at 
/home/andres/src/postgresql/src/backend/executor/execScan.c:41
#26 0x564872022840 in ExecScan (node=0x564874907c18, 
accessMtd=0x564872050b32 , recheckMtd=0x564872050bfd ) at 
/home/andres/src/postgresql/src/backend/executor/execScan.c:162
#27 0x564872050c4b in ExecSeqScan (pstate=0x564874907c18) at 
/home/andres/src/postgresql/src/backend/executor/nodeSeqscan.c:130
#28 0x56487202c115 in ExecProcNode (node=0x564874907c18) at 
/home/andres/src/postgresql/src/include/executor/executor.h:245
#29 0x56487202c5af in fetch_input_tuple (aggstate=0x5648749074a0) at 
/home/andres/src/postgresql/src/backend/executor/nodeAgg.c:406
#30 0x56487202eb03 in agg_fill_hash_table (aggstate=0x5648749074a0) at 
/home/andres/src/postgresql/src/backend/executor/nodeAgg.c:1915
#31 0x56487202e38c in ExecAgg (pstate=0x5648749074a0) at 
/home/andres/src/postgresql/src/backend/executor/nodeAgg.c:1534
#32 0x564872020c76 in ExecProcNodeFirst (node=0x5648749074a0) at 
/home/andres/src/postgresql/src/backend/executor/execProcnode.c:446
#33 0x5648720366fc in ExecProcNode (node=0x5648749074a0) at 
/home/andres/src/postgresql/src/include/executor/executor.h:245
#34 0x564872036d83 in gather_getnext (gatherstate=0x564874907148) at 
/home/andres/src/postgresql/src/backend/executor/nodeGather.c:285
#35 0x564872036c0d in ExecGather (pstate=0x564874907148) at 
/home/andres/src/postgresql/src/backend/executor/nodeGather.c:216
#36 0x564872020c76 in 

Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patch for hash index

2018-01-27 Thread Amit Kapila
On Thu, Jan 25, 2018 at 7:29 PM, Alexander Korotkov
 wrote:
> On Sat, Jan 20, 2018 at 4:24 PM, Amit Kapila 
> wrote:
>>
>> On Fri, Sep 29, 2017 at 8:20 PM, Alexander Korotkov
>>  wrote:
>> > +1,
>> > Very nice idea!  Locking hash values directly seems to be superior over
>> > locking hash index pages.
>> > Shubham, do you have any comment on this?
>> >
>>
>> As Shubham seems to be running out of time, I thought of helping him
>> by looking into the above-suggested idea.  I think one way to lock a
>> particular hash value is we can use TID of heap tuple associated with
>> each index entry (index entry for the hash index will be hash value).
>
>
> Sorry, I didn't get what do you particularly mean.  If locking either TID of
> associated heap tuple or TID of hash index tuple, then what will we lock
> in the case when nothing found?  Even if we found nothing, we have
> to place some lock according to search key in order to detect cases when
> somebody has inserted the row which we might see according to that search
> key.
>

Okay, but if you use hash value as lock tag (which is possible) how
will we deal with things like page split?  I think even if use
blocknumber/page/bucketnumber corresponding to the hash value along
with hash value in lock tag, then also it doesn't appear to work.  I
think using page level locks for index makes sense, especially because
it will be convinient to deal with page splits.  Also, as predicate
locks stay in-memory, so creating too many such locks doesn't sound
like a nice strategy even though we have a way to upgrade it to next
level (page) as that has a separate cost to it.

>>
>> However, do we really need it for implementing predicate locking for
>> hash indexes?  If we look at the "Index AM implementations" section of
>> README-SSI, it doesn't seem to be required.  Basically, if we look at
>> the strategy of predicate locks in btree [1], it seems to me locking
>> at page level for hash index seems to be a right direction as similar
>> to btree, the corresponding heap tuple read will be locked.
>
>
> Btree uses leaf-pages locking because it supports both range searches
> and exact value searches.  And it needs to detect overlaps between
> these two kinds of searches.  Therefore, btree locks leaf-pages in both
> cases.
>

Also, probably using page level locks makes it easier to deal index
operations like page split.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: JIT compiling with LLVM v9.0

2018-01-27 Thread Jeff Davis
On Sat, Jan 27, 2018 at 1:20 PM, Andres Freund  wrote:
> b) The optimizations to take advantage of the constants and make the
>code faster with the constant tupledesc is fairly slow (you pretty
>much need at least an -O2 equivalent), whereas the handrolled tuple
>deforming is faster than the slot_getsomeattrs with just a single,
>pretty cheap, mem2reg pass.  We're talking about ~1ms vs 70-100ms in
>a lot of cases.  The optimizer often will not actually unroll the
>loop with many attributes despite that being beneficial.

This seems like the major point. We would have to customize the
optimization passes a lot and/or choose carefully which ones we apply.

> I think in most cases using the approach you advocate makes sense, to
> avoid duplication, but tuple deforming is such a major bottleneck that I
> think it's clearly worth doing it manually. Being able to use llvm with
> just a always-inline and a mem2reg pass makes it so much more widely
> applicable than doing the full inlining and optimization work.

OK.

On another topic, I'm trying to find a way we could break this patch
into smaller pieces. For instance, if we concentrate on tuple
deforming, maybe it would be committable in time for v11?

I see that you added some optimizations to the existing generic code.
Do those offer a measurable improvement, and if so, can you commit
those first to make the JIT stuff more readable?

Also, I'm sure you considered this, but I'd like to ask if we can try
harder make the JIT itself happen in an extension. It has some pretty
huge benefits:
  * The JIT code is likely to go through a lot of changes, and it
would be nice if it wasn't tied to a yearly release cycle.
  * Would mean postgres itself isn't dependent on a huge library like
llvm, which just seems like a good idea from a packaging standpoint.
  * May give GCC or something else a chance to compete with it's own JIT.
  * It may make it easier to get something in v11.

It appears reasonable to make the slot deforming and expression
evaluator parts an extension. execExpr.h only exports a couple new
functions; heaptuple.c has a lot of changes but they seem like they
could be separated (unless I'm missing something).

The biggest problem is that the inlining would be much harder to
separate out, because you are building the .bc files at build time. I
really like the idea of inlining, but it doesn't necessarily need to
be in the first commit.

Regards,
 Jeff Davis



Re: [HACKERS] GnuTLS support

2018-01-27 Thread Bruce Momjian
On Wed, Jan 17, 2018 at 10:02:35PM -0500, Tom Lane wrote:
> That is a really good point.  For precedent, note that darn near nobody
> seems to know whether their psql contains readline or libedit.  If we
> force the issue by giving the settings different names, then they'll be
> forced to figure out which SSL implementation they have.
> 
> On the other hand, you could argue that there are more user-friendly
> ways to expose that information than demanding that users play twenty
> questions with their config files.  I'd at least want us to recognize
> when somebody tries to set "openssl_foo" in a gnutls implementation,
> and respond with "you need to twiddle the gnutls_xxx variables instead"
> rather than just "unrecognized configuration parameter".  Maybe that'd
> be good enough, though.

To open another can of worms, are we ever going to rename "ssl"
parameters to "tls" since TLS is the protocol used by all modern secure
communication libraries.  SSL was deprecated in 2015:

https://www.globalsign.com/en/blog/ssl-vs-tls-difference/
Both SSL 2.0 and 3.0 have been deprecated by the IETF (in 2011
and 2015, respectively).

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: Built-in connection pooling

2018-01-27 Thread Bruce Momjian
On Mon, Jan 22, 2018 at 06:51:08PM +0100, Tomas Vondra wrote:
> > Yes, external connection pooling is more flexible. It allows to 
> > perform pooling either at client side either at server side (or even 
> > combine two approaches).>
> > Also external connection pooling for PostgreSQL is not limited by 
> > pgbouncer/pgpool.>
> > There are many frameworks maintaining their own connection pool, for 
> > example J2EE, jboss, hibernate,...>
> > I have a filling than about 70% of enterprise systems working with 
> > databases are written in Java and doing connection pooling in their 
> > own way.>
> 
> True, but that does not really mean we don't need "our" connection
> pooling (built-in or not). The connection pools are usually built into
> the application servers, so each application server has their own
> independent pool. With larger deployments (a couple of application
> servers) that quickly causes problems with max_connections.

I found this thread and the pthread thread very interesting.  Konstantin,
thank you for writing prototypes and giving us very useful benchmarks
for ideas I thought I might never see.

As much as I would like to move forward with coding, I would like to
back up and understand where we need to go with these ideas.

First, it looks like pthreads and a builtin pooler help mostly with
1000+ connections.  It seems like you found that pthreads wasn't
sufficient and the builtin pooler was better.  Is that correct?

Is there anything we can do differently about allowing long-idle
connections to reduce their resource usage, e.g. free their caches? 
Remove from PGPROC?  Could we do it conditionally, e.g. only sessions
that don't have open transactions or cursors?

It feels like user and db mismatches are always going to cause pooling
problems.  Could we actually exit and restart connections that have
default session state?

Right now, if you hit max_connections, we start rejecting new
connections.  Would it make sense to allow an option to exit idle
connections when this happens so new users can connect?

I know we have relied on external connection poolers to solve all the
high connection problems but it seems there might be simple things we
can do to improve matters.  FYI, I did write a blog entry comparing
external and internal connection poolers:

https://momjian.us/main/blogs/pgblog/2017.html#April_21_2017

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: \describe*

2018-01-27 Thread David Fetter
On Sat, Jan 27, 2018 at 10:54:07PM +0100, Vik Fearing wrote:
> On 01/27/2018 05:39 PM, David Fetter wrote:
> > On Fri, Jan 26, 2018 at 04:28:24PM +0100, Vik Fearing wrote:
> >> On 01/26/2018 03:49 PM, David Fetter wrote:
> >>> I propose that we do what at least MySQL, Oracle, and DB2 do and
> >>> implement DESCRIBE as its own command.
> >> Hard pass.
> > 
> > Would you be so kind as to expand on this?  "Pass" might indicate a
> > lack of interest in doing the work, but "hard pass" seems to indicate
> > that you have reasons the work should not be done.  Have I interpreted
> > this correctly?
> 
> Andreas said it quite well.  I don't like having client commands look
> like server commands.  I don't mind exceptions for "help" and "quit",
> but I see no reason for anything more.

I did not propose a client command mimicking a server command.  I
thought I made that clear by mentioning that the \ commands are
unavailable to clients other than psql, and offering an alternative.

What I propose is in fact a server command, which at least three of
the other popular RDBMSs already have.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: STATISTICS retained in CREATE TABLE ... LIKE (INCLUDING ALL)?

2018-01-27 Thread Tomas Vondra


On 01/27/2018 10:45 PM, Tom Lane wrote:
> David Rowley  writes:
>> I'd offer to put it back to the order of the enum, but I want to
>> minimise the invasiveness of the patch. I'm not sure yet if it should
>> be classed as a bug fix or a new feature.
> 
> FWIW, I'd call it a new feature.
> 

I'm not sure what exactly the feature would be? I mean "keep statistics
even if you only ask for indexes" does not seem particularly helpful to
me. So I see it more like a bug - I certainly think it should have been
handled differently in 10.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: STATISTICS retained in CREATE TABLE ... LIKE (INCLUDING ALL)?

2018-01-27 Thread Tomas Vondra
Hi,

On 01/27/2018 10:09 PM, David Rowley wrote:
> On 27 January 2018 at 00:03, Tels  wrote:
>> Looking at the patch, at first I thought the order was sorted and you
>> swapped STORAGE and STATISTICS by accident. But then, it seems the order
>> is semi-random. Should that list be sorted or is it already sorted by some
>> criteria that I don't see?
>>
>> -  INCLUDING DEFAULTS INCLUDING IDENTITY INCLUDING
>> CONSTRAINTS INCLUDING INDEXES INCLUDING STORAGE INCLUDING
>> COMMENTS.
>> +  INCLUDING DEFAULTS INCLUDING IDENTITY INCLUDING
>> CONSTRAINTS INCLUDING INDEXES INCLUDING STORAGE INCLUDING STATISTICS
>> INCLUDING COMMENTS.
> 
> It looks like they were in order of how they're defined in enum
> TableLikeOption up until [1], then I'm not so sure what the new order
> is based on after that.
> 
> I'd offer to put it back to the order of the enum, but I want to
> minimise the invasiveness of the patch. I'm not sure yet if it should
> be classed as a bug fix or a new feature.
> 
> On looking at this I realised I missed changing the syntax synopsis.
> The attached adds this.
> 

Thanks for working on a patch. This should have been in the statistics
patch, no doubt about that.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Write lifetime hints for NVMe

2018-01-27 Thread Tomas Vondra

On 01/27/2018 08:06 PM, Dmitry Dolgov wrote:
>> On 27 January 2018 at 16:03, Tomas Vondra  
>> wrote:
>>
>> Aren't those numbers far lower that you'd expect from NVMe storage? I do
>> have a NVMe drive (Intel 750) in my machine, and I can do thousands of
>> transactions on it with two clients. Seems a bit suspicious.
> 
> Maybe an NVMe storage can provide much higher numbers in general, but there 
> are
> resource limitations from AWS itself. I was using c5.large, which is the
> smallest possible instance of type c5, so maybe that can explain absolute
> numbers - but anyway I can recheck, just in case if I missed something.
> 

According to [1] the C5 instances don't have actual NVMe devices (say,
storage in PCIe slot or connected using M.2) but EBS volumes exposed as
NVMe devices. That would certainly make explain the low IOPS numbers, as
EBS has built-in throttling. I don't know how much of the NVMe features
does this EBS variant support.

Amazon actually does provide instance types (f1 and i3) with real NVMe
devices. That's what I'd be testing.


I can do some testing on my system with NVMe storage, to see if there
really is any change thanks to the patch.

[1]
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/nvme-ebs-volumes.html

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] generated columns

2018-01-27 Thread Peter Eisentraut
On 1/26/18 12:46, Robert Haas wrote:
> On Thu, Jan 25, 2018 at 10:26 PM, Peter Eisentraut
>  wrote:
>>> Does the SQL spec mention the matter? How do other systems
>>> handle such cases?
>>
>> In Oracle you get the same overflow error.
> 
> That seems awful.  If a user says "SELECT * FROM tab" and it fails,
> how are they supposed to recover, or even understand what the problem
> is?  I think we should really try to at least generate an errcontext
> here:
> 
> ERROR:  integer out of range
> CONTEXT: while generating virtual column "b"
> 
> And maybe a hint, too, like "try excluding this column".

This is expanded in the rewriter, so there is no context like that.
This is exactly how views work, e.g.,

create table t1 (id int, length int);
create view v1 as select id, length * 10 as length_in_nanometers
from t1;
insert into t1 values (1, 5);
select * from v1;
ERROR:  integer out of range

I think this is not a problem in practice.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] GnuTLS support

2018-01-27 Thread Peter Eisentraut
On 1/25/18 09:07, Peter Eisentraut wrote:
> On 1/19/18 13:43, Peter Eisentraut wrote:
>> Comparing the existing {be,fe}-secure-openssl.c with the proposed
>> {be,fe}-secure-gnutls.c, and with half an eye on the previously proposed
>> Apple Secure Transport implementation, I have identified a few more
>> areas of refactoring that should be done in order to avoid excessive
>> copy-and-pasting in the new implementations:
> 
> And here is another place that needs cleaning up, where the OpenSSL API
> was used directly.

And here is another one.  The last one for now I think.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From ab0bd8f655a5c75c8040b462a9d2c0111fbf323a Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Sat, 27 Jan 2018 13:47:52 -0500
Subject: [PATCH] Refactor client-side SSL certificate checking code

Separate the parts specific to the SSL library from the general logic.

The previous code structure was

open_client_SSL()
calls verify_peer_name_matches_certificate()
calls verify_peer_name_matches_certificate_name()
calls wildcard_certificate_match()

and was completely in fe-secure-openssl.c.  The new structure is

open_client_SSL() [openssl]
calls pq_verify_peer_name_matches_certificate() [generic]
calls pgtls_verify_peer_name_matches_certificate_guts() [openssl]
calls openssl_verify_peer_name_matches_certificate_name() [openssl]
calls pq_verify_peer_name_matches_certificate_name() [generic]
calls wildcard_certificate_match() [generic]
---
 src/interfaces/libpq/fe-secure-openssl.c | 209 ---
 src/interfaces/libpq/fe-secure.c | 185 ++-
 src/interfaces/libpq/libpq-int.h |  20 +++
 3 files changed, 228 insertions(+), 186 deletions(-)

diff --git a/src/interfaces/libpq/fe-secure-openssl.c 
b/src/interfaces/libpq/fe-secure-openssl.c
index 9ab317320a..228ea5897a 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -60,9 +60,8 @@
 #endif
 #include 
 
-static bool verify_peer_name_matches_certificate(PGconn *);
 static int verify_cb(int ok, X509_STORE_CTX *ctx);
-static int verify_peer_name_matches_certificate_name(PGconn *conn,
+static int openssl_verify_peer_name_matches_certificate_name(PGconn *conn,

  ASN1_STRING *name,

  char **store_name);
 static void destroy_ssl_system(void);
@@ -492,76 +491,16 @@ verify_cb(int ok, X509_STORE_CTX *ctx)
 
 
 /*
- * Check if a wildcard certificate matches the server hostname.
- *
- * The rule for this is:
- * 1. We only match the '*' character as wildcard
- * 2. We match only wildcards at the start of the string
- * 3. The '*' character does *not* match '.', meaning that we match only
- *a single pathname component.
- * 4. We don't support more than one '*' in a single pattern.
- *
- * This is roughly in line with RFC2818, but contrary to what most browsers
- * appear to be implementing (point 3 being the difference)
- *
- * Matching is always case-insensitive, since DNS is case insensitive.
- */
-static int
-wildcard_certificate_match(const char *pattern, const char *string)
-{
-   int lenpat = strlen(pattern);
-   int lenstr = strlen(string);
-
-   /* If we don't start with a wildcard, it's not a match (rule 1 & 2) */
-   if (lenpat < 3 ||
-   pattern[0] != '*' ||
-   pattern[1] != '.')
-   return 0;
-
-   if (lenpat > lenstr)
-   /* If pattern is longer than the string, we can never match */
-   return 0;
-
-   if (pg_strcasecmp(pattern + 1, string + lenstr - lenpat + 1) != 0)
-
-   /*
-* If string does not end in pattern (minus the wildcard), we 
don't
-* match
-*/
-   return 0;
-
-   if (strchr(string, '.') < string + lenstr - lenpat)
-
-   /*
-* If there is a dot left of where the pattern started to 
match, we
-* don't match (rule 3)
-*/
-   return 0;
-
-   /* String ended with pattern, and didn't have a dot before, so we match 
*/
-   return 1;
-}
-
-/*
- * Check if a name from a server's certificate matches the peer's hostname.
- *
- * Returns 1 if the name matches, and 0 if it does not. On error, returns
- * -1, and sets the libpq error message.
- *
- * The name extracted from the certificate is returned in *store_name. The
- * caller is responsible for freeing it.
+ * OpenSSL-specific wrapper around
+ * pq_verify_peer_name_matches_certificate_name(), converting the ASN1_STRING
+ * into a plain C string.
  */
 static int
-verify_peer_name_matches_certificate_name(PGconn 

Re: \describe*

2018-01-27 Thread Vik Fearing
On 01/27/2018 05:39 PM, David Fetter wrote:
> On Fri, Jan 26, 2018 at 04:28:24PM +0100, Vik Fearing wrote:
>> On 01/26/2018 03:49 PM, David Fetter wrote:
>>> I propose that we do what at least MySQL, Oracle, and DB2 do and
>>> implement DESCRIBE as its own command.
>> Hard pass.
> 
> Would you be so kind as to expand on this?  "Pass" might indicate a
> lack of interest in doing the work, but "hard pass" seems to indicate
> that you have reasons the work should not be done.  Have I interpreted
> this correctly?

Andreas said it quite well.  I don't like having client commands look
like server commands.  I don't mind exceptions for "help" and "quit",
but I see no reason for anything more.
-- 
Vik Fearing  +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support



Re: STATISTICS retained in CREATE TABLE ... LIKE (INCLUDING ALL)?

2018-01-27 Thread Tom Lane
David Rowley  writes:
> I'd offer to put it back to the order of the enum, but I want to
> minimise the invasiveness of the patch. I'm not sure yet if it should
> be classed as a bug fix or a new feature.

FWIW, I'd call it a new feature.

I think the ordering of these items suffers greatly from "add new stuff
at the end no matter what" syndrome.  Feel free to design an ordering
that makes more sense, but then please apply it consistently.

regards, tom lane



Re: JIT compiling with LLVM v9.0

2018-01-27 Thread Andres Freund
Hi,

On 2018-01-26 22:52:35 -0800, Jeff Davis wrote:
> The version of LLVM that I tried this against had a linker option
> called "InternalizeLinkedSymbols" that would prevent the visibility
> problem you mention (assuming I understand you correctly).

I don't think they're fully solvable - you can't really internalize a
reference to a mutable static variable in another translation
unit. Unless you modify that translation unit, which doesn't work when
postgres running.


> That option is no longer there so I will have to figure out how to do
> it with the current LLVM API.

Look at the llvmjit_wrap.c code invoking FunctionImporter - that pretty
much does that.  I'll push a cleaned up version of that code sometime
this weekend (it'll then live in llvmjit_inline.cpp).


> > Afaict that's effectively what I've already implemented. We could export
> > more input as constants to the generated program, but other than that...
>
> I brought this up in the context of slot_compile_deform(). In your
> patch, you have code like:
>
> +   if (!att->attnotnull)
> +   {
> ...
> +   v_nullbyte = LLVMBuildLoad(
> +   builder,
> +   LLVMBuildGEP(builder, v_bits,
> +_nullbyteno, 1, 
> ""),
> +   "attnullbyte");
> +
> +   v_nullbit = LLVMBuildICmp(
> +   builder,
> +   LLVMIntEQ,
> +   LLVMBuildAnd(builder, v_nullbyte,
> v_nullbytemask, ""),
> +   LLVMConstInt(LLVMInt8Type(), 0, false),
> +   "attisnull");
> ...
>
> So it looks like you are reimplementing the generic code, but with
> conditional code gen. If the generic code changes, someone will need
> to read, understand, and change this code, too, right?

Right. Not that that's code that has changed that much...


> With my approach, then it would initially do *un*conditional code gen,
> and be less efficient and less specialized than the code generated by
> your current patch. But then it would link in the constant tupledesc,
> and optimize, and the optimizer will realize that they are constants
> (hopefully) and then cut out a lot of the dead code and specialize it
> to the given tupledesc.

Right.


> This places a lot of faith in the optimizer and I realize it may not
> happen as nicely with real code as it did with my earlier experiments.
> Maybe you already tried and you are saying that's a dead end? I'll
> give it a shot, though.

I did that, yes. There's two major downsides:

a) The code isn't as efficient as the handrolled code. The handrolled
   code e.g. can take into account that it doesn't need to access the
   NULL bitmap for a NOT NULL column and we don't need to check the
   tuple's number of attributes if there's a following NOT NULL
   attribute. Those safe a good number of cycles.

b) The optimizations to take advantage of the constants and make the
   code faster with the constant tupledesc is fairly slow (you pretty
   much need at least an -O2 equivalent), whereas the handrolled tuple
   deforming is faster than the slot_getsomeattrs with just a single,
   pretty cheap, mem2reg pass.  We're talking about ~1ms vs 70-100ms in
   a lot of cases.  The optimizer often will not actually unroll the
   loop with many attributes despite that being beneficial.

I think in most cases using the approach you advocate makes sense, to
avoid duplication, but tuple deforming is such a major bottleneck that I
think it's clearly worth doing it manually. Being able to use llvm with
just a always-inline and a mem2reg pass makes it so much more widely
applicable than doing the full inlining and optimization work.


> >> I experimented a bit before and it works for basic cases, but I'm not
> >> sure if it's as good as your hand-generated LLVM.
> >
> > For deforming it doesn't even remotely get as good in my experiments.
>
> I'd like some more information here -- what didn't work? It didn't
> recognize constants? Or did recognize them, but didn't optimize as
> well as you did by hand?

It didn't optimize as well as I did by hand, without significantly
complicating (and slowing) the originating the code. It sometimes
decided not to unroll the loop, and it takes a *lot* longer than the
direct emission of the code.

I'm hoping to work on making more of the executor JITed, and there I do
think it's largely going to be what you're proposing, due to the sheer
mass of code.

Greetings,

Andres Freund



Re: STATISTICS retained in CREATE TABLE ... LIKE (INCLUDING ALL)?

2018-01-27 Thread David Rowley
On 27 January 2018 at 00:03, Tels  wrote:
> Looking at the patch, at first I thought the order was sorted and you
> swapped STORAGE and STATISTICS by accident. But then, it seems the order
> is semi-random. Should that list be sorted or is it already sorted by some
> criteria that I don't see?
>
> -  INCLUDING DEFAULTS INCLUDING IDENTITY INCLUDING
> CONSTRAINTS INCLUDING INDEXES INCLUDING STORAGE INCLUDING
> COMMENTS.
> +  INCLUDING DEFAULTS INCLUDING IDENTITY INCLUDING
> CONSTRAINTS INCLUDING INDEXES INCLUDING STORAGE INCLUDING STATISTICS
> INCLUDING COMMENTS.

It looks like they were in order of how they're defined in enum
TableLikeOption up until [1], then I'm not so sure what the new order
is based on after that.

I'd offer to put it back to the order of the enum, but I want to
minimise the invasiveness of the patch. I'm not sure yet if it should
be classed as a bug fix or a new feature.

On looking at this I realised I missed changing the syntax synopsis.
The attached adds this.

[1] 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=3217327053638085d24dd4d276e7c1f7ac2c4c6b

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


create_table_like_statistics_v2.patch
Description: Binary data


Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

2018-01-27 Thread Daniel Gustafsson
> On 27 Jan 2018, at 00:36, Tom Lane  wrote:
> 
> I've pushed this mostly as-is.  

Thanks!

> I also took out the parser changes related to
> allowing unquoted PARALLEL in old-style CREATE AGGREGATE, because that
> is not a goal I consider worthy of adding extra grammar complexity.
> We don't document that PARALLEL works there, and there has never been
> any expectation that deprecated legacy syntax would grow new options
> as needed to have feature parity with the modern syntax.

Ok, didn’t know old syntax wasn’t extended to support new options so I went
after it having run into the regress tests using it.

> I also trimmed the new regression test cases a bit, as most of them seemed
> pretty duplicative.

Makes sense, they were made verbose to aid review but were too chatty for
inclusion.

cheers ./daniel


Re: [HACKERS] PoC: custom signal handler for extensions

2018-01-27 Thread Arthur Zakirov
Hello,

On Mon, Jan 22, 2018 at 02:34:58PM +0300, Maksim Milyutin wrote:
> ...
> I have attached a new version of patch and updated version of
> remote_effective_user function implementation that demonstrates the usage of
> custom signals API.

Thank you.

The patch is applied and build.

> +/*
> + * UnregisterCustomProcSignal
> + *  Release slot of specific custom signal.
> + *
> + * This function have to be called in _PG_init or _PG_fini functions of
> + * extensions at the stage of loading shared preloaded libraries. Otherwise 
> it
> + * throws fatal error. Also it throws fatal error if argument is not valid
> + * custom signal.
> + */
> +void
> +UnregisterCustomProcSignal(ProcSignalReason reason)
> +{
> + if (!process_shared_preload_libraries_in_progress)
> + ereport(FATAL, (errcode(ERRCODE_INTERNAL_ERROR),
> + errmsg("cannot unregister 
> custom signal after startup")));

Is there actual need in UnregisterCustomProcSignal() within _PG_init()?
An extension registers a handler and then unregister it doing
nothing. It seems useless.

Also process_shared_preload_libraries_in_progress within _PG_fini() will
be false I think. _PG_fini() won't be called though, because
implementation of internal_unload_library() is disabled.

Actually, is there need in UnregisterCustomProcSignal() at all? It
unregisters a handler only in current backend, for actual unergistering
we need to call it everywhere, if I'm not mistaken.

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: Write lifetime hints for NVMe

2018-01-27 Thread Dmitry Dolgov
> On 27 January 2018 at 16:03, Tomas Vondra  
> wrote:
>
> Aren't those numbers far lower that you'd expect from NVMe storage? I do
> have a NVMe drive (Intel 750) in my machine, and I can do thousands of
> transactions on it with two clients. Seems a bit suspicious.

Maybe an NVMe storage can provide much higher numbers in general, but there are
resource limitations from AWS itself. I was using c5.large, which is the
smallest possible instance of type c5, so maybe that can explain absolute
numbers - but anyway I can recheck, just in case if I missed something.



Re: \describe*

2018-01-27 Thread David Fetter
On Fri, Jan 26, 2018 at 04:28:24PM +0100, Vik Fearing wrote:
> On 01/26/2018 03:49 PM, David Fetter wrote:
> > I propose that we do what at least MySQL, Oracle, and DB2 do and
> > implement DESCRIBE as its own command.
> Hard pass.

Would you be so kind as to expand on this?  "Pass" might indicate a
lack of interest in doing the work, but "hard pass" seems to indicate
that you have reasons the work should not be done.  Have I interpreted
this correctly?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Add RANGE with values and exclusions clauses to the Window Functions

2018-01-27 Thread Tom Lane
Oliver Ford  writes:
> On Sat, Jan 27, 2018 at 7:40 AM, Erik Rijkers  wrote:
>> Regression tests only succeed for assert-disabled compiles; they fail when
>> assert-enabled:

> Problem seems to be with an existing Assert in catcache.c:1545:
> Assert(nkeys > 0 && nkeys < cache->cc_nkeys);
> The "<" needs to be "<=" (and is changed in the attached patch).

No, that Assert is correct, because it's in SearchCatCacheList.
It doesn't make any sense to use SearchCatCacheList for a lookup
that specifies all of the key columns, because then you necessarily
have at most one match; you might as well use regular SearchCatCache,
which is significantly more efficient.

> AFAICT this was never a problem before purely because no code before
> this patch called SearchSysCacheList4, so they always called with
> fewer keys than the number available.

Hm, probably we shouldn't even have that macro.

regards, tom lane



Re: Write lifetime hints for NVMe

2018-01-27 Thread Tomas Vondra


On 01/27/2018 02:20 PM, Dmitry Dolgov wrote:
> Hi,
> 
> From what I see some time ago the write lifetime hints support for NVMe multi
> streaming was merged into Linux kernel [1]. Theoretically it allows data
> written together on media so they can be erased together, which minimizes
> garbage collection, resulting in reduced write amplification as well as
> efficient flash utilization [2]. I couldn't find any discussion about that on
> hackers, so I decided to experiment with this feature a bit. My idea was to
> test quite naive approach when all file descriptors, that are related to
> temporary files, have assigned `RWH_WRITE_LIFE_SHORT`, and rest of them
> `RWH_WRITE_LIFE_EXTREME`. Attached patch is a dead simple POC without any
> infrastructure around to enable/disable hints.
> 
> It turns out that it's possible to perform benchmarks on some EC2 instance
> types (e.g. c5) with the corresponding version of the kernel, since they 
> expose
> a volume as nvme device:
> 
> ```
> # nvme list
> Node SN   Model
> Namespace Usage  Format   FW Rev
>  
>  -
> --  
> /dev/nvme0n1 vol01cdbc7ec86f17346 Amazon Elastic Block Store
> 1   0.00   B /   8.59  GB512   B +  0 B   1.0
> ```
> 
> To get some baseline results I've run several rounds of pgbench on these quite
> modest instances (dedicated, with optimized EBS) with slightly adjusted
> `max_wal_size` and with default configuration:
> 
> $ pgbench -s 200 -i
> $ pgbench -T 600 -c 2 -j 2
> 
> Analyzing `strace` output I can see that during this test there were some
> significant number of operations with pg_stat_tmp and xlogtemp, so I assume
> write lifetime hints should have some effect.
> 
> As a result I've got reduction of latency about 5-8% (but so far these numbers
> are unstable, probably because of virtualization).
> 
> ```
> # without patch
> number of transactions actually processed: 491945
> latency average = 2.439 ms
> tps = 819.906323 (including connections establishing)
> tps = 819.908755 (excluding connections establishing)
> ```
> 
> ```
> with patch
> number of transactions actually processed: 521805
> latency average = 2.300 ms
> tps = 869.665330 (including connections establishing)
> tps = 869.668026 (excluding connections establishing)
> ```
> 

Aren't those numbers far lower that you'd expect from NVMe storage? I do
have a NVMe drive (Intel 750) in my machine, and I can do thousands of
transactions on it with two clients. Seems a bit suspicious.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] proposal: psql command \graw

2018-01-27 Thread Pavel Stehule
2018-01-27 15:22 GMT+01:00 Daniel Verite :

> Pavel Stehule wrote:
>
> > We are able to generate html/tex/markdown formats on client side. CSV is
> > more primitive, but much more important data format, so it should not be
> a
> > problem. But I remember some objections related to code duplication.
>
> While experimenting with adding csv as an output format
> (similar to unaligned except that it quotes with CSV rules), I find out
> that we can already do this:
>
> \o | client-side-program with arguments
> or
> \o /path/to/file.csv
> COPY (select query) TO STDOUT CSV [HEADER] ;
> \o
>
> This can be used to route multiple resultsets into multiple programs
> or files without the minor drawbacks of \copy (single line
> formatting, no variables interpolation), or
> psql -c "COPY (query) TO STDOUT.." >file.csv which is a bit rigid
> (single redirect).
>
>
> OTOH I notice that this simpler form with a right-side argument of \g:
>  =# COPY (select 1) TO STDOUT CSV HEADER \g /tmp/file.csv
> outputs to the console rather into a file. Not sure why it acts
> differently than \o and whether it's intentional.
>
> Anyway I think the above sequence with \o already does everything
> that is being proposed as an alternative to \graw, or am I missing
> something?
>

There is one difference, hard to say how much is important - you have to
wrap your query to COPY command. That is not too friendly.

Regards

Pavel



>
>
> What \o + COPY CSV lacks compared to a csv output format
> is the ability to export results from meta-commands in csv,
> such as \g in \x mode or \gx, \crosstabview, \d, \l...
>
> The other problem with \o |program + COPY CSV is that it's
> not easy to discover that combination. It's more obvious if
> we have csv explicitly listed as an output format.
>
> I'll post the patch implementing that in a new thread and we'll see
> how it goes.
>
>
> Best regards,
> --
> Daniel Vérité
> PostgreSQL-powered mailer: http://www.manitou-mail.org
> Twitter: @DanielVerite
>


Re: [HACKERS] proposal: psql command \graw

2018-01-27 Thread Daniel Verite
Pavel Stehule wrote:

> We are able to generate html/tex/markdown formats on client side. CSV is
> more primitive, but much more important data format, so it should not be a
> problem. But I remember some objections related to code duplication.

While experimenting with adding csv as an output format
(similar to unaligned except that it quotes with CSV rules), I find out
that we can already do this:

\o | client-side-program with arguments
or
\o /path/to/file.csv
COPY (select query) TO STDOUT CSV [HEADER] ;
\o

This can be used to route multiple resultsets into multiple programs
or files without the minor drawbacks of \copy (single line
formatting, no variables interpolation), or
psql -c "COPY (query) TO STDOUT.." >file.csv which is a bit rigid
(single redirect).


OTOH I notice that this simpler form with a right-side argument of \g:
 =# COPY (select 1) TO STDOUT CSV HEADER \g /tmp/file.csv
outputs to the console rather into a file. Not sure why it acts
differently than \o and whether it's intentional.

Anyway I think the above sequence with \o already does everything
that is being proposed as an alternative to \graw, or am I missing
something?


What \o + COPY CSV lacks compared to a csv output format
is the ability to export results from meta-commands in csv,
such as \g in \x mode or \gx, \crosstabview, \d, \l...

The other problem with \o |program + COPY CSV is that it's
not easy to discover that combination. It's more obvious if
we have csv explicitly listed as an output format.

I'll post the patch implementing that in a new thread and we'll see
how it goes.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Write lifetime hints for NVMe

2018-01-27 Thread Dmitry Dolgov
Hi,

>From what I see some time ago the write lifetime hints support for NVMe multi
streaming was merged into Linux kernel [1]. Theoretically it allows data
written together on media so they can be erased together, which minimizes
garbage collection, resulting in reduced write amplification as well as
efficient flash utilization [2]. I couldn't find any discussion about that on
hackers, so I decided to experiment with this feature a bit. My idea was to
test quite naive approach when all file descriptors, that are related to
temporary files, have assigned `RWH_WRITE_LIFE_SHORT`, and rest of them
`RWH_WRITE_LIFE_EXTREME`. Attached patch is a dead simple POC without any
infrastructure around to enable/disable hints.

It turns out that it's possible to perform benchmarks on some EC2 instance
types (e.g. c5) with the corresponding version of the kernel, since they expose
a volume as nvme device:

```
# nvme list
Node SN   Model
Namespace Usage  Format   FW Rev
 
 -
--  
/dev/nvme0n1 vol01cdbc7ec86f17346 Amazon Elastic Block Store
1   0.00   B /   8.59  GB512   B +  0 B   1.0
```

To get some baseline results I've run several rounds of pgbench on these quite
modest instances (dedicated, with optimized EBS) with slightly adjusted
`max_wal_size` and with default configuration:

$ pgbench -s 200 -i
$ pgbench -T 600 -c 2 -j 2

Analyzing `strace` output I can see that during this test there were some
significant number of operations with pg_stat_tmp and xlogtemp, so I assume
write lifetime hints should have some effect.

As a result I've got reduction of latency about 5-8% (but so far these numbers
are unstable, probably because of virtualization).

```
# without patch
number of transactions actually processed: 491945
latency average = 2.439 ms
tps = 819.906323 (including connections establishing)
tps = 819.908755 (excluding connections establishing)
```

```
with patch
number of transactions actually processed: 521805
latency average = 2.300 ms
tps = 869.665330 (including connections establishing)
tps = 869.668026 (excluding connections establishing)
```

So I have a few questions:

* Does it sound interesting and worthwhile to create a proper patch?

* Maybe someone else has similar results?

* Any suggestions about what can be the best/worst case scenarios of using such
  kind of hints?


[1]: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c75b1d9421f80f4143e389d2d50ddfc8a28c8c35
[2]: 
https://regmedia.co.uk/2016/09/23/0_storage-intelligence-prodoverview-2015-0.pdf


nvme_write_lifetime_poc.patch
Description: Binary data


Re: Setting BLCKSZ 4kB

2018-01-27 Thread Tomas Vondra


On 01/27/2018 05:01 AM, Bruce Momjian wrote:
> On Fri, Jan 26, 2018 at 11:53:33PM +0100, Tomas Vondra wrote:
>>
>> ...
>>
>> FWIW even if it's not save in general, it would be useful to
>> understand what are the requirements to make it work. I mean,
>> conditions that need to be met on various levels (sector size of
>> the storage device, page size of of the file system, filesystem
>> alignment, ...).
> 
> I think you are fine as soon the data arrives at the durable
> storage, and assuming the data can't be partially written to durable
> storage. I was thinking more of a case where you have a file system,
> a RAID card without a BBU, and then magnetic disks. In that case,
> even if the file system were to write in 4k chunks, the RAID
> controller would also need to do the same, and with the same
> alignment. Of course, that's probably a silly example since there is
> probably no way to atomically write 4k to a magnetic disk.
> 
> Actually, what happens if a 4k write is being written to an SSD and
> the server crashes. Is the entire write discarded?
> 

AFAIK it's not possible to end up with a partial write, particularly not
such that would contain a mix of old and new data - that's because SSDs
can't overwrite a block without erasing it first.

So the write should either succeed or fail as a whole, depending on when
exactly the server crashes - it might be right before confirming the
flush back to the client, for example. That assumes the drive has 4kB
sectors (internal pages) - on drives with volatile write cache but
supporting write barriers and cache flushes. On drives with non-volatile
write cache (so with battery/capacitor) it should always succeed and
never get discarded.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Add RANGE with values and exclusions clauses to the Window Functions

2018-01-27 Thread Erik Rijkers

On 2018-01-27 11:49, Oliver Ford wrote:

On Sat, Jan 27, 2018 at 7:40 AM, Erik Rijkers  wrote:

On 2018-01-27 00:35, Oliver Ford wrote:


Attached patch implements an extensible version of the RANGE with
values clause. It doesn't actually add any more type support than was


[...]


I've tested that the existing regression tests in previous versions
still pass, and also added new tests for descending mode.



Hi,

Regression tests only succeed for assert-disabled compiles; they fail 
when

assert-enabled:

I used (Centos 6.9):


Could you please try the attached version? It works for me with asserts 
enabled.




[0001-window-frame-v8.patch]

Yes, that fixed it, thanks.


Problem seems to be with an existing Assert in catcache.c:1545:

Assert(nkeys > 0 && nkeys < cache->cc_nkeys);

The "<" needs to be "<=" (and is changed in the attached patch).
AFAICT this was never a problem before purely because no code before
this patch called SearchSysCacheList4, so they always called with
fewer keys than the number available. But it's surely correct to
assert that the number of keys supplied is less than or equal to, not
less than, the number of keys in the cache.




Re: General purpose hashing func in pgbench

2018-01-27 Thread Ildar Musin
Hello Fabien,


26/01/2018 09:28, Fabien COELHO пишет:
>
> Hello Ildar,
>
> Applies, compiles, runs.
>
> I still have a few very minor comments, sorry for this (hopefully)
> last iteration from my part. I'm kind of iterative...
>
> The XML documentation source should avoid a paragraph on one very long
> line, but rather be indented like other sections.
>
> I'd propose simplify the second part:
>
>   Hash functions can be used, for example, to modify the distribution of
>   random_zipfian or
> random_exponential
>   functions in order to obtain scattered distribution.
>   Thus the following pgbench script simulates possible real world
> workload
>   typical for social media and blogging platforms where few accounts
>   generate excessive load:
>
> ->
>
>   Hash functions can be used to scatter the distribution of random
>   functions such as random_zipfian or
>   random_exponential.
>   For instance, the following pgbench script simulates possible real
>   world workload typical for social media and blogging platforms where
>   few accounts generate excessive load:
>
> Comment the Assert(0) as an internal error that cannot happen.
>
> I'd suggest to compact the execution code by declaring int64 variable
> and coerce to int in one go, like the integer bitwise functions. I'm
> in favor to keeping them in their own case and not reuse this one.
>
I did everything you mention here and attached a new version on the patch.
Thank you!

-- 
Ildar Musin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company 

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 3dd492c..503dd79 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -874,13 +874,18 @@ pgbench  options 
 d
 
  
   
-scale 
-   current scale factor
+client_id 
+   unique number identifying the client session (starts from 
zero)
   
 
   
-client_id 
-   unique number identifying the client session (starts from 
zero)
+default_seed 
+   random seed used in hash functions by default
+  
+
+  
+scale 
+   current scale factor
   
  
 
@@ -1246,6 +1251,27 @@ pgbench  options 
 d
5
   
   
+   hash(a [, 
seed ] )
+   integer
+   alias for hash_murmur2()
+   hash(10, 5432)
+   -5817877081768721676
+  
+  
+   hash_fnv1a(a [, 
seed ] )
+   integer
+   https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function;>FNV-1a
 hash
+   hash_fnv1a(10, 5432)
+   -7793829335365542153
+  
+  
+   hash_murmur2(a [, 
seed ] )
+   integer
+   https://en.wikipedia.org/wiki/MurmurHash;>MurmurHash2 hash
+   hash_murmur2(10, 5432)
+   -5817877081768721676
+  
+  

int(x)
integer
cast to int
@@ -1423,6 +1449,30 @@ f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) 
/

 
   
+Hash functions hash, hash_murmur2 and
+hash_fnv1a accept an input value and an optional seed 
parameter.
+In case the seed isn't provided the value of 
:default_seed
+is used, which is initialized randomly unless set by the command-line
+-D option. Hash functions can be used to scatter the
+distribution of random functions such as random_zipfian 
or
+random_exponential. For instance, the following pgbench
+script simulates possible real world workload typical for social media and
+blogging platforms where few accounts generate excessive load:
+
+
+\set r random_zipfian(0, 1, 1.07)
+\set k abs(hash(:r)) % 100
+
+
+In some cases several distinct distributions are needed which don't 
correlate with each other and this is when implicit seed parameter comes in 
handy:
+
+
+\set k1 abs(hash(:r), :default_seed + 123) % 100
+\set k2 abs(hash(:r), :default_seed + 321) % 100
+
+  
+
+  
As an example, the full definition of the built-in TPC-B-like
transaction is:
 
diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y
index e23ca51..fc42c47 100644
--- a/src/bin/pgbench/exprparse.y
+++ b/src/bin/pgbench/exprparse.y
@@ -16,6 +16,10 @@
 
 #include "pgbench.h"
 
+#define PGBENCH_NARGS_VARIABLE (-1)
+#define PGBENCH_NARGS_CASE (-2)
+#define PGBENCH_NARGS_HASH (-3)
+
 PgBenchExpr *expr_parse_result;
 
 static PgBenchExprList *make_elist(PgBenchExpr *exp, PgBenchExprList *list);
@@ -226,9 +230,13 @@ make_uop(yyscan_t yyscanner, const char *operator, 
PgBenchExpr *expr)
 /*
  * List of available functions:
  * - fname: function name, "!..." for special internal functions
- * - nargs: number of arguments
- * -1 is a special value for least & greatest meaning 
#args >= 1
- * -2 is for the "CASE WHEN ..." function, which has #args 
>= 3 and odd
+ * - nargs: number of arguments. Special cases:
+ * - PGBENCH_NARGS_VARIABLE is a special value for least 

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-01-27 Thread Amit Kapila
On Fri, Jan 26, 2018 at 12:36 PM, Amit Kapila  wrote:
> On Fri, Jan 26, 2018 at 12:00 PM, Peter Geoghegan  wrote:
>> On Thu, Jan 25, 2018 at 10:00 PM, Amit Kapila  
>> wrote:
 At this point, my preferred solution is for someone to go implement
 Amit's WaitForParallelWorkersToAttach() idea [1] (Amit himself seems
 like the logical person for the job).

>>>
>>> I can implement it and share a prototype patch with you which you can
>>> use to test parallel sort stuff.
>>
>> That would be great. Thank you.
>>
>>> I would like to highlight the
>>> difference which you will see with WaitForParallelWorkersToAttach as
>>> compare to WaitForParallelWorkersToFinish() is that the former will
>>> give you how many of nworkers_launched workers are actually launched
>>> whereas latter gives an error if any of the expected workers is not
>>> launched.  I feel former is good and your proposed way of calling it
>>> after the leader is done with its work has alleviated the minor
>>> disadvantage of this API which is that we need for workers to startup.
>>
>> I'm not sure that it makes much difference, though, since in the end
>> WaitForParallelWorkersToFinish() is called anyway, much like
>> nodeGather.c. Have I missed something?
>>
>
> Nopes, you are right.  I had in my mind that if we have something like
> what I am proposing, then we don't even need to detect failures in
> WaitForParallelWorkersToFinish and we can finish the work without
> failing.
>
>> I had imagined that WaitForParallelWorkersToAttach() would give me an
>> error in the style of WaitForParallelWorkersToFinish(), without
>> actually waiting for the parallel workers to finish.
>>
>
> I think that is also doable.  I will give it a try and report back if
> I see any problem with it.
>

I have posted the patch for the above API and posted it on a new
thread [1].  Do let me know either here or on that thread if the patch
suffices your need?

[1] - 
https://www.postgresql.org/message-id/CAA4eK1%2Be2MzyouF5bg%3DOtyhDSX%2B%3DAo%3D3htN%3DT-r_6s3gCtKFiw%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Wait for parallel workers to attach

2018-01-27 Thread Amit Kapila
During the recent development of parallel operation (parallel create
index)[1], a need has been arised for $SUBJECT.  The idea is to allow
leader backend to rely on number of workers that are successfully
started.  This API allows leader to wait for all the workers to start
or fail even if one of the workers fails to attach.  We consider
workers started/attached once they are attached to error queue.  This
will ensure that any error after the workers are attached won't be
silently ignored by leader.

I have used wait event as WAIT_EVENT_BGWORKER_STARTUP similar to
WaitForReplicationWorkerAttach, but we might want to change it.

I have tested this patch by calling this API in nodeGather.c and then
introducing failuires at various places: (a) induce fork failure for
background workers (force_fork_failure_v1.patch), (b) Exit parallel
worker before attaching to the error queue
(exit_parallel_worker_before_error_queue_attach_v1.patch) and (c) Exit
parallel worker after attaching to the error queue
(exit_parallel_worker_after_error_queue_attach_v1.patch).

In all above cases, I got the errors as expected.

[1] - 
https://www.postgresql.org/message-id/CAA4eK1KgmdT3ivm1vG%2BrJzKOKeYQU2XLhp6ma5LMHxaG89brsA%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


WaitForParallelWorkersToAttach_v1.patch
Description: Binary data


modify_gather_to_wait_for_attach_v1.patch
Description: Binary data


force_fork_failure_v1.patch
Description: Binary data


exit_parallel_worker_before_error_queue_attach_v1.patch
Description: Binary data


exit_parallel_worker_after_error_queue_attach_v1.patch
Description: Binary data