Re: JIT compiling with LLVM v9.0

2018-01-26 Thread Jeff Davis
Hi,

On Fri, Jan 26, 2018 at 6:40 PM, Andres Freund  wrote:
>> I would like to see if we can get a combination of JIT and LTO to work
>> together to specialize generic code at runtime.
>
> Well, LTO can't quite work. It relies on being able to mark code in
> modules linked together as externally visible - and cleary we can't do
> that for a running postgres binary. At least in all incarnations I'm
> aware of.  But that's why the tree I posted supports inlining of code.

I meant a more narrow use of LTO: since we are doing linking in step
#4 and optimization in step #5, it's optimizing the code after
linking, which is a kind of LTO (though perhaps I'm misusing the
term?).

The version of LLVM that I tried this against had a linker option
called "InternalizeLinkedSymbols" that would prevent the visibility
problem you mention (assuming I understand you correctly). That option
is no longer there so I will have to figure out how to do it with the
current LLVM API.

> Afaict that's effectively what I've already implemented. We could export
> more input as constants to the generated program, but other than that...

I brought this up in the context of slot_compile_deform(). In your
patch, you have code like:

+   if (!att->attnotnull)
+   {
...
+   v_nullbyte = LLVMBuildLoad(
+   builder,
+   LLVMBuildGEP(builder, v_bits,
+_nullbyteno, 1, ""),
+   "attnullbyte");
+
+   v_nullbit = LLVMBuildICmp(
+   builder,
+   LLVMIntEQ,
+   LLVMBuildAnd(builder, v_nullbyte,
v_nullbytemask, ""),
+   LLVMConstInt(LLVMInt8Type(), 0, false),
+   "attisnull");
...

So it looks like you are reimplementing the generic code, but with
conditional code gen. If the generic code changes, someone will need
to read, understand, and change this code, too, right?

With my approach, then it would initially do *un*conditional code gen,
and be less efficient and less specialized than the code generated by
your current patch. But then it would link in the constant tupledesc,
and optimize, and the optimizer will realize that they are constants
(hopefully) and then cut out a lot of the dead code and specialize it
to the given tupledesc.

This places a lot of faith in the optimizer and I realize it may not
happen as nicely with real code as it did with my earlier experiments.
Maybe you already tried and you are saying that's a dead end? I'll
give it a shot, though.

> Now the JITed expressions tree currently makes it hard for LLVM to
> recognize some constant input as constant, but what's largely needed for
> that to be better is some improvements in where temporary values are
> stored (should be in alloca's rather than local memory, so mem2reg can
> do its thing).  It's a TODO... Right now LLVM will figure out constant
> inputs to non-strict functions, but not strict ones, but after fixing
> some of what I've mentioned previously it works pretty universally.
>
>
> Have I misunderstood adn there's some significant functional difference?

I'll try to explain with code, and then we can know for sure ;-)

Sorry for the ambiguity, I'm probably misusing a few terms.

>> I experimented a bit before and it works for basic cases, but I'm not
>> sure if it's as good as your hand-generated LLVM.
>
> For deforming it doesn't even remotely get as good in my experiments.

I'd like some more information here -- what didn't work? It didn't
recognize constants? Or did recognize them, but didn't optimize as
well as you did by hand?

Regards,
  Jeff Davis



Re: JIT compiling with LLVM v9.0

2018-01-26 Thread Jeff Davis
On Wed, Jan 24, 2018 at 11:02 PM, Andres Freund  wrote:
> Not entirely sure what you mean. You mean why I don't inline
> slot_getsomeattrs() etc and instead generate code manually?  The reason
> is that the generated code is a *lot* smarter due to knowing the
> specific tupledesc.

I would like to see if we can get a combination of JIT and LTO to work
together to specialize generic code at runtime.

Let's say you have a function f(int x, int y, int z). You want to be
able to specialize it on y at runtime, so that a loop gets unrolled in
the common case where y is small.

1. At build time, create bitcode for the generic implementation of f().
2. At run time, load the generic bitcode into a module (let's call it
the "generic module")
3. At run time, create a new module (let's call it the "bind module")
that only does the following things:
   a. declares a global variable bind_y, and initialize it to the value 3
   b. declares a wrapper function f_wrapper(int x, int z), and all the
function does is call f(x, bind_y, z)
4. Link the generic module and the bind module together (let's call
the result the "linked module")
5. Optimize the linked module

After sorting out a few details about symbols and inlining, what will
happen is that the generic f() will be inlined into f_wrapper, and it
will see that bind_y is a constant, and then unroll a "for" loop over
y.

I experimented a bit before and it works for basic cases, but I'm not
sure if it's as good as your hand-generated LLVM.

If we can make this work, it would be a big win for
readability/maintainability. The hand-generated LLVM is limited to the
bind module, which is very simple, and doesn't need to be changed when
the implementation of f() changes.

Regards,
 Jeff Davis



Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-01-26 Thread Robert Haas
On Fri, Jan 26, 2018 at 6:40 PM, Thomas Munro
 wrote:
> Thanks for looking into this.  Yeah.  I think you're right that it
> could add a bit of overhead in some cases (ie if you receive a lot of
> signals that AREN'T caused by fork failure, then you'll enter
> HandleParallelMessage() every time unnecessarily), and it does feel a
> bit kludgy.  The best idea I have to fix that so far is like this: (1)
> add a member fork_failure_count to struct BackgroundWorkerArray, (2)
> in do_start_bgworker() whenever fork fails, do
> ++BackgroundWorkerData->fork_failure_count (ie before a signal is sent
> to the leader), (3) in procsignal_sigusr1_handler where we normally do
> a bunch of CheckProcSignal(PROCSIG_XXX) stuff, if
> (BackgroundWorkerData->fork_failure_count !=
> last_observed_fork_failure_count) HandleParallelMessageInterrupt().
> As far as I know, as long as fork_failure_count is (say) int32 (ie not
> prone to tearing) then no locking is required due to the barriers
> implicit in the syscalls involved there.  This is still slightly more
> pessimistic than it needs to be (the failed fork may be for someone
> else's ParallelContext), but only in rare cases so it would be
> practically as good as precise PROCSIG delivery.  It's just that we
> aren't allowed to deliver PROCSIGs from the postmaster.  We are
> allowed to communicate through BackgroundWorkerData, and there is a
> precedent for cluster-visible event counters in there already.

I could sign on to that plan, but I don't think we should hold this
patch up for it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Removing WITH clause support in CREATE FUNCTION, for isCachable and isStrict

2018-01-26 Thread Michael Paquier
On Fri, Jan 26, 2018 at 12:30:12PM -0500, Tom Lane wrote:
> And done.  I failed to resist the temptation to rename
> compute_attributes_sql_style, since the "sql_style" bit no longer
> conveys anything.  I'd always found compute_attributes_with_style
> to be confusingly named --- seemed like it should have a sibling
> compute_attributes_gracelessly, or something like that.

Nice, thanks for the commit and review!
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-01-26 Thread Thomas Munro
On Fri, Jan 26, 2018 at 7:30 PM, Peter Geoghegan  wrote:
> On Thu, Jan 25, 2018 at 10:00 PM, Amit Kapila  wrote:
>> However, now I see that you and Thomas are trying to find a different
>> way to overcome this problem differently, so not sure if I should go
>> ahead or not.  I have seen that you told you wanted to look at
>> Thomas's proposed stuff carefully tomorrow, so I will wait for you
>> guys to decide which way is appropriate.
>
> I suspect that the overhead of Thomas' experimental approach is going
> to causes problems in certain cases. Cases that are hard to foresee.
> That patch makes HandleParallelMessages() set ParallelMessagePending
> artificially, pending confirmation of having launched all workers.
>
> It was an interesting experiment, but I think that your
> WaitForParallelWorkersToAttach() idea has a better chance of working
> out.

Thanks for looking into this.  Yeah.  I think you're right that it
could add a bit of overhead in some cases (ie if you receive a lot of
signals that AREN'T caused by fork failure, then you'll enter
HandleParallelMessage() every time unnecessarily), and it does feel a
bit kludgy.  The best idea I have to fix that so far is like this: (1)
add a member fork_failure_count to struct BackgroundWorkerArray, (2)
in do_start_bgworker() whenever fork fails, do
++BackgroundWorkerData->fork_failure_count (ie before a signal is sent
to the leader), (3) in procsignal_sigusr1_handler where we normally do
a bunch of CheckProcSignal(PROCSIG_XXX) stuff, if
(BackgroundWorkerData->fork_failure_count !=
last_observed_fork_failure_count) HandleParallelMessageInterrupt().
As far as I know, as long as fork_failure_count is (say) int32 (ie not
prone to tearing) then no locking is required due to the barriers
implicit in the syscalls involved there.  This is still slightly more
pessimistic than it needs to be (the failed fork may be for someone
else's ParallelContext), but only in rare cases so it would be
practically as good as precise PROCSIG delivery.  It's just that we
aren't allowed to deliver PROCSIGs from the postmaster.  We are
allowed to communicate through BackgroundWorkerData, and there is a
precedent for cluster-visible event counters in there already.

I think you should proceed with Amit's plan.  If we ever make a plan
like the above work in future, it'd render that redundant by turning
every CFI() into a cancellation point for fork failure, but I'm not
planning to investigate further given the muted response to my
scheming in this area so far.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Setting BLCKSZ 4kB

2018-01-26 Thread Andres Freund
Hi,

On 2018-01-27 00:28:07 +0100, Tomas Vondra wrote:
> But does that make the internal page size relevant to the atomicity
> question? For example, let's say we write 4kB on a drive with 2kB
> internal pages, and the power goes out after writing the first 2kB of
> data (so losing the second 2kB get lost). The disk however never
> confirmed the 4kB write, exactly because of the writer barrier ...

That would be problematic, yes. That's *precisely* the torn page issue
we're worried about re full page writes.  Consider, as just one of many
examples, crashing during WAL apply, the first half of the page might be
new, the other old - we'd skip the next time we try apply because the
LSN in the page would indicate it's new enough. With FPWs that doesn't
happen because the first time through we'll reapply the whole write.


> I have to admit I'm not sure what happens at this point - whether the
> drive will produce torn page (with the first 2kB updated and 2kB old),
> or if it's smart enough to realize the write barrier was not reached.

I don't think you can rely on anything.


> But perhaps this (non-volatile write cache) is one of the requirements
> for disabling full page writes?

I don't think that's reliably doable due to the limited knowledge about
what exactly happens inside each and every model of drive.

Greetings,

Andres Freund



Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

2018-01-26 Thread Tom Lane
I've pushed this mostly as-is.  I fixed the missed places in reloptions
code as we discussed.  I also took out the parser changes related to
allowing unquoted PARALLEL in old-style CREATE AGGREGATE, because that
is not a goal I consider worthy of adding extra grammar complexity.
We don't document that PARALLEL works there, and there has never been
any expectation that deprecated legacy syntax would grow new options
as needed to have feature parity with the modern syntax.  I don't feel
a need to go out of our way to prevent new options from working in the
old syntax, if they happen to, but I definitely don't want to expend
code on making them do so.

Accordingly, that regression test that expected PARALLEL to work in
the old-style syntax was just ill-considered, and I changed it to use
the new-style syntax instead.

I also trimmed the new regression test cases a bit, as most of them seemed
pretty duplicative.  How many times do we need to verify that the lexer
doesn't downcase quoted identifiers?  I'm not really sure that checking
this behavior from SQL is useful at all, actually.  The interesting
question is whether there are code paths that can reach these string
comparisons with strings that *didn't* get processed as identifiers by the
lexer, and these tests do basically nothing to prove that there aren't.
Still, I think we can push this and wait to see if there are complaints.

regards, tom lane



Re: Setting BLCKSZ 4kB

2018-01-26 Thread Tomas Vondra


On 01/27/2018 12:06 AM, Andres Freund wrote:
> Hi,
> 
> On 2018-01-26 23:53:33 +0100, Tomas Vondra wrote:
>> But more importantly, I don't see why the size of the internal page
>> would matter here at all? SSDs have non-volatile write cache (DRAM with
>> battery), protecting all the internal writes to pages. If your SSD does
>> not do that correctly, it's already broken no matter what page size it
>> uses even with full_page_writes=on.
> 
> Far far from all SSDs have non-volatile write caches. And if they
> respect barrier requests (i.e. flush before returning), they're not
> broken.
>

That is true, thanks for the correction.

But does that make the internal page size relevant to the atomicity
question? For example, let's say we write 4kB on a drive with 2kB
internal pages, and the power goes out after writing the first 2kB of
data (so losing the second 2kB get lost). The disk however never
confirmed the 4kB write, exactly because of the writer barrier ...

I have to admit I'm not sure what happens at this point - whether the
drive will produce torn page (with the first 2kB updated and 2kB old),
or if it's smart enough to realize the write barrier was not reached.

But perhaps this (non-volatile write cache) is one of the requirements
for disabling full page writes?

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

2018-01-26 Thread Daniel Gustafsson
> On 26 Jan 2018, at 23:58, Tom Lane  wrote:
> 
> Daniel Gustafsson  writes:
>>> On 26 Jan 2018, at 22:32, Tom Lane  wrote:
>>> I notice that there are two reloptions-related
>>> "pg_strncasecmp" calls that did not get converted to "strncmp":
>>> reloptions.c:804
> 
>> The way I read transformRelOptions(), oldOptions is not guaranteed to
>> come from the parser (though in reality it probably will be).
> 
> Well, one response to that is that it should contain values that were
> deemed acceptable at some previous time.  If we allowed catalog contents
> to be migrated physically across major releases, we'd need to worry about
> having mixed-case reloptions in a v11 catalog ... but we don't, so we
> won't.  The old reloptions should always be ones that got through
> parseRelOptions before, and those now will always have to be exact case.

That’s a good point, the reloptions will only ever come from the same major
version.

> Another response is that leaving it like this would mean that
> transformRelOptions and parseRelOptions have different opinions about
> whether two option names are the same or not, which seems to me to be
> exactly the same sort of bug hazard that you were on about at the
> beginning of this thread.

Completely agree.

>> The namespace isn’t either
>> but passing an uppercase namespace should never be valid AFAICT, hence the
>> patch changing it to case sensitive comparison.
> 
> Well, it's no more or less valid than an uppercase option name …

Agreed.  Taking a step back and thinking it’s clear that the comparison in the
oldoptions loop should’ve been changed in the patch for it to be consistent
with the original objective.

cheers ./daniel


Re: Setting BLCKSZ 4kB

2018-01-26 Thread Andres Freund
Hi,

On 2018-01-26 23:53:33 +0100, Tomas Vondra wrote:
> But more importantly, I don't see why the size of the internal page
> would matter here at all? SSDs have non-volatile write cache (DRAM with
> battery), protecting all the internal writes to pages. If your SSD does
> not do that correctly, it's already broken no matter what page size it
> uses even with full_page_writes=on.

Far far from all SSDs have non-volatile write caches. And if they
respect barrier requests (i.e. flush before returning), they're not
broken.

Greetings,

Andres Freund



Re: PATCH: pgbench - option to build using ppoll() for larger connection counts

2018-01-26 Thread Fabien COELHO


Hello Doug,

Patch applies, compiles, tests ok.


   > [...] Replaced USE_PPOLL with HAVE_PPOLL as having both seems redundant.

   I'm okay with that. I'm wondering whether there should be a way to force
   using one or the other when both are available. Not sure.

Added option to force use of select(2) via:  -DUSE_SELECT


USE_SELECT could mean something somewhere. Maybe use something more 
specific like PGBENCH_USE_SELECT? Having this macro available simplifies 
testing.


I'm not sure why you do the following trick, could you explain?
  +#undef USE_SELECT
  +#define USE_SELECT

In the select implementation you do:

 return (socket_set *) pg_malloc0(sizeof(socket_set) * MAXCLIENTS);

but ISTM that socket_set is already an fd_set which represents a set of 
clients, so allocating a number of it is needed. The initial 
implementation just does "fs_set input_mask", whetever the number of 
clients, and it works fine.


--
Fabien.



Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

2018-01-26 Thread Tom Lane
Daniel Gustafsson  writes:
>> On 26 Jan 2018, at 22:32, Tom Lane  wrote:
>> I notice that there are two reloptions-related
>> "pg_strncasecmp" calls that did not get converted to "strncmp":
>> reloptions.c:804

> The way I read transformRelOptions(), oldOptions is not guaranteed to
> come from the parser (though in reality it probably will be).

Well, one response to that is that it should contain values that were
deemed acceptable at some previous time.  If we allowed catalog contents
to be migrated physically across major releases, we'd need to worry about
having mixed-case reloptions in a v11 catalog ... but we don't, so we
won't.  The old reloptions should always be ones that got through
parseRelOptions before, and those now will always have to be exact case.

Another response is that leaving it like this would mean that
transformRelOptions and parseRelOptions have different opinions about
whether two option names are the same or not, which seems to me to be
exactly the same sort of bug hazard that you were on about at the
beginning of this thread.

> The namespace isn’t either
> but passing an uppercase namespace should never be valid AFAICT, hence the
> patch changing it to case sensitive comparison.

Well, it's no more or less valid than an uppercase option name ...

>> and reloptions.h:169.

> Oversight, completely missed that one.

It looks like no core code uses that macro, so it's got no effect
unless some third-party code does ... but I changed it anyway.

regards, tom lane



Re: Setting BLCKSZ 4kB

2018-01-26 Thread Tomas Vondra


On 01/26/2018 02:56 PM, Bruce Momjian wrote:
> On Wed, Jan 17, 2018 at 02:10:10PM +0100, Fabien COELHO wrote:
>>
>> Hello,
>>
>>> What are the cons of setting BLCKSZ as 4kB? When saw the results published
>>> on [...].
>>
>> There were other posts and publications which points to the same direction
>> consistently.
>>
>> This matches my deep belief is that postgres default block size is a
>> reasonable compromise for HDD, but is less pertinent for SSD for most OLTP
>> loads.
>>
>> For OLAP, I do not think it would lose much, but I have not tested it.
>>
>>> Does turning off FPWs will be safe if BLCKSZ is set to 4kB given page size
>>> of file system is 4kB?
>>
>> FPW = Full Page Write. I would not bet on turning off FPW, ISTM 
>> that SSDs can have "page" sizes as low as 512 bytes, but are 
>> typically 2 kB or 4 kB, and the information easily available
>> anyway.
> 

Is this referring to sector size or the internal SSD page size?

AFAIK there are only 512B and 4096B sectors, so I assume you must be
talking about the latter. I don't think I've ever heard about an SSD
with 512B pages though (generally the page sizes are 2kB to 16kB).

But more importantly, I don't see why the size of the internal page
would matter here at all? SSDs have non-volatile write cache (DRAM with
battery), protecting all the internal writes to pages. If your SSD does
not do that correctly, it's already broken no matter what page size it
uses even with full_page_writes=on.

On spinning rust the caches would be disabled and replaced by write
cache on a RAID controller with battery, but that's not possible on SSDs
where the on-disk cache is baked into the whole design.

What I think does matters here is the sector size (i.e. either 512B or
4096B) used to communicate with the disk. Obviously, if the kernel
writes 4kB page as a series of independent 512B writes, that would be
unreliable. If it sends one 4kB write, why wouldn't that work?

> Yes, that is the hard part, making sure you have 4k granularity of 
> write, and matching write alignment. pg_test_fsync and diskchecker.pl
> (which we mention in our docs) will not help here. A specific
> alignment test based on diskchecker.pl would have to be written.
> However, if you look at the kernel code you might be able to verify
> quickly that the 4k atomicity is not guaranteed.
> 

Are you suggesting there's a part of the kernel code clearly showing
it's not atomic? Can you point us to that part of the kernel sources?

FWIW even if it's not save in general, it would be useful to understand
what are the requirements to make it work. I mean, conditions that need
to be met on various levels (sector size of the storage device, page
size of of the file system, filesystem alignment, ...).

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

2018-01-26 Thread Daniel Gustafsson
> On 26 Jan 2018, at 22:32, Tom Lane  wrote:
> 
> Michael Paquier  writes:
>> On Wed, Jan 24, 2018 at 09:47:50AM +0100, Daniel Gustafsson wrote:
>>> Attached is a rebased v7 patch which has your amendments (minus
>>> propname) which passes make check without errors.
> 
>> Confirmed. I am switching the status as ready for committer for
>> volatility-v7.patch then.
> 
> Poking through this,

Thanks!

> I notice that there are two reloptions-related
> "pg_strncasecmp" calls that did not get converted to "strncmp":
> reloptions.c:804

The way I read transformRelOptions(), oldOptions is not guaranteed to come from
the parser (though in reality it probably will be).  The namespace isn’t either
but passing an uppercase namespace should never be valid AFAICT, hence the
patch changing it to case sensitive comparison.

> and reloptions.h:169.

Oversight, completely missed that one.

cheers ./daniel


Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

2018-01-26 Thread Tom Lane
Michael Paquier  writes:
> On Wed, Jan 24, 2018 at 09:47:50AM +0100, Daniel Gustafsson wrote:
>> Attached is a rebased v7 patch which has your amendments (minus
>> propname) which passes make check without errors.

> Confirmed. I am switching the status as ready for committer for
> volatility-v7.patch then.

Poking through this, I notice that there are two reloptions-related
"pg_strncasecmp" calls that did not get converted to "strncmp":
reloptions.c:804 and reloptions.h:169.

Is that an oversight, or intentional, and if intentional why?

regards, tom lane



Re: [HACKERS] why not parallel seq scan for slow functions

2018-01-26 Thread Robert Haas
On Tue, Jan 2, 2018 at 6:38 AM, Amit Kapila  wrote:
>  [ new patch ]

I think that grouping_planner() could benefit from a slightly more
extensive rearrangement.  With your patch applied, the order of
operations is:

1. compute the scan/join target
2. apply the scan/join target to all paths in current_rel's pathlist
3. generate gather paths, possibly adding more stuff to current_rel's pathlist
4. rerun set_cheapest
5. apply the scan/join target, if parallel safe, to all paths in the
current rel's partial_pathlist, for the benefit of upper planning
steps
6. clear the partial pathlist if the target list is not parallel safe

I at first thought this was outright broken because step #3 imposes
the scan/join target without testing it for parallel-safety, but then
I realized that generate_gather_paths will apply that target list by
using apply_projection_to_path, which makes an is_parallel_safe test
of its own.  But it doesn't seem good for step 3 to test the
parallel-safety of the target list separately for each path and then
have grouping_planner do it one more time for the benefit of upper
planning steps.  Instead, I suggest that we try to get rid of the
logic in apply_projection_to_path that knows about Gather and Gather
Merge specifically.  I think we can do that if grouping_planner does
this:

1. compute the scan/join target
2. apply the scan/join target, if parallel safe, to all paths in the
current rel's partial_pathlist
3. generate gather paths
4. clear the partial pathlist if the target list is not parallel safe
5. apply the scan/join target to all paths in current_rel's pathlist
6. rerun set_cheapest

That seems like a considerably more logical order of operations.  It
avoids not only the expense of testing the scanjoin_target for
parallel-safety multiple times, but the ugliness of having
apply_projection_to_path know about Gather and Gather Merge as a
special case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Partition-wise aggregation/grouping

2018-01-26 Thread Robert Haas
On Thu, Jan 18, 2018 at 8:55 AM, Jeevan Chalke
 wrote:
> Attached patch with other review points fixed.

Committed 0001 and 0002 together, with some cosmetic changes,
including fixing pgindent damage.  Please pgindent your patches before
submitting.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: JIT compiling with LLVM v9.0

2018-01-26 Thread Andres Freund
Hi,

On 2018-01-26 13:06:27 +0300, Konstantin Knizhnik wrote:
> One more question: do you have any idea how to profile JITed code?

Yes ;). It depends a bit on what exactly you want to do. Is it
sufficient to get time associated with the parent caller, or do you need
instruction-level access.


> There is no LLVMOrcRegisterPerf in LLVM 5, so jit_profiling_support option
> does nothing.

Right, it's a patch I'm trying to get into the next version of
llvm. With that you get access to the shared object and everything.


> And without it perf is not able to unwind stack trace for generated
> code.

You can work around that by using --call-graph lbr with a sufficiently
new perf. That'll not know function names et al, but at least the parent
will be associated correctly.


> But you are compiling code using LLVMOrcAddEagerlyCompiledIR
> and I find no way to pass no-omit-frame pointer option here.

It shouldn't be too hard to open code support for it, encapsulated in a
function:
// Set function attribute "no-frame-pointer-elim" based on
// NoFramePointerElim.
for (auto  : *Mod) {
  auto Attrs = F.getAttributes();
  StringRef Value(options.NoFramePointerElim ? "true" : "false");
  Attrs = Attrs.addAttribute(F.getContext(), AttributeList::FunctionIndex,
 "no-frame-pointer-elim", Value);
  F.setAttributes(Attrs);
}
that's all that option did for mcjit.

Greetings,

Andres Freund



Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-01-26 Thread Peter Geoghegan
On Fri, Jan 26, 2018 at 11:17 AM, Robert Haas  wrote:
> Hmm, I like the idea of making it a #define instead of having it
> depend on parallel_leader_participation.  Let's do that.  If the
> consensus is later that it was the wrong decision, it'll be easy to
> change it back.

WFM.

-- 
Peter Geoghegan



Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-01-26 Thread Robert Haas
On Fri, Jan 26, 2018 at 2:04 PM, Peter Geoghegan  wrote:
> On Fri, Jan 26, 2018 at 10:33 AM, Robert Haas  wrote:
>> I'm busy with other things, so no rush.
>
> Got it.
>
> There is one question that I should probably get clarity on ahead of
> the next revision, which is: Should I rip out the code that disallows
> a "degenerate parallel CREATE INDEX" when
> parallel_leader_participation=off, or should I instead rip out any
> code that deals with parallel_leader_participation, and always have
> the leader participate as a worker?
>
> If I did the latter, then leader non-participation would live on as a
> #define debug option within nbtsort.c. It definitely seems like we'd
> want to preserve that at a minimum.

Hmm, I like the idea of making it a #define instead of having it
depend on parallel_leader_participation.  Let's do that.  If the
consensus is later that it was the wrong decision, it'll be easy to
change it back.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-01-26 Thread Peter Geoghegan
On Fri, Jan 26, 2018 at 10:33 AM, Robert Haas  wrote:
> I'm busy with other things, so no rush.

Got it.

There is one question that I should probably get clarity on ahead of
the next revision, which is: Should I rip out the code that disallows
a "degenerate parallel CREATE INDEX" when
parallel_leader_participation=off, or should I instead rip out any
code that deals with parallel_leader_participation, and always have
the leader participate as a worker?

If I did the latter, then leader non-participation would live on as a
#define debug option within nbtsort.c. It definitely seems like we'd
want to preserve that at a minimum.

-- 
Peter Geoghegan



Re: \describe*

2018-01-26 Thread Corey Huinker
>
> It would be about as hard to memorize \describe-schemas as it is to
> memorize \dn:
> You'd have to remember that it is "-" and not "_", that it is "describe",
> not "desc"
> and that it is "schemas", not "schema".
>

You wouldn't memorize them.  You'd discover them with tab completion.

Type "\d"  and you'll see
\d  \dA \dc \dd \ddp \des \deu \df \dFd \dFt \di \dL \dn \d0 \drds \dS \dT
\dv \dy
\da \db \dC \dD \dE  \det \dew \dF \dFp \dg  \dl \dm \do \dp \ds   \dt \du
\dx

which is more heat than light. Yes, those are all the possibilites, but I,
Joe Newguy, want to list schemas, and \ds and \dS look like the good
guesses, neither of which is the right answer.  If, with this feature, I
typed \desc, I might see:

\describe  \describe-functions  \describe-schemas  \describe-tables
...

So my voyage of discovery would have completed with having typed
"\desc-sc" and if we add a note to interactive mode, I'd be shown
the hint that \dn is the shortcut for that just above the list of schemas.


Re: unique indexes on partitioned tables

2018-01-26 Thread Peter Eisentraut
On 1/22/18 17:55, Alvaro Herrera wrote:
> Alvaro Herrera wrote:
>> Version 4 of this patch, rebased on today's master.

+   if (key->partattrs[i] == 0)
+   ereport(ERROR,
+   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+errmsg("unsupported %s constraint with
partition key definition",
+   constraint_type),
+errmsg("%s constraints cannot be used when
partition keys include expressions.",
+   constraint_type)));

Double errmsg().  (Maybe an Assert somewhere should help catch this?)

+alter table idxpart add primary key (a);   -- not an incomplete one tho

"though"?

I would like to see some tests that the unique constraints are actually
enforced.  That is, insert some duplicate values and see it fail.  Throw
some null values in, to check PK behavior as well.  It should be
trivial, but seems kind of useful.

Other than that, this looks pretty good to me.  A logical extension of
the previous partitioned index patch.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

2018-01-26 Thread Stephen Frost
Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Fri, Jan 26, 2018 at 11:56 AM, Stephen Frost  wrote:
> >> I think you've chosen a terrible design and ought to throw the whole
> >> thing away and start over.
> >
> > I'll all for throwing away the existing test once we've got something
> > that covers at least what it does (ideally more, of course).
> 
> I'm for throwing this away now.  It's a nuisance for other people to
> maintain, and as Tom's reply makes clear (and it matches my
> suspicions), they are maintaining it without really knowing whether
> the updates are making are *correct*, just knowing that they *make the
> tests pass*.  It's nice to make things turn green on the code coverage
> report, but if we're not really verifying that the results are
> correct, we're just kidding ourselves.  We'd get the same amount of
> green on the code coverage report by running the pg_dump commands and
> sending the output to /dev/null, and it would be a lot less work to
> keep up to date.

There's not much to argue about if committers are simply hacking away at
it without actually considering if the test are correct or not.  I
suspect it's not quite as bad as being outright wrong- if individuals
are at least testing their changes thoroughly first and then making sure
that the tests pass then it's at least more likely that what's recorded
in the test suite is actually correct, but that's not great.  Further,
having contributors include updates to the test suite which are then
willfully thrown away is outright bad.

If that's worse than not having this test coverage for pg_dump, I'm not
sure.  What strikes me as likely is that removing this test from pg_dump
will just result in bugs being introduced because, for as much as
updating these tests are painful, actually testing all of the different
variations of pg_dump by hand for even a simple change is quite a bit of
work too.  I don't have any doubt that the reason bugs were found with
this is because there were permutations of pg_dump that weren't being
tested, either by individuals making the change or through the other
regression tests we have.  Indeed, the tests included were specifically
to get code coverage of cases which weren't already being tested.

> I'm glad this helped you find some bugs.  It is only worth keeping if
> it prevents other hackers from introducing bugs in the future.  I
> doubt that it will have that effect.

I'm not convinced that it won't, but I agree that we want something that
everyone will feel comfortable in maintaining and improving moving
forward.  I'll work on it and either submit a rework to make it easier
to maintain or a patch to remove it before the next CF.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-01-26 Thread Robert Haas
On Fri, Jan 26, 2018 at 1:17 PM, Peter Geoghegan  wrote:
> On Fri, Jan 26, 2018 at 10:01 AM, Robert Haas  wrote:
>> On Fri, Jan 26, 2018 at 1:30 AM, Peter Geoghegan  wrote:
>>> I had imagined that WaitForParallelWorkersToAttach() would give me an
>>> error in the style of WaitForParallelWorkersToFinish(), without
>>> actually waiting for the parallel workers to finish.
>>
>> +1.  If we're going to go that route, and that seems to be the
>> consensus, then I think an error is more appropriate than returning an
>> updated worker count.
>
> Great.
>
> Should I wait for Amit's WaitForParallelWorkersToAttach() patch to be
> posted, reviewed, and committed, or would you like to see what I came
> up with ("The next revision of the patch will make the
> leader-participates-as-worker spool/Tuplelsortstate start and finish
> sorting before the main leader spool/Tuplelsortstate is even started")
> today?

I'm busy with other things, so no rush.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [Sender Address Forgery]Re: pg_(total_)relation_size and partitioned tables

2018-01-26 Thread Robert Haas
On Fri, Jan 26, 2018 at 7:45 AM, Michael Paquier
 wrote:
> There could be value in having a version dedicated to inheritance trees
> as well, true enough.  As well as value in having something that shows
> both.  Still let's not forget that partition sets are structured so as
> the parents have no data, so I see more value in having only partitions
> listed, without the INHERIT part.  Opinions from others are of course
> welcome.

Well, partitioning and inheritance can't be mixed.  A given table has
either partitions OR inheritance children OR neither.  If it has
either partitions or inheritance children, find_all_inheritors will
return them.  Otherwise, I think it'll just return the input OID
itself.  So I don't quite see, if we're going to add a convenience
function here, why wouldn't just define it to return the same results
as find_all_inheritors does?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-01-26 Thread Peter Geoghegan
On Fri, Jan 26, 2018 at 10:01 AM, Robert Haas  wrote:
> On Fri, Jan 26, 2018 at 1:30 AM, Peter Geoghegan  wrote:
>> I had imagined that WaitForParallelWorkersToAttach() would give me an
>> error in the style of WaitForParallelWorkersToFinish(), without
>> actually waiting for the parallel workers to finish.
>
> +1.  If we're going to go that route, and that seems to be the
> consensus, then I think an error is more appropriate than returning an
> updated worker count.

Great.

Should I wait for Amit's WaitForParallelWorkersToAttach() patch to be
posted, reviewed, and committed, or would you like to see what I came
up with ("The next revision of the patch will make the
leader-participates-as-worker spool/Tuplelsortstate start and finish
sorting before the main leader spool/Tuplelsortstate is even started")
today?

-- 
Peter Geoghegan



Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-01-26 Thread Robert Haas
On Fri, Jan 26, 2018 at 1:30 AM, Peter Geoghegan  wrote:
> I had imagined that WaitForParallelWorkersToAttach() would give me an
> error in the style of WaitForParallelWorkersToFinish(), without
> actually waiting for the parallel workers to finish.

+1.  If we're going to go that route, and that seems to be the
consensus, then I think an error is more appropriate than returning an
updated worker count.

On the question of whether this is better or worse than using
barriers, I'm not entirely sure.  I understand that various objections
to the Barrier concept have been raised, but I'm not personally
convinced by any of them.  On the other hand, if we only have to call
WaitForParallelWorkersToAttach after the leader finishes its own sort,
then there's no latency advantage to the barrier approach.  I suspect
we might still end up reworking this if we add the ability for new
workers to join an index build in medias res at some point in the
future -- but, as Peter points out, maybe the whole algorithm would
get reworked in that scenario.  So, since other people like
WaitForParallelWorkersToAttach, I think we can just go with that for
now.  I don't want to kill this patch with unnecessary nitpicking.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] generated columns

2018-01-26 Thread Robert Haas
On Thu, Jan 25, 2018 at 10:26 PM, Peter Eisentraut
 wrote:
>> Does the SQL spec mention the matter? How do other systems
>> handle such cases?
>
> In Oracle you get the same overflow error.

That seems awful.  If a user says "SELECT * FROM tab" and it fails,
how are they supposed to recover, or even understand what the problem
is?  I think we should really try to at least generate an errcontext
here:

ERROR:  integer out of range
CONTEXT: while generating virtual column "b"

And maybe a hint, too, like "try excluding this column".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Removing WITH clause support in CREATE FUNCTION, for isCachable and isStrict

2018-01-26 Thread Robert Haas
On Fri, Jan 26, 2018 at 12:30 PM, Tom Lane  wrote:
> Michael Paquier  writes:
>> On Thu, Jan 25, 2018 at 06:30:04PM -0500, Tom Lane wrote:
>>> In short, I'm on board with removing the WITH clause.  I've not
>>> reviewed the patch in detail, but will do so and push it if there's
>>> not objections pretty soon.
>
>> Glad to hear that, thanks!
>
> And done.  I failed to resist the temptation to rename
> compute_attributes_sql_style, since the "sql_style" bit no longer
> conveys anything.  I'd always found compute_attributes_with_style
> to be confusingly named --- seemed like it should have a sibling
> compute_attributes_gracelessly, or something like that.

+1 for compute_attributes_gracelessly!  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Removing WITH clause support in CREATE FUNCTION, for isCachable and isStrict

2018-01-26 Thread Tom Lane
Michael Paquier  writes:
> On Thu, Jan 25, 2018 at 06:30:04PM -0500, Tom Lane wrote:
>> In short, I'm on board with removing the WITH clause.  I've not
>> reviewed the patch in detail, but will do so and push it if there's
>> not objections pretty soon.

> Glad to hear that, thanks!

And done.  I failed to resist the temptation to rename
compute_attributes_sql_style, since the "sql_style" bit no longer
conveys anything.  I'd always found compute_attributes_with_style
to be confusingly named --- seemed like it should have a sibling
compute_attributes_gracelessly, or something like that.

regards, tom lane



Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

2018-01-26 Thread Robert Haas
On Fri, Jan 26, 2018 at 11:56 AM, Stephen Frost  wrote:
>> I think you've chosen a terrible design and ought to throw the whole
>> thing away and start over.
>
> I'll all for throwing away the existing test once we've got something
> that covers at least what it does (ideally more, of course).

I'm for throwing this away now.  It's a nuisance for other people to
maintain, and as Tom's reply makes clear (and it matches my
suspicions), they are maintaining it without really knowing whether
the updates are making are *correct*, just knowing that they *make the
tests pass*.  It's nice to make things turn green on the code coverage
report, but if we're not really verifying that the results are
correct, we're just kidding ourselves.  We'd get the same amount of
green on the code coverage report by running the pg_dump commands and
sending the output to /dev/null, and it would be a lot less work to
keep up to date.

I'm glad this helped you find some bugs.  It is only worth keeping if
it prevents other hackers from introducing bugs in the future.  I
doubt that it will have that effect.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

2018-01-26 Thread Tom Lane
Robert Haas  writes:
> I figured you would, but it's still my opinion.  I guess my basic
> objection here is to the idea that we somehow know that the 6000+ line
> test case file actually contains only correct tests.  That vastly
> exceeds the ability of any normal human being to verify correctness,
> especially given what's already been said about the interdependencies
> between different parts of the file and the lack of adequate
> documentation.

Yeah, that's a problem.  In the last two times I touched that file,
I just moved things between "like" and "unlike" categories until the
test passed.  If there were anything useful it had to tell me, it was a
complete failure at doing so.  I frankly won't even think about adding
new test cases to it, either.

I don't know how to make it better exactly, but I concur with Robert that
that test needs fundamental redesign of some kind to be maintainable.

regards, tom lane



Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

2018-01-26 Thread Stephen Frost
Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Fri, Jan 26, 2018 at 11:09 AM, Stephen Frost  wrote:
> > * Robert Haas (robertmh...@gmail.com) wrote:
> >> On Thu, Jan 25, 2018 at 4:52 PM, Alvaro Herrera  
> >> wrote:
> >> > My proposal is that instead of looking at three hundred lines, you'd
> >> > look for 50 lines of `pg_restore -l` output -- is element XYZ in there
> >> > or not.  Quite a bit simpler for the guy adding a new test.  This tests
> >> > combinations of pg_dump switches: are we dumping the right set of
> >> > objects.
> >>
> >> My counter-proposal is that we remove the test entirely.  It looks
> >> like an unmaintainable and undocumented mess to me, and I doubt
> >> whether the testing value is sufficient to justify the effort of
> >> updating it every time anyone wants to change something in pg_dump.
> >
> > Considering it turned up multiple serious bugs, particularly in the
> > binary upgrade path, I can't disagree more.  If you have a counter
> > proposal which actually results in better test coverage, that'd be
> > fantastic, but I wholly reject the notion that we should be considering
> > reducing our test coverage of pg_dump.
> 
> I figured you would, but it's still my opinion.  I guess my basic
> objection here is to the idea that we somehow know that the 6000+ line
> test case file actually contains only correct tests.  That vastly
> exceeds the ability of any normal human being to verify correctness,
> especially given what's already been said about the interdependencies
> between different parts of the file and the lack of adequate
> documentation.

I don't mean to discount your opinion at all, but I'm quite concerned
about the code coverage of pg_dump.  I certainly agree that the testing
of pg_dump can and should be improved and I'd like to find time to work
on that, but simply throwing this out strikes me as a step backwards,
not forwards, even for its difficulty.

> For example, I notice that the file contains 166 copies of
> only_dump_test_schema  => 1 (with 4 different variations as to how
> much whitespace is included). Some of those are in the "like" clause
> and some are in the "unlike" clause.  If one of them were misplaced,
> and pg_dump is actually producing the wrong output, the two errors
> would cancel out and I suspect nobody would notice.  If somebody makes
> a change to pg_dump that changes which things get dumped when --schema
> is used, then a lot of those lines would need to moved around.  That
> would be a huge nuisance.  If the author of such a patch just stuck
> those lines where they make the tests pass, they might fail to notice
> if one of them were actually in the wrong place, and a bug would go
> undiscovered.  Some people probably have the mental stamina to audit
> 166 separate cases and make sure that each one is properly positioned,
> but some people might not; and that's really just one test of many.
> Really, every pg_dump change that alters behavior needs to
> individually reconsider the position of every one of ~6000 lines in
> this file, and nobody is really going to do that.
> 
> There's some rule that articulates what the effect of --schema is
> supposed to be.  I don't know the exact rule, but it's probably
> something like "global objects shouldn't get dumped and schema objects
> should only get dumped if they're in that schema".  That rule ought to
> be encoded into this file in some recognizable form.  It's encoded
> there, but only in the positioning of hundreds of separate lines of
> code that look identical but must be positioned differently based on a
> human programmer's interpretation of how that rule applies to each
> object type.  That's not a good way of implementing the rule; nobody
> can look at this and say "oh, well I changed the rules for --schema,
> so here's which things need to be updated".  You're not going to be
> able to look at the diff somebody produces and have any idea whether
> it's right, or at least not without a lot of very time-consuming
> manual verification.  If you had just saved the output of pg_dump in a
> file (maybe with OIDs and other variable bits stripped out) and
> compared the new output against the old, it would give you just as
> much code coverage but probably be a lot easier to maintain.  If you
> had implemented the rules programmatically instead of encoding a giant
> manually precomputed data structure in the test case file it would be
> a lot more clearly correct and again easier to maintain.

The existing code already has some of these type of rules encoded in it,
specifically to reduce the number of like/unlike cases where it's
possible to do so- see the discussion of catch-all tests, described on
about line 282.  While there may be other such rules which can be
encoded, I don't believe they'd end up being nearly as clean as you're
suggesting unless we go through and make changes to pg_dump itself to
consistently behave according to 

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

2018-01-26 Thread Robert Haas
On Fri, Jan 26, 2018 at 11:09 AM, Stephen Frost  wrote:
> * Robert Haas (robertmh...@gmail.com) wrote:
>> On Thu, Jan 25, 2018 at 4:52 PM, Alvaro Herrera  
>> wrote:
>> > My proposal is that instead of looking at three hundred lines, you'd
>> > look for 50 lines of `pg_restore -l` output -- is element XYZ in there
>> > or not.  Quite a bit simpler for the guy adding a new test.  This tests
>> > combinations of pg_dump switches: are we dumping the right set of
>> > objects.
>>
>> My counter-proposal is that we remove the test entirely.  It looks
>> like an unmaintainable and undocumented mess to me, and I doubt
>> whether the testing value is sufficient to justify the effort of
>> updating it every time anyone wants to change something in pg_dump.
>
> Considering it turned up multiple serious bugs, particularly in the
> binary upgrade path, I can't disagree more.  If you have a counter
> proposal which actually results in better test coverage, that'd be
> fantastic, but I wholly reject the notion that we should be considering
> reducing our test coverage of pg_dump.

I figured you would, but it's still my opinion.  I guess my basic
objection here is to the idea that we somehow know that the 6000+ line
test case file actually contains only correct tests.  That vastly
exceeds the ability of any normal human being to verify correctness,
especially given what's already been said about the interdependencies
between different parts of the file and the lack of adequate
documentation.

For example, I notice that the file contains 166 copies of
only_dump_test_schema  => 1 (with 4 different variations as to how
much whitespace is included). Some of those are in the "like" clause
and some are in the "unlike" clause.  If one of them were misplaced,
and pg_dump is actually producing the wrong output, the two errors
would cancel out and I suspect nobody would notice.  If somebody makes
a change to pg_dump that changes which things get dumped when --schema
is used, then a lot of those lines would need to moved around.  That
would be a huge nuisance.  If the author of such a patch just stuck
those lines where they make the tests pass, they might fail to notice
if one of them were actually in the wrong place, and a bug would go
undiscovered.  Some people probably have the mental stamina to audit
166 separate cases and make sure that each one is properly positioned,
but some people might not; and that's really just one test of many.
Really, every pg_dump change that alters behavior needs to
individually reconsider the position of every one of ~6000 lines in
this file, and nobody is really going to do that.

There's some rule that articulates what the effect of --schema is
supposed to be.  I don't know the exact rule, but it's probably
something like "global objects shouldn't get dumped and schema objects
should only get dumped if they're in that schema".  That rule ought to
be encoded into this file in some recognizable form.  It's encoded
there, but only in the positioning of hundreds of separate lines of
code that look identical but must be positioned differently based on a
human programmer's interpretation of how that rule applies to each
object type.  That's not a good way of implementing the rule; nobody
can look at this and say "oh, well I changed the rules for --schema,
so here's which things need to be updated".  You're not going to be
able to look at the diff somebody produces and have any idea whether
it's right, or at least not without a lot of very time-consuming
manual verification.  If you had just saved the output of pg_dump in a
file (maybe with OIDs and other variable bits stripped out) and
compared the new output against the old, it would give you just as
much code coverage but probably be a lot easier to maintain.  If you
had implemented the rules programmatically instead of encoding a giant
manually precomputed data structure in the test case file it would be
a lot more clearly correct and again easier to maintain.

I think you've chosen a terrible design and ought to throw the whole
thing away and start over.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Boolean partitions syntax

2018-01-26 Thread Tom Lane
Robert Haas  writes:
> On Fri, Jan 26, 2018 at 11:07 AM, Stephen Frost  wrote:
>> I've already had two people mention that it'd be neat to have PG support
>> it, so I don't believe it'd go unused.  As for if we should force people
>> to use quotes, my vote would be no because we don't require that for
>> other usage of true/false in the parser and I don't see any reason why
>> this should be different.

> OK.  Let's wait a bit and see if anyone else wants to weigh in.

I dunno, this patch seems quite bizarre to me.  IIUC, it results in
TRUE/FALSE behaving differently in a partbound_datum than they do
anywhere else in the grammar, to wit that they're effectively just
another spelling for the undecorated literals 't' and 'f', without
anything indicating that they're boolean.  That seems wrong from a
data typing standpoint.  And even if it's really true that we'd
rather lose type information for partbound literals, a naive observer
would probably think that these should decay to the strings "true"
and "false" not "t" and "f" (cf. opt_boolean_or_string).

I've not paid any attention to this thread up to now, so maybe there's
a rational explanation for this choice that I missed.  But mucking
with makeBoolAConst like that doesn't seem to me to pass the smell
test.  I'm on board with the stated goal of allowing TRUE/FALSE here,
but having to contort the grammar output like this suggests that
there's something wrong in parse analysis of partbound_datum.

regards, tom lane

PS: the proposed documentation wording is too verbose by half.
I'd just cut it down to "".



Re: Boolean partitions syntax

2018-01-26 Thread Robert Haas
On Fri, Jan 26, 2018 at 11:07 AM, Stephen Frost  wrote:
> I've already had two people mention that it'd be neat to have PG support
> it, so I don't believe it'd go unused.  As for if we should force people
> to use quotes, my vote would be no because we don't require that for
> other usage of true/false in the parser and I don't see any reason why
> this should be different.

OK.  Let's wait a bit and see if anyone else wants to weigh in.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

2018-01-26 Thread Stephen Frost
Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Thu, Jan 25, 2018 at 4:52 PM, Alvaro Herrera  
> wrote:
> > My proposal is that instead of looking at three hundred lines, you'd
> > look for 50 lines of `pg_restore -l` output -- is element XYZ in there
> > or not.  Quite a bit simpler for the guy adding a new test.  This tests
> > combinations of pg_dump switches: are we dumping the right set of
> > objects.
> 
> My counter-proposal is that we remove the test entirely.  It looks
> like an unmaintainable and undocumented mess to me, and I doubt
> whether the testing value is sufficient to justify the effort of
> updating it every time anyone wants to change something in pg_dump.

Considering it turned up multiple serious bugs, particularly in the
binary upgrade path, I can't disagree more.  If you have a counter
proposal which actually results in better test coverage, that'd be
fantastic, but I wholly reject the notion that we should be considering
reducing our test coverage of pg_dump.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Boolean partitions syntax

2018-01-26 Thread Stephen Frost
Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Thu, Jan 25, 2018 at 8:44 PM, Amit Langote
>  wrote:
> > Attached updated patch.
> 
> I wonder if this patch is just parser bloat without any real benefit.
> It can't be very common to want to partition on a Boolean column, and
> if you do, all this patch does is let you drop the quotes.  That's not
> really a big deal, is it?

I've already had two people mention that it'd be neat to have PG support
it, so I don't believe it'd go unused.  As for if we should force people
to use quotes, my vote would be no because we don't require that for
other usage of true/false in the parser and I don't see any reason why
this should be different.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: [HACKERS] Custom compression methods

2018-01-26 Thread Ildar Musin

Hello Ildus,

I continue reviewing your patch. Here are some thoughts.

1. When I set column storage to EXTERNAL then I cannot set compression.
Seems reasonable:
create table test(id serial, msg text);
alter table test alter column msg set storage external;
alter table test alter column msg set compression pg_lz4;
ERROR:  storage for "msg" should be MAIN or EXTENDED

But if I reorder commands then it's ok:
create table test(id serial, msg text);
alter table test alter column msg set compression pg_lz4;
alter table test alter column msg set storage external;
\d+ test
Table "public.test"
 Column |  Type   |  ...  | Storage  | Compression
+-+  ...  +--+-
 id | integer |  ...  | plain|
 msg| text|  ...  | external | pg_lz4

So we could either allow user to set compression settings even when
storage is EXTERNAL but with warning or prohibit user to set compression
and external storage at the same time. The same thing is with setting 
storage PLAIN.



2. I think TOAST_COMPRESS_SET_RAWSIZE macro could be rewritten like
following to prevent overwriting of higher bits of 'info':

((toast_compress_header *) (ptr))->info = \
((toast_compress_header *) (ptr))->info & ~RAWSIZEMASK | (len);

It maybe does not matter at the moment since it is only used once, but
it could save some efforts for other developers in future.
In TOAST_COMPRESS_SET_CUSTOM() instead of changing individual bits you
may do something like this:

#define TOAST_COMPRESS_SET_CUSTOM(ptr) \
do { \
((toast_compress_header *) (ptr))->info = \
((toast_compress_header *) (ptr))->info & RAWSIZEMASK | 
((uint32) 0x02
<< 30) \
} while (0)

Also it would be nice if bit flags were explained and maybe replaced by
a macro.


3. In AlteredTableInfo, BulkInsertStateData and some functions (eg
toast_insert_or_update) there is a hash table used to keep preserved
compression methods list per attribute. I think a simple array of List*
would be sufficient in this case.


4. In optionListToArray() you can use list_qsort() to sort options list 
instead of converting it manually into array and then back to a list.



5. Redundunt #includes:

In heap.c:
#include "access/reloptions.h"
In tsvector.c:
#include "catalog/pg_type.h"
#include "common/pg_lzcompress.h"
In relcache.c:
#include "utils/datum.h"

6. Just a minor thing: no reason to change formatting in copy.c
-   heap_insert(resultRelInfo->ri_RelationDesc, tuple, mycid,
-   hi_options, bistate);
+   heap_insert(resultRelInfo->ri_RelationDesc, tuple,
+   mycid, hi_options, bistate);

7. Also in utility.c the extra new line was added which isn't relevant 
for this patch.


8. In parse_utilcmd.h the 'extern' keyword was removed from 
transformRuleStmt declaration which doesn't make sense in this patch.


9. Comments. Again, they should be read by a native speaker. So just a 
few suggestions:

toast_prepare_varlena() - comment needed
invalidate_amoptions_cache() - comment format doesn't match other 
functions in the file


In htup_details.h:
/* tuple contain custom compressed
 * varlenas */
should be "contains"

--
Ildar Musin
i.mu...@postgrespro.ru



Re: Documentation of pgcrypto AES key sizes

2018-01-26 Thread Robert Haas
On Thu, Jan 25, 2018 at 8:19 PM, Michael Paquier
 wrote:
> On Fri, Jan 26, 2018 at 12:33:41PM +1300, Thomas Munro wrote:
>> I noticed that the documentation for encrypt()/decrypt() says "aes —
>> AES (Rijndael-128)", but in fact 192 and 256 bit keys are also
>> supported, whether you build --with-openssl or --without-openssl.
>> Should that say "AES (Rijndael-128, -192 or -256)" instead?
>
> Indeed. Instead of using the keysize as a prefix, I would personally
> find less confusing if written as "AES (Rijndael with key sizes of 128,
> 192 or 256 bytes)" instead of the phrasing you are proposing. Well, it
> is true that "Rijndael-128" and friends are wordings that can be found
> here and there..

encrypt() seems happy with a key of any length at all, although I
guess internally it must round up to the next larger size.

rhaas=# select v, min(n), max(n) from (select n, encrypt('hello
world'::bytea, ('\x' || repeat('00', n))::bytea, 'aes') v from
generate_series(1,10) n) x group by 1;
 v  | min |  max
+-+
 \x7489adda96bb9c30fb4932e07731571a |   1 | 16
 \x20a25e2af113663852f4e7b7870835ff |  17 | 24
 \x56cbe187babf7b5df62924d78a3a5099 |  25 | 10
(3 rows)

The breakpoints are at 16 bytes = 128 bits and 24 bytes = 192 bits, so
that is consistent with Thomas's theory about what's going on under
the hood.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

2018-01-26 Thread Robert Haas
On Thu, Jan 25, 2018 at 4:52 PM, Alvaro Herrera  wrote:
> My proposal is that instead of looking at three hundred lines, you'd
> look for 50 lines of `pg_restore -l` output -- is element XYZ in there
> or not.  Quite a bit simpler for the guy adding a new test.  This tests
> combinations of pg_dump switches: are we dumping the right set of
> objects.

My counter-proposal is that we remove the test entirely.  It looks
like an unmaintainable and undocumented mess to me, and I doubt
whether the testing value is sufficient to justify the effort of
updating it every time anyone wants to change something in pg_dump.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Boolean partitions syntax

2018-01-26 Thread Robert Haas
On Thu, Jan 25, 2018 at 8:44 PM, Amit Langote
 wrote:
> Attached updated patch.

I wonder if this patch is just parser bloat without any real benefit.
It can't be very common to want to partition on a Boolean column, and
if you do, all this patch does is let you drop the quotes.  That's not
really a big deal, is it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Redefining inet_net_ntop

2018-01-26 Thread Tom Lane
Emre Hasegeli  writes:
>> port.h declares inet_net_ntop and we always compile our own from
>> port/inet_net_ntop.c .

> There is another copy of it under backend/utils/adt/inet_cidr_ntop.c.
> The code looks different but does 90% the same thing.  Their naming
> and usage is confusing.

> I recently needed to format IP addresses as DNS PTR records in the
> database, and got annoyed by having no functions that outputs IPv6
> addresses in easily parseable format like
> :::::::.  I was going to send a patch
> to unify those C functions and add another SQL function to get
> addresses in such format.  Is this a good plan?  Where should those C
> functions be on the tree if they are not port of anything anymore?

Almost certainly, the thing to do is absorb updated code from bind,
not roll our own.

regards, tom lane



Re: Updating timezone data to 2018c

2018-01-26 Thread Tom Lane
Michael Paquier  writes:
> I have just bumped into tzdata (https://www.iana.org/time-zones), to
> notice that 2018c has been released. Surely, there will be a refresh for
> the next release?

Yeah, it's on my to-do list for next week.

> At the same time I have played with the instructions in
> src/timezone/README to generate the attached. That's always an
> experience to take. One abbreviation has been visibly removed as far as
> I can see.  I know that Tom does that before each release, I just got
> curious about how this gets updated, and I am wondering if I got the
> thing right ;)

Seems a bit odd, because I see nothing related to West Africa in
the announcement at
http://mm.icann.org/pipermail/tz-announce/2018-January/48.html

[ pokes around... ] This must be a delayed effect of this change from
2017b:

 Namibia will switch from +01 with DST to +02 all year on
 2017-09-03 at 02:00.  This affects UT offsets starting 2018-04-01
 at 02:00.  (Thanks to Steffen Thorsen.)

Probably, at the time I checked the abbrevs list when I installed 2017b,
WAST was still being detected as a currently-in-use abbreviation.
Now it isn't.  What I'll probably do is mark the abbreviation obsolete
but leave it in place in Africa.txt, since that's what we've generally
done in the past, especially for things that were very recently current.

regards, tom lane



Re: Invalid result from hash_page_items function

2018-01-26 Thread Robert Haas
On Thu, Jan 25, 2018 at 4:50 PM, Masahiko Sawada  wrote:
> This appears at PostgreSQL 10 and current HEAD. The cause of this
> seems that hash_page_items allocates the memory space for the page
> before switching memory context. AFAICS there is no similar problem in
> pageinspect contrib module. Attached patch fixes it.

Committed and back-patched.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: \describe*

2018-01-26 Thread Andreas Karlsson
On 01/26/2018 03:49 PM, David Fetter wrote:> They are indeed terse and 
cryptic, and what's worse, they're not

available to clients other than psql, so I propose that we do what at
least MySQL, Oracle, and DB2 do and implement DESCRIBE as its own
command.

Especially handy would be a variant DESCRIBE CREATE, which would do
what it says on the label in a copy-and-paste-able form, but that's
not strictly necessary for the first cut.


I am not fan of this since I like how easy it is to explain to beginners 
that all backslash commands are processed by the client while everything 
else is handled by the server. Yes, "help" is an exception, but nobody 
really needs to know about that command.


As for the actually proposal I do not care strongly either way. The \d 
commands are a bit cryptic and unfriendly to the occasional user, but I 
am not sure that having two ways to do it would be better.


Andreas



Re: JIT compiling with LLVM v9.0

2018-01-26 Thread David Fetter
On Thu, Jan 25, 2018 at 11:20:28AM -0800, Andres Freund wrote:
> On 2018-01-25 18:40:53 +0300, Konstantin Knizhnik wrote:
> > Another question is whether it is sensible to redundantly do
> > expensive work (llvm compilation) in all backends.
> 
> Right now we kinda have to, but I really want to get rid of that.
> There's some pointers included as constants in the generated code. I
> plan to work on getting rid of that requirement, but after getting
> the basics in (i.e. realistically not this release).  Even after
> that I'm personally much more interested in caching the generated
> code inside a backend, rather than across backends.   Function
> addresses et al being different between backends would add some
> complications, can be overcome, but I'm doubtful it's immediately
> worth it.

If we go with threading for this part, sharing that state may be
simpler.  It seems a lot of work is going into things that threading
does at a much lower developer cost, but that's a different
conversation.

> > So before starting code generation, ExecReadyCompiledExpr can first
> > build signature and check if correspondent library is already present.
> > Also it will be easier to control space used by compiled libraries in
> > this
> 
> Right, I definitely think we want to do that at some point not too far
> away in the future. That makes the applicability of JITing much broader.
> 
> More advanced forms of this are that you JIT in the background for
> frequently executed code (so not to incur latency the first time
> somebody executes). Aand/or that you emit unoptimized code the first
> time through, which is quite quick, and run the optimizer after the
> query has been executed a number of times.

Both sound pretty neat.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: \describe*

2018-01-26 Thread David Fetter
On Thu, Jan 25, 2018 at 08:11:00PM -0500, Corey Huinker wrote:
> Some of the discussions about making psql more user friendly (more
> tab completions help, exit, etc) got me thinking about other ways
> that psql could be more friendly, and the one that comes to mind is
> our terse but cryptic \d* commands.

They are indeed terse and cryptic, and what's worse, they're not
available to clients other than psql, so I propose that we do what at
least MySQL, Oracle, and DB2 do and implement DESCRIBE as its own
command.

Especially handy would be a variant DESCRIBE CREATE, which would do
what it says on the label in a copy-and-paste-able form, but that's
not strictly necessary for the first cut.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Should we introduce a "really strict" volatility about return values?

2018-01-26 Thread Robert Haas
On Thu, Jan 25, 2018 at 6:51 PM, Andres Freund  wrote:
> I don't plan to work on this anytime soon, but I though it's interesting
> enough to put out there and see what others think.

I mean, if it buys enough performance, it's probably worth doing.
Sort of annoying to have to introduce more syntax, but it could be
worth it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Updating timezone data to 2018c

2018-01-26 Thread Michael Paquier
Hi all,

I have just bumped into tzdata (https://www.iana.org/time-zones), to
notice that 2018c has been released. Surely, there will be a refresh for
the next release?

At the same time I have played with the instructions in
src/timezone/README to generate the attached. That's always an
experience to take. One abbreviation has been visibly removed as far as
I can see.  I know that Tom does that before each release, I just got
curious about how this gets updated, and I am wondering if I got the
thing right ;)

Thanks,
--
Michael
diff --git a/src/timezone/data/tzdata.zi b/src/timezone/data/tzdata.zi
index a818947009..afa8f85c43 100644
--- a/src/timezone/data/tzdata.zi
+++ b/src/timezone/data/tzdata.zi
@@ -1,3 +1,4 @@
+# version 2018c
 # This zic input file is in the public domain.
 R A 1916 o - Jun 14 23s 1 S
 R A 1916 1919 - O Sun>=1 23s 0 -
@@ -50,7 +51,6 @@ Li Africa/Abidjan Africa/Freetown
 Li Africa/Abidjan Africa/Lome
 Li Africa/Abidjan Africa/Nouakchott
 Li Africa/Abidjan Africa/Ouagadougou
-Li Africa/Abidjan Africa/Sao_Tome
 Li Africa/Abidjan Atlantic/St_Helena
 R B 1940 o - Jul 15 0 1 S
 R B 1940 o - O 1 0 0 -
@@ -237,6 +237,10 @@ Li Africa/Lagos Africa/Niamey
 Li Africa/Lagos Africa/Porto-Novo
 Z Indian/Reunion 3:41:52 - LMT 1911 Jun
 4 - +04
+Z Africa/Sao_Tome 0:26:56 - LMT 1884
+-0:36:45 - LMT 1912
+0 - GMT 2018 Ja 1 1
+1 - WAT
 Z Indian/Mahe 3:41:48 - LMT 1906 Jun
 4 - +04
 R H 1942 1943 - S Sun>=15 2 1 -
@@ -656,10 +660,10 @@ R Z 2013 ma - O lastSun 2 0 S
 Z Asia/Jerusalem 2:20:54 - LMT 1880
 2:20:40 - JMT 1918
 2 Z I%sT
-R a 1948 o - May Sun>=1 2 1 D
-R a 1948 1951 - S Sat>=8 2 0 S
-R a 1949 o - Ap Sun>=1 2 1 D
-R a 1950 1951 - May Sun>=1 2 1 D
+R a 1948 o - May Sat>=1 24 1 D
+R a 1948 1951 - S Sun>=9 0 0 S
+R a 1949 o - Ap Sat>=1 24 1 D
+R a 1950 1951 - May Sat>=1 24 1 D
 Z Asia/Tokyo 9:18:59 - LMT 1887 D 31 15u
 9 a J%sT
 R b 1973 o - Jun 6 0 1 S
@@ -3606,7 +3610,7 @@ Z America/Argentina/Ushuaia -4:33:12 - LMT 1894 O 31
 Li America/Curacao America/Aruba
 Z America/La_Paz -4:32:36 - LMT 1890
 -4:32:36 - CMT 1931 O 15
--4:32:36 1 BOST 1932 Mar 21
+-4:32:36 1 BST 1932 Mar 21
 -4 - -04
 R As 1931 o - O 3 11 1 S
 R As 1932 1933 - Ap 1 0 0 -
@@ -3658,12 +3662,13 @@ R As 2005 o - O 16 0 1 S
 R As 2006 o - N 5 0 1 S
 R As 2007 o - F 25 0 0 -
 R As 2007 o - O Sun>=8 0 1 S
-R As 2008 ma - O Sun>=15 0 1 S
+R As 2008 2017 - O Sun>=15 0 1 S
 R As 2008 2011 - F Sun>=15 0 0 -
 R As 2012 o - F Sun>=22 0 0 -
 R As 2013 2014 - F Sun>=15 0 0 -
 R As 2015 o - F Sun>=22 0 0 -
 R As 2016 2022 - F Sun>=15 0 0 -
+R As 2018 ma - N Sun>=1 0 1 S
 R As 2023 o - F Sun>=22 0 0 -
 R As 2024 2025 - F Sun>=15 0 0 -
 R As 2026 o - F Sun>=22 0 0 -
@@ -4024,6 +4029,7 @@ Z Etc/GMT+9 -9 - -09
 Z Etc/GMT+10 -10 - -10
 Z Etc/GMT+11 -11 - -11
 Z Etc/GMT+12 -12 - -12
+Z Factory 0 - -00
 Li Africa/Nairobi Africa/Asmera
 Li Africa/Abidjan Africa/Timbuktu
 Li America/Argentina/Catamarca America/Argentina/ComodRivadavia
@@ -4142,5 +4148,3 @@ Li Etc/UTC UTC
 Li Etc/UTC Universal
 Li Europe/Moscow W-SU
 Li Etc/UTC Zulu
-Li America/Los_Angeles US/Pacific-New
-Z Factory 0 - -00
diff --git a/src/timezone/known_abbrevs.txt b/src/timezone/known_abbrevs.txt
index eb48069d87..4db831c62d 100644
--- a/src/timezone/known_abbrevs.txt
+++ b/src/timezone/known_abbrevs.txt
@@ -96,7 +96,6 @@ SAST	7200
 SST	-39600
 UCT	0
 UTC	0
-WAST	7200	D
 WAT	3600
 WEST	3600	D
 WET	0
diff --git a/src/timezone/tznames/Africa.txt b/src/timezone/tznames/Africa.txt
index 0bd0c405f6..8c0eeed276 100644
--- a/src/timezone/tznames/Africa.txt
+++ b/src/timezone/tznames/Africa.txt
@@ -147,8 +147,6 @@ GMT 0# Greenwich Mean Time
 #  - SAST South Australian Standard Time (not in IANA database)
 SAST 7200# South Africa Standard Time
  # (Africa/Johannesburg)
-WAST 7200 D  # West Africa Summer Time
- # (Africa/Windhoek)
 WAT  3600# West Africa Time
  # (Africa/Bangui)
  # (Africa/Brazzaville)


signature.asc
Description: PGP signature


Re: PATCH: Exclude unlogged tables from base backups

2018-01-26 Thread Robert Haas
On Wed, Jan 24, 2018 at 1:25 PM, David Steele  wrote:
> I think you mean DEBUG1?  It's already at DEBUG2.
>
> I considered using DEBUG1 but decided against it.  The other exclusions
> will produce a limited amount of output because there are only a few of
> them.  In the case of unlogged tables there could be any number of
> exclusions and I thought that was too noisy for DEBUG1.

+1.  Even DEBUG2 seems pretty chatty for a message that just tells you
that something is working in an entirely expected fashion; consider
DEBUG3.  Fortunately, base backups are not so common that this should
cause enormous log spam either way, but keeping the amount of debug
output down to a reasonable level is an important goal.  Before
a43f1939d5dcd02f4df1604a68392332168e4be0, it wasn't really practical
to run a production server with log_min_messages lower than DEBUG2,
because you'd get so much log spam it would cause performance problems
(and maybe fill up the disk).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] WIP: Aggregation push-down

2018-01-26 Thread Robert Haas
On Fri, Jan 26, 2018 at 8:04 AM, Antonin Houska  wrote:
> So one problem is that the grouping expression can be inappropriate for
> partial aggregation even if there's no type change during the
> translation. What I consider typical for this case is that the equality
> operator used to identify groups (SortGroupClause.eqop) can return true even
> if binary (stored) values of the inputs differ. The partial aggregation could
> only take place if we had a special equality operator which distinguishes the
> *binary* values (I don't know yet how to store this operator the catalog ---
> in pg_opclass recors for the hash opclasses?)..

We don't have an operator that tests for binary equality, but it's
certainly testable from C; see datumIsEqual.  I'm not sure how much
this really helps, though.  I think it would be safe to build an
initial set of groups based on datumIsEqual comparisons and then
arrange to later merge some of the groups.  But that's not something
we ever do in the executor today, so it might require quite a lot of
hacking.  Also, it seems like it might really suck in some cases.  For
instance, consider something like SELECT scale(one.a), sum(two.a) FROM
one, two WHERE one.a = two.a GROUP BY 1.  Doing a Partial Aggregate on
two.a using datumIsEqual semantics, joining to a, and then doing a
Finalize Aggregate looks legal, but the Partial Aggregate may produce
a tremendous number of groups compared to the Finalize Aggregate.  In
other words, this technique wouldn't merge any groups that shouldn't
be merged, but it might fail to merge groups that really need to be
merged to get good performance.

> One of my ideas is to check whether the source and destination types are
> binary coercible (i.e. pg_cast(castfunc) = 0) but this might be a misuse of
> the binary coercibility.

Yeah, binary coercibility has nothing to do with this; that tells you
whether the two types are the same on disk, not whether they have the
same equality semantics.  For instance, I think text and citext are
binary coercible, but their equality semantics are not the same.

> Another idea is to allow only such changes that the
> destination type is in the same operator class as the source, and explicitly
> enumerate the "safe opclasses". But that would mean that user cannot define
> new opclasses within which the translation is possible --- not sure this is a
> serious issue.

Enumerating specific opclasses in the source code is a non-starter --
Tom Lane would roll over in his grave if he weren't still alive.  What
we could perhaps consider doing is adding some mechanism for an
opclass or opfamily to say whether its equality semantics happen to be
exactly datumIsEqual() semantics.  That's a little grotty because it
leaves data types for which that isn't true out in the cold, but on
the other hand it seems like it would be useful as a way of optimizing
a bunch of things other than this.  Maybe it could also include a way
to specify that the comparator happens to have the semantics as C's
built-in < and > operators, which seems like a case that might also
lend itself to some optimizations.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Setting BLCKSZ 4kB

2018-01-26 Thread Bruce Momjian
On Wed, Jan 17, 2018 at 02:10:10PM +0100, Fabien COELHO wrote:
> 
> Hello,
> 
> >What are the cons of setting BLCKSZ as 4kB? When saw the results published
> >on [...].
> 
> There were other posts and publications which points to the same direction
> consistently.
> 
> This matches my deep belief is that postgres default block size is a
> reasonable compromise for HDD, but is less pertinent for SSD for most OLTP
> loads.
> 
> For OLAP, I do not think it would lose much, but I have not tested it.
> 
> >Does turning off FPWs will be safe if BLCKSZ is set to 4kB given page size
> >of file system is 4kB?
> 
> FPW = Full Page Write. I would not bet on turning off FPW, ISTM that SSDs
> can have "page" sizes as low as 512 bytes, but are typically 2 kB or 4 kB,
> and the information easily available anyway.

Yes, that is the hard part, making sure you have 4k granularity of
write, and matching write alignment.  pg_test_fsync and diskchecker.pl
(which we mention in our docs) will not help here.  A specific alignment
test based on diskchecker.pl would have to be written.  However, if you
look at the kernel code you might be able to verify quickly that the 4k
atomicity is not guaranteed.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Configuring messages language on Windows

2018-01-26 Thread a . parfenov
Hello hackers,

As it mentioned in pg_locale.c, the variable LC_MESSAGES is ignored in 
Windows(pg_locale.c:162). In other systems, this variable is
used to select a messages language. But in Windows, the language is
selected based on system locale and couldn't be changed via
configuration. Additionally, this affects regress tests, since language
for messages generated by psql is also configured via LC_MESSAGES and
ignored on Windows installations and cause failure of tests on
non-English Windows installations.

I've done a little patch to fix that problem via usage of LANGUAGE 
variable on Windows systems. To get more information about LC_MESSAGES, 
LANGUAGE and other variable used in GNU gettext look at documentation 
[1].
IMHO that patch is more like a workaround and I'm not sure that it is 
safe for all combination of systems/compilers. I think we can find a
better way to solve the problem.

Also, there is a problem of mixing encoding in the log in case of 
databases with different encoding on one server. I didn't find any good 
solution how to work in such a case because each backend should use its 
own encoding in order to work with data and client properly. This 
problem is not Windows specific, but most of the OS use one Unicode 
encoding for all languages (e.g. UTF-8).

The main point of this message is to say about the problem, try to
find an appropriate solution and ask community's opinion about the
problem.

[1] 
https://www.gnu.org/software/gettext/manual/html_node/Locale-Environment-Variables.htmldiff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index a3dc3be5a8..2548f6bbd2 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -216,6 +216,11 @@ pg_perm_setlocale(int category, const char *locale)
 			envvar = "LC_MESSAGES";
 			envbuf = lc_messages_envbuf;
 #ifdef WIN32
+			/*
+			 * Use LANGUAGE instead of LC_MESSAGES since Windows doesn't support
+			 * LC_MESSAGES environment variable.
+			 */
+			envvar = "LANGUAGE";
 			result = IsoLocaleName(locale);
 			if (result == NULL)
 result = (char *) locale;
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index a1ee1041b4..631f31729e 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -760,6 +760,13 @@ initialize_environment(void)
 	unsetenv("LANGUAGE");
 	unsetenv("LC_ALL");
 	putenv("LC_MESSAGES=C");
+#ifdef WIN32
+	/*
+	 * Use LANGUAGE instead of LC_MESSAGES since Windows doesn't support
+	 * LC_MESSAGES environment variable.
+	 */
+	putenv("LANGUAGE=C");
+#endif
 
 	/*
 	 * Set encoding as requested

Re: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory

2018-01-26 Thread Robert Haas
On Thu, Jan 25, 2018 at 8:54 PM, Tsunakawa, Takayuki
 wrote:
> Yes, that's pg_test_fsync output.  Isn't pg_test_fsync the tool to determine 
> the value for wal_sync_method?  Is this manual misleading?

Hmm.  I hadn't thought about it as misleading, but now that you
mention it, I'd say that it probably is.  I suspect that there should
be a disclaimer saying that the fastest WAL sync method in terms of
ops/second is not necessarily the one that will deliver the best
database performance, and mention the issues around open_sync and
open_datasync specifically.  But let's see what your testing shows;
I'm talking based on now-fairly-old experience with this and a passing
familiarity with the relevant source code.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] WIP: Aggregation push-down

2018-01-26 Thread Antonin Houska
Robert Haas  wrote:

> On Fri, Dec 22, 2017 at 10:43 AM, Antonin Houska  wrote:
> > Michael Paquier  wrote:
> >> On Sat, Nov 4, 2017 at 12:33 AM, Antonin Houska  wrote:
> >> > I'm not about to add any other features now. Implementation of the 
> >> > missing
> >> > parts (see the TODO comments in the code) is the next step. But what I'd
> >> > appreciate most is a feedback on the design. Thanks.
> >>
> >> I am getting a conflict after applying patch 5 but this did not get
> >> any reviews so moved to next CF with waiting on author as status.
> >
> > Attached is the next version.
> 
> I've been a bit confused for a while about what this patch is trying
> to do, so I spent some time today looking at it to try to figure it
> out.

Thanks.

> There's a lot I don't understand yet, but it seems like the general idea is
> to build, potentially for each relation in the join tree, not only the
> regular list of paths but also a list of "grouped" paths.  If the
> pre-grouped path wins, then we can get a final path that looks like Finalize
> Aggregate -> Some Joins -> Partial Aggregate -> Maybe Some More Joins ->
> Base Table Scan.

Yes. The regression test output shows some more plans.

> In some cases the patch seems to replace that uppermost Finalize Aggregate
> with a Result node.

This is a feature implemented in 09_avoid_agg_finalization.diff. You can
ignore it for now, it's of lower priority than the preceding parts and needs
more work to be complete.

> translate_expression_to_rels() looks unsafe.  Equivalence members are
> known to be equal under the semantics of the relevant operator class,
> but that doesn't mean that one can be freely substituted for another.
> For example:
> 
> rhaas=# create table one (a numeric);
> CREATE TABLE
> rhaas=# create table two (a numeric);
> CREATE TABLE
> rhaas=# insert into one values ('0');
> INSERT 0 1
> rhaas=# insert into two values ('0.00');
> INSERT 0 1
> rhaas=# select one.a, count(*) from one, two where one.a = two.a group by 1;
>  a | count
> ---+---
>  0 | 1
> (1 row)
> 
> rhaas=# select two.a, count(*) from one, two where one.a = two.a group by 1;
>   a   | count
> --+---
>  0.00 | 1
> (1 row)
> 
> There are, admittedly, a large number of data types for which such a
> substitution would work just fine.  numeric will not, citext will not,
> but many others are fine. Regrettably, we have no framework in the
> system for identifying equality operators which actually test identity
> versus some looser notion of equality.  Cross-type operators are a
> problem, too; if we have foo.x = bar.y in the query, one might be int4
> and the other int8, for example, but they can still belong to the same
> equivalence class.
> 
> Concretely, in your test query "SELECT p.i, avg(c1.v) FROM
> agg_pushdown_parent AS p JOIN agg_pushdown_child1 AS c1 ON c1.parent =
> p.i GROUP BY p.i" you assume that it's OK to do a Partial
> HashAggregate over c1.parent rather than p.i.  This will be false if,
> say, c1.parent is of type citext and p.i is of type text; this will
> get grouped together that shouldn't.

I've really missed this problem, thanks for bringing it up! Your
considerations made me think of the specifics of the types like numeric or
citext. Although your example does not demonstrate what I consider the core
issue, I believe we eventually think of the same.

Consider this example query (using the tables you defined upthread):

SELECT  one.a
FROMone, two
WHERE   one.a = two.a AND scale(two.a) > 1
GROUP BY1

I we first try to aggregate "two", then evaluate WHERE clause and then
finalize the aggregation, the partial aggregation of "two" can put various
binary values of the "a" column (0, 0.0, etc.) into the same group so the
scale of the output (of the partial agregation) will be undefined. Thus the
aggregation push-down will change the input of the WHERE clause.

So one problem is that the grouping expression can be inappropriate for
partial aggregation even if there's no type change during the
translation. What I consider typical for this case is that the equality
operator used to identify groups (SortGroupClause.eqop) can return true even
if binary (stored) values of the inputs differ. The partial aggregation could
only take place if we had a special equality operator which distinguishes the
*binary* values (I don't know yet how to store this operator the catalog ---
in pg_opclass recors for the hash opclasses?)..

On the other hand, if the grouping expression is not a plain Var and there's
no type change during the translation and the expression output type is not of
the "identity-unsafe type" (numeric, citext, etc.), input vars of the
"identity-unsafe type"should not prevent us from using the expression for
partial aggregation: in such a case the grouping keys are computed in the same
way they would be w/o the partial aggregation.

The other problem is 

Re: AS OF queries

2018-01-26 Thread Bruce Momjian
On Fri, Jan 26, 2018 at 10:56:06AM +0300, Konstantin Knizhnik wrote:
> >>Yeh, I suspected that just disabling autovacuum was not enough.
> >>I heard (but do no know too much) about microvacuum and hot updates.
> >>This is why I was a little bit surprised when me test didn't show lost of 
> >>updated versions.
> >>May be it is because of vacuum_defer_cleanup_age.
> >Well vacuum and single-page pruning do 3 things:
> >
> >1.  remove expired updated rows
> >2.  remove deleted row
> >3.  remove rows from aborted transactions
> >
> >While time travel doesn't want #1 and #2, it probably wants #3.
> >
> Rows of aborted transactions are in any case excluded by visibility checks.
> Definitely skipping them costs some time, so large percent of aborted
> transactions  may affect query speed.
> But query speed is reduced in any case if in order to support time travel we
> prohibit or postpone vacuum.
> 
> What is the expected relation of committed and aborted transactions? I
> expected that it should be much bigger than one (especially if we take in
> account
> only read-write transaction which has really updated database). In this case
> number of versions created by aborted transaction should be much smaller
> than number of versions created by updated/delete of successful
> transactions. So them should not have significant impact on performance.

Uh, I think the big question is whether we are ready to agreed that a
time-travel database will _never_ have aborted rows removed.  The
aborted rows are clearly useless for time travel, so the question is
whether we ever want to remove them.  I would think at some point we do.

Also, I am not sure we have any statistics on how many aborted rows are
in each table.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: [Sender Address Forgery]Re: pg_(total_)relation_size and partitioned tables

2018-01-26 Thread Michael Paquier
On Fri, Jan 26, 2018 at 07:00:43PM +0900, Amit Langote wrote:
> I wonder what pg_partition_tree_tables() should return when passed a table
> that doesn't have partitions under it?  Return a 1-member set containing
> itself?

Yes.  A table alone is itself part of a partition set, so the result
should be made of only itself.

> I also mean for tables that may inheritance children established
> through plain old inheritance.

There could be value in having a version dedicated to inheritance trees
as well, true enough.  As well as value in having something that shows
both.  Still let's not forget that partition sets are structured so as
the parents have no data, so I see more value in having only partitions
listed, without the INHERIT part.  Opinions from others are of course
welcome.
--
Michael


signature.asc
Description: PGP signature


Re: list partition constraint shape

2018-01-26 Thread Etsuro Fujita

(2018/01/26 10:15), Amit Langote wrote:

On 2018/01/25 21:17, Etsuro Fujita wrote:

Some minor comments:

+   /*
+* Construct an ArrayExpr for the non-null partition
+* values
+*/
+   arrexpr = makeNode(ArrayExpr);
+   arrexpr->array_typeid =
+   !type_is_array(key->parttypid[0])
+   ? get_array_type(key->parttypid[0])
+   : key->parttypid[0];

We test the type_is_array() above in this bit, so I don't think we need to
test that again here.


Ah, you're right.  Fixed.


Thanks.  I think the updated version is fine, but I think we can 
simplify the change in this part a bit further, so I modified your 
patch.  I also adjusted some comments in that change a little bit. 
Attached is a modified version of the patch.  What do you think about 
that?  Please let me know.  If that is okay, I'll mark this as Ready for 
Committer.



Attached updated patch.  Thanks again.


Thanks for updating the patch!

Best regards,
Etsuro Fujita
*** a/src/backend/catalog/partition.c
--- b/src/backend/catalog/partition.c
***
*** 1625,1642  make_partition_op_expr(PartitionKey key, int keynum,
  	{
  		case PARTITION_STRATEGY_LIST:
  			{
! ScalarArrayOpExpr *saopexpr;
  
! /* Build leftop = ANY (rightop) */
! saopexpr = makeNode(ScalarArrayOpExpr);
! saopexpr->opno = operoid;
! saopexpr->opfuncid = get_opcode(operoid);
! saopexpr->useOr = true;
! saopexpr->inputcollid = key->partcollation[keynum];
! saopexpr->args = list_make2(arg1, arg2);
! saopexpr->location = -1;
  
! result = (Expr *) saopexpr;
  break;
  			}
  
--- 1625,1682 
  	{
  		case PARTITION_STRATEGY_LIST:
  			{
! List	   *elems = (List *) arg2;
! int			nelems = list_length(elems);
  
! Assert(keynum == 0);
! if (nelems > 1 &&
! 	!type_is_array(key->parttypid[keynum]))
! {
! 	ArrayExpr  *arrexpr;
! 	ScalarArrayOpExpr *saopexpr;
! 
! 	/* Construct an ArrayExpr for the right-hand inputs */
! 	arrexpr = makeNode(ArrayExpr);
! 	arrexpr->array_typeid =
! 	get_array_type(key->parttypid[keynum]);
! 	arrexpr->array_collid = key->parttypcoll[keynum];
! 	arrexpr->element_typeid = key->parttypid[keynum];
! 	arrexpr->elements = elems;
! 	arrexpr->multidims = false;
! 	arrexpr->location = -1;
! 
! 	/* Build leftop = ANY (rightop) */
! 	saopexpr = makeNode(ScalarArrayOpExpr);
! 	saopexpr->opno = operoid;
! 	saopexpr->opfuncid = get_opcode(operoid);
! 	saopexpr->useOr = true;
! 	saopexpr->inputcollid = key->partcollation[keynum];
! 	saopexpr->args = list_make2(arg1, arrexpr);
! 	saopexpr->location = -1;
! 
! 	result = (Expr *) saopexpr;
! }
! else
! {
! 	List	   *elemops = NIL;
! 	ListCell   *lc;
! 
! 	foreach (lc, elems)
! 	{
! 		Expr   *elem = lfirst(lc),
! 			   *elemop;
  
! 		elemop = make_opclause(operoid,
! 			   BOOLOID,
! 			   false,
! 			   arg1, elem,
! 			   InvalidOid,
! 			   key->partcollation[keynum]);
! 		elemops = lappend(elemops, elemop);
! 	}
! 
! 	result = nelems > 1 ? makeBoolExpr(OR_EXPR, elemops, -1) : linitial(elemops);
! }
  break;
  			}
  
***
*** 1758,1768  get_qual_for_list(Relation parent, PartitionBoundSpec *spec)
  	PartitionKey key = RelationGetPartitionKey(parent);
  	List	   *result;
  	Expr	   *keyCol;
- 	ArrayExpr  *arr;
  	Expr	   *opexpr;
  	NullTest   *nulltest;
  	ListCell   *cell;
! 	List	   *arrelems = NIL;
  	bool		list_has_null = false;
  
  	/*
--- 1798,1807 
  	PartitionKey key = RelationGetPartitionKey(parent);
  	List	   *result;
  	Expr	   *keyCol;
  	Expr	   *opexpr;
  	NullTest   *nulltest;
  	ListCell   *cell;
! 	List	   *elems = NIL;
  	bool		list_has_null = false;
  
  	/*
***
*** 1828,1834  get_qual_for_list(Relation parent, PartitionBoundSpec *spec)
  			false,	/* isnull */
  			key->parttypbyval[0]);
  
! 			arrelems = lappend(arrelems, val);
  		}
  	}
  	else
--- 1867,1873 
  			false,	/* isnull */
  			key->parttypbyval[0]);
  
! 			elems = lappend(elems, val);
  		}
  	}
  	else
***
*** 1843,1872  get_qual_for_list(Relation parent, PartitionBoundSpec *spec)
  			if (val->constisnull)
  list_has_null = true;
  			else
! arrelems = lappend(arrelems, copyObject(val));
  		}
  	}
  
! 	if (arrelems)
  	{
! 		/* Construct an ArrayExpr for the non-null partition values */
! 		arr = makeNode(ArrayExpr);
! 		arr->array_typeid = !type_is_array(key->parttypid[0])
! 			? get_array_type(key->parttypid[0])
! 			: key->parttypid[0];
! 		arr->array_collid = key->parttypcoll[0];
! 		arr->element_typeid = key->parttypid[0];
! 		arr->elements = arrelems;
! 		arr->multidims = false;

Re: [HACKERS] [PATCH] Lockable views

2018-01-26 Thread Yugo Nagata
On Thu, 25 Jan 2018 20:51:41 +1300
Thomas Munro  wrote:

> On Sun, Dec 31, 2017 at 11:57 PM, Yugo Nagata  wrote:
> > On Fri, 29 Dec 2017 23:39:39 +0900 (JST)
> > Tatsuo Ishii  wrote:
> >> Your addition to the doc:
> >> +   Automatically updatable views (see )
> >> +   that do not have INSTEAD OF triggers or INSTEAD
> >> +   rules are also lockable. When a view is locked, its base relations are
> >> +   also locked recursively with the same lock mode.
> >
> > I added this point to the documentation.
> 
> +   Automatically updatable views (see )
> +   that do not have INSTEAD OF triggers or INSTEAD

Thank you for pointing out this.

Also, I've noticed this patch cannot be applied to the HEAD, so I'll fix
them together and update the patch.

> 
> Tthe documentation doesn't build: you now need to say 
> instead of , and you need to say .
> 
> -- 
> Thomas Munro
> http://www.enterprisedb.com
> 


-- 
Yugo Nagata 



Re: STATISTICS retained in CREATE TABLE ... LIKE (INCLUDING ALL)?

2018-01-26 Thread Tels
Moin,

On Fri, January 26, 2018 2:30 am, David Rowley wrote:
> On 21 January 2018 at 19:21, David Rowley 
> wrote:
>> On 20 January 2018 at 18:50, Tom Lane  wrote:
>>> Stephen Froehlich  writes:
 Are custom statistics in PG10 retained in LIKE (INCLUDING ALL) or do I
 need to recreate the statistics by hand each time I create a new
 table?
>>>
>>> LIKE doesn't know about those, no.  Perhaps it should.
>>
>> Agreed. ALL does not mean MOST.
>
> (Moving to -hackers)
>
> The attached implements this.
>
> Looking at "LIKE ALL" in more detail in the docs it claims to be
> equivalent to "INCLUDING DEFAULTS INCLUDING IDENTITY INCLUDING
> CONSTRAINTS INCLUDING INDEXES INCLUDING STORAGE INCLUDING COMMENTS",
> so it didn't seem right to magically have LIKE ALL include something
> that there's no option to include individually, so I added INCLUDING
> STATISTICS.

Note sure if someone want's to exclude statistics, but it is good that one
can. So +1 from me.

Looking at the patch, at first I thought the order was sorted and you
swapped STORAGE and STATISTICS by accident. But then, it seems the order
is semi-random. Should that list be sorted or is it already sorted by some
criteria that I don't see?

-  INCLUDING DEFAULTS INCLUDING IDENTITY INCLUDING
CONSTRAINTS INCLUDING INDEXES INCLUDING STORAGE INCLUDING
COMMENTS.
+  INCLUDING DEFAULTS INCLUDING IDENTITY INCLUDING
CONSTRAINTS INCLUDING INDEXES INCLUDING STORAGE INCLUDING STATISTICS
INCLUDING COMMENTS.

Best wishes,

Tels



Re: \describe*

2018-01-26 Thread Laurenz Albe
Corey Huinker wrote:
> Some of the discussions about making psql more user friendly (more tab 
> completions help, exit, etc) got me thinking about other ways that psql could 
> be more friendly, and the one that comes to mind is our terse but cryptic \d* 
> commands.
> 
> I think it would be helpful and instructive to have corresponding long-form 
> describe commands. 
> 
> Take describing schemas. Is \dn intuitive? Not really. In hindsight, you may 
> think "yeah, a schema is a namespace", but you never guessed 'n' on the first 
> try, or the second.
> 
> Looking over exec_command_d() a bit, I think it's a bit of a stretch do have 
> each command handle a long form like this:
> 
> \describe table my_table
> or
> \describe table verbose my_table
> 
> because then each \d-variant has to account for objects named "table" and 
> "verbose" and that's a path to unhappiness.
> 
> But if we dash-separated them, then all of the strcmps would be in the 'e' 
> subsection, and each one would just have to know it's long to short 
> translation, and call exec_command_d with the corresponding short command
> 
> describe => d
> describe-verbose => d+
> describe-aggregates-verbose => da+
> describe-roles => du
> 
> We could even presume the verbose flag in all cases (after all, the user was 
> being verbose...), which would also cut down on tab-completion results, and 
> we could check for interactive mode and display a message like
> 
> \describe-schemas (short: \dn+)
> 
> so that the person has the opportunity to learn the corresponding short 
> command.
> 
> In additional to aiding tab completion discovery of the commands (i.e. typing 
> "\desc" and then hitting tab, it would also make scripts a little more 
> self-documenting.
> 
> Thoughts?

I'm somewhat -1 on this.

It would be about as hard to memorize \describe-schemas as it is to memorize 
\dn:
You'd have to remember that it is "-" and not "_", that it is "describe", not 
"desc"
and that it is "schemas", not "schema".

Moreover, it would be as awkward to have
   \describe-schemas public
as it would be to list all schemas with
   \describe-schema

But my strongest criticism is that the \d* commands are for interactive use,
and who wants to type in a long string like that?  The beginner won't be able to
guess the correct command, and the experienced user would refuse to use it.


Having said all that, I can imagine that having \desc and \describe as an
alternative to \d would help beginners who come e.g. from Oracle,
but that would mean a change of the current behavior:

test=> \describe
 List of foreign servers
  Name  |  Owner   | Foreign-data wrapper 
+--+--
 oracle | postgres | oracle_fdw
(1 row)

This is because \des lists foreign servers, and the rest of the command is 
ignored.

Yours,
Laurenz Albe



Re: Redefining inet_net_ntop

2018-01-26 Thread Emre Hasegeli
> port.h declares inet_net_ntop and we always compile our own from
> port/inet_net_ntop.c .

There is another copy of it under backend/utils/adt/inet_cidr_ntop.c.
The code looks different but does 90% the same thing.  Their naming
and usage is confusing.

I recently needed to format IP addresses as DNS PTR records in the
database, and got annoyed by having no functions that outputs IPv6
addresses in easily parseable format like
:::::::.  I was going to send a patch
to unify those C functions and add another SQL function to get
addresses in such format.  Is this a good plan?  Where should those C
functions be on the tree if they are not port of anything anymore?



Re: JIT compiling with LLVM v9.0

2018-01-26 Thread Konstantin Knizhnik



On 26.01.2018 11:23, Andres Freund wrote:

Hi,

Thanks for testing things out!


Thank you for this work.
One more question: do you have any idea how to profile JITed code?
There is no LLVMOrcRegisterPerf in LLVM 5, so jit_profiling_support 
option does nothing.

And without it perf is not able to unwind stack trace for generated code.
A attached the produced profile, looks like "unknown" bar corresponds to 
JIT code.


There is NoFramePointerElim option in LLVMMCJITCompilerOptions 
structure, but it requires use of ExecutionEngine.

Something like this:

    mod = llvm_mutable_module(context);
    {
        struct LLVMMCJITCompilerOptions options;
        LLVMExecutionEngineRef jit;
        char* error;
        LLVMCreateExecutionEngineForModule(, mod, );
        LLVMInitializeMCJITCompilerOptions(, sizeof(options));
        options.NoFramePointerElim = 1;
        LLVMCreateMCJITCompilerForModule(, mod, , 
sizeof(options),

                                         );
    }
    ...

But you are compiling code using LLVMOrcAddEagerlyCompiledIR
and I find no way to pass no-omit-frame pointer option here.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [Sender Address Forgery]Re: pg_(total_)relation_size and partitioned tables

2018-01-26 Thread Amit Langote
On 2018/01/22 11:44, Michael Paquier wrote:
> On Sun, Jan 21, 2018 at 07:16:38PM +1300, David Rowley wrote:
>> On 20 January 2018 at 23:14, Michael Paquier  
>> wrote:
>>> If pg_partition_tree_tables() uses the top of the partition as input,
>>> all the tree should show up. If you use a leaf, anything under the leaf
>>> should show up. If a leaf is defined and it has no underlying leaves,
>>> then only this outmost leaf should be listed.
>>
>> hmm, that's thoroughly confusing. Just in case anyone else is stuck on
>> that, I just need to mention that a leaf is the does not have
>> branches, in nature or computer science.
> 
> OK, sorry if my words are confusing. Imagine that you have the following
> partition set:
> 
>p
>   / \
>  /   \
> p1   p2
> /  \
>/\
>   p21   p22
> 
> If pg_partition_tree_tables() uses 'p' as input argument, then I would
> imagine that it should return p, p1, p2, p21 and p22. If 'p2' is used,
> then p2, p21 and p22 are the results. If either one of p1, p21 or p22 is
> used as input, then the result is respectively p1, p21 or p22.
> 
> Amit's patch seems to be doing that kind of logic by using
> find_all_inheritors, which is good. We need to make the difference
> between relations that are part of a partition set and the other ones
> which are part of an INHERIT link, and, at least it seems to me, the
> patch is not careful with that. I haven't tested what is proposed
> though, but this portion likely needs more thoughts.

Yeah, I think I completely missed that part.

I wonder what pg_partition_tree_tables() should return when passed a table
that doesn't have partitions under it?  Return a 1-member set containing
itself?  I also mean for tables that may inheritance children established
through plain old inheritance.

Thanks,
Amit




relispartition for index partitions

2018-01-26 Thread Amit Langote
Hi.

I noticed that relispartition isn't set for index's partitions.

create table p (a int) partition by list (a);
create table p12 partition of p for values in (1, 2);
create index on p (a);
select relname, relkind from pg_class where relnamespace =
'public'::regnamespace and relispartition is true;
 relname | relkind
-+-
 p12 | r
(1 row)

Is that intentional?

Thanks,
Amit




Re: JIT compiling with LLVM v9.0

2018-01-26 Thread Andres Freund
Hi,

Thanks for testing things out!


On 2018-01-26 10:44:24 +0300, Konstantin Knizhnik wrote:
> Also I noticed that parallel execution didsables JIT.

Oh, oops, I broke that recently by moving where the decisition about
whether to jit or not is. There actually is JITing, but only in the
leader.


> Are there any principle problems with combining JIT and parallel execution?

No, there's not, I just need to send down the flag to JIT down to the
workers. Will look at it tomorrow.  If you want to measure / play around
till then you can manually hack the PGJIT_* checks in execExprCompile.c

with that done, on my laptop, tpch-Q01, scale 10:

SET max_parallel_workers_per_gather=0; SET jit_expressions = 1;
15145.508 ms
SET max_parallel_workers_per_gather=0; SET jit_expressions = 0;
23808.809 ms
SET max_parallel_workers_per_gather=4; SET jit_expressions = 1;
4775.170 ms
SET max_parallel_workers_per_gather=4; SET jit_expressions = 0;
7173.483 ms

(that's with inlining and deforming enabled too)

Greetings,

Andres Freund



Rewriting the test of pg_upgrade as a TAP test - take two

2018-01-26 Thread Michael Paquier
Hi all,

As promised on a recent thread, here is a second tentative to switch
pg_upgrade's test.sh into a TAP infrastructure.

This is a continuation of the following thread:
https://www.postgresql.org/message-id/CAB7nPqRdaN1A1YNjxNL9T1jUEWct8ttqq29dNv8W_o37%2Be8wfA%40mail.gmail.com

To begin with, compared to last version, I found and fixed a couple of
bugs, and I also implemented an interface which allows for tests of
pg_upgrade across major versions.  The patch set is as follows:
- 0001 refactors PostgresNode.pm so as a node's binary path is
registered when created and can be enforced by the test creating the
node via get_new_node.  That's useful for pg_upgrade because we want to
use different binary paths for the node to-be-upgraded and the new one.
This is also useful for test suites willing to work on cross-version
tests.
- 0002 begins the move to a TAP suite by introducing a base set of tests
for pg_upgrade.
- 0003 is the real meat, and removes test.sh in favor of a TAP script
aimed at supporting tests using the same version as well as
cross-version tests.

In order to do that, I have changed a bit the interface used for those
tests.  A use can enforce the following variables to run a test using a
past version, while the new version is the one of the tree where the TAP
script is kicked:
export oldsrc=...somewhere/postgresql (old version's source tree)
export oldbindir=...otherversion/bin  (old version's installed bin dir)
export oldlibdir=...otherversion/lib  (old version's installed lib dir)
make check

That is documented as well in TESTING in the patch.  While the patch is
working great when working on the same major version, there is still an
issue I am facing related to the loading of shared libraries in
src/test/regress/ for the old server's version after upgrading it. Even
by tweaking LD_LIBRARY_PATH I could not get the path load. The patch has
been tweaked so as the objects from regression tests that depend on
regress.so, autoinc.so and refint.so are not generated, allowing
pg_upgrade to work, which is definitely wrong.  I am sure that I missed
a trick here but it is Friday here ;p

Note also that the diff of the dump still needs to be eyeballed for two
reasons:
- hash index handling is different in v10~, as those exist now in the
dumps taken.
- PostgreSQL origin version string shows up.
We could apply as well some post-filters to allow the test to pass, but
that's one subject I would like to discuss on this thread.

Another topic that I would like to discuss is how this interface is fit
for the buildfarm code.  After hacking my stuff, I have looked at the
buildfarm code to notice that things like TestUpgradeXversion.pm do
*not* make use of test.sh, and that what I hacked is much similar to
what the buildfarm code is doing, which is itself roughly a copy of what
test.sh does.  Andrew, your feedback would be important here.

So, thoughts?
--
Michael
From 1294bb18426cea5c0764d0d07846fee291502b73 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Wed, 24 Jan 2018 16:19:53 +0900
Subject: [PATCH 1/3] Refactor PostgresNode.pm so as nodes can use custom
 binary path

This is a requirement for having a proper TAP test suite for pg_upgrade
where users have the possibility to manipulate nodes which use different
set of binaries during the upgrade processes.
---
 src/test/perl/PostgresNode.pm | 80 +--
 1 file changed, 62 insertions(+), 18 deletions(-)

diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 1d5ac4ee35..c0892fb8a0 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -133,7 +133,7 @@ INIT
 
 =over
 
-=item PostgresNode::new($class, $name, $pghost, $pgport)
+=item PostgresNode::new($class, $name, $pghost, $pgport, $bindir)
 
 Create a new PostgresNode instance. Does not initdb or start it.
 
@@ -144,13 +144,14 @@ of finding port numbers, registering instances for cleanup, etc.
 
 sub new
 {
-	my ($class, $name, $pghost, $pgport) = @_;
+	my ($class, $name, $pghost, $pgport, $bindir) = @_;
 	my $testname = basename($0);
 	$testname =~ s/\.[^.]+$//;
 	my $self = {
 		_port=> $pgport,
 		_host=> $pghost,
 		_basedir => "$TestLib::tmp_check/t_${testname}_${name}_data",
+		_bindir  => $bindir,
 		_name=> $name,
 		_logfile => "$TestLib::log_path/${testname}_${name}.log" };
 
@@ -284,6 +285,20 @@ sub data_dir
 
 =pod
 
+=item $node->bin_dir()
+
+Returns the path to the binary directory used by this node.
+
+=cut
+
+sub bin_dir
+{
+	my ($self) = @_;
+	return $self->{_bindir};
+}
+
+=pod
+
 =item $node->archive_dir()
 
 If archiving is enabled, WAL files go here.
@@ -328,6 +343,7 @@ sub info
 	open my $fh, '>', \$_info or die;
 	print $fh "Name: " . $self->name . "\n";
 	print $fh "Data directory: " . $self->data_dir . "\n";
+	print $fh "Binary directory: " . $self->bin_dir . "\n";
 	print $fh "Backup directory: " . $self->backup_dir . "\n";
 	print $fh "Archive