date:20160908

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-09-08 Thread Claudio Freire

On Tue, Sep 6, 2016 at 8:28 PM, Peter Geoghegan  wrote:
>> In the meanwhile, I'll go and do some perf testing.
>>
>> Assuming the speedup is realized during testing, LGTM.
>
> Thanks. I suggest spending at least as much time on unsympathetic
> cases (e.g., only 2 or 3 tapes must be merged). At the same time, I
> suggest focusing on a type that has relatively expensive comparisons,
> such as collated text, to make differences clearer.

The tests are still running (the benchmark script I came up with runs
for a lot longer than I anticipated, about 2 days), but preliminar
results are very promising, I can see a clear and consistent speedup.
We'll have to wait for the complete results to see if there's any
significant regression, though. I'll post the full results when I have
them, but until now it all looks like this:

setup:

create table lotsofitext(i text, j text, w text, z integer, z2 bigint);
insert into lotsofitext select cast(random() * 10.0 as text)
|| 'blablablawblabla', cast(random() * 10.0 as text) ||
'blablablawjjjblabla', cast(random() * 10.0 as text) ||
'blablabl
awwwabla', random() * 10.0, random() * 1.0 from
generate_series(1, 1000);

timed:

select count(*) FROM (select * from lotsofitext order by i, j, w, z, z2) t;

Unpatched Time: 100351.251 ms
Patched Time: 75180.787 ms

That's like a 25% speedup on random input. As we say over here, rather
badly translated, not a turkey's boogers (meaning "nice!")


On Tue, Sep 6, 2016 at 9:50 PM, Claudio Freire  wrote:
> On Tue, Sep 6, 2016 at 9:19 PM, Peter Geoghegan  wrote:
>> On Tue, Sep 6, 2016 at 4:55 PM, Claudio Freire  
>> wrote:
>>> I noticed, but here n = state->memtupcount
>>>
>>> +   Assert(memtuples[0].tupindex == newtup->tupindex);
>>> +
>>> +   CHECK_FOR_INTERRUPTS();
>>> +
>>> +   n = state->memtupcount; /* n is heap's size,
>>> including old root */
>>> +   imin = 0;   /*
>>> start with caller's "hole" in root */
>>> +   i = imin;
>>
>> I'm fine with using "n" in the later assertion you mentioned, if
>> that's clearer to you. memtupcount is broken out as "n" simply because
>> that's less verbose, in a place where that makes things far clearer.
>>
>>> In fact, the assert on the patch would allow writing memtuples outside
>>> the heap, as in calling tuplesort_heap_root_displace if
>>> memtupcount==0, but I don't think that should be legal (memtuples[0]
>>> == memtuples[imin] would be outside the heap).
>>
>> You have to have a valid heap (i.e. there must be at least one
>> element) to call tuplesort_heap_root_displace(), and it doesn't
>> directly compact the heap, so it must remain valid on return. The
>> assertion exists to make sure that everything is okay with a
>> one-element heap, a case which is quite possible.
>
> More than using "n" or "memtupcount" what I'm saying is to assert that
> memtuples[imin] is inside the heap, which would catch the same errors
> the original assert would, and more.
>
> Assert(imin < state->memtupcount)
>
> If you prefer.
>
> The original asserts allows any value of imin for memtupcount>1, and
> that's my main concern. It shouldn't.

So, for the assertions to properly avoid clobbering/reading out of
bounds memory, you need both the above assert:

 +*/
 +   memtuples[i] = memtuples[imin];
 +   i = imin;
 +   }
 +
>+   Assert(imin < state->memtupcount);
 +   memtuples[imin] = *newtup;
 +}

And another one at the beginning, asserting:

 +   SortTuple  *memtuples = state->memtuples;
 +   int n,
 +   imin,
 +   i;
 +
>+   Assert(state->memtupcount > 0 && memtuples[0].tupindex == 
>newtup->tupindex);
 +
 +   CHECK_FOR_INTERRUPTS();

It's worth making that change, IMHO, unless I'm missing something.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CF app and patch series

2016-09-08 Thread Alvaro Herrera

Craig Ringer wrote:
> Hi all
> 
> Now that it's becoming more common to post patch series, not just
> standalone patches, it might be worth looking at how the CF app can
> help manage them.
> 
> Any ideas?

I agree that we don't consider this case at all currently and that it's
causing some pain.  

MIME parsing is mind-boggling.  Trying to figure out *one* patch file
from an email is already pretty difficult.  Trying to figure out more
than one might be nightmarish.  Maybe Magnus will contradict me and say
it's trivial to do -- that'd be great.

I don't have any great ideas for how to support this; I'd say it would
be something like the CF entry has sub-entries one for each patch in the
series, that can be closed independently.  This probably involves
surgery to the CF database and app which I'm not volunteering to write,
however.  Django-enabled contributors speak up now ...

I think for the time being we should keep the single entry as "needs
review" or whatever open state until there is nothing left in
committable state, then close as "committed"; the parts that were not
committed can be resent as a new series to a later CF.  In the
annotations section, add a link to the latest version each time it's
posted, along with some indication of how many parts are left[1].  Perhaps
if a part is committed and there's no new version submitted soon, also
add an annotation to indicate that.

[1] Do not, as I've seen some do in the past, delete the old annotations
while adding new ones.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optimization for lazy_scan_heap

2016-09-08 Thread Masahiko Sawada

On Thu, Sep 8, 2016 at 11:27 PM, Jim Nasby  wrote:
> On 9/8/16 3:03 AM, Masahiko Sawada wrote:
>>
>> Current manual vacuum doesn't output how may all_frozen pages we
>> skipped according to visibility map.
>> That's why I attached 0001 patch which makes the manual vacuum emit
>> such information.
>
>
> I think we should add that, and info about all-frozen skips, regardless of
> the rest of these changes. Can you submit that separately to the commitfest?
> (Or I can if you want.)

Yeah, autovacuum emits such information but manual vacuum doesn't.
I submitted that patch before[1] but patch was not committed, because
verbose output is already complicated.
But I will submit it again and start to discussion about that.

[1] 
https://www.postgresql.org/message-id/CAD21AoCQGTKDxQ6YfrJ0%2B%3Dev6A3i3pt2ULdWxCtwPLKR2E77jg%40mail.gmail.com

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] to_date_valid()

2016-09-08 Thread Peter Eisentraut

On 8/15/16 7:33 AM, Andreas 'ads' Scherbaum wrote:
> postgres=# SELECT to_date('2011 12  18', ' MM   DD');
>to_date
> 
>   2011-12-08
> (1 row)
> 
> 
> That is from the regression tests, and obviously handles the date 
> transformation wrong. My attempt catches this, because I compare the 
> date with the input date, and do not rely on a valid date only.

It's debatable what is correct here.

Using to_number, the behavior appears to be that a space in the pattern
ignores one character.  For example:

test=# select to_number('123 456', '999 999');
 to_number
---
123456

test=# select to_number('123 456', '999  999');
 to_number
---
 12356

Considering that, the above to_date result is not incorrect.

So just squashing the spaces and converting the value back is not a
correct approach to detecting overflow.

I think using ValidateDate() was the right idea.  That is what we use
for checking date validity everywhere else.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ICU integration

2016-09-08 Thread Tom Lane

Peter Eisentraut  writes:
> On 9/6/16 1:40 PM, Doug Doole wrote:
>> We carried the ICU version numbers around on our collation and locale
>> IDs (such as fr_FR%icu36) . The database would load multiple versions of
>> the ICU library so that something created with ICU 3.6 would always be
>> processed with ICU 3.6. This avoided the problems of trying to change
>> the rules on the user. (We'd always intended to provide tooling to allow
>> the user to move an existing object up to a newer version of ICU, but we
>> never got around to doing it.) In the code, this meant we were
>> explicitly calling the versioned API so that we could keep the calls
>> straight.

> I understand that in principle, but I don't see operating system
> providers shipping a bunch of ICU versions to facilitate that.  They
> will usually ship one.

I agree with that estimate, and I would further venture that even if we
wanted to bundle ICU into our tarballs, distributors would rip it out
again on security grounds.  I am dead certain Red Hat would do so; less
sure that other vendors have similar policies, but it seems likely.
They don't want to have to fix security bugs in more than one place.

This is a problem, if ICU won't guarantee cross-version compatibility,
because it destroys the argument that moving to ICU would offer us
collation behavior stability.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2016-09-08 Thread Claudio Freire

On Thu, Sep 8, 2016 at 11:54 AM, Pavan Deolasee
 wrote:
> For example, for a table with 60 bytes wide tuple (including 24 byte
> header), each page can approximately have 8192/60 = 136 tuples. Say we
> provision for 136*2 = 272 bits per page i.e. 34 bytes per page for the
> bitmap. First 272 offsets in every page are represented in the bitmap and
> anything greater than are in overflow region. On the other hand, the current
> representation will need about 16 bytes per page assuming 2% dead tuples, 40
> bytes per page assuming 5% dead tuples and 80 bytes assuming 10% dead
> tuples. So bitmap will take more space for small tuples or when vacuum is
> run very aggressively, both seems unlikely for very large tables. Of course
> the calculation does not take into account the space needed by the overflow
> area, but I expect that too be small.

I thought about something like this, but it could be extremely
inefficient for mostly frozen tables, since the bitmap cannot account
for frozen pages without losing the O(1) lookup characteristic


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optimization for lazy_scan_heap

2016-09-08 Thread Robert Haas

On Thu, Sep 8, 2016 at 4:03 AM, Masahiko Sawada  wrote:
> On Wed, Sep 7, 2016 at 4:11 PM, Simon Riggs  wrote:
>> On 7 September 2016 at 04:13, Masahiko Sawada  wrote:
>>
>>> Since current HEAD could scan visibility map twice, the execution time
>>> of Patched is approximately half of HEAD.
>>
>> Sounds good.
>>
>> To ensure we are doing exactly same amount of work as before, did you
>> see the output of VACUUM VEROBOSE?
>
> Sorry, the previous test result I posted was something wrong.
> I rerun the performance test and results are,
>
> * 1TB Table(visibility map size is 32MB)
> HEAD : 4853.250 ms (00:04.853)
> Patched : 3805.789 ms (00:03.806)
>
> * 8TB Table(visibility map size is 257MB)
> HEAD : 37853.891 ms (00:37.854)
> Patched : 30153.943 ms (00:30.154)
>
> * 32TB Table(visibility map size is 1GB)
> HEAD: 151908.344 ms (02:31.908)
> Patched: 120560.037 ms (02:00.560)
>
> Since visibility map page can be cached onto shared buffer or OS cache
> by first scanning, the benefit of this patch seems not to be large.

Yeah, that's not nearly as good as the first set of results.  Maybe
it's still worth doing, but if you need to have a 32TB table to save
as much as 30 seconds, we don't have much of a problem here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2016-09-08 Thread Pavan Deolasee

On Wed, Sep 7, 2016 at 10:18 PM, Claudio Freire 
wrote:

> On Wed, Sep 7, 2016 at 12:12 PM, Greg Stark  wrote:
> > On Wed, Sep 7, 2016 at 1:45 PM, Simon Riggs 
> wrote:
> >> On 6 September 2016 at 19:59, Tom Lane  wrote:
> >>
> >>> The idea of looking to the stats to *guess* about how many tuples are
> >>> removable doesn't seem bad at all.  But imagining that that's going to
> be
> >>> exact is folly of the first magnitude.
> >>
> >> Yes.  Bear in mind I had already referred to allowing +10% to be safe,
> >> so I think we agree that a reasonably accurate, yet imprecise
> >> calculation is possible in most cases.
> >
> > That would all be well and good if it weren't trivial to do what
> > Robert suggested. This is just a large unsorted list that we need to
> > iterate throught. Just allocate chunks of a few megabytes and when
> > it's full allocate a new chunk and keep going. There's no need to get
> > tricky with estimates and resizing and whatever.
>
> I agree. While the idea of estimating the right size sounds promising
> a priori, considering the estimate can go wrong and over or
> underallocate quite severely, the risks outweigh the benefits when you
> consider the alternative of a dynamic allocation strategy.
>
> Unless the dynamic strategy has a bigger CPU impact than expected, I
> believe it's a superior approach.
>
>
How about a completely different representation for the TID array? Now this
is probably not something new, but I couldn't find if the exact same idea
was discussed before. I also think it's somewhat orthogonal to what we are
trying to do here, and will probably be a bigger change. But I thought I'll
mention since we are at the topic.

What I propose is to use a simple bitmap to represent the tuples. If a
tuple at  is dead then the corresponding bit in the bitmap
is set. So clearly the searches through dead tuples are O(1) operations,
important for very large tables and large arrays.

Challenge really is that a heap page can theoretically have MaxOffsetNumber
of line pointers (or to be precise maximum possible offset number). For a
8K block, that comes be about 2048. Having so many bits per page is neither
practical nor optimal. But in practice the largest offset on a heap page
should not be significantly greater than MaxHeapTuplesPerPage, which is a
more reasonable value of 291 on my machine. Again, that's with zero sized
tuple and for real life large tables, with much wider tuples, the number
may go down even further.

So we cap the offsets represented in the bitmap to some realistic value,
computed by looking at page density and then multiplying it by a small
factor (not more than two) to take into account LP_DEAD and LP_REDIRECT
line pointers. That should practically represent majority of the dead
tuples in the table, but we then keep an overflow area to record tuples
beyond the limit set for per page. The search routine will do a direct
lookup for offsets less than the limit and search in the sorted overflow
area for offsets beyond the limit.

For example, for a table with 60 bytes wide tuple (including 24 byte
header), each page can approximately have 8192/60 = 136 tuples. Say we
provision for 136*2 = 272 bits per page i.e. 34 bytes per page for the
bitmap. First 272 offsets in every page are represented in the bitmap and
anything greater than are in overflow region. On the other hand, the
current representation will need about 16 bytes per page assuming 2% dead
tuples, 40 bytes per page assuming 5% dead tuples and 80 bytes assuming 10%
dead tuples. So bitmap will take more space for small tuples or when vacuum
is run very aggressively, both seems unlikely for very large tables. Of
course the calculation does not take into account the space needed by the
overflow area, but I expect that too be small.

I guess we can make a choice between two representations at the start
looking at various table stats. We can also be smart and change from bitmap
to traditional representation as we scan the table and see many more tuples
in the overflow region than we provisioned for. There will be some
challenges in converting representation mid-way, especially in terms of
memory allocation, but I think those can be sorted out if we think that the
idea has merit.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Logical Replication WIP

2016-09-08 Thread Petr Jelinek


Hi,

On 07/09/16 14:10, Erik Rijkers wrote:

On 2016-08-31 22:51, Petr Jelinek wrote:


and one more version with bug fixes, improved code docs and couple



I am not able to get the replication to work. Would you (or anyone) be
so kind to point out what I am doing wrong?

Patches applied, compiled, make-checked, installed OK.

I have 2 copies compiled and installed,  logical_replication and
logical_replication2, to be publisher and subscriber, ports 6972 and
6973 respectively.


Logfile subscriber-side:
[...]
2016-09-07 13:47:44.441 CEST 21997 LOG:  MultiXact member wraparound
protections are now enabled
2016-09-07 13:47:44.528 CEST 21986 LOG:  database system is ready to
accept connections
2016-09-07 13:47:44.529 CEST 22002 LOG:  logical replication launcher
started
2016-09-07 13:52:11.319 CEST 22143 LOG:  logical replication apply for
subscription sub1 started
2016-09-07 13:53:47.010 CEST 22143 ERROR:  could not open relation with
OID 0
2016-09-07 13:53:47.012 CEST 21986 LOG:  worker process: logical
replication worker 24048 (PID 22143) exited with exit code 1
2016-09-07 13:53:47.018 CEST 22184 LOG:  logical replication apply for
subscription sub1 started
2016-09-07 13:53:47.028 CEST 22184 ERROR:  could not open relation with
OID 0
2016-09-07 13:53:47.030 CEST 21986 LOG:  worker process: logical
replication worker 24048 (PID 22184) exited with exit code 1
2016-09-07 13:53:52.041 CEST 22187 LOG:  logical replication apply for
subscription sub1 started
2016-09-07 13:53:52.045 CEST 22187 ERROR:  could not open relation with
OID 0
2016-09-07 13:53:52.046 CEST 21986 LOG:  worker process: logical
replication worker 24048 (PID 22187) exited with exit code 1
(repeat every few seconds)




It means the tables don't exist on subscriber. I added check and proper 
error message in my local dev branch, it will be part of the next update.


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-09-08 Thread Aleksander Alekseev

> Hello, Victor.
> 
> > I'm sending new version of patch.
> > 
> > I've replaced readonly option with target_server_type (with JDBC
> > compatibility alias targetServerType), 
> > 
> > use logic of setting defaults based on number of hosts in the connect
> > string instead of complicated condition in the state machine,
> > 
> > added checks for return values of memory allocation function.
> > 
> > Also, I've solved pg_usleep problem by linking pgsleep.c into libpq
> > along with some other libpgport objects which are already linked there.
> > 
> > Thus client applications don't need to link with libpgport and libpq
> > shared library is self-containted.
> 
> This patch doesn't apply to master branch:
> 
> ```
> $ git apply ~/temp/libpq-failover-8.patch
> /home/eax/temp/libpq-failover-8.patch:184: trailing whitespace.
> check: 
> /home/eax/temp/libpq-failover-8.patch:252: trailing whitespace.
>   /* 
> /home/eax/temp/libpq-failover-8.patch:253: trailing whitespace.
>*  Validate target_server_mode option 
> /home/eax/temp/libpq-failover-8.patch:254: trailing whitespace.
>*/ 
> /home/eax/temp/libpq-failover-8.patch:306: trailing whitespace.
>   appendPQExpBuffer(>errorMessage, 
> error: src/interfaces/libpq/t/001-multihost.pl: already exists in
> working directory
> 
> $ git diff
> ```

Oops. Sorry, my mistake. I forgot to check for untracked files.
Everything is fine.

-- 
Best regards,
Aleksander Alekseev


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-09-08 Thread Aleksander Alekseev

Hello, Victor.

> I'm sending new version of patch.
> 
> I've replaced readonly option with target_server_type (with JDBC
> compatibility alias targetServerType), 
> 
> use logic of setting defaults based on number of hosts in the connect
> string instead of complicated condition in the state machine,
> 
> added checks for return values of memory allocation function.
> 
> Also, I've solved pg_usleep problem by linking pgsleep.c into libpq
> along with some other libpgport objects which are already linked there.
> 
> Thus client applications don't need to link with libpgport and libpq
> shared library is self-containted.

This patch doesn't apply to master branch:

```
$ git apply ~/temp/libpq-failover-8.patch
/home/eax/temp/libpq-failover-8.patch:184: trailing whitespace.
check: 
/home/eax/temp/libpq-failover-8.patch:252: trailing whitespace.
/* 
/home/eax/temp/libpq-failover-8.patch:253: trailing whitespace.
 *  Validate target_server_mode option 
/home/eax/temp/libpq-failover-8.patch:254: trailing whitespace.
 */ 
/home/eax/temp/libpq-failover-8.patch:306: trailing whitespace.
appendPQExpBuffer(>errorMessage, 
error: src/interfaces/libpq/t/001-multihost.pl: already exists in
working directory

$ git diff
```

-- 
Best regards,
Aleksander Alekseev


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2016-09-08 Thread Jim Nasby


On 9/8/16 3:48 AM, Masahiko Sawada wrote:

If we replaced dead_tuples with an array-of-array, isn't there
negative performance impact for lazy_tid_reap()?
As chunk is added, that performance would be decrease.


Yes, it certainly would, as you'd have to do 2 binary searches. I'm not 
sure how much that matters though; presumably the index scans are 
normally IO-bound?


Another option would be to use the size estimation ideas others have 
mentioned to create one array. If the estimates prove to be wrong you 
could then create a single additional segment; by that point you should 
have a better idea of how far off the original estimate was. That means 
the added search cost would only be a compare and a second pointer redirect.


Something else that occurred to me... AFAIK the only reason we don't 
support syncscan with VACUUM is because it would require sorting the TID 
list. If we just added a second TID list we would be able to support 
syncscan, swapping over to the 'low' list when we hit the end of the 
relation.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optimization for lazy_scan_heap

2016-09-08 Thread Jim Nasby


On 9/8/16 3:03 AM, Masahiko Sawada wrote:

Current manual vacuum doesn't output how may all_frozen pages we
skipped according to visibility map.
That's why I attached 0001 patch which makes the manual vacuum emit
such information.


I think we should add that, and info about all-frozen skips, regardless 
of the rest of these changes. Can you submit that separately to the 
commitfest? (Or I can if you want.)

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Surprising behaviour of \set AUTOCOMMIT ON

2016-09-08 Thread Rahila Syed

Thank you for comments.

>But there's a function in startup.c which might be the ideal location
>or the check, as it looks like the one and only place where the
>autocommit flag actually changes:

>static void
>autocommit_hook(const char *newval)

Thank you for pointing out this. This indeed is the only place where
autocommit setting changes in psql. However,
including the check here will require returning the status of command from
this hook. i.e if we throw error inside this
hook we will need to return false indicating the value has not changed.
This will change the existing definition of the hook
which returns void. This will also lead to changes in other implementations
of this hook apart from autocommit_hook.

>There are other values than ON: true/yes and theirs abbreviations,
>the value 1, and anything that doesn't resolve to OFF is taken as ON.
>ParseVariableBool() in commands.c already does the job of converting
>these to bool, the new code should probably just call that function
>instead of parsing the value itself.

I have included this in the attached patch.

Also I have included prior review comments by Rushabh Lathia and Amit
Langote in the attached patch

Regarding suggestion to move the code inside SetVariable(), I think it is
not a good idea because
SetVariable() and variable.c file in which it is present is about handling
of psql variables, checks for
correct variable names, values etc. This check is about correctness of the
command so it is better placed
in exec_command().

Thank you,
Rahila Syed

On Sat, Sep 3, 2016 at 4:39 PM, Daniel Verite 
wrote:

> Rushabh Lathia wrote:
>
> > It might happen that SetVariable() can be called from other places in
> > future,
> > so its good to have all the set variable related checks inside
> > SetVariable().
>
> There's already another way to skip the \set check:
>   select 'on' as "AUTOCOMMIT" \gset
>
> But there's a function in startup.c which might be the ideal location
> for the check, as it looks like the one and only place where the
> autocommit flag actually changes:
>
> static void
> autocommit_hook(const char *newval)
> {
>  pset.autocommit = ParseVariableBool(newval, "AUTOCOMMIT");
> }
>
>
>
> Best regards,
> --
> Daniel Vérité
> PostgreSQL-powered mailer: http://www.manitou-mail.org
> Twitter: @DanielVerite
>

psql_error_on_autocommit_v2.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c

2016-09-08 Thread Tom Lane

Dilip Kumar  writes:
> On Thu, Sep 8, 2016 at 2:23 AM, Tom Lane  wrote:
>> In the case of record_in, it seems like it'd be easy enough to use
>> lookup_rowtype_tupdesc_noerror() instead of lookup_rowtype_tupdesc()
>> and then throw a user-facing error right there.

> Actually when we are passing invalid type to "record_in", error is
> thrown in "lookup_type_cache" function, and this function is not
> taking "no_error" input (it's only limited to
> lookup_rowtype_tupdesc_internal).

Ah, should have dug a bit deeper.

> One option is to pass "no_error" parameter to "lookup_type_cache"
> function also, and throw error only in record_in function.
> But problem is, "lookup_type_cache" has lot of references. And also
> will it be good idea to throw one generic error instead of multiple
> specific errors. ?

We could make it work like that without breaking the ABI if we were
to add a NOERROR bit to the allowed "flags".  However, after looking
around a bit I'm no longer convinced what I said above is a good idea.
In particular, if we put the responsibility of reporting the error on
callers then we'll lose the ability to distinguish "no such pg_type OID"
from "type exists but it's only a shell".  So I'm now thinking it's okay
to promote lookup_type_cache's elog to a full ereport, especially since
it's already using ereport for some of the related error cases such as
the shell-type failure.

That still leaves us with what to do for domain_in.  A really simple
fix would be to move the InitDomainConstraintRef call before the
getBaseTypeAndTypmod call, because the former fetches the typcache
entry and would then benefit from lookup_type_cache's ereport.
But that seems a bit magic.

I'm tempted to propose that domain_state_setup start with

typentry = lookup_type_cache(domainType, TYPECACHE_DOMAIN_INFO);
if (typentry->typtype != TYPTYPE_DOMAIN)
ereport(ERROR,
(errcode(ERRCODE_DATATYPE_MISMATCH),
 errmsg("type %s is not a domain",
format_type_be(domainType;

removing the need to verify that after getBaseTypeAndTypmod.

The cache loading is basically free because InitDomainConstraintRef
would do it anyway; so the extra cost here is only one dynahash search.
You could imagine buying back those cycles by teaching the typcache
to be able to cache the result of getBaseTypeAndTypmod, but I'm doubtful
that we really care.  This whole setup sequence only happens once per
query anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Useless dependency assumption libxml2 -> libxslt in MSVC scripts

2016-09-08 Thread Tom Lane

Michael Paquier  writes:
> On Thu, Sep 8, 2016 at 9:39 PM, Magnus Hagander  wrote:
>> Pretty sure this goes back to *our* XML support requiring both. As in you
>> couldn't turn on/off xslt independently, so the "xml requires xslt" comment
>> is that *our* xml support required both.
>> 
>> This may definitely not be true anymore, and that check has just not been
>> updated.
>> 
>> Also this was 10 years ago, so I'm of course not 100% sure, but I think it
>> was something like that...

> Was it for contrib/xml2? For example was it because this module could
> not be compiled with just libxml2?

The core code has never used xslt at all.  Some quick digging in the git
history suggests that contrib/xml2 wasn't very clean about this before
2008:

commit eb915caf92a6805740e949c3233ee32bc9676484
Author: Tom Lane 
Date:   Thu May 8 16:49:37 2008 +

Fix contrib/xml2 makefile to not override CFLAGS, and in passing make it
auto-configure properly for libxslt present or not.

or even 2010, depending on how large a value of "work" you want:

commit d6a6f8c6be4b6d6a9e90e92d91a83225bfe8d29d
Author: Tom Lane 
Date:   Mon Mar 1 18:07:59 2010 +

Fix contrib/xml2 so regression test still works when it's built without 
libxslt.

This involves modifying the module to have a stable ABI, that is, the
xslt_process() function still exists even without libxslt.  It throws a
runtime error if called, but doesn't prevent executing the CREATE FUNCTION
call.  This is a good thing anyway to simplify cross-version upgrades.

Both of those fixes postdate our native-Windows port, though I'm not sure
of the origin of the specific test you're wondering about.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Useless dependency assumption libxml2 -> libxslt in MSVC scripts

2016-09-08 Thread Magnus Hagander

On Thu, Sep 8, 2016 at 2:53 PM, Michael Paquier 
wrote:

> On Thu, Sep 8, 2016 at 9:39 PM, Magnus Hagander 
> wrote:
> > Pretty sure this goes back to *our* XML support requiring both. As in you
> > couldn't turn on/off xslt independently, so the "xml requires xslt"
> comment
> > is that *our* xml support required both.
> >
> > This may definitely not be true anymore, and that check has just not been
> > updated.
> >
> > Also this was 10 years ago, so I'm of course not 100% sure, but I think
> it
> > was something like that...
>
> Was it for contrib/xml2? For example was it because this module could
> not be compiled with just libxml2?
>

Yes, that's what I'm referring to.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Useless dependency assumption libxml2 -> libxslt in MSVC scripts

2016-09-08 Thread Michael Paquier

On Thu, Sep 8, 2016 at 9:39 PM, Magnus Hagander  wrote:
> Pretty sure this goes back to *our* XML support requiring both. As in you
> couldn't turn on/off xslt independently, so the "xml requires xslt" comment
> is that *our* xml support required both.
>
> This may definitely not be true anymore, and that check has just not been
> updated.
>
> Also this was 10 years ago, so I'm of course not 100% sure, but I think it
> was something like that...

Was it for contrib/xml2? For example was it because this module could
not be compiled with just libxml2?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sample configuration files

2016-09-08 Thread Tom Lane

Vik Fearing  writes:
> I noticed that this patch has been marked Waiting on Author with no
> comment.  Peter, what more should I be doing right now while waiting for
> MartÃn's review?

FWIW, I agree with the upthread misgivings about whether this is actually
a useful effort.  Even if we installed the sample config files somewhere
(something there is not consensus for AFAICT), they would not actually
*do* anything useful as standalone files.  I suppose you are imagining
that people would either manually concatenate them onto postgresql.conf
or insert an include directive for them into postgresql.conf, but neither
of those things sound pleasant or maintainable.

Moreover, it's not clear why anyone would do that at all in the age of
ALTER SYSTEM SET.

I suggest that it'd be more fruitful to view this as a documentation
effort; that is, in each contrib module's SGML documentation file provide
a standardized section listing all its parameters and their default
settings.  That would be something that could be copied-and-pasted from
into either an editor window on postgresql.conf for the old guard, or
an ALTER SYSTEM SET command for the new.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Useless dependency assumption libxml2 -> libxslt in MSVC scripts

2016-09-08 Thread Michael Paquier

On Thu, Sep 8, 2016 at 9:27 PM, Tom Lane  wrote:
> Michael Paquier  writes:
>> But I don't understand the reason behind such a restriction to be
>> honest because libxml2 does not depend on libxslt. The contrary is
>> true: libxslt needs libxml2.
>
> Right.
>
>> Note as well that libxml2 does depend on ICONV though.
>
> Hm, is that true either?  I don't see any sign of such a dependency
> on Linux:
>
> $ ldd /usr/lib64/libxml2.so.2.7.6
> linux-vdso.so.1 =>  (0x7fff98f6b000)
> libdl.so.2 => /lib64/libdl.so.2 (0x00377a60)
> libz.so.1 => /lib64/libz.so.1 (0x00377ae0)
> libm.so.6 => /lib64/libm.so.6 (0x003779e0)
> libc.so.6 => /lib64/libc.so.6 (0x003779a0)
> /lib64/ld-linux-x86-64.so.2 (0x00377960)

Peaking into the libxml2 code there is a configure switch
--with-iconv, so that's an optional dependency. And the same exists
for Windows in their win32/stuff. So I am mistaken and this could be
just removed.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Useless dependency assumption libxml2 -> libxslt in MSVC scripts

2016-09-08 Thread Magnus Hagander

On Thu, Sep 8, 2016 at 2:27 PM, Tom Lane  wrote:

> Michael Paquier  writes:
> > But I don't understand the reason behind such a restriction to be
> > honest because libxml2 does not depend on libxslt. The contrary is
> > true: libxslt needs libxml2.
>
> Right.
>

Pretty sure this goes back to *our* XML support requiring both. As in you
couldn't turn on/off xslt independently, so the "xml requires xslt" comment
is that *our* xml support required both.

This may definitely not be true anymore, and that check has just not been
updated.

Also this was 10 years ago, so I'm of course not 100% sure, but I think it
was something like that...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Declarative partitioning - another take

2016-09-08 Thread Rajkumar Raghuwanshi

On Wed, Sep 7, 2016 at 3:58 PM, Amit Langote 
wrote:

>
> Hi,
>
> On 2016/09/07 17:56, Rajkumar Raghuwanshi wrote:
> > Hi,
> >
> > I have a query regarding list partitioning,
> >
> > For example if I want to store employee data in a table, with "IT" dept
> > employee in emp_p1 partition, "HR" dept employee in emp_p2 partition and
> if
> > employee belongs to other than these two, should come in emp_p3
> partition.
> >
> > In this case not sure how to create partition table. Do we have something
> > like we have UNBOUNDED for range partition or oracle have "DEFAULT" for
> > list partition.
> >
> > create table employee (empid int, dept varchar) partition by list(dept);
> > create table emp_p1 partition of employee for values in ('IT');
> > create table emp_p2 partition of employee for values in ('HR');
> > create table emp_p3 partition of employee for values in (??);
>
> Sorry, no such feature is currently offered.  It might be possible to
> offer something like a "default" list partition which accepts values other
> than those specified for other existing partitions.  However, that means
> if we add a non-default list partition after a default one has been
> created, the implementation must make sure that it moves any values from
> the default partition that now belong to the newly created partition.
>
> Thanks,
> Amit
>

Thanks for clarifying, But I could see same problem of moving data when
adding a new non-default partition with unbounded range partition.

For example give here, Initially I have create a partition table test with
test_p3 as unbounded end,
Later tried to change test_p3 to contain 7-9 values only, and adding a new
partition test_p4 contain 10-unbound.

--create partition table and some leafs
CREATE TABLE test (a int, b int) PARTITION BY RANGE(a);
CREATE TABLE test_p1 PARTITION OF test FOR VALUES START (1) END (4);
CREATE TABLE test_p2 PARTITION OF test FOR VALUES START (4) END (7);
CREATE TABLE test_p3 PARTITION OF test FOR VALUES START (7) END UNBOUNDED;

--insert some data
INSERT INTO test SELECT i, i*10 FROM generate_series(1,3) i;
INSERT INTO test SELECT i, i*10 FROM generate_series(4,6) i;
INSERT INTO test SELECT i, i*10 FROM generate_series(7,13) i;

--directly not able to attach test_p4 because of overlap error, hence
detached test_p3 and than attaching test_p4
SELECT tableoid::regclass,* FROM test;
 tableoid | a  |  b
--++-
 test_p1  |  1 |  10
 test_p1  |  2 |  20
 test_p1  |  3 |  30
 test_p2  |  4 |  40
 test_p2  |  5 |  50
 test_p2  |  6 |  60
 test_p3  |  7 |  70
 test_p3  |  8 |  80
 test_p3  |  9 |  90
 test_p3  | 10 | 100
 test_p3  | 11 | 110
 test_p3  | 12 | 120
 test_p3  | 13 | 130
(13 rows)

ALTER TABLE test DETACH PARTITION test_p3;
CREATE TABLE test_p4 (like test);
ALTER TABLE test ATTACH PARTITION test_p4 FOR VALUES start (10) end
UNBOUNDED;

--now can not attach test_p3 because of overlap with test_p4, causing data
loss from main test table.
ALTER TABLE test ATTACH PARTITION test_p3 FOR VALUES start (7) end (10);
ERROR:  source table contains a row violating partition bound specification
ALTER TABLE test ATTACH PARTITION test_p3 FOR VALUES start (7) end (13);
ERROR:  partition "test_p3" would overlap partition "test_p4"

SELECT tableoid::regclass,* FROM test;
 tableoid | a | b
--+---+
 test_p1  | 1 | 10
 test_p1  | 2 | 20
 test_p1  | 3 | 30
 test_p2  | 4 | 40
 test_p2  | 5 | 50
 test_p2  | 6 | 60
(6 rows)

Re: [HACKERS] GiST penalty functions [PoC]

2016-09-08 Thread Михаил Бахтерев

Excuse me for intervention.

It depends. For instance, i run PostgreSQL on the modern MIPS CPU, which
does not have sqrt support.

But you are right, it is supported in most cases. And if execution speed
of this very fuction is of concern, sqrtf(x) should be used instead of
sqrt(x).

Despite this, Andrew's solution gives more accurate representation of
values. And as far as i understand, this improves overall performance by
decreasing the overall amount of instructions, which must be executed.

It is possible to speed up Andrew's implementation and get rid of
warnings by using bit-masks and unions. Something like:

  union {
float f;
struct {
  unsigned int mantissa:23, exponent:8, sign:1;
} bits;
  }

I am sorry, i have no time to check this. But it is common wisdom to
avoid pointer-based memory accesses in high-performance code, as they
create a lot of false write-to-read dependencies.

- Mikhail, respectfully

On Wed, Sep 07, 2016 at 05:58:42PM -0400, Tom Lane wrote:
> Heikki Linnakangas  writes:
> > Unfortunately, sqrt(x) isn't very cheap.
> 
> You'd be surprised: sqrt is built-in on most modern hardware.  On my
> three-year-old workstation, sqrt(x) seems to take about 2.6ns.  For
> comparison, the pack_float version posted in
> 
> takes 3.9ns (and draws multiple compiler warnings, too).
> 
>   regards, tom lane

signature.asc
Description: PGP signature

Re: [HACKERS] Suggestions for first contribution?

2016-09-08 Thread Christian Convey

On Wed, Sep 7, 2016 at 10:44 AM, Stas Kelvich  wrote:
> There is also a list of projects for google summer of code: 
> https://wiki.postgresql.org/wiki/GSoC_2016
>
> That topics expected to be doable by a newcomer during several months. It is 
> also slightly
> outdated, but you always can check current state of things by searching this 
> mail list on interesting topic.
>
> Also postgres have several internal API’s like fdw[1] or gist[2] that allows 
> you to implements something
> useful without digging too much into a postgres internals.
>
> [1] https://www.postgresql.org/docs/current/static/fdwhandler.html
> [2] https://www.postgresql.org/docs/current/static/gist-extensibility.html

Thanks Stas, I'll have a look at those.  I appreciate the links.

- Christian


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Suggestions for first contribution?

2016-09-08 Thread Christian Convey

On Wed, Sep 7, 2016 at 10:34 AM, Yury Zhuravlev
 wrote:
> Christian Convey wrote:
>>
>> Yury doesn't seem to need help
>> with CMake
>
> Hello.
> I am sorry that the only answer is now.
> I need help but with write CMake code:
> 1. Make ecpg tests
> 2. Make MinGW/Msys support
> 3. many many ...
> all targets and discussion here:
> https://github.com/stalkerg/postgres_cmake/issues
>
> If you can only test CMake for Ubuntu x86_64 that it is not necessary.
> If I not fully understand you - sorry.

Hi Yury, no problem.  It sounds like most of the platform testing you
want requires hardware I don't have, unfortunately.

- Christian


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Useless dependency assumption libxml2 -> libxslt in MSVC scripts

2016-09-08 Thread Tom Lane

Michael Paquier  writes:
> But I don't understand the reason behind such a restriction to be
> honest because libxml2 does not depend on libxslt. The contrary is
> true: libxslt needs libxml2.

Right.

> Note as well that libxml2 does depend on ICONV though.

Hm, is that true either?  I don't see any sign of such a dependency
on Linux:

$ ldd /usr/lib64/libxml2.so.2.7.6
linux-vdso.so.1 =>  (0x7fff98f6b000)
libdl.so.2 => /lib64/libdl.so.2 (0x00377a60)
libz.so.1 => /lib64/libz.so.1 (0x00377ae0)
libm.so.6 => /lib64/libm.so.6 (0x003779e0)
libc.so.6 => /lib64/libc.so.6 (0x003779a0)
/lib64/ld-linux-x86-64.so.2 (0x00377960)


regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Suggestions for first contribution?

2016-09-08 Thread Christian Convey

On Wed, Sep 7, 2016 at 10:14 AM, Aleksander Alekseev
 wrote:
> Here is another idea for a contribution - refactoring.
>
> Currently there are a lot of procedures in PostgreSQL code that
> definitely don't fit on one screen (i.e. ~50 lines). Also many files are
> larger than say 1000 lines of code. For instance, psql_completion
> procedure is more than 2000 lines long! I think patches that would make
> such code easier to read would have a good chance to be accepted.

Thanks for the suggestion!  I can see how refactoring could make a lot
of sense for some functions.  That's probably a task I'd want to take
on after I had more experience with PG's code base.  I think being
familiar with most/all of PG's code would make that task go more
smoothly.

- Christian

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GiST penalty functions [PoC]

2016-09-08 Thread Andrew Borodin

>BTW, would someone explain to me why using a float here will not fail 
>catastrophically for inputs outside the range of float?

Indeed, it will. And that's one of the reason I'm considering
advancing GiST API instead of advancing extensions.

It won't crash, but will produce terrible index, not suitable of
performing efficient queries.
GiST treats penalty this way:
/* disallow negative or NaN penalty */
if (isnan(penalty) || penalty < 0.0)
penalty = 0.0;

Any NaN is a "perfect fit".

Calculation of real (say float) penalty and choice of subtree with
smallest penalty is an idea from 92. Modern spatial index requires
more than just a float to compare subtrees.

Best regards, Andrey Borodin, Octonica & Ural Federal University.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] \timing interval

2016-09-08 Thread Pavel Stehule

2016-09-08 13:10 GMT+02:00 Craig Ringer :

> On 4 Sep. 2016 3:36 am, "Tom Lane"  wrote:
> >
>
> > After further thought I concluded that not providing any labeling of
> > days is a bad idea.
>
> Yeah. I think labeling days is definitely good. I'm glad you changed that.
>
> Personally I'd like to trim milliseconds when dealing with minute+ long
> runs and seconds from hour+ runs too, since it's all there in the ms output
> and the units output is for human readability. I see the value of retaining
> full precision too, though, and don't feel strongly about it.
>

It should be hard to read without units. I know so now it is maybe late,
but the current output is not too readable for me. I miss units - 10s,
1m20.5s, 1h 30m 5s

Regards

Pavel

Re: [HACKERS] GiST penalty functions [PoC]

2016-09-08 Thread Tom Lane

Heikki Linnakangas  writes:
> BTW, I would be OK with the bit-twiddling hack, if we had an autoconf 
> check for IEEE 754 floats, and a graceful fallback for other systems. 

I would still object to the version submitted, on the grounds of the
compiler warnings it causes.  Possibly those could be avoided with
a union, though; Float4GetDatum doesn't produce such warnings.

BTW, would someone explain to me why using a float here will not
fail catastrophically for inputs outside the range of float?
Cubes are based on doubles, not floats.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-09-08 Thread Victor Wagner

On Tue, 6 Sep 2016 07:58:28 +0530
Mithun Cy  wrote:

> On Mon, Sep 5, 2016 at 4:33 PM, Aleksander Alekseev <
> a.aleks...@postgrespro.ru> wrote:
> >After a brief examination I've found following ways to improve the
> >patch.  
> Adding to above comments.
> 
I'm sending new version of patch.

I've replaced readonly option with target_server_type (with JDBC
compatibility alias targetServerType), 

use logic of setting defaults based on number of hosts in the connect
string instead of complicated condition in the state machine,

added checks for return values of memory allocation function.

Also, I've solved pg_usleep problem by linking pgsleep.c into libpq
along with some other libpgport objects which are already linked there.

Thus client applications don't need to link with libpgport and libpq
shared library is self-containted.
 diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 4e34f00..f0df877 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -792,7 +792,7 @@ host=localhost port=5432 dbname=mydb connect_timeout=10

The general form for a connection URI is:
 
-postgresql://[user[:password]@][netloc][:port][/dbname][?param1=value1...]
+postgresql://[user[:password]@][netloc][:port][,netloc[:port]...][/dbname][?param1=value1...]
 

 
@@ -809,6 +809,7 @@ postgresql://localhost/mydb
 postgresql://user@localhost
 postgresql://user:secret@localhost
 postgresql://other@localhost/otherdb?connect_timeout=10application_name=myapp
+postgresql://node1,node2:5433,node3:4432,node4/mydb?hostorder=randomtarget_server_type=any
 
 Components of the hierarchical part of the URI can also
 be given as parameters.  For example:
@@ -831,7 +832,9 @@ postgresql:///mydb?host=localhostport=5433

 For improved compatibility with JDBC connection URIs,
 instances of parameter ssl=true are translated into
-sslmode=require.
+sslmode=require and
+loadBalanceHosts=true  into
+hostorder=random.

 

@@ -841,6 +844,10 @@ postgresql:///mydb?host=localhostport=5433
 postgresql://[2001:db8::1234]/database
 

+   
+   There can be serveral host specifications, optionally accompanied
+   with port, separated by comma.
+   
 

 The host component is interpreted as described for the parameter PostgreSQL was built). On machines without
 Unix-domain sockets, the default is to connect to localhost.

+   
+   There can be more than one host parameter in
+   the connect string. In this case these hosts would be considered
+   alternate entries into same database and if connect to first one
+   fails, library would try to connect second etc. This can be used
+   for high availability cluster or for load balancing. See
+parameter.
+   
+   
+   Network host name can be accompanied with port number, separated by
+   colon. If so, this port number is used only when connected to
+   this host. If there is no port number, port specified in the
+parameter would be used.
+   
   
  
 
@@ -942,8 +963,44 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname


   
-
+
+  
+  hostorder
+  
+  
+  Specifies how to choose host from list of alternate hosts,
+  specified in the  parameter.
+  
+  
+  If value of this argument is sequential (the
+  default) library connects to the hosts in order they specified,
+  and tries to connect second one only if connection to the first
+  fails.
+  
+  
+  If value is random host to connect is randomly
+  picked from the list. It allows to balance load between several
+  cluster nodes. However, currently PostgreSQL doesn't support
+  multimaster clusters. So, without use of third-party products,
+  only read-only connections can take advantage from the
+  load-balancing. See 
+  
+  
+  
+  
+  target_server_type
+  
+  
+  If this parameter is master upon successful connection
+  library checks if host is in recovery state, and if it is so,
+  tries next host in the connect string. If this parameter is
+  any, connection to standby nodes are considered
+  successful.
+  
+  
+  
   
+
port


@@ -985,7 +1042,6 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   
   
  
-
  
   connect_timeout
   
@@ -996,7 +1052,27 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   
   
  
-
+ 
+ failover_timeout
+ 
+ 
+ Maximum time to cyclically retry all the hosts in connect string.
+ (as decimal integer number of seconds). If not specified, then
+ hosts are tried just once.
+ 
+ 
+ If we have replicating cluster, and master node fails, it might
+ take some time to promote one of standby nodes to the new master.
+ So clients which notice that connect to the

Re: [HACKERS] \timing interval

2016-09-08 Thread Craig Ringer

On 4 Sep. 2016 3:36 am, "Tom Lane"  wrote:
>

> After further thought I concluded that not providing any labeling of
> days is a bad idea.

Yeah. I think labeling days is definitely good. I'm glad you changed that.

Personally I'd like to trim milliseconds when dealing with minute+ long
runs and seconds from hour+ runs too, since it's all there in the ms output
and the units output is for human readability. I see the value of retaining
full precision too, though, and don't feel strongly about it.

Re: [HACKERS] High-CPU consumption on information_schema (only) query

2016-09-08 Thread Craig Ringer

On 8 Sep. 2016 7:38 am, "Robins Tharakan"  wrote:
>
> Hi,
>
> An SQL (with only information_schema related JOINS) when triggered, runs
with max CPU (and never ends - killed after 2 days).
> - It runs similarly (very slow) on a replicated server that acts as a
read-only slave.
> - Top shows only postgres as hitting max CPU (nothing else). When query
killed, CPU near 0%.
> - When the DB is restored on a separate test server (with the exact
postgresql.conf) the same query works fine.
> - There is no concurrent usage on the replicated / test server (although
the primary is a Production server and has concurrent users).
>
> Questions:
> - If this was a postgres bug or a configuration issue, query on the
restored DB should have been slow too. Is there something very basic I am
missing here?
>
> If someone asks for I could provide SQL + EXPLAIN, but it feels
irrelevant here. I amn't looking for a specific solution but what else
should I be looking for here?

Get a series of stack traces.

Perf with stack output would be good too.

You need debug info for both.

Re: [HACKERS] Push down more UPDATEs/DELETEs in postgres_fdw

2016-09-08 Thread Etsuro Fujita


On 2016/09/07 13:21, Ashutosh Bapat wrote:

* with the patch:
postgres=# explain verbose delete from ft1 using ft2 where ft1.a =
ft2.a;
 QUERY PLAN

-
 Delete on public.ft1  (cost=100.00..102.04 rows=1 width=38)
   ->  Foreign Delete  (cost=100.00..102.04 rows=1 width=38)
 Remote SQL: DELETE FROM public.t1 r1 USING (SELECT ROW(a,
b), a FROM public.t2) ss1(c1, c2) WHERE ((r1.a = ss1.c2))
(3 rows)



The underlying scan on t2 requires ROW(a,b) for locking the row for
update/share. But clearly it's not required if the full query is being
pushed down.


Exactly.


Is there a way we can detect that ROW(a,b) is useless
column (not used anywhere in the other parts of the query like
RETURNING, DELETE clause etc.) and eliminate it?


I don't have a clear solution for that yet, but I'll try to remove that 
in the next version.



Similarly for a, it's
part of the targetlist of the underlying scan so that the WHERE clause
can be applied on it. But it's not needed if we are pushing down the
query. If we eliminate the targetlist of the query, we could construct a
remote query without having subquery in it, making it more readable.


Will try to do so also.

Thanks for the comments!

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Push down more full joins in postgres_fdw

2016-09-08 Thread Etsuro Fujita


On 2016/09/06 22:07, Ashutosh Bapat wrote:

On Fri, Sep 2, 2016 at 3:55 PM, Etsuro Fujita
> wrote:



On 2016/08/22 15:49, Ashutosh Bapat wrote:
1. deparsePlaceHolderVar looks odd - each of the deparse*
function is
named as deparse + . PlaceHolderVar is not a parser node, so no string is
going to be
parsed as a PlaceHolderVar. May be you want to name it as
deparseExerFromPlaceholderVar or something like that.



The name "deparseExerFromPlaceholderVar" seems long to me.  How
about "deparsePlaceHolderExpr"?



There isn't any node with name PlaceHolderExpr.


I'll rename it to "deparseExerInPlaceholderVar", then.


3. I think registerAlias stuff should happen really at the time of
creating paths and should be stored in fpinfo. Without that it
will be
computed every time we deparse the query. This means every time
we try
to EXPLAIN the query at various levels in the join tree and once
for the
query to be sent to the foreign server.



Hmm.  I think the overhead in calculating aliases would be
negligible compared to the overhead in explaining each remote query
for costing or sending the remote query for execution.  So, I
created aliases in the same way as remote params created and stored
into params_list in deparse_expr_cxt.  I'm not sure it's worth
complicating the code.



We defer remote parameter creation till deparse time since the the
parameter numbers are dependent upon the sequence in which we deparse
the query. Creating them at the time of path creation and storing them
in fpinfo is not possible because a. those present in the joining
relations will conflict with each other and need some kind of
serialization at the time of deparsing b. those defer for differently
parameterized paths or paths with different pathkeys. We don't defer it
because it's very light on performance.

That's not true with the alias information. As long as we detect which
relations need subqueries, their RTIs are enough to create unique
aliases e.g. if a relation involves RTIs 1, 2, 3 corresponding subquery
can have alias r123 without conflicting with any other alias.


Hmm.  But another concern about the approach of generating an subselect 
alias for a path, if needed, at the path creation time would be that the 
path might be rejected by add_path, which would result in totally 
wasting cycles for generating the subselect alias for the path.



However minimum overhead it might have, we will deparse the query every
time we create a path involving that kind of relation i.e. for different
pathkeys, different parameterization and different joins in which that
relation participates in. This number can be significant. We want to
avoid this overhead if we can.


Exactly.  As I said above, I think the overhead would be negligible 
compared to the overhead in explaining each remote query for costing or 
the overhead in sending the final remote query for execution, though.



5. The blocks related to inner and outer relations in
deparseFromExprForRel() look same. We should probably separate
that code
out into a function and call it at two places.



Done.



I see you have created function deparseOperandRelation() for the
same. I guess, this should be renamed as deparseRangeTblRef() since it
will create RangeTblRef node when parsed back.


OK, if there no opinions of others, I'll rename it to the name proposed 
by you, "deparseRangeTblRef".



6.
! if (is_placeholder)
! errcontext("placeholder expression at position %d in
select list",
!errpos->cur_attno);
A user wouldn't know what a placeholder expression is, there is
no such
term in the documentation. We have to device a better way to
provide an
error context for an expression in general.



Though I proposed that, I don't think that it's that important to
let users know that the expression is from a PHV.  How about just
saying "expression", not "placeholder expression", so that we have
the message "expression at position %d in select list" in the context?



Hmm, that's better than placeholder expression, but not as explanatory
as it should be since we won't be printing the "select list" in the
error context and user won't have a clue as to what error context is
about.


I don't think so.  Consider an example of the conversion error message, 
which is from the regression test:


SELECT  ft1.c1,  ft2.c2, ft1 FROM ft1, ft2 WHERE ft1.c1 = ft2.c1 AND 
ft1.c1 = 1;

ERROR:  invalid input syntax for integer: "foo"
CONTEXT:  whole-row reference to foreign table "ft1"

As shown in the example, the error message is displayed under a remote 
query for execution.  So, ISTM it's reasonable to print something like

Re: [HACKERS] Bug in two-phase transaction recovery

2016-09-08 Thread Simon Riggs

On 8 September 2016 at 07:43, Michael Paquier  wrote:
> On Wed, Sep 7, 2016 at 10:48 PM, Stas Kelvich  
> wrote:
>> Some time ago two-phase state file format was changed to have variable size 
>> GID,
>> but several places that read that files were not updated to use new offsets. 
>> Problem
>> exists in master and 9.6 and can be reproduced on prepared transactions with
>> savepoints.
>
> Oops and meh. This meritates an open item, and has better be fixed by
> 9.6.0. I am glad you noticed that honestly. And we had better take
> care of this issue as soon as possible.

Looking now.

>> Also while looking at StandbyRecoverPreparedTransactions() i’ve noticed that 
>> buffer
>> for 2pc file is allocated in TopMemoryContext but never freed. That probably 
>> exists
>> for a long time.
>
> @@ -1886,6 +1886,8 @@ StandbyRecoverPreparedTransactions(bool overwriteOK)
> Assert(TransactionIdFollows(subxid, xid));
> SubTransSetParent(xid, subxid, overwriteOK);
> }
> +
> +   pfree(buf);
> }
> This one is a good catch. I have checked also the other callers of
> ReadTwoPhaseFile but I am not seeing any other leak. That's a leak,
> not critical though so applying it only on HEAD would be enough IMO.

Far from critical, but backpatched to 9.6 because it isn't just
executed once at startup, it is executed every shutdown checkpoint.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c

2016-09-08 Thread Dilip Kumar

On Thu, Sep 8, 2016 at 2:23 AM, Tom Lane  wrote:
> I really don't like this solution.  Changing this one function, out of the
> dozens just like it in lsyscache.c, seems utterly random and arbitrary.
>
> I'm actually not especially convinced that passing domain_in a random
> OID needs to not produce an XX000 error.  That isn't a user-facing
> function and people shouldn't be calling it by hand at all.  The same
> goes for record_in.  But if we do want those cases to avoid this,
> I think it's appropriate to fix it in those functions, not decree
> that essentially-random other parts of the system should bear the
> responsibility.  (Thought experiment: if we changed the order of
> operations in domain_in so that getBaseTypeAndTypmod were no longer
> the first place to fail, would we undo this change and then change
> wherever the failure had moved to?  That's pretty messy.)
>
> In the case of record_in, it seems like it'd be easy enough to use
> lookup_rowtype_tupdesc_noerror() instead of lookup_rowtype_tupdesc()
> and then throw a user-facing error right there.

Actually when we are passing invalid type to "record_in", error is
thrown in "lookup_type_cache" function, and this function is not
taking "no_error" input (it's only limited to
lookup_rowtype_tupdesc_internal).

One option is to pass "no_error" parameter to "lookup_type_cache"
function also, and throw error only in record_in function.
But problem is, "lookup_type_cache" has lot of references. And also
will it be good idea to throw one generic error instead of multiple
specific errors. ?

Another option is to do same for "record_in" also what you have
suggested for domain_in (an extra syscache lookup). ?

>Not sure about a
> nice solution for domain_in.  We might have to resort to an extra
> syscache lookup there, but even if we did, it should happen only
> once per query so that's not very awful.
>
>> 3. CheckFunctionValidatorAccess: This is being called from all
>> language validator functions.
>
> This part seems reasonable, since the validator functions are documented
> as something users might call, and CheckFunctionValidatorAccess seems
> like an apropos place to handle it.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Alter or rename enum value

2016-09-08 Thread Emre Hasegeli

> Given that you are now familiar with the internals of how enums are
> implemented would it be possible to continue the work and add the "ALTER
> TYPE  DROP VALUE " command?

I was thinking to try developing it, but I would be more than happy to
help by testing and reviewing, if someone else would do.  I don't
think I have enough experience to think of all details of this
feature.

The main problem that has been discussed before was the indexes.  One
way is to tackle with it is to reindex all the tables after the
operation.  Currently we are doing it when the datatype of indexed
columns change.  So it should be possible, but very expensive.

Another way, Thomas Munro suggested when we were talking about it,
would be to add another column to mark the deleted rows to the catalog
table.  We can check for this column before allowing the value to be
used.  Indexes can continue using the deleted values.  We can also
re-use those entries when someone wants to add new value to the
matching place.  It should be safe as long as we don't update the sort
order.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in two-phase transaction recovery

2016-09-08 Thread Pavan Deolasee

On Thu, Sep 8, 2016 at 12:13 PM, Michael Paquier 
wrote:

> On Wed, Sep 7, 2016 at 10:48 PM, Stas Kelvich 
> wrote:
> > Some time ago two-phase state file format was changed to have variable
> size GID,
> > but several places that read that files were not updated to use new
> offsets. Problem
> > exists in master and 9.6 and can be reproduced on prepared transactions
> with
> > savepoints.
>
> Oops and meh. This meritates an open item, and has better be fixed by
> 9.6.0. I am glad you noticed that honestly. And we had better take
> care of this issue as soon as possible.
>
>
Indeed, it's a bug. Thanks Stas for tracking it down and Michael for the
review and checking other places. I also looked at the patch and it seems
fine to me. FWIW I looked at all other places where TwoPhaseFileHeader is
referred and they look safe to me.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Stopping logical replication protocol

2016-09-08 Thread Craig Ringer

On 7 September 2016 at 15:44, Vladimir Gordiychuk  wrote:

> Fixed patch in attach.

It'd helpful if you summarize the changes made when posting revisions.

>> * There are no tests. I don't see any really simple way to test this,
>> though.
>
> I will be grateful if you specify the best way how to do it. I tests this
> patches by pgjdbc driver tests that also build on head of postgres. You
can
> check test scenario for both patches in
> PRhttps://github.com/pgjdbc/pgjdbc/pull/550
>
>
org.postgresql.replication.LogicalReplicationTest#testDuringSendBigTransactionReplicationStreamCloseNotActive
>
org.postgresql.replication.LogicalReplicationTest#testAfterCloseReplicationStreamDBSlotStatusNotActive

Yeah, testing it isn't trivial in postgres core, since of course none of
the tools know to send a client-initiated CopyDone.

My suggestion is to teach pg_recvlogical to send CopyDone on the first
SIGINT (control-C) and wait until the server ends the data-stream and
returns to command mode. A second control-C should unilaterally close the
connection and it should close after a timeout too. This will be backward
compatible with 9.4/5/6 (just with a delay on exit by sigint).

Then in a TAP test in src/test/recovery set up a logical decoding session
with test_decoding and pg_recvlogical. Do a test xact then sigint
pg_recvlogical and verify that it exits. To make sure it exited by mutual
agreement with server, probably run pg_recvlogival at a higher debug level
and text search for a message you print from pg_recvlogical when it gets
server CopyDone in the response to client CopyDone. I don't think a
different pg_recvlogical numeric exit code could be justified for this.

It sounds like more work than I think it would actually be.

-- 
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Quorum commit for multiple synchronous replication.

2016-09-08 Thread Masahiko Sawada

On Wed, Sep 7, 2016 at 4:03 AM, Josh Berkus  wrote:
> On 08/29/2016 06:52 AM, Fujii Masao wrote:
>> Also I like the following Simon's idea.
>>
>> https://www.postgresql.org/message-id/canp8+jlhfbvv_pw6grasnupw+bdk5dctu7gwpnap-+-zwvk...@mail.gmail.com
>> ---
>> * first k (n1, n2, n3) – does the same as k (n1, n2, n3) does now
>> * any k (n1, n2, n3) – would release waiters as soon as we have the
>> responses from k out of N standbys. “any k” would be faster, so is
>> desirable for performance and resilience
>
> What are we going to do for backwards compatibility, here?
>
> So, here's the dilemma:
>
> If we want to keep backwards compatibility with 9.6, then:
>
> "k (n1, n2, n3)" == "first k (n1, n2, n3)"
>
> However, "first k" is not what most users will want, most of the time;
> users of version 13, years from now, will be getting constantly confused
> by "first k" behavior when they wanted quorum.  So the sensible default
> would be:
>
> "k (n1, n2, n3)" == "any k (n1, n2, n3)"
>

+1.

"k (n1, n2, n3)" == "first k (n1, n2, n3)" doesn't break backward
compatibility but most users would think "k(n1, n2, n3)" as quorum
after introduced quorum.
I wish we can change the s_s_names syntax of 9.6 to "first k(n1, n2,
n3)" style before 9.6 releasing if we got consensus.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL

2016-09-08 Thread Simon Riggs

On 7 September 2016 at 14:58, Tom Lane  wrote:

>> That may not be perceived as a "fix" by everybody, so we should not do
>> it without an explicit agreement by many.
>
> Agreed, but I vote with Fujii-san for back-patching.

No problem with backpatching, just wanted some +1s before I did it.

Will do that tomorrow.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Quorum commit for multiple synchronous replication.

2016-09-08 Thread Masahiko Sawada

On Wed, Sep 7, 2016 at 12:47 AM, Masahiko Sawada  wrote:
> On Tue, Sep 6, 2016 at 11:08 PM, Simon Riggs  wrote:
>> On 29 August 2016 at 14:52, Fujii Masao  wrote:
>>> On Sat, Aug 6, 2016 at 6:36 PM, Petr Jelinek  wrote:
 On 04/08/16 06:40, Masahiko Sawada wrote:
>
> On Wed, Aug 3, 2016 at 3:05 PM, Michael Paquier
>  wrote:
>>
>> On Wed, Aug 3, 2016 at 2:52 PM, Masahiko Sawada 
>> wrote:
>>>
>>> I was thinking that the syntax for quorum method would use '[ ... ]'
>>> but it will be confused with '( ... )' priority method used.
>>> 001 patch adds 'Any N ( ... )' style syntax but I know that we still
>>> might need to discuss about better syntax, discussion is very welcome.
>>> Attached draft patch, please give me feedback.
>>
>>
>> I am +1 for using either "{}" or "[]" to define a quorum set, and -1
>> for the addition of a keyword in front of the integer defining for how
>> many nodes server need to wait for.
>
>
> Thank you for reply.
> "{}" or "[]" are not bad but because these are not intuitive, I
> thought that it will be hard for uses to use different method for each
> purpose.
>

 I think the "any" keyword is more explicit and understandable, also closer
 to SQL. So I would be in favor of doing that.
>>>
>>> +1
>>>
>>> Also I like the following Simon's idea.
>>>
>>> https://www.postgresql.org/message-id/canp8+jlhfbvv_pw6grasnupw+bdk5dctu7gwpnap-+-zwvk...@mail.gmail.com
>>> ---
>>> * first k (n1, n2, n3) – does the same as k (n1, n2, n3) does now
>>> * any k (n1, n2, n3) – would release waiters as soon as we have the
>>> responses from k out of N standbys. “any k” would be faster, so is
>>> desirable for performance and resilience
>>> ---
>>
>> +1
>>
>> "synchronous_method" -> "synchronization_method"
>
> Thanks, will fix.
>
>> I'm concerned about the performance of this code. Can we work out a
>> way of measuring it, so we can judge how successful we are at
>> releasing waiters quickly? Thanks
>
> I will measure the performance effect of this code.
> I'm expecting that performances are,
> 'first 1 (n1, n2)'  > 'any 1(n1, n2)' > 'first 2(n1, n2)'
> 'first 1 (n1, n2)' will be highest throughput.
>

Sorry, that's wrong.
'any 1(n1, n2)' will be highest throughput or same as 'first 1(n1, n2)'.

>> For 9.6 we implemented something that allows the DBA to define how
>> slow programs are. Previously, since 9.1 this was something specified
>> on the application side. I would like to put it back that way, so we
>> end up with a parameter on client e.g. commit_quorum = k. Forget the
>> exact parameters/user API for now, but I'd like to allow the code to
>> work with user defined settings. Thanks.
>
> I see. The parameter on client should effect for priority method as well.
> And similar to synchronous_commit, the client can specify the how much
> standbys the master waits to commit for according to synchronization
> method, even if s_s_names is defined.
>
> Regards,
>
> --
> Masahiko Sawada
> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> NTT Open Source Software Center

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autonomous transactions

2016-09-08 Thread Craig Ringer

On 8 Sep. 2016 1:38 pm, "Andres Freund"  wrote:
>
> I kind of dislike this approach for a different reason than already
> mentioned in this thread: Namely it's not re-usable for implementing
> streaming logical replication of large transaction, i.e. allow to decode
> & apply un-committed changes.  The issue there is that one really needs
> to have multiple transactions active in the same connection, which this
> approach does not allow.

I've been thinking about this recently with an eye to handling the majority
of transactions on a typical system that neither perform DDL nor are
concurrent with it.

The following might be fairly nonsensical if I've misunderstood the problem
as I've not had a chance to more than glance at it, but:

I wonder if some kind of speculative concurrent decoding+output is possible
without being able to have multiple xacts on a backend. We could use a
shared xact and snapshot for all concurrent xacts. If any of them do
anything that requires invalidations etc we dump the speculatively
processed ones and fall back to reorder buffering and serial output at
commit time until we pass the xact that caused an invalidation and don't
have more pending. Add a callback to the output plugin to tell it that
speculative decoding of an xact has been aborted.

Or maybe even just put that speculstive decoding on hold and pick up where
we left off partway through the reorder buffer when we get around to
processing it serially. That way we don't have to discard work already
done. More usefully we could avoid having to enqueue stuff into the reorder
buffer just in case we have to switch to fallback serial decoding.

Failing that:

Since logical decoding only needs read only xacts that roll back, it only
needs multiple backend private virtualxacts. They don't need
SERIALIZABLE/SSI which I think (?) means other backends don't need to be
able to wait on them. That seems simpler than what autonomous xacts would
need since there we need them exposed in PgXact, waitable-on for txid
locks, etc. Right? In the same way that historical snapshots are cut-down
maybe logical decoding xacts can be too.

I suspect that waiting for full in-process multiple xact support to do
interleaved decoding/replay will mean a long wait indeed.

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2016-09-08 Thread Masahiko Sawada

On Wed, Sep 7, 2016 at 2:39 AM, Robert Haas  wrote:
> On Tue, Sep 6, 2016 at 10:28 PM, Claudio Freire  
> wrote:
>>> The problem with this is that we allocate the entire amount of
>>> maintenance_work_mem even when the number of actual dead tuples turns
>>> out to be very small.  That's not so bad if the amount of memory we're
>>> potentially wasting is limited to ~1 GB, but it seems pretty dangerous
>>> to remove the 1 GB limit, because somebody might have
>>> maintenance_work_mem set to tens or hundreds of gigabytes to speed
>>> index creation, and allocating that much space for a VACUUM that
>>> encounters 1 dead tuple does not seem like a good plan.
>>>
>>> What I think we need to do is make some provision to initially
>>> allocate only a small amount of memory and then grow the allocation
>>> later if needed.  For example, instead of having
>>> vacrelstats->dead_tuples be declared as ItemPointer, declare it as
>>> ItemPointer * and allocate the array progressively in segments.  I'd
>>> actually argue that the segment size should be substantially smaller
>>> than 1 GB, like say 64MB; there are still some people running systems
>>> which are small enough that allocating 1 GB when we may need only 6
>>> bytes can drive the system into OOM.
>>
>> This would however incur the cost of having to copy the whole GB-sized
>> chunk every time it's expanded. It woudln't be cheap.
>
> No, I don't want to end up copying the whole array; that's what I
> meant by allocating it progressively in segments.  Something like what
> you go on to propose.
>
>> I've monitored the vacuum as it runs and the OS doesn't map the whole
>> block unless it's touched, which it isn't until dead tuples are found.
>> Surely, if overcommit is disabled (as it should), it could exhaust the
>> virtual address space if set very high, but it wouldn't really use the
>> memory unless it's needed, it would merely reserve it.
>
> Yeah, but I've seen actual breakage from exactly this issue on
> customer systems even with the 1GB limit, and when we start allowing
> 100GB it's going to get a whole lot worse.
>
>> To fix that, rather than repalloc the whole thing, dead_tuples would
>> have to be an ItemPointer** of sorted chunks. That'd be a
>> significantly more complex patch, but at least it wouldn't incur the
>> memcpy.
>
> Right, this is what I had in mind.  I don't think this is actually
> very complicated, because the way we use this array is really simple.
> We basically just keep appending to the array until we run out of
> space, and that's not very hard to implement with an array-of-arrays.
> The chunks are, in some sense, sorted, as you say, but you don't need
> to do qsort() or anything like that.  You're just replacing a single
> flat array with a data structure that can be grown incrementally in
> fixed-size chunks.
>

If we replaced dead_tuples with an array-of-array, isn't there
negative performance impact for lazy_tid_reap()?
As chunk is added, that performance would be decrease.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GiST penalty functions [PoC]

2016-09-08 Thread Andrew Borodin

>autoconf check for IEEE 754 floats
Autoconf man says folowing:
>it is safe to assume IEEE-754 in most portable code these days
https://www.gnu.org/software/autoconf/manual/autoconf.html#Floating-Point-Portability

> A union might be more readable
Here is union version of the patch. It's slower 10% than original cube
and dereference version. Have no idea why.
Select performance is improved as in v3.

Also I've investigated a bit why linear package failed. It's usefull
to think in terms of bijections, not in terms of arithmetic functions.

float 1 is 1065353216 hex 3f80
FLT_MAX / ( 8 >> 3 ) is 2139095039 hex 7f7f
FLT_MAX / ( 8 >> 2 ) is 2130706431 hex 7eff
FLT_MAX / ( 8 >> 1 ) is 2122317823 hex 7e7f
FLT_MAX / ( 8 >> 0 ) is 2113929215 hex 7dff

This realm borders are completly wrong.
That maens I used 800 thousands values to pack 2billions of values of
realms 1-3, and all other 2.1 bils of values to pack realm 0. Fragile
arithmetics was done wrong.

What I had to use was
Realm 3
INT32_MAX is nan (or something little less than nan)
3*INT32_MAX/4 is 3.689348e+19
Total little less than 512kk different values

Realm 2
3*INT32_MAX/4 is 3.689348e+19
INT32_MAX/2 is 2.00e+00
Total 512kk different values

Realm 1
INT32_MAX/2 is 2.00e+00
INT32_MAX/4 is 1.084202e-19
Total 512kk different values

Realm 0
INT32_MAX/4 is 1.084202e-19
0 is 0.00e+00
Total 512kk different values

Though I hadn't tested it yet. I'm not sure, BTW, hardcoding this
consts is a good idea.

Best regards, Andrey Borodin, Octonica & Ural Federal University.
diff --git a/contrib/cube/cube.c b/contrib/cube/cube.c
index 3feddef..ded639b 100644
--- a/contrib/cube/cube.c
+++ b/contrib/cube/cube.c
@@ -91,14 +91,16 @@ PG_FUNCTION_INFO_V1(cube_enlarge);
 /*
 ** For internal use only
 */
-int32  cube_cmp_v0(NDBOX *a, NDBOX *b);
-bool   cube_contains_v0(NDBOX *a, NDBOX *b);
-bool   cube_overlap_v0(NDBOX *a, NDBOX *b);
-NDBOX *cube_union_v0(NDBOX *a, NDBOX *b);
-void   rt_cube_size(NDBOX *a, double *sz);
-NDBOX *g_cube_binary_union(NDBOX *r1, NDBOX *r2, int *sizep);
-bool   g_cube_leaf_consistent(NDBOX *key, NDBOX *query, StrategyNumber 
strategy);
-bool   g_cube_internal_consistent(NDBOX *key, NDBOX *query, 
StrategyNumber strategy);
+static int32   cube_cmp_v0(NDBOX *a, NDBOX *b);
+static boolcube_contains_v0(NDBOX *a, NDBOX *b);
+static boolcube_overlap_v0(NDBOX *a, NDBOX *b);
+static NDBOX  *cube_union_v0(NDBOX *a, NDBOX *b);
+static float   pack_float(float actualValue, int realm);
+static voidrt_cube_size(NDBOX *a, double *sz);
+static voidrt_cube_edge(NDBOX *a, double *sz);
+static NDBOX  *g_cube_binary_union(NDBOX *r1, NDBOX *r2, int *sizep);
+static boolg_cube_leaf_consistent(NDBOX *key, NDBOX *query, 
StrategyNumber strategy);
+static boolg_cube_internal_consistent(NDBOX *key, NDBOX *query, 
StrategyNumber strategy);
 
 /*
 ** Auxiliary funxtions
@@ -419,7 +421,32 @@ g_cube_decompress(PG_FUNCTION_ARGS)
}
PG_RETURN_POINTER(entry);
 }
+/*
+ * Function to pack floats of different realms
+ * This function serves to pack bit flags inside float type
+ * Resulted value represent can be from four different "realms"
+ * Every value from realm 3 is greater than any value from realms 2,1 and 0.
+ * Every value from realm 2 is less than every value from realm 3 and greater
+ * than any value from realm 1 and 0, and so on. Values from the same realm
+ * loose two bits of precision. This technique is possible due to floating
+ * point numbers specification according to IEEE 754: exponent bits are highest
+ * (excluding sign bits, but here penalty is always positive). If float a is
+ * greater than float b, integer A with same bit representation as a is greater
+ * than integer B with same bits as b.
+ */
+static float
+pack_float(float actualValue, int realm)
+{
+   union
+   {
+   float fp;
+   int32 bits;
+   } buffer;
 
+   buffer.fp = actualValue;
+   buffer.bits = (buffer.bits >> 2) + (INT32_MAX / 4) * realm;
+   return buffer.fp;
+}
 
 /*
 ** The GiST Penalty method for boxes
@@ -428,6 +455,11 @@ g_cube_decompress(PG_FUNCTION_ARGS)
 Datum
 g_cube_penalty(PG_FUNCTION_ARGS)
 {
+   /* REALM 0: No extension is required, volume is zero, return edge   
*/
+   /* REALM 1: No extension is required, return nonzero volume 
*/
+   /* REALM 2: Volume extension is zero, return nonzero edge extension 
*/
+   /* REALM 3: Volume extension is nonzero, return it  
*/
+
GISTENTRY  *origentry = (GISTENTRY *) PG_GETARG_POINTER(0);
GISTENTRY  *newentry = (GISTENTRY *) PG_GETARG_POINTER(1);
float  *result = (float *) PG_GETARG_POINTER(2);
@@ -441,9 +473,33 @@ g_cube_penalty(PG_FUNCTION_ARGS)

Re: [HACKERS] [PATCH] Alter or rename enum value

2016-09-08 Thread Matthias Kurz

On 7 September 2016 at 22:14, Tom Lane  wrote:

> ilm...@ilmari.org (Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?=) writes:
> > Here is version 6 of the patch, which just adds RENAME VALUE with no IF
> > [NOT] EXISTS, rebased onto current master (particularly the
> > transactional ADD VALUE patch).
>
> Pushed with some adjustments.  The only thing that wasn't a matter of
> taste is you forgot to update the backend/nodes/ support functions
> for struct AlterEnumStmt.
>
> regards, tom lane
>

Thank you all for committing this feature!

Given that you are now familiar with the internals of how enums are
implemented would it be possible to continue the work and add the "ALTER
TYPE  DROP VALUE " command?

Thank you!
Regards, Matthias

Re: [HACKERS] Optimization for lazy_scan_heap

2016-09-08 Thread Masahiko Sawada

On Wed, Sep 7, 2016 at 4:11 PM, Simon Riggs  wrote:
> On 7 September 2016 at 04:13, Masahiko Sawada  wrote:
>
>> Since current HEAD could scan visibility map twice, the execution time
>> of Patched is approximately half of HEAD.
>
> Sounds good.
>
> To ensure we are doing exactly same amount of work as before, did you
> see the output of VACUUM VEROBOSE?

Sorry, the previous test result I posted was something wrong.
I rerun the performance test and results are,

* 1TB Table(visibility map size is 32MB)
HEAD : 4853.250 ms (00:04.853)
Patched : 3805.789 ms (00:03.806)

* 8TB Table(visibility map size is 257MB)
HEAD : 37853.891 ms (00:37.854)
Patched : 30153.943 ms (00:30.154)

* 32TB Table(visibility map size is 1GB)
HEAD: 151908.344 ms (02:31.908)
Patched: 120560.037 ms (02:00.560)

Since visibility map page can be cached onto shared buffer or OS cache
by first scanning, the benefit of this patch seems not to be large.

Here are outputs of VACUUM VERBOSE for 32TB table.

* HEAD
INFO:  vacuuming "public.vm_skip_test"
INFO:  "vm_skip_test": found 0 removable, 0 nonremovable row versions
in 0 out of 4294967294 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
Skipped 4294967294 all-frozen pages according to visibility map.
0 pages are entirely empty.
CPU 1.06s/148.11u sec elapsed 149.20 sec.
VACUUM
Time: 151908.344 ms (02:31.908)

* Patched
INFO:  vacuuming "public.vm_skip_test"
INFO:  "vm_skip_test": found 0 removable, 0 nonremovable row versions
in 0 out of 4294967294 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
Skipped 4294967294 all-frozen pages according to visibility map.
0 pages are entirely empty.
CPU 0.65s/117.15u sec elapsed 117.81 sec.
VACUUM
Time: 120560.037 ms (02:00.560)

Current manual vacuum doesn't output how may all_frozen pages we
skipped according to visibility map.
That's why I attached 0001 patch which makes the manual vacuum emit
such information.

>
> Can we produce a test that verifies the result patched/unpatched?
>

Attached test shell script but because I don't have such a large disk,
I've measured performance benefit using by something like unofficial
way.

To make a situation where table is extremly large and make
corresponding visibility map, I applied 0002 patch and made a fake
visibility map.
Attached 0002 patch adds GUC parameter cheat_vacuum_table_size which
artificially defines table size being vacuumed .
For example, If we do,
  SET cheat_vacuum_table_size = 4;
  VACUUM vm_test;
then in lazy_scan_heap, vm_test table is processed as an
8TB(MaxBlockNumber / 4) table.

Attached test shell script makes fake visibility map files and
executes the performance tests for 1TB, 8TB and 32TB table.
Please confirm it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
From bbda8e4144101a41804865d88592a15bf0150340 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada 
Date: Thu, 8 Sep 2016 11:24:21 -0700
Subject: [PATCH 1/2] lazy_scan_heap outputs how many all_frozen pages are
 skipped.

---
 src/backend/commands/vacuumlazy.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index b5fb325..20d8334 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -1331,6 +1331,10 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	"Skipped %u pages due to buffer pins.\n",
 	vacrelstats->pinskipped_pages),
 	 vacrelstats->pinskipped_pages);
+	appendStringInfo(, ngettext("Skipped %u all-frozen page according to visibility map.\n",
+	"Skipped %u all-frozen pages according to visibility map.\n",
+	vacrelstats->frozenskipped_pages),
+	 vacrelstats->frozenskipped_pages);
 	appendStringInfo(, ngettext("%u page is entirely empty.\n",
 	"%u pages are entirely empty.\n",
 	empty_pages),
-- 
2.8.1

From 01e493a44ba6c5bb9b59a1016bfce4b9bcf75e95 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada 
Date: Thu, 8 Sep 2016 11:36:28 -0700
Subject: [PATCH 2/2] Add cheat_vacuum_table_size.

---
 contrib/pg_visibility/pg_visibility.c |  6 +-
 src/backend/commands/vacuumlazy.c |  9 +++--
 src/backend/utils/misc/guc.c  | 10 ++
 src/include/commands/vacuum.h |  1 +
 4 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index 7034066..37ace99 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -12,6 +12,7 @@
 #include "access/visibilitymap.h"
 #include "catalog/pg_type.h"
 #include "catalog/storage_xlog.h"
+#include "commands/vacuum.h"
 #include "funcapi.h"
 #include

Re: [HACKERS] WAL consistency check facility

2016-09-08 Thread Michael Paquier

On Wed, Sep 7, 2016 at 7:22 PM, Kuntal Ghosh  wrote:
> Hello,

Could you avoid top-posting please? More reference here:
http://www.idallen.com/topposting.html

> - If WAL consistency check is enabled for a rmgrID, we always include
> the backup image in the WAL record.

What happens if wal_consistency has different settings on a standby
and its master? If for example it is set to 'all' on the standby, and
'none' on the master, or vice-versa, how do things react? An update of
this parameter should be WAL-logged, no?

> - I've extended the RmgrTable with a new function pointer
> rm_checkConsistency, which is called after rm_redo. (only when WAL
> consistency check is enabled for this rmgrID)
> - In each rm_checkConsistency, both backup pages and buffer pages are
> masked accordingly before any comparison.

This leads to heavy code duplication...

> - In postgresql.conf, a new guc variable named 'wal_consistency' is
> added. Default value of this variable is 'None'. Valid values are
> combinations of Heap2, Heap, Btree, Hash, Gin, Gist, Sequence, SPGist,
> BRIN, Generic and XLOG. It can also be set to 'All' to enable all the
> values.

Lower-case is the usual policy for parameter values for GUC parameters.

> - In recovery tests (src/test/recovery/t), I've added wal_consistency
> parameter in the existing scripts. This feature doesn't change the
> expected output. If there is any inconsistency, it can be verified in
> corresponding log file.

I am afraid that just generating a WARNING message is going to be
useless for the buildfarm. If we want to detect errors, we could for
example have an additional GUC to trigger an ERROR or a FATAL, taking
down the cluster, and allowing things to show in red on a platform.

> Results
> 
>
> I've tested with installcheck and installcheck-world in master-standby
> set-up. Followings are the configuration parameters.

So you tested as well the recovery tests, right?

> I got two types of inconsistencies as following:
>
> 1. For Btree/UNLINK_PAGE_META, btpo_flags are different. In backup
> page, BTP_DELETED and BTP_LEAF both the flags are set, whereas after
> redo, only BTP_DELETED flag is set in buffer page. I assume that we
> should clear all btpo_flags before setting BTP_DELETED in
> _bt_unlink_halfdead_page().

The page is deleted, it does not matter, so you could just mask all
the flags for a deleted page...
[...]
+   /*
+* Mask everything on a DELETED page.
+*/
+   if (((BTPageOpaque) PageGetSpecialPointer(page_norm))->btpo_flags
& BTP_DELETED)
And that's what is happening.

> 2. For BRIN/UPDATE+INIT, block numbers (in rm_tid[0]) are different in
> REVMAP page. This happens only for two cases. I'm not sure what the
> reason can be.

Hm? This smells like a block reference bug. What are the cases you are
referring to?

> I haven't done sufficient tests yet to measure the overhead of this
> modification. I'll do that next.

I did a first pass on your patch, and I think that things could be
really reduced. There is much code duplication, but see below for the
details..

 #include "access/xlogutils.h"
-
+#include "storage/bufmask.h"
I know that I am a noisy one on the matter, but please double-check
for such useless noise in your patch. And there is not only one.

+   newwalconsistency = (bool *) guc_malloc(ERROR,(RM_MAX_ID + 1)*sizeof(bool));
This spacing is not project-style. You may want to go through that:
https://www.postgresql.org/docs/devel/static/source.html

+$node_master->append_conf(
+   'postgresql.conf', qq(
+wal_consistency = 'All'
+));
Instead of duplicating that 7 times, you could just do it once in the
init() method of PostgresNode.pm. This really has meaning if enabled
by default.

+   /*
+* Followings are the rmids which can have backup blocks.
+* We'll enable this feature only for these rmids.
+*/
+   newwalconsistency[RM_HEAP2_ID] = true;
+   newwalconsistency[RM_HEAP_ID] = true;
+   newwalconsistency[RM_BTREE_ID] = true;
+   newwalconsistency[RM_HASH_ID] = true;
+   newwalconsistency[RM_GIN_ID] = true;
+   newwalconsistency[RM_GIST_ID] = true;
+   newwalconsistency[RM_SEQ_ID] = true;
+   newwalconsistency[RM_SPGIST_ID] = true;
+   newwalconsistency[RM_BRIN_ID] = true;
+   newwalconsistency[RM_GENERIC_ID] = true;
+   newwalconsistency[RM_XLOG_ID] = true;
Here you can just use MemSet with RM_MAX_ID and simplify this code maintenance.

+   for(i = 0; i < RM_MAX_ID + 1 ; i++)
+   newwalconsistency[i] = false;
+   break;
Same here you can just use MemSet.

+   else if (pg_strcasecmp(tok, "NONE") == 0)
[...]
+   else if (pg_strcasecmp(tok, "ALL") == 0)
It seems to me that using NONE or ALL with any other keywords should
not be allowed.

+   if (pg_strcasecmp(tok, "Heap2") == 0)
+   {
+   newwalconsistency[RM_HEAP2_ID]

Re: [HACKERS] Sample configuration files

2016-09-08 Thread Vik Fearing

On 08/29/2016 03:34 AM, Vik Fearing wrote:
> We have sample configuration files for postgresql.conf and
> recovery.conf, but we do not have them for contrib modules.  This patch
> attempts to add them.
> 
> Although the patch puts the sample configuration files in the tree, it
> doesn't try to install them.  That's partly because I think it would
> need an extension version bump and that doesn't seem worth it.
> 
> Also, while writing this patch, it crossed my mind that the plpgsql
> variables should probably be in the main postgresql.conf file because we
> install it by default.  I will happily write that patch if desired,
> independently of this patch.
> 
> Current as of 26fa446, added to next commitfest.

I noticed that this patch has been marked Waiting on Author with no
comment.  Peter, what more should I be doing right now while waiting for
Martín's review?
-- 
Vik Fearing  +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GiST penalty functions [PoC]

2016-09-08 Thread Heikki Linnakangas


On 09/08/2016 09:39 AM, Михаил Бахтерев wrote:

Excuse me for intervention.

It depends. For instance, i run PostgreSQL on the modern MIPS CPU, which
does not have sqrt support.

But you are right, it is supported in most cases. And if execution speed
of this very fuction is of concern, sqrtf(x) should be used instead of
sqrt(x).

Despite this, Andrew's solution gives more accurate representation of
values. And as far as i understand, this improves overall performance by
decreasing the overall amount of instructions, which must be executed.


BTW, I would be OK with the bit-twiddling hack, if we had an autoconf 
check for IEEE 754 floats, and a graceful fallback for other systems. 
The fallback could be simply the current penalty function. You wouldn't 
get the benefit from the better penalty function on non-IEEE systems, 
then, but it would still be correct.



It is possible to speed up Andrew's implementation and get rid of
warnings by using bit-masks and unions. Something like:

  union {
float f;
struct {
  unsigned int mantissa:23, exponent:8, sign:1;
} bits;
  }

I am sorry, i have no time to check this. But it is common wisdom to
avoid pointer-based memory accesses in high-performance code, as they
create a lot of false write-to-read dependencies.


The compiler should be smart enough to generate the same instructions 
either way. A union might be more readable, though. (We don't need to 
extract the mantissa, exponent and sign, so a union of float and int32 
would do.)


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Supporting SJIS as a database encoding

2016-09-08 Thread Tsunakawa, Takayuki

From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kyotaro
> HORIGUCHI
> 
> $ time psql postgres -c 'select t.a from t, generate_series(0, )' >
> /dev/null
> 
> real  0m22.696s
> user  0m16.991s
> sys   0m0.182s>
> 
> Using binsearch the result for the same operation was
> real  0m35.296s
> user  0m17.166s
> sys   0m0.216s
> 
> Returning in UTF-8 bloats the result string by about 1.5 times so it doesn't
> seem to make sense comparing with it. But it takes real = 47.35s.

Cool, 36% speedup!  Does this difference vary depending on the actual 
characters used, e.g. the speedup would be greater if most of the characters 
are ASCII?

Regards
Takayuki Tsunakawa




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Is tuplesort_heap_siftup() a misnomer?

2016-09-08 Thread Heikki Linnakangas


On 09/08/2016 03:36 AM, Peter Geoghegan wrote:

On Wed, Sep 7, 2016 at 2:42 PM, Tom Lane  wrote:

The reason it's called siftup is that that's what Knuth calls it.
See Algorithm 5.2.3H (Heapsort), pp 146-147 in the first edition of
Volume 3; tuplesort_heap_siftup corresponds directly to steps H3-H8.


I see that Knuth explains that these steps form the siftup procedure.
What steps does tuplesort_heap_insert correspond to, if any?


Hmm. The loop in tuplesort_heap_siftup() indeed looks the same as 
Knuth's steps H3-H8. But the start is different. Knuth's algorithm 
begins with the situation that the top node of the sub-heap is in wrong 
place, and it pushes it downwards, until the whole sub-heap satisfies 
the heap condition again. But tuplesort_heap_siftup() begins by moving 
the last value in the heap to the top, and then it does the H3-H8 steps 
to move it to the right place.


Using Wikipedia's terms, tuplesort_heap_siftup() function as whole is 
the Extract operation. And the loop inside it corresponds to the 
Max-Heapify operation, which is the same as Knuth's "siftup".


I still think tuplesort_heap_siftup is a confusing name, although I'm 
not sure that Peter's "compact" is much better. I suggest that we rename 
it to tuplesort_heap_delete_top(). In comments within the function, 
explain that the *loop* corresponds to the "siftup" in Knuth's book.


Interestingly, as Knuth explains the siftup algorithm, it performs a 
"replace the top" operation, rather than "remove the top". But we use it 
to remove the top, by moving the last element to the top and running the 
algorithm after that. Peter's other patch, to change the way we 
currently replace the top node, to do it in one pass rather than as 
delete+insert, is actually Knuth's siftup algorithm.


Peter, looking at your "displace" patch in this light, I think 
tuplesort_heap_root_displace() and tuplesort_heap_delete_top() (as I'm 
calling it now), should share a common subroutine. Displace replaces the 
top node with the new node passed as argument, and then calls the 
subroutine to push it down to the right place. Delete_top replaces the 
top node with the last value in the heap, and then calls the subroutine. 
Or perhaps delete_top should remove the last value from the heap, and 
then call displace() with it, to re-insert it.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Suggestions for first contribution?

2016-09-08 Thread Noah Misch

On Wed, Sep 07, 2016 at 05:14:44PM +0300, Aleksander Alekseev wrote:
> Here is another idea for a contribution - refactoring.

Refactoring is not a good early contribution.  Refactoring is more likely to
succeed once you internalize community code practices through much study of
functional patches.  However, even committer-initiated refactoring has a high
failure rate.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel build with MSVC

2016-09-08 Thread Michael Paquier

On Thu, Sep 8, 2016 at 3:54 PM, Christian Ullrich  wrote:
> Much apologizings for coming in late again, but I just realized it would be
> better if the user-controlled flags came after all predefined options the
> user might want to override. Right now that is only /verbosity in both build
> and clean operations.
>
> Patch attached, also reordering the ecpg-clean command line in clean.bat to
> match the others that have the project file first.

-"msbuild $buildwhat.vcxproj $msbflags /verbosity:normal
/p:Configuration=$bconf"
+"msbuild $buildwhat.vcxproj /verbosity:normal $msbflags
/p:Configuration=$bconf"

Why not... If people are willing to switch to /verbosity:detailed and
double the amount of build time...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel build with MSVC

2016-09-08 Thread Christian Ullrich


* Noah Misch wrote:


Committed.


Much apologizings for coming in late again, but I just realized it would 
be better if the user-controlled flags came after all predefined options 
the user might want to override. Right now that is only /verbosity in 
both build and clean operations.


Patch attached, also reordering the ecpg-clean command line in clean.bat 
to match the others that have the project file first.


--
Christian


>From 09a5f3945e2b87b56ca3525a56db970e9ecf8ffd Mon Sep 17 00:00:00 2001
From: Christian Ullrich 
Date: Thu, 8 Sep 2016 08:34:45 +0200
Subject: [PATCH] Let MSBFLAGS be used to override predefined options.

---
 src/tools/msvc/build.pl  | 4 ++--
 src/tools/msvc/clean.bat | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/tools/msvc/build.pl b/src/tools/msvc/build.pl
index 5273977..a5469cd 100644
--- a/src/tools/msvc/build.pl
+++ b/src/tools/msvc/build.pl
@@ -54,7 +54,7 @@ elsif (uc($ARGV[0]) ne "RELEASE")
 if ($buildwhat and $vcver >= 10.00)
 {
system(
-   "msbuild $buildwhat.vcxproj $msbflags /verbosity:normal 
/p:Configuration=$bconf"
+   "msbuild $buildwhat.vcxproj /verbosity:normal $msbflags 
/p:Configuration=$bconf"
);
 }
 elsif ($buildwhat)
@@ -63,7 +63,7 @@ elsif ($buildwhat)
 }
 else
 {
-   system("msbuild pgsql.sln $msbflags /verbosity:normal 
/p:Configuration=$bconf");
+   system("msbuild pgsql.sln /verbosity:normal $msbflags 
/p:Configuration=$bconf");
 }
 
 # report status
diff --git a/src/tools/msvc/clean.bat b/src/tools/msvc/clean.bat
index e21e37f..b972ddf 100755
--- a/src/tools/msvc/clean.bat
+++ b/src/tools/msvc/clean.bat
@@ -107,6 +107,6 @@ REM for /r %%f in (*.sql) do if exist %%f.in del %%f
 cd %D%
 
 REM Clean up ecpg regression test files
-msbuild %MSBFLAGS% /NoLogo ecpg_regression.proj /t:clean /v:q
+msbuild ecpg_regression.proj /NoLogo /v:q %MSBFLAGS% /t:clean 
 
 goto :eof
-- 
2.10.0.windows.1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Truncating/vacuuming relations on full tablespaces

2016-09-08 Thread Haribabu Kommi

On Mon, Jun 20, 2016 at 9:28 PM, Asif Naeem  wrote:

> Thank you for useful suggestions. PFA patch, I have tried to cover all the
> points mentioned.
>
>>
>>
Patch needs rebase, it is failing to apply on latest master.
And also there are some pending comment fix from Robert.

Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] Bug in two-phase transaction recovery

2016-09-08 Thread Michael Paquier

On Wed, Sep 7, 2016 at 10:48 PM, Stas Kelvich  wrote:
> Some time ago two-phase state file format was changed to have variable size 
> GID,
> but several places that read that files were not updated to use new offsets. 
> Problem
> exists in master and 9.6 and can be reproduced on prepared transactions with
> savepoints.

Oops and meh. This meritates an open item, and has better be fixed by
9.6.0. I am glad you noticed that honestly. And we had better take
care of this issue as soon as possible.

> For example:
>
> create table t(id int);
> begin;
> insert into t values (42);
> savepoint s1;
> insert into t values (43);
> prepare transaction 'x';
> begin;
> insert into t values (142);
> savepoint s1;
> insert into t values (143);
> prepare transaction 'y’;
>
> and restart the server. Startup process will hang. Fix attached.

In crash recovery this is very likely to fail with an assertion if
those are enabled:
TRAP: FailedAssertion("!(TransactionIdFollows(subxid, xid))", File:
"twophase.c", Line: 1767)
And the culprit is e0694cf9, which makes this open item owned by Simon.

I also had a look at the patch you are proposing, and I think that's a
correct fix. I also looked at the other code paths scanning the 2PC
state file and did not notice extra problems. The good news is that
the 2PC file generation is not impacted, only its scan at recovery is,
so the impact of the bug is limited for existing 9.6 deployments where
both 2PC and subtransactions are involved.

> Also while looking at StandbyRecoverPreparedTransactions() i’ve noticed that 
> buffer
> for 2pc file is allocated in TopMemoryContext but never freed. That probably 
> exists
> for a long time.

@@ -1886,6 +1886,8 @@ StandbyRecoverPreparedTransactions(bool overwriteOK)
Assert(TransactionIdFollows(subxid, xid));
SubTransSetParent(xid, subxid, overwriteOK);
}
+
+   pfree(buf);
}
This one is a good catch. I have checked also the other callers of
ReadTwoPhaseFile but I am not seeing any other leak. That's a leak,
not critical though so applying it only on HEAD would be enough IMO.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Supporting SJIS as a database encoding

2016-09-08 Thread Kyotaro HORIGUCHI

Hello,

At Wed, 07 Sep 2016 16:13:04 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20160907.161304.112519789.horiguchi.kyot...@lab.ntt.co.jp>
> > Implementing radix tree code, then redefining the format of mapping table
> > > to suppot radix tree, then modifying mapping generator script are needed.
> > > 
> > > If no one oppse to this, I'll do that.

So, I did that as a PoC. The radix tree takes a little less than
100k bytes (far smaller than expected:) and it is defnitely
faster than binsearch.


The attached patch does the following things.

- Defines a struct for static radix tree
  (utf_radix_tree). Currently it supports up to 3-byte encodings.

- Adds a map generator script UCS_to_SJIS_radix.pl, which
  generates utf8_to_sjis_radix.map from utf8_to_sjis.map.

- Adds a new conversion function utf8_to_sjis_radix.

- Modifies UtfToLocal so as to allow map to be NULL.

- Modifies utf8_to_sjis to use the new conversion function
  instead of ULmapSJIS.


The followings are to be done.

- utf8_to_sjis_radix could be more generic.

- SJIS->UTF8 is not implemented but it would be easily done since
  there's no difference in using the radix tree mechanism.
  (but the output character is currently assumed to be 2-byte long)

- It doesn't support 4-byte codes so this is not applicable to
  sjis_2004. Extending the radix tree to support 4-byte wouldn't
  be hard.


The following is the result of a simple test.

=# create table t (a text); alter table t alter column a storage plain;
=# insert into t values ('... 7130 cahracters containing (I believe) all 
characters in SJIS encoding');
=# insert into t values ('... 7130 cahracters containing (I believe) all 
characters in SJIS encoding');

# Doing that twice is just my mistake.

$ export PGCLIENTENCODING=SJIS

$ time psql postgres -c '
$ psql -c '\encoding' postgres
SJIS


$ time psql postgres -c 'select t.a from t, generate_series(0, )' > 
/dev/null

real0m22.696s
user0m16.991s
sys 0m0.182s>

Using binsearch the result for the same operation was 
real0m35.296s
user0m17.166s
sys 0m0.216s

Returning in UTF-8 bloats the result string by about 1.5 times so
it doesn't seem to make sense comparing with it. But it takes
real = 47.35s.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


0001-Use-radix-tree-to-encoding-characters-of-Shift-JIS.patch.gz
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

< 1 2

101 - 159 of 159 matches

Mail list logo