Re: proposal: alternative psql commands quit and exit

2018-01-13 Thread Chapman Flack
On 01/13/18 21:36, Everaldo Canuto wrote:

> I don't wanna be irritating but here every postgres user I found really
> can't understand how the original patch was not accepted, people against it
> did not show good reasons.

As I was watching this thread go by, the reasons being shown seemed good
to me, and they were being raised by some of the people most responsible
(as one quickly learns by scanning old archives), over the years, for
postgres being the excellent thing that it is, so it would not be my
first instinct to dismiss them.

One of the things I have come to really appreciate about this community
(in which I am still a relative newcomer) is that when disagreements arise,
you can see them being worked through by consideration of technical points
and somewhat measurable principles. In that light, if there's a better
conclusion you think is worth advocating, that would fit right in if done
in ways that respond to, and address in some way, the points already
raised; that adds more to the conversation than just saying they were
not "good".

-Chap



PQHost() undefined behavior if connecting string contains both host and hostaddr types

2018-01-13 Thread Haribabu Kommi
While working on [1], we find out the inconsistency in PQHost() behavior
if the connecting string that is passed to connect to the server contains
multiple hosts with both host and hostaddr types. For example,

host=host1,host2 hostaddr=127.0.0.1,127.0.0.1 port=5434,5432

As the hostaddr is given preference when both host and hostaddr is
specified, so the connection type for both addresses of the above
conninfo is CHT_HOST_ADDRESS. So the PQhost() returns the
conn->pghost value i.e "host1,host2" instead of the actual host that
is connected.

Instead of checking the connection type while returning the host
details, it should check whether the host is NULL or not? with this
change it returns the expected value for all the connection types.

Patch attached.

[1] -
https://www.postgresql.org/message-id/CAJrrPGeE0zY03phJByjofnN2TV9bSf4fu2yy%3Dv4DyWw_Dj%3DbRQ%40mail.gmail.com

Regards,
Hari Babu
Fujitsu Australia


PQhost-update-to-return-proper-host-details_v2.patch
Description: Binary data


Re: proposal: alternative psql commands quit and exit

2018-01-13 Thread Everaldo Canuto
On Sat, Jan 13, 2018 at 11:31 PM, Tom Lane  wrote:

> Vik Fearing  writes:
> > After re-reading the thread, I think the original patch is optimal.
>
> Hm, few other people thought that.
>
>
I don't wanna be irritating but here every postgres user I found really
can't understand how the original patch was not accepted, people against it
did not show good reasons.

> In particular, I currently hate how pasting a query with tabs screws
> > everything up with unhelpful completions.  I don't want to add to that
> > with the column example Tom exposed in [1] and especially with the
> > plpgsql example Vladimir Svedov exposed in [2].
>
> That seems like sort of an independent problem, though.  I wonder
> whether it's possible to find out from libreadline that the input
> text came from a paste rather than being typed.
>
> > I quite like the idea of blanking the database name in PROMPT2 as
> > suggested by Daniel in [3], I would be happy to work on that patch if
> > others agree.
>
> I think that the part of that about inserting a line number is important.
> What about combining the two ideas: in PROMPT2, the database name is
> *replaced* by a line number, while (trying to) keep the width the same?
>

If you guys can review the supposed consensus about this patch, I can spend
some time and implement this PROMPT2 idea on different patch submission.

Regards, Everaldo


Re: proposal: alternative psql commands quit and exit

2018-01-13 Thread Stephen Frost
Tom, Vik, all,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Vik Fearing  writes:
> > In particular, I currently hate how pasting a query with tabs screws
> > everything up with unhelpful completions.  I don't want to add to that
> > with the column example Tom exposed in [1] and especially with the
> > plpgsql example Vladimir Svedov exposed in [2].
> 
> That seems like sort of an independent problem, though.  I wonder
> whether it's possible to find out from libreadline that the input
> text came from a paste rather than being typed.

I'm not sure exactly how, but I know that some software is able to
figure that out (irssi, in particular).  Might be just a timing-based
heuristic though.

> > I quite like the idea of blanking the database name in PROMPT2 as
> > suggested by Daniel in [3], I would be happy to work on that patch if
> > others agree.
> 
> I think that the part of that about inserting a line number is important.
> What about combining the two ideas: in PROMPT2, the database name is
> *replaced* by a line number, while (trying to) keep the width the same?

Seems like a good approach to me.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: proposal: alternative psql commands quit and exit

2018-01-13 Thread Tom Lane
Vik Fearing  writes:
> After re-reading the thread, I think the original patch is optimal.

Hm, few other people thought that.

> In particular, I currently hate how pasting a query with tabs screws
> everything up with unhelpful completions.  I don't want to add to that
> with the column example Tom exposed in [1] and especially with the
> plpgsql example Vladimir Svedov exposed in [2].

That seems like sort of an independent problem, though.  I wonder
whether it's possible to find out from libreadline that the input
text came from a paste rather than being typed.

> I quite like the idea of blanking the database name in PROMPT2 as
> suggested by Daniel in [3], I would be happy to work on that patch if
> others agree.

I think that the part of that about inserting a line number is important.
What about combining the two ideas: in PROMPT2, the database name is
*replaced* by a line number, while (trying to) keep the width the same?

regards, tom lane



Re: proposal: alternative psql commands quit and exit

2018-01-13 Thread Stephen Frost
Vik,

* Vik Fearing (vik.fear...@2ndquadrant.com) wrote:
> On 01/13/2018 10:52 PM, Stephen Frost wrote:
> > * Vik Fearing (vik.fear...@2ndquadrant.com) wrote:
> >> On 01/09/2018 02:29 AM, Alvaro Herrera wrote:
> >>> Everaldo Canuto wrote:
>  Change my patch will make psql different from other sql clients I use
>  (sqlplus and mysql).
> >>>
> >>> Well, I don't think we're too hot about copying their behavior exactly.
> >>> This thread discussed the behavior at length and a consensus was found
> >>> after much thinking.  No one is imposing anything.
> >>>
>  I am happy with my custom version of psql so you can cancel/delete/ignore
>  my patch.
> >>>
> >>> We're not happy to lose a potential contributor, but of course that's
> >>> your prerogative.
> >>>
> >>> Can we find somebody else willing to implement the agreed-upon behavior?
> >>
> >> I'm currently reviewing two other patches in the commitfest (in my
> >> limited time).  When I'm done with those I would be happy to pick this
> >> up if no one else has.
> > 
> > Looks like the status of this should really be 'waiting on author' (even
> > if the current author isn't really interested in updating it, sounds
> > like Vik might be), so I've adjusted its status.
> > 
> > I'm also of the opinion that this would be good to have, though I don't
> > know that I'll have time to bring it over the line for v11.
> 
> After re-reading the thread, I think the original patch is optimal.  I
> don't like what the consensus is and so I have no interest in
> implementing it.
> 
> In particular, I currently hate how pasting a query with tabs screws
> everything up with unhelpful completions.  I don't want to add to that
> with the column example Tom exposed in [1] and especially with the
> plpgsql example Vladimir Svedov exposed in [2].

The issue expressed in [1] was pretty clearly addressed, imv, by only
printing out a message instead of actually aborting things as suggested
later by Robert.  That seem approach can be used to deal with the
plpgsql 'exit;' case mentioned in your [2].

> I quite like the idea of blanking the database name in PROMPT2 as
> suggested by Daniel in [3], I would be happy to work on that patch if
> others agree.

I'm not against that as it's largely orthogonal and seems pretty
reasonable by itself.  I don't know that having it really takes anything
away from having help/quit/exit be detected by psql and a message sent
back to the user that maybe they are looking for help.

If this is picked up by psql then we can also do things like *not* throw
the message out there when we know what we're working with is from a \ef
or similar, so this would only happen if you're actually pasting some
big function in that happens to have 'exit;' in it, and then you get a
message back, but otherwise the function definition works just fine and
you can ignore it and move on...

Maybe we should have a \pset parameter to disable this too, so that
people who really don't have to see the warnings kicked back have a way
to get rid of them.  That certainly seems like it'd be pretty simple to
do and something we could include in the help output...

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: proposal: alternative psql commands quit and exit

2018-01-13 Thread Vik Fearing
On 01/13/2018 10:52 PM, Stephen Frost wrote:
> Vik, all,
> 
> * Vik Fearing (vik.fear...@2ndquadrant.com) wrote:
>> On 01/09/2018 02:29 AM, Alvaro Herrera wrote:
>>> Everaldo Canuto wrote:
 Change my patch will make psql different from other sql clients I use
 (sqlplus and mysql).
>>>
>>> Well, I don't think we're too hot about copying their behavior exactly.
>>> This thread discussed the behavior at length and a consensus was found
>>> after much thinking.  No one is imposing anything.
>>>
 I am happy with my custom version of psql so you can cancel/delete/ignore
 my patch.
>>>
>>> We're not happy to lose a potential contributor, but of course that's
>>> your prerogative.
>>>
>>> Can we find somebody else willing to implement the agreed-upon behavior?
>>
>> I'm currently reviewing two other patches in the commitfest (in my
>> limited time).  When I'm done with those I would be happy to pick this
>> up if no one else has.
> 
> Looks like the status of this should really be 'waiting on author' (even
> if the current author isn't really interested in updating it, sounds
> like Vik might be), so I've adjusted its status.
> 
> I'm also of the opinion that this would be good to have, though I don't
> know that I'll have time to bring it over the line for v11.

After re-reading the thread, I think the original patch is optimal.  I
don't like what the consensus is and so I have no interest in
implementing it.

In particular, I currently hate how pasting a query with tabs screws
everything up with unhelpful completions.  I don't want to add to that
with the column example Tom exposed in [1] and especially with the
plpgsql example Vladimir Svedov exposed in [2].

I quite like the idea of blanking the database name in PROMPT2 as
suggested by Daniel in [3], I would be happy to work on that patch if
others agree.

[1] https://postgr.es/m/31478.1512760...@sss.pgh.pa.us

[2]
https://postgr.es/m/CADqDLE-hgRA0qLeiC3a3DJ41-yR5kj=k0o679K=zk32vlza...@mail.gmail.com

[3]
https://postgr.es/m/ad75eab6-71ea-4e1f-8aa5-68876ff50...@manitou-mail.org
-- 
Vik Fearing  +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support



Re: WIP: a way forward on bootstrap data

2018-01-13 Thread Tom Lane
John Naylor  writes:
> On 1/12/18, Alvaro Herrera  wrote:
>> Pushed 0001 with some changes of my own.  I'm afraid I created a few
>> conflicts for the later patches in your series; please rebase.

> Attached, rebased over 255f14183ac.

I decided that I wasn't going to get answers about the things I cared
about without looking through the patchset, so I've now done so,
in a once-over-lightly fashion.  Here's a quick summary of what it
does, for those who may be equally confused, and then some comments
(not really a review).

The patch removes DATA and (SH)DESCR lines from all the catalog/pg_*.h
files, as well as the #defines for OID-value macros, and puts that
information into pg_*.dat files corresponding to the .h files, in a form
that is easily readable and writable by Perl scripts.  Comments associated
with this info are also transferred to the .dat files, and the scripts
that can rewrite the .dat files are able to preserve the comments.

genbki.pl is able to generate postgres.bki and other initdb input files
directly from the .dat files.  It also creates a single header file
src/include/catalog/oid_symbols.h that contains all of the OID-value
macros that were formerly in the pg_*.h files.

The patch gets rid of the need to write hard-wired OIDs when referencing
operators, opfamilies, etc in the .dat files; now you can write their
names instead.  genbki.pl will look up the names and substitute numeric
OIDs in the emitted postgres.bki file.  There are also provisions to
shorten the .dat files through the use of abbreviated field names,
default values for fields, and some other less-general techniques.

--

OK, now some comments:

I'm not sure about the decision to move all the OID macros into
one file; that seems like namespace pollution.  It's especially
odd that you did that but did not consolidate fmgroids.h with
the macros for symbols from other catalogs.  Now it's true that
we need all those symbols to be distinct, since it needs to be
possible for .c files to include any combination of pg_*.h headers,
but I don't think it's an especially good idea that you have to
include all of them or none.  Even if we're willing to put up with
that namespace pollution for backend code, there are clients that
currently include pg_*.h headers to get some of those macros, and
they're likely to be less happy about it.

The design I'd kind of imagined was one generated file of #define's
per pg_*.h file, not just one giant one.

It would be really nice, also, if the attribute number macros
(Natts_pg_proc, Anum_pg_proc_proname, etc) could be autogenerated.
Manually renumbering those is one of the bigger pains in the rear
when adding catalog columns.  It was less of a pain than adjusting
the DATA lines of course, so I never figured it was worth doing
something about in isolation --- but with this infrastructure in
place, that's manual work we shouldn't have to do anymore.

Another thing that I'd sort of hoped might happen from this patchset
is to cure the problem of keeping some catalog headers safe for
client-side inclusion, because some clients want the OID value macros
and/or macros for column values (eg PROVOLATILE_IMMUTABLE), so they
currently have to #include those headers or else hard-code the values.
We've worked around that to date with ad-hoc solutions like splitting
function declarations out to pg_*_fn.h files, but I never liked that
much.  With the OID value macros being moved out to separate generated
file(s), there's now a possibility that we could fix this once and for all
by making client-side code include those file(s) not pg_type.h et al
themselves.  But we'd need a way to put the column-value macros into
those files too; maybe that's too messy to make it practical.

The .dat files need to have header comments that follow project
conventions, in particular they need to contain copyright statements.
Likewise for generated files.

I've got zero faith that the .h files will hold still long enough
for these patches to be reviewed and applied.  The ones that touch
significant amounts of data need to be explained as "run this script
on the current data", rather than presented as static diffs.

I'm not really thrilled by the single-purpose "magic" behaviors added
in 0007, such as computing prosrc from proname.  I think those will
add more confusion than they're worth.

In 0010, you relabel the types of some OID columns so that genbki.pl
will know which lookup to apply to them.  That's not such a problem for
the relabelings that are just macros and genbki.pl converts back to
type OID in the .bki file.  But you also did things like s/Oid/regtype/,
and that IS a problem because it will affect what client code sees in
those catalog columns.  We've discussed changing those columns to
regfoo types in the past, and decided not to, because of the likelihood
of breaking client queries.  I do not think this patch gets to change
that policy.  So the way to identify the lookup 

Re: [HACKERS] Crash on promotion when recovery.conf is renamed

2018-01-13 Thread Thomas Munro
On Fri, Jan 12, 2018 at 4:35 PM, Stephen Frost  wrote:
> Appears to compile and pass make check-world still (not sure why Thomas'
> bot currently thinks the build fails, since it worked here..).

It looks good now.  There was a brief window when all Travis CI builds
said this while updating various packages at startup:

W: GPG error: http://repo.mongodb.org/apt/ubuntu
trusty/mongodb-org/3.4 Release: The following signatures were invalid:
KEYEXPIRED 1515625755
W: The repository 'http://repo.mongodb.org/apt/ubuntu
trusty/mongodb-org/3.4 Release' is not signed.

That's because the apt.sources happens to have that repository in it.
I didn't look into it because it fixed itself at the next rebuild.  I
speculate that MongoDB's package repository is eventually consistent.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: proposal: alternative psql commands quit and exit

2018-01-13 Thread Stephen Frost
Vik, all,

* Vik Fearing (vik.fear...@2ndquadrant.com) wrote:
> On 01/09/2018 02:29 AM, Alvaro Herrera wrote:
> > Everaldo Canuto wrote:
> >> Change my patch will make psql different from other sql clients I use
> >> (sqlplus and mysql).
> > 
> > Well, I don't think we're too hot about copying their behavior exactly.
> > This thread discussed the behavior at length and a consensus was found
> > after much thinking.  No one is imposing anything.
> > 
> >> I am happy with my custom version of psql so you can cancel/delete/ignore
> >> my patch.
> > 
> > We're not happy to lose a potential contributor, but of course that's
> > your prerogative.
> > 
> > Can we find somebody else willing to implement the agreed-upon behavior?
> 
> I'm currently reviewing two other patches in the commitfest (in my
> limited time).  When I'm done with those I would be happy to pick this
> up if no one else has.

Looks like the status of this should really be 'waiting on author' (even
if the current author isn't really interested in updating it, sounds
like Vik might be), so I've adjusted its status.

I'm also of the opinion that this would be good to have, though I don't
know that I'll have time to bring it over the line for v11.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: [PROPOSAL] Shared Ispell dictionaries

2018-01-13 Thread Tomas Vondra


On 01/13/2018 04:22 PM, Arthur Zakirov wrote:
> Hello,
> 
> Thank you Tomas for your review.
> 
> On Sat, Jan 13, 2018 at 03:25:55AM +0100, Tomas Vondra wrote:
>> allocate memory in the buildCxt. What about adding tmpstrdup to copy a
>> string into the context? I admit this is mostly nitpicking though.
> 
> ... snip ...>
>> 8) One more thing - I've noticed that the hash table uses this key:
>> ...
>> That is, full paths to the two files, and I'm not sure that's a very
>> good idea. Firstly, it's a bit wasteful (1kB per path). But more
>> importantly it means all dictionaries referencing the same files will
>> share the same chunk of shared memory - not only within a single
>> database, but across the whole cluster. That may lead to surprising
>> behavior, because e.g. unloading a dictionary in one database will
>> affect dictionaries in all other databases referencing the same files.
> 
> Hm, indeed. It's worth to use only file names instead full paths. And
> it is good idea to use more information besides file names. It can be
> Oid of a database and Oid of a namespace maybe, because a
> dictionary can be created in different schemas.
> 

I doubt using filenames (without the directory paths) solves anything,
really. The keys still have to be MAXPGPATH because someone could create
very long filename. But I don't think memory consumption is such a big
deal, really. With 1000 dictionaries it's still just ~2MB of data, which
is negligible compared to the amount of memory saved by sharing the
dictionaries.

Not sure if we really need to add the database/schema OIDs. I mentioned
the unexpected consequences (cross-db sharing) but maybe that's a
feature we should keep (it reduces memory usage). So perhaps this should
be another CREATE TEXT SEARCH DICTIONARY parameter, allowing sharing the
dictionary with other databases?

Aren't we overengineering this?

> I think your proposals may be implemented in several patches, so they can
> be applyed independently but consistently. I suppose I will prepare new
> version of the patch with fixes and with initial design of new functions
> and commands soon.
> 

Yes, splitting patches into smaller, more focused bits is a good idea.

BTW the current patch fails to document the dictionary sharing. It only
mentions it when describing the shared_dictionaries GUC. IMHO the right
place for additional details is

https://www.postgresql.org/docs/10/static/textsearch-dictionaries.html


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [PATCH] ECPG bug fix in preproc when indicator struct is shorter than record struct

2018-01-13 Thread Michael Meskes
> Do you want patches for the back ports as well?
> I noticed that between 9.6 (which is what we're using with this
> customer) and 10 the variable arrsiz was renamed to arrsize, so
> slight differences. Did not check earlier releases yet.

Na, don't worry, git cherry-pick and conflict resolution will do the
trick. But thanks for the heads-up.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org
Jabber: michael at xmpp dot meskes dot org
VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL



Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-01-13 Thread Peter Geoghegan
On Sat, Jan 13, 2018 at 4:32 AM, Amit Kapila  wrote:
> Yeah, but this would mean that now with parallel create index, it is
> possible that some tuples from the transaction would end up in index
> and others won't.

You mean some tuples from some past transaction that deleted a bunch
of tuples and committed, but not before someone acquired a still-held
snapshot that didn't see the deleter's transaction as committed yet?

I guess that that is different, but it doesn't matter. All that
matters is that in the end, the index contains all entries for all
heap tuples visible to any possible snapshot (though possibly
excluding some existing old snapshots iff we detect broken HOT chains
during builds).

> In general, this makes me slightly nervous mainly
> because such a case won't be possible without the parallel option for
> create index, but if you and Robert are okay with it as there is no
> fundamental problem, then we might as well leave it as it is or maybe
> add a comment saying so.

Let me try to explain this another way, in terms of the high-level
intuition that I have about it (Robert can probably skip this part).

GetOldestXmin() returns a value that is inherently a *conservative*
cut-off. In hot standby mode, it's possible for the value it returns
to go backwards from a value previously returned within the same
backend.

Even with serial builds, the exact instant that GetOldestXmin() gets
called could vary based on something like the OS scheduling of the
process that runs CREATE INDEX. It could have a different value based
only on that. It follows that it won't matter if parallel CREATE INDEX
participants have a slightly different value, because the cut-off is
all about the consistency of the index with what the universe of
possible snapshots could see in the heap, not the consistency of
different parts of the index with each other (the parts produced from
heap tuples read from each participant).

Look at how the pg_visibility module calls GetOldestXmin() to recheck
-- it has to call GetOldestXmin() a second time, with a buffer lock
held on a heap page throughout. It does this to conclusively establish
that the visibility map is corrupt (otherwise, it could just be that
the cut-off became stale).

Putting all of this together, it would be safe for the
HEAPTUPLE_RECENTLY_DEAD case within IndexBuildHeapRangeScan() to call
GetOldestXmin() again (a bit like pg_visibility does), to avoid having
to index an actually-fully-dead-by-now tuple (we could call
HeapTupleSatisfiesVacuum() a second time for the heap tuple, hoping to
get HEAPTUPLE_DEAD the second time around). This optimization wouldn't
work out a lot of the time (it would only work out when an old
snapshot went away during the CREATE INDEX), and would add
procarraylock traffic, so we don't do it. But AFAICT it's feasible.

> Another point is that the information about broken hot chains
> indexInfo->ii_BrokenHotChain is getting lost.  I think you need to
> coordinate this information among backends that participate in
> parallel create index.  Test to reproduce the problem is as below:
>
> create table tbrokenchain(c1 int, c2 varchar);
> insert into tbrokenchain values(3, 'aaa');
>
> begin;
> set force_parallel_mode=on;
> update tbrokenchain set c2 = 'bbb' where c1=3;
> create index idx_tbrokenchain on tbrokenchain(c1);
> commit;
>
> Now, check the value of indcheckxmin in pg_index, it should be true,
> but with patch it is false.  You can try with patch by not changing
> the value of force_parallel_mode;

Ugh, you're right. That's a real howler. Will fix.

Note that my stress-testing strategy has had a lot to do with
verifying that a serial build has relfiles that are physically
identical to parallel builds. Obviously that couldn't have caught
this, because this only concerns the state of the pg_index catalog.

> The patch uses both parallel_leader_participation and
> force_parallel_mode, but it seems the definition is different from
> what we have in Gather.  Basically, even with force_parallel_mode, the
> leader is participating in parallel build. I see there is some
> discussion above about both these parameters and still, there is not
> complete agreement on the best way forward.  I think we should have
> parallel_leader_participation as that can help in testing if nothing
> else.

I think that you're quite right that parallel_leader_participation
needs to be supported for testing purposes. I had some sympathy for
the idea that we should remove leader participation as a worker from
the patch entirely, but the testing argument seems to clinch it. I'm
fine with killing force_parallel_mode, though, because it will be
possible to force the use of parallelism by using the existing
parallel_workers table storage param in the next version of the patch,
regardless of how small the table is.

Thanks for the review.
-- 
Peter Geoghegan



Re: GSoC 2018

2018-01-13 Thread Alexander Korotkov
Hi!

On Fri, Jan 5, 2018 at 5:26 AM, Stephen Frost  wrote:

> * Alexander Korotkov (a.korot...@postgrespro.ru) wrote:
> > On Thu, Jan 4, 2018 at 11:36 PM, Stephen Frost 
> wrote:
> > > * Stephen Frost (sfr...@snowman.net) wrote:
> > > > The deadline for Mentoring organizations to apply is: January 23.
> > >
> > > We currently only have four (4) projects for 2018 listed on our
> > > projects page here:
> > >
> > > https://wiki.postgresql.org/index.php?title=GSoC_2018
> >
> > Could 2017 project ideas be reused this year?  IIRC, only 3 of project
> > ideas were used last year.
>
> Yes!  As I mentioned in my initial email, they simply need to be updated
> and we need to make sure that we have mentors for them.
>
> If you're willing to mentor for one or multiple, please review the
> description, remove the previous mentors and put yourself and then
> update the '2017' to be '2018' and we'll include it.  Also, feel free to
> contact the other former mentors if you believe the'll be interested in
> mentoring again this year.
>
> What I don't want to do is assume that people who volunteered to mentor
> last year are willing to do so again this year.
>

Great!  I've pull following projects for 2018:

 * GiST API advancement
 * TOAST'ing in slices
 * Table density estimation for approximate queries
 * Extract scanning strategy to the separate entity from GiST/GIN/SP-GiST
opclasses

Oleg Bartunov, Teodor Sigaev, Andrey Borodin and I confirmed volunteering
to mentor these projects this year.  The only thing to clarify: are you
volunteering to mentor TOAST'ing in slices this year?  If no, please
correct the wiki accordingly.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: master make check fails on Solaris 10

2018-01-13 Thread Tom Lane
Marina Polyakova  writes:
> On 12-01-2018 21:00, Tom Lane wrote:
>> Hm ... so apparently, that compiler has bugs in handling nondefault
>> alignment specs.  You said upthread it was gcc, but what version
>> exactly?

> This is 5.2.0:

Ugh ... protosciurus has 3.4.3, but I see that configure detects that
as *not* having __int128.  Probably what's happening on your machine
is that gcc knows __int128 but generates buggy code for it when an
alignment spec is given.  So that's unfortunate, but it's not really
a regression from 3.4.3.

I'm not sure there's much we can do about this.  Dropping the use
of the alignment spec isn't a workable option.  If there were a
simple way for configure to detect that the compiler generates bad
code for that, we could have it do so and reject use of __int128,
but it'd be up to you to come up with a workable test.

In the end this might just be an instance of the old saw about
avoiding dot-zero releases.  Have you tried a newer gcc?
(Digging in their bugzilla finds quite a number of __int128 bugs
fixed in 5.4.x, though none look to be specifically about
misaligned data.)

Also, if it still happens with current gcc on that hardware,
there'd be grounds for a new bug report to them.

regards, tom lane



Re: Minor fix for pgbench documentation

2018-01-13 Thread Fabien COELHO



Here is a patch that adds missing random_zipfian func to the paragraph
in pgbench documentation about random functions parameterization.


Indeed.

Patch applies cleanly, doc build ok. Marked as "ready".

I have added it to the next commitfest.

--
Fabien.



Re: [HACKERS] Replication status in logical replication

2018-01-13 Thread Simon Riggs
On 9 January 2018 at 04:36, Masahiko Sawada  wrote:

>> This patch appears to cause this DEBUG1 message
>>
>> "standby \"%s\" has now caught up with primary"
>>
>> which probably isn't the right message, but might be OK to backpatch.
>>
>> Thoughts on better wording?
>>
>
> I think that this DEBUG1 message appears when sent any WAL after
> caught up even without this patch. This patch makes this message
> appear at a properly timing. Or am I missing something?

We're not talking about standbys, so the message is incorrect.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()

2018-01-13 Thread Simon Riggs
On 12 January 2018 at 15:45, Tom Lane  wrote:

>> I have some reservations about whether this makes the mechanism less
>> reliable.
>
> Yeah, it scares me too.  The xl_prev field is our only way of detecting
> that we're looking at old WAL data when we cross a sector boundary.
> I have no faith that we can prevent old WAL data from reappearing in the
> file system across an OS crash, so I find Simon's assertion that we can
> dodge the problem through file manipulation to be simply unbelievable.

Not really sure what you mean by "file manipulation". Maybe the
proposal wasn't clear.

We need a way of detecting that we are looking at old WAL data. More
specifically, we need to know whether we are looking at a current file
or an older file. My main assertion here is that the detection only
needs to happen at file-level, not at record level, so it is OK to
lose some bits of information without changing our ability to protect
data - they were not being used productively.

Let's do the math to see if it is believable, or not.

The new two byte value is protected by CRC. The 2 byte value repeats
every 32768 WAL files. Any bit error in that value that made it appear
to be a current value would need to have a rare set of circumstances.

1. We would need to suffer a bit error that did not get caught by the CRC.

2. An old WAL record would need to occur right on the boundary of the
last WAL record.

3. The bit error would need to occur within the 2 byte value. WAL
records are usually fairly long, but so this has a Probability of
<1/16

4. The bit error would need to change an old value to the current
value of the new 2 byte field. If the current value is N, and the
previous value is M, then a single bit error that takes M -> N can
only happen if N-M is divisible by 2. The maximum probability of an
issue would occur when we reuse WAL every 3 files, so probability of
such a change would be 1/16. If the distance between M and N is not a
power of two then a single bit error cannot change M into N. So what
probability do we assign to the situation that M and N are exactly a
power of two apart?

So the probability of this occurring requires a single undetectable
bit error and would then happen less than 1 in 256 times, but arguably
much less. Notice that this probability is therefore at least 2 orders
of magnitude smaller than the chance that a single bit error occurs
and simply corrupts data, a mere rounding error in risk.

I don't find that unbelievable at all.

If you still do, then I would borrow Andres' idea of using the page
header. If we copy the new 2 byte value into the page header, we can
use that to match against in the case of error. XLogPageHeaderData can
be extended by 2 bytes without increasing its size when using 8 byte
alignment. The new 2 byte value is the same anywhere in the file, so
that works quickly and easily. And it doesn't increase the size of the
header.

So with that change it looks completely viable.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Transform for pl/perl

2018-01-13 Thread Andrew Dunstan


On 01/12/2018 03:47 AM, Anthony Bykov wrote:
>
> The problem was that different perl compilers uses different infinity
> representations. Some of them use "Inf" others - use "inf". So, in
> attachments there is a new version of the patch.
>


There's a bit of an impedance mismatch and inconsistency here. I think
we need to deal with json scalars (particularly numerics) the same way
we do for plain scalar arguments. We don't convert a numeric argument to
and SvNV. We just do this in plperl_call_perl_func():

    tmp = OutputFunctionCall(&(desc->arg_out_func[i]),
 fcinfo->arg[i]);
    sv = cstr2sv(tmp);
    pfree(tmp)
[...]

    PUSHs(sv_2mortal(sv));

Large numerics won't work as SvNV values, which have to fit in a
standard double. So I think we should treat them the same way we do for
plain scalar arguments.

(This also suggests that the tests are a bit deficient in not testing
jsonb with large numeric values.)

I'm going to set this back to waiting on author pending discussion.


cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] ECPG bug fix in preproc when indicator struct is shorter than record struct

2018-01-13 Thread Rader, David
Thank you!

On Sat, Jan 13, 2018 at 9:02 AM, Michael Meskes 
wrote:

> > Attached is a proposed patch to fix a bug in the ECPG preprocessor
> > that generates application code that core dumps at run-time. When the
> > input pgc code uses a record struct for returning query results and
> > uses an indicator struct that has fewer fields than the record
> > struct, the generated .c code will compile with no warning but core
> > dump. This situation comes up when a developer adds a field to an
> > existing query, adds the field to the record struct and forgets to
> > add the field to the indicator struct.
>
> Thanks for spotting and fixing, committed.
>
> > The attached sample files are a simple sample of pgc code that can be
> > used to see the difference in before and after generation and the
> > before and after generated code.
>
> Next time it would be nice if the test case was self-contained. Wasn't
> that difficult to figure out the table layout, though. :)


Got it - will add next time.


>
> > If accepted, this bug fix can be back ported to earlier versions of
> > ecpg as well.
>
> As usual this will be done after a couple of days, if no problems
> appear. I'm pretty sure there won't but sticking to my workflow here.
>

Do you want patches for the back ports as well?
I noticed that between 9.6 (which is what we're using with this customer)
and 10 the variable arrsiz was renamed to arrsize, so slight differences.
Did not check earlier releases yet.



> Michael
> --
> Michael Meskes
> Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
> Meskes at (Debian|Postgresql) dot Org
> Jabber: michael at xmpp dot meskes dot org
> VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL
>


Re: [PATCH] ECPG bug fix in preproc when indicator struct is shorter than record struct

2018-01-13 Thread Michael Meskes
> Attached is a proposed patch to fix a bug in the ECPG preprocessor
> that generates application code that core dumps at run-time. When the
> input pgc code uses a record struct for returning query results and
> uses an indicator struct that has fewer fields than the record
> struct, the generated .c code will compile with no warning but core
> dump. This situation comes up when a developer adds a field to an
> existing query, adds the field to the record struct and forgets to
> add the field to the indicator struct.

Thanks for spotting and fixing, committed.

> The attached sample files are a simple sample of pgc code that can be
> used to see the difference in before and after generation and the
> before and after generated code.

Next time it would be nice if the test case was self-contained. Wasn't
that difficult to figure out the table layout, though. :)

> If accepted, this bug fix can be back ported to earlier versions of
> ecpg as well.

As usual this will be done after a couple of days, if no problems
appear. I'm pretty sure there won't but sticking to my workflow here.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org
Jabber: michael at xmpp dot meskes dot org
VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL



Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-01-13 Thread Amit Kapila
On Sat, Jan 13, 2018 at 6:02 PM, Amit Kapila  wrote:
>
> The patch uses both parallel_leader_participation and
> force_parallel_mode, but it seems the definition is different from
> what we have in Gather.  Basically, even with force_parallel_mode, the
> leader is participating in parallel build. I see there is some
> discussion above about both these parameters and still, there is not
> complete agreement on the best way forward.  I think we should have
> parallel_leader_participation as that can help in testing if nothing
> else.
>

Or maybe just have force_parallel_mode.  I think one of these is
required to facilitate some form of testing of the parallel code
easily.  As you can see from my previous email, it was quite easy to
demonstrate a test with force_parallel_mode.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-01-13 Thread Amit Kapila
On Sat, Jan 13, 2018 at 1:25 AM, Peter Geoghegan  wrote:
> On Fri, Jan 12, 2018 at 6:14 AM, Robert Haas  wrote:
>> On Fri, Jan 12, 2018 at 8:19 AM, Amit Kapila  wrote:
>>> 1.
>>> + if (!IsBootstrapProcessingMode() && !indexInfo->ii_Concurrent)
>>>   {
>>> - snapshot = RegisterSnapshot(GetTransactionSnapshot());
>>> - OldestXmin = InvalidTransactionId; /* not used */
>>> + OldestXmin = GetOldestXmin(heapRelation, true);
>>>
>>> I think leader and workers should have the same idea of oldestXmin for
>>> the purpose of deciding the visibility of tuples.  I think this is
>>> ensured in all form of parallel query as we do share the snapshot,
>>> however, same doesn't seem to be true for Parallel Index builds.
>>
>> Hmm.  Does it break anything if they use different snapshots?  In the
>> case of a query that would be disastrous because then you might get
>> inconsistent results, but if the snapshot is only being used to
>> determine what is and is not dead then I'm not sure it makes much
>> difference ... unless the different snapshots will create confusion of
>> some other sort.
>
> I think that this is fine. GetOldestXmin() is only used when we have a
> ShareLock on the heap relation, and the snapshot is SnapshotAny. We're
> only talking about the difference between HEAPTUPLE_DEAD and
> HEAPTUPLE_RECENTLY_DEAD here. Indexing a heap tuple when that wasn't
> strictly necessary by the time you got to it is normal.
>

Yeah, but this would mean that now with parallel create index, it is
possible that some tuples from the transaction would end up in index
and others won't.   In general, this makes me slightly nervous mainly
because such a case won't be possible without the parallel option for
create index, but if you and Robert are okay with it as there is no
fundamental problem, then we might as well leave it as it is or maybe
add a comment saying so.


Another point is that the information about broken hot chains
indexInfo->ii_BrokenHotChain is getting lost.  I think you need to
coordinate this information among backends that participate in
parallel create index.  Test to reproduce the problem is as below:

create table tbrokenchain(c1 int, c2 varchar);
insert into tbrokenchain values(3, 'aaa');

begin;
set force_parallel_mode=on;
update tbrokenchain set c2 = 'bbb' where c1=3;
create index idx_tbrokenchain on tbrokenchain(c1);
commit;

Now, check the value of indcheckxmin in pg_index, it should be true,
but with patch it is false.  You can try with patch by not changing
the value of force_parallel_mode;

The patch uses both parallel_leader_participation and
force_parallel_mode, but it seems the definition is different from
what we have in Gather.  Basically, even with force_parallel_mode, the
leader is participating in parallel build. I see there is some
discussion above about both these parameters and still, there is not
complete agreement on the best way forward.  I think we should have
parallel_leader_participation as that can help in testing if nothing
else.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WIP: a way forward on bootstrap data

2018-01-13 Thread John Naylor
On 1/13/18, Tom Lane  wrote:

> According to my understanding, part of what's going on here is that
> we're going to teach genbki.pl to parse these object references and
> convert them to hard-coded OIDs in the emitted BKI file.  That seems
> good to me, but one thing we're going to need is a spec for how
> genbki.pl knows what to do.

I don't know if it qualifies as a spec, but here's my implementation:

Use dummy type aliases in the header files: regam, regoper, regopf, and regtype
These are #defined away in genbki.h:

+/* 
+ * Some columns of type Oid have human-readable entries that are
+ * resolved when creating postgres.bki.
+ * 
+ */
+#define regam Oid
+#define regoper Oid
+#define regopf Oid
+#define regtype Oid

Likewise, in genbki.pl (and I just noticed a typo, s/names/types/):

+# We use OID aliases to indicate when to do OID lookups, so column names
+# have to be turned back into 'oid' before writing the CREATE command.
+my %RENAME_REGOID = (
+   regam => 'oid',
+   regoper => 'oid',
+   regopf => 'oid',
+   regtype => 'oid');
+

When genbki.pl sees one of these type aliases, it consults the
appropriate lookup table, exactly how we do now for regproc. One
possibly dubious design point is that I declined to teach the
pg_attribute logic about this, so doing lookups in tables with schema
macros has to be done explicitly. There is only one case of this right
now, and I noted the tradeoff:

+   # prorettype
+   # Note: We could handle this automatically by 
using the
+   # 'regtype' alias, but then we would have to 
teach
+   # morph_row_for_pgattr() to change the 
attribute type back to
+   # oid. Since we have to treat pg_proc 
differently anyway,
+   # just do the type lookup manually here.
+   my $rettypeoid = $regtypeoids{ 
$bki_values{prorettype}};
+   $bki_values{prorettype} = $rettypeoid
+ if defined($rettypeoid);

This is all in patch 0011.

-John Naylor



Re: General purpose hashing func in pgbench

2018-01-13 Thread Fabien COELHO


Hello Ildar,


so that different instances of hash function within one script would
have different seeds. Yes, that is a good idea, I can do that.


Added this feature in attached patch. But on a second thought this could
be something that user won't expect. For example, they may want to run
pgbench with two scripts:
- the first one updates row by key that is a hashed random_zipfian value;
- the second one reads row by key generated the same way
(that is actually what YCSB workloads A and B do)

It feels natural to write something like this:
\set rnd random_zipfian(0, 100, 0.99)
\set key abs(hash(:rnd)) % 1000
in both scripts and expect that they both would have the same
distribution. But they wouldn't. We could of course describe this
implicit behaviour in documentation, but ISTM that shared seed would be
more clear.


I think that it depends on the use case, that both can be useful, so there 
should be a way to do either.


With "always different" default seed, distinct distributions are achieved
with:

   -- DIFF distinct seeds inside and between runs
   \set i1 abs(hash(:r1)) % 1000
   \set j1 abs(hash(:r2)) % 1000

and the same distribution can be done with an explicit seed:

   -- DIFF same seed inside and between runs
   \set i1 abs(hash(:r1), 5432) % 1000
   \set j1 abs(hash(:r2), 5432) % 1000

The drawback is that the same seed is used between runs in this case, 
which is not desirable. This could be circumvented by adding the random 
seed as an automatic variable and using it, eg:


   -- DIFF same seed inside run, distinct between runs
   \set i1 abs(hash(:r1), :random_seed + 5432) % 1000
   \set j1 abs(hash(:r2), :random_seed + 2345) % 1000


Now with a shared hash_seed the same distribution is by default:

   -- SHARED same underlying hash_seed inside run, distinct between runs
   \set i1 abs(hash(:r1)) % 1000
   \set j1 abs(hash(:r2)) % 1000

However some trick is needed now to get distinct seeds. With

   -- SHARED distinct seed inside run, but same between runs
   \set i1 abs(hash(:r1, 5432)) % 1000
   \set j1 abs(hash(:r2, 2345)) % 1000

We are back to the same issue has the previous case because then the 
distribution is the same from one run to the next, which is not desirable. 
I found this workaround trick:


   -- SHARED distinct seeds inside and between runs
   \set i1 abs(hash(:r1, hash(5432))) % 1000
   \set j1 abs(hash(:r2, hash(2345))) % 1000

Or with a new :hash_seed or :random_seed automatic variable, we could also 
have:


   -- SHARED distinct seeds inside and between runs
   \set i1 abs(hash(:r1, :hash_seed + 5432)) % 1000
   \set j1 abs(hash(:r2, :hash_seed + 2345)) % 1000

It provides controllable distinct seeds between runs but equal one between 
if desired, by reusing the same value to be hashed as a seed.


I also agree with your argument that the user may reasonably expect that 
hash(5432) == hash(5432) inside and between scripts, at least on the same 
run, so would be surprised that it is not the case.


So I've changed my mind, I'm sorry for making you going back and forth on 
the subject. I'm now okay with one shared 64 bit hash seed, with a clear 
documentation about the fact, and an outline of the trick to achieve 
distinct distributions inside a run if desired and why it would be 
desirable to avoid correlations. Also, I think that providing the seed as 
automatic variable (:hash_seed or :hseed or whatever) would make some 
sense as well. Maybe this could be used as a way to fix the seed 
explicitely, eg:


   pgbench -D hash_seed=1234 ...

Would use this value instead of the random generated one. Also, with that 
the default inserted second argument could be simply ":hash_seed", which 
would simplify the executor which would not have to do check for an 
optional second argument.


--
Fabien.