date:20160310

Re: [HACKERS] checkpointer continuous flushing - V18

2016-03-10 Thread Peter Geoghegan

On Thu, Mar 10, 2016 at 11:18 PM, Fabien COELHO  wrote:
> I can only concur!
>
> The "Performance Tips" chapter (II.14) is more user/query oriented. The
> "Server Administration" bool (III) does not discuss this much.

That's definitely one area in which the docs are lacking -- I've heard
several complaints about this myself. I think we've been hesitant to
do more in part because the docs must always be categorically correct,
and must not use weasel words. I think it's hard to talk about
performance while maintaining the general tone of the documentation. I
don't know what can be done about that.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing

2016-03-10 Thread Fabien COELHO




I just pushed the two major remaining patches in this thread.


Hurray! Nine months the this baby out:-)

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing - V18

2016-03-10 Thread Fabien COELHO




As you wish. I thought that understanding the underlying performance model
with sequential writes written in chunks is important for the admin, and as
this guc would have an impact on performance it should be hinted about,
including the limits of its effect where large bases will converge to random
io performance. But maybe that is not the right place.


I do agree that that's something interesting to document somewhere. But
I don't think any of the current places in the documentation are a good
fit, and it's a topic much more general than the feature we're debating
here.  I'm not volunteering, but a good discussion of storage and the
interactions with postgres surely would be a significant improvement to
the postgres docs.


I can only concur!

The "Performance Tips" chapter (II.14) is more user/query oriented. The 
"Server Administration" bool (III) does not discuss this much.


There is a wiki about performance tuning, but it is not integrated into 
the documentation. It could be a first documentation source.


Also the README in some development directories are very interesting, 
although they contains too much details about the implementation.


There has been a lot of presentations over the years, and blog posts.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-03-10 Thread Amit Kapila

On Fri, Mar 11, 2016 at 12:28 AM, Robert Haas  wrote:
>
>
> Committed with some further editing.  In particular, the way you
> determined whether we could safely access the tranche information for
> any given ID was wrong; please check over what I did and make sure
> that isn't also wrong.
>

There are few typos which I have tried to fix with the attached patch.  Can
you tell me what was wrong with the way it was done in patch?

@@ -4541,9 +4542,10 @@ AbortSubTransaction(void)
  */
  LWLockReleaseAll();

+ pgstat_report_wait_end();
+ pgstat_progress_end_command();
  AbortBufferIO();
  UnlockBuffers();
- pgstat_progress_end_command();

  /* Reset WAL record construction state */
  XLogResetInsertion();
@@ -4653,6 +4655,9 @@ AbortSubTransaction(void)
  */
  XactReadOnly = s->prevXactReadOnly;

+ /* Report wait end here, when there is no further possibility of wait */
+ pgstat_report_wait_end();
+
  RESUME_INTERRUPTS();
 }

AbortSubTransaction() does call pgstat_report_wait_end() twice, is this
intentional? I have kept it in the end because there is a chance that in
between API's can again set the state to wait and also by that time we have
not released buffer pins and heavyweight locks, so not sure if it makes
sense to report wait end at that stage.  I have noticed that in
WaitOnLock(), on error the wait end is set, but now again thinking on it,
it seems it will be better to set it in
AbortTransaction/AbortSubTransaction at end.  What do you think?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

fix_typo_lwlock_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Detecting SSI conflicts before reporting constraint violations

2016-03-10 Thread Thomas Munro

On Fri, Mar 11, 2016 at 6:31 PM, Thomas Munro
 wrote:
> I'm not sure what to make of the pre-existing comment about following
> HOT-chains and concurrent index builds (which I moved).  Does it mean
> there is some way that CREATE INDEX CONCURRENTLY could cause us to
> consider the wrong tuple and miss an SSI conflict?

No, because the check is done entirely on the basis of the the index
page.  The question may be arise if we discover that we also need a
conflict-out check here though, because it would be based on the tuple
that has been found by heap_hot_search_buffer.

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-10 Thread Amit Langote

On 2016/03/11 13:16, Robert Haas wrote:
> On Thu, Mar 10, 2016 at 9:04 PM, Amit Langote
>  wrote:
>> So, from what I understand here, we should not put total count of index
>> pages into st_progress_param; rather, have the client (reading
>> pg_stat_progress_vacuum) derive it using pg_indexes_size() (?), as and
>> when necessary.  However, only server is able to tell the current position
>> within an index vacuuming round (or how many pages into a given index
>> vacuuming round), so report that using some not-yet-existent mechanism.
> 
> Isn't that mechanism what you are trying to create in 0003?

Right, 0003 should hopefully become that mechanism.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Detecting SSI conflicts before reporting constraint violations

2016-03-10 Thread Thomas Munro

On Fri, Mar 11, 2016 at 10:00 AM, Simon Riggs  wrote:
> On 10 March 2016 at 20:36, Thomas Munro 
> wrote:
>>
>> On Fri, Mar 11, 2016 at 8:50 AM, Simon Riggs 
>> wrote:
>> > On 3 February 2016 at 23:12, Thomas Munro
>> > 
>> > wrote:
>> >
>> >>
>> >>  It quacks suspiciously like a bug.
>> >
>> >
>> > Agreed
>> >
>> > What's more important is that is very publicly a bug in the eyes of
>> > others
>> > and should be fixed and backpatched soon.
>> >
>> > We have a maintenance release coming in a couple of weeks and I'd like
>> > to
>> > see this in there.
>>
>> As I understand it, the approach I've taken here can't be backpatched
>> because it changes the aminsert_function interface (it needs the
>> current snapshot when inserting), so I was proposing this as an
>> improvement for 9.6.  I guess there are other way to get the right
>> snapshot into btinsert (and thence _bt_check_unique), but I didn't
>> think it would be very classy to introduce a 'current snapshot' global
>> variable to smuggle it in.
>
>
> But this is a Serializable transaction, so it only has one snapshot...
>
> This is where adding comments on patch theory would help.

Here's a much simpler version with more comments, and an explanation
below.  It handles the same set of isolation test specs.

I dropped the calls to PredicateLockPage and
CheckForSerializableConflictOut, which means I don't even need a
snapshot (and if I ever do, you're right, doh, I should just use
GetTransactionSnapshot()).  They were part of my (so far unsuccessful)
attempt to detect a conflict for read-write-unique-4.spec permutation
"r2 w1 w2 c1 c2", where one transaction only writes.   That leaves
only a "hypothetical" CheckForSerializableConflictIn check (see
below).

This version seems to handle the obvious overlapping read-then-write
scenarios I've contrived and seen in bug reports.  I need to do more
study of the SSI code and theory, and see if there are other
conflicting schedules that are missed but could be detected at this
point.  (Possibly including that "r2 w1 w2 c1 c2" case.)

Explanation:

In unpatched master, when _bt_check_unique discovers that a key
already exists in the index, it checks if the heap tuple is live by
calling heap_hot_search, which in turn calls heap_hot_search_buffer,
and then throws away the buffer and returns true or false.  If the
result is true, a unique constraint violation is raised.

This patch introduces a drop-in replacement
check_unique_tuple_still_live to call instead of heap_hot_search.  The
replacement function also calls heap_hot_search_buffer, but while it
has the buffer it takes the opportunity to do an SSI conflict-in
check.  At that point we know that the transaction is already doomed
to ereport, it's just a question of figuring out whether it should
ereport 40001 first.

The new conflict-in check tells the SSI machinery that we are going to
insert at this index page, even though we aren't.  It mirrors the one
that happens right before _bt_findinsertloc and _bt_insertonpg, which
would be reached if this were a non-unique index.  It gives the SSI
machinery a chance to detect conflicts that would be created by
writing to this page.  If it doesn't detect a conflict, behaviour is
unchanged and the usual unique constraint violation will be raised.

I'm not sure what to make of the pre-existing comment about following
HOT-chains and concurrent index builds (which I moved).  Does it mean
there is some way that CREATE INDEX CONCURRENTLY could cause us to
consider the wrong tuple and miss an SSI conflict?

(Tangentially, I believe the nearby ON CONFLICT code needs some SSI
checks and I may investigate that some time if someone doesn't beat me
to it, but not as part of this patch.)

-- 
Thomas Munro
http://www.enterprisedb.com

ssi-read-write-unique-v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: RETURNING primary_key()

2016-03-10 Thread Joshua D. Drake


On 03/10/2016 08:28 PM, Craig Ringer wrote:

On 11 March 2016 at 03:07, Igal @ Lucee.org > wrote:


I noticed that you usually don't put html in the emails here, but I
think that it's appropriate here to show the information in a clear
way (also, according to my computer it's 2016).


Pretty sure we have at least one person here using mailreader software
that's old enough to vote in most countries, but I tend to share the
sentiment. At least when there's actually a functional reason like this :)


That person needs to suck it up. Email is no longer just fixed width 
text and hasn't been in a decade.


JD

--
Command Prompt, Inc.  http://the.postgres.company/
 +1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Upper planner pathification

2016-03-10 Thread Tom Lane

Andres Freund  writes:
> On 2016-03-10 15:03:41 -0500, Tom Lane wrote:
>> What it encourages is having module boundaries that actually mean
>> something, as well as code that can be refactored without having
>> to worry about which extensions will complain about it.

> I personally think it's entirely fine to break extensions if it's adding
> or removing a few parameters or somesuch. That's easy enough fixed.

I don't want to promise that the *behavior* of those functions remains
stable.  As an example, none of them any longer do any internal cost
calculations, which is a change that doesn't directly show in their
argument lists but will break extensions just as surely (and more
silently) as an argument-list change would do.  And no, that change
is NOT going to get undone.

> Would you rather add back the exports or should I?

I'll do it ... just send me the list.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: RETURNING primary_key()

2016-03-10 Thread Craig Ringer

On 11 March 2016 at 03:07, Igal @ Lucee.org  wrote:

>
> I noticed that you usually don't put html in the emails here, but I think
> that it's appropriate here to show the information in a clear way (also,
> according to my computer it's 2016).
>

Pretty sure we have at least one person here using mailreader software
that's old enough to vote in most countries, but I tend to share the
sentiment. At least when there's actually a functional reason like this :)

Thanks so much for doing this testing.

>   I hope that it will be rendered properly:
>
>
> *MySQL* *DB2* *SQL Server (MS)* *SQL Server (jTDS)* *Oracle*
> *Returned Type* SET SET ROW ROW ROW
> *Column Name* GENERATED_KEY [name of identity col] GENERATED_KEYS ID ROWID
> *Column Type* Unknown (numeric) integer numeric numeric ROWID
> *Value* Each inserted value to identity column Each inserted value to
> identity column Last inserted value to identity column Last inserted
> value to identity column internal address location that does not change
> on UPDATE
> *Example* (1), (2) (1), (2) (2) (2) AAAE5nAABAAALCxAAM
> Some notes and observations:
>
> It's the Wild West!  Each implementation does something completely
> different.
>

I honestly didn't expect that. I knew Oracle returned ROWID, but I have to
admit I thought the others would probably just return the key column(s).

When you supply the column type, does that (with the exception of Oracle)
match the column type of the generated key col?

Did you try GENERATED ALWAYS cols (where supported), UNIQUE columns with
DEFAULTs, composite columns, etc?  Part of the question for Pg is what
exactly we should and should not be returning.

> (Side note:  This was my first, and hopefully my last, experience with
> Oracle database, and it's been a real PITA.  If I had tried it out some 20
> years ago then the experience would have probably led me to sell the stock
> short, which would have probably ended with my bankruptcy.  Go figure...)
>

I rather less than fondly recall my own attempts to get Oracle Express
installed and running for some test or another a while ago. Amazing that it
can be that fiddly. MS-SQL on the other hand "just worked" and dropped me
into the most gorgeously wonderful admin tool and SQL editor ever.

I wonder if any of these drivers have extension options and compat flags
that you have to turn on to get better behaviour like returning a set? Or
if they're just that limited?

Anyway, from the sounds of this we have a fair bit of freedom to define
what we want at both the Pg and driver level so long as we satisfy the
basic constraint that we should return a set of generated keys in the case
where a statement does an insert that adds rows to a table with a SERIAL
(or an owned SEQUENCE). Seems like we could do pretty much whatever we want
for multiple-generated-columns cases etc.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] [COMMITTERS] pgsql: Provide much better wait information in pg_stat_activity.

2016-03-10 Thread Pavel Stehule

>
> Yes, I think we use this rubric quite often, and I agree it's a good one.
>
> > Trying to e.g. select a different number of columns into a different
> > number of variables in a PL/pgSQL function doesn't throw an error.
> > Bad. :(
>
> Yeah, I'm sympathetic to that request.  That seems like poor error
> checking and nothing else.
>
> (But note that I do not rule here.)
>

I am not sure, but maybe this issue is covered by plpgsql_check. But not
possible to check it when dynamic SQL is used.

Pavel


>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Re: [HACKERS] WIP: Upper planner pathification

2016-03-10 Thread Andres Freund

Hi,

On 2016-03-10 15:03:41 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > Primarily because create_plan(), and/or its children, have to know about
> > what you're doing; you can hide some, but not all, things below
> > CustomScan nodes.
> 
> And which of those things does e.g. setrefs.c not need to know about?

For CustomScans? You can mostly get by with ->custom_exprs.

> > ISTM, that there's good enough reasons to go either way; I don't see
> > what we're gaining by making these private. That just encourages
> > copy-paste coding.
> 
> What it encourages is having module boundaries that actually mean
> something, as well as code that can be refactored without having
> to worry about which extensions will complain about it.

I personally think it's entirely fine to break extensions if it's adding
or removing a few parameters or somesuch. That's easy enough fixed.

> I will yield on this point because it's not worth my time to argue about
> it, but I continue to say that it's a bad decision you will regret.

FWIW, I do agree that it'd be much nicer to use the new API; the biggest
user in Citus should be able to work with that.  But it's not that easy
to do that and still support postgres 9.4/9.5.

> Which functions did you need exactly?  I'm not exporting more than
> I have to.

I'll try to do the port tomorrow; to make sure I have the definitive
list. Afaics it's just make_seqscan, make_sort_from_sortclauses,
make_limit, make_agg.  I'd not however be surprised if other extensions
also use some of these.

Would you rather add back the exports or should I?

Greetings,

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix for OpenSSL error queue bug

2016-03-10 Thread Peter Geoghegan

On Thu, Mar 10, 2016 at 7:22 PM, Peter Eisentraut  wrote:
> Arguably, if everyone followed "my" approach, this should be very easy
> to fix everywhere.

I don't think that there is any clear indication that the OpenSSL
people would share that view. Or my view. Or anything that's sensible
or practical or actionable.

> Instead, reading through the PHP bug report, they
> are coming up with a fairly complex solution for clearing the error
> queue afterwards so as to not leave "landmines" for the next guy.  But
> their code will (AFAICT) still be wrong because they are not clearing
> the error *before* the API calls where it is required per documentation.
>  So "everyone" (sample of 2) is scrambling to clean up for the next guy
> instead of doing the straightforward fix of following the API
> documentation and cleaning up before their own calls.

It will be less wrong, though.

PostgreSQL is the project that doesn't trust a C90 standard library
function to not blithely write passed the bounds of a passed buffer,
all because of a bug in certain versions of Solaris based systems that
was several years old at the time. See commit be8b06c364. My view that
that wasn't really worth worrying about was clearly the minority view
when this was discussed (a minority of 1, and a majority of 4 or 5,
IIRC). I think that this case vastly exceeds that standard for
worrying about other people's broken code; in this case, we ourselves
made the same mistake for years and years.

> I also see the clean-up-afterwards approach in the Python ssl module.  I
> fear there is de facto a second API specification that requires you to
> clean up errors after yourself and gives an implicit guarantee that the
> error queue is empty whenever you want to make any SSL calls.  I don't
> think this actually works in all cases, but maybe if everyone else is
> convinced of that (in plain violation of the published OpenSSL
> documentation, AFAICT) we need to get on board with that for
> interoperability.

I didn't know that Python's ssl module did that. That seems to lend
support to my view, which is that we should similarly clear the
thread's queue lest anyone else be affected. Yes, this approach is
fairly scatter-gun, but frankly that's just the situation we find
ourselves in.

>>> Also, there is nothing that
>>> says that an error produces exactly one entry in the error queue; it
>>> could be multiple.  Or that errors couldn't arise at random times
>>> between the reset and whatever happens next.
>>
>> I think that it's kind of implied, since calling ERR_get_error() pops
>> the stack. But even if that isn't so, it might be worth preventing bad
>> things from happening to client applications only sometimes.
>
> The lore on the internet suggests that multiple errors could definitely
> happen.  So popping one error afterwards isn't going to fix it, it just
> moves the edge case around.

Are you sure, specifically, that an I/O function is known to add more
than one error to the per-thread queue? Obviously there can be more
than one error in the queue. But I haven't seen any specific
indication, either in the docs or in the lore, that more than one
error can be added by a single call to an I/O function such as
SSL_read(). Perhaps you can share where you encountered the lore.

>> doesn't it give you pause? Heikki seemed to think that clearing our
>> own queue was important when he looked at this a year ago:
>>
>> http://www.postgresql.org/message-id/54edd30d.5050...@vmware.com
>
> I think that message suggests that we should clear the queue before each
> call, not after.

Uh, it very clearly *is* Heikki's view that we should clear the queue
*afterwards*. Certainly, I think Heikki also wanted us to clear the
queue before, so we aren't screwed, "just to be sure", as he puts it
-- but nobody disputes that that's necessary anyway. That it might not
be *sufficient* to just call ERR_get_error() is the new information in
the bug report. Heikki said:

"""

The OpenSSL manual doesn't directly require you to call
ERR_clear_error() before every SSL_* call. It just requires that you
ensure that the error queue is empty. Libpq ensures that by always
clearing the queue *after* an error happens, in SSLerrmessage().

"""

The problem with this, as Heikki goes on to say, it that we might not
get to that point in SSLerrmessage(); we may not be able to pop the
queue/call ERR_get_error(), more or less by accident (e.g. I noticed
an OOM could do that). That's why I proposed to fix that by calling
ERR_get_error() early and unambiguously. If we must rely on that
happening, it should not be from such a long distance (i.e. from
within SSLerrmessage(), which is kind of far removed from the original
I/O function calls).

Thanks
--
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Provide much better wait information in pg_stat_activity.

2016-03-10 Thread Pavel Stehule

2016-03-11 0:17 GMT+01:00 Tom Lane :

> Robert Haas  writes:
> > Or ... maybe this is intentional behavior?  Now that I think about it,
> > doesn't each backend cache this info the first time its transaction
> > reads the data?
>
> Your view of pg_stat_activity is supposed to hold still within a
> transaction, yes.  Otherwise it'd be really painful to do any complicated
> joins.  I think there may be a function to explicitly flush the cache,
> if you really need to see intratransaction changes.
>

I understand.

This behave has impact on PL functions that try to repeated check of
pg_stat_activity. But this use case is not frequent.

Thank you.

Regards

Pavel



> regards, tom lane
>

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-10 Thread Robert Haas

On Thu, Mar 10, 2016 at 9:04 PM, Amit Langote
 wrote:
> So, from what I understand here, we should not put total count of index
> pages into st_progress_param; rather, have the client (reading
> pg_stat_progress_vacuum) derive it using pg_indexes_size() (?), as and
> when necessary.  However, only server is able to tell the current position
> within an index vacuuming round (or how many pages into a given index
> vacuuming round), so report that using some not-yet-existent mechanism.

Isn't that mechanism what you are trying to create in 0003?  But
otherwise, yes, you've accurate summarized what I think we should do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Provide much better wait information in pg_stat_activity.

2016-03-10 Thread Robert Haas

On Thu, Mar 10, 2016 at 10:49 PM, Joel Jacobson  wrote:
> On Fri, Mar 11, 2016 at 9:36 AM, Robert Haas  wrote:
>> Well, this was discussed.  If we keep the Boolean "waiting" column, then 
>> either:
>
> Oh, sorry for missing out on that discussion.
>
>> 1. We make it true only for heavyweight lock waits, and false for
>> other kinds of waits.  That's pretty strange.
>> 2. We make it true for all kinds of waits that we now know how to
>> report.  That still breaks compatibility.
>
> Why not 3: We make it true for exactly the same type of situations as
> in previous versions. Or is it not possible to do so for some reason?

3 = 1.

> Off topic, but related to the backward-compatibility subject:
>
> Is there any written policy/wiki/thread/document on the topic "When
> breaking backward-compatibility is acceptable"?

Not to my knowledge.  We end up hashing it out on a case-by-case basis.

> It would be helpful to get a better understand of this, as some ideas
> on how to improve things can quickly be ruled out or ruled in
> depending on what is acceptable or not.
> For instance, there are some minor but annoying flaws in PL/pgSQL that
> I would love to get fixed,
> but the main arguments against doing so have been that it might break
> some users' code somewhere,
> even though doing so would probably be a good thing as the user could
> have a bug in the code.
> See: https://wiki.postgresql.org/wiki/Improving_PL/PgSQL_(September_2014)

I think that with respect to this particular set of improvements, the
problem is basically that there are just a lot of things that you
could hypothetically change, and it's not altogether clear which ones
of those individually would please more people than they displeased,
and it's not clear how much change we want to allow in total for the
sake of preserving backward compatibility, and then, too, the designs
for a lot of the individual features are fertile ground for
bikeshedding.  I'm not direly opposed to most of what's on that page,
but I'm not excited about most of it, either.  I bet if we canvassed
10 different companies that made heavy use of PL/pgsql they'd all have
a list of proposed changes like that, and I bet some of them would
conflict with each other, and I bet if we did all that stuff the
average PL/pgsql user's life would not be much better, but the manual
would be much longer.

(Also, I bet the variable assignments thing would break large amounts
of code that is working as designed.)

> I think one general rule should be "Breaking backward-compatibility is
> acceptable if the new major pg-version throws an error in a situation
> where the old major pg-version would conceal a bug or allow misuse of
> a feature".
> Trying to select the now removed "waiting" column throws an error.
> Good! That lead me as a user here to figure out why I can't and
> shouldn't use it. :)

Yes, I think we use this rubric quite often, and I agree it's a good one.

> Trying to e.g. select a different number of columns into a different
> number of variables in a PL/pgSQL function doesn't throw an error.
> Bad. :(

Yeah, I'm sympathetic to that request.  That seems like poor error
checking and nothing else.

(But note that I do not rule here.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-10 Thread Dilip Kumar

On Thu, Mar 10, 2016 at 8:26 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> I don't think we can rely on median that much if we have only 3 runs.
> For 3 runs we can only apply Kornfeld method which claims that confidence
> interval should be between lower and upper values.
> Since confidence intervals for master and patched versions are overlapping
> we can't conclude that expected TPS numbers are different.
> Dilip, could you do more runs? 10, for example. Using such statistics we
> would be able to conclude something.
>

Here is the reading for 10 runs


Median Result
---

Client   Base  Patch
---
1  1987319739
2  3865838276
4  6881262075

Full Results of 10 runs...

 Base
-
 Runs  1 Client2 Client  4 Client
-
1194423486649023
2196053513951695
3197263710453899
4198353843955708
5198663863867473
6198803867970152
7199733872070920
8200483873771342
9200573881571403
10  203444142377953
-


Patch
---
Runs  1 Client 2 Client  4 Client
--
1188813016154928
2194153203159741
3195643502261060
4196273671261839
5196703765962011
6198083889462139
7198573908162983
8199103992375358
9201693993777481
10  201814000378462
--


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] [COMMITTERS] pgsql: Provide much better wait information in pg_stat_activity.

2016-03-10 Thread Amit Kapila

On Fri, Mar 11, 2016 at 9:19 AM, Joel Jacobson  wrote:
>
> On Fri, Mar 11, 2016 at 9:36 AM, Robert Haas 
wrote:
> > Well, this was discussed.  If we keep the Boolean "waiting" column,
then either:
>
> Oh, sorry for missing out on that discussion.
>
> > 1. We make it true only for heavyweight lock waits, and false for
> > other kinds of waits.  That's pretty strange.
> > 2. We make it true for all kinds of waits that we now know how to
> > report.  That still breaks compatibility.
>
> Why not 3: We make it true for exactly the same type of situations as
> in previous versions. Or is it not possible to do so for some reason?
>

Thats exactly the first point (1) of Robert.  One thing that will be
strange according to me is that in some cases where waiting will be false,
but still wait_event and wait_event_type contain some wait information and
I think that will look odd to anybody new looking at the view.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] GinPageIs* don't actually return a boolean

2016-03-10 Thread Andres Freund

On 2016-03-11 04:50:45 +0100, Michael Paquier wrote:
> On Thu, Mar 10, 2016 at 11:40 PM, Robert Haas  wrote:
> > We need to decide what to do about this.  I disagree with Peter: I
> > think that regardless of stdbool, what we've got right now is sloppy
> > coding - bad style if nothing else.  Furthermore, I think that while C
> > lets you use any non-zero value to represent true, our bool type is
> > supposed to contain only one of those two values.  Therefore, I think
> > we should commit the full patch, back-patch it as far as somebody has
> > the energy for, and move on.  But regardless, this patch can't keep
> > sitting in the CommitFest - we either have to take it or reject it,
> > and soon.

I plan to commit something like this, unless there's very loud protest
from Peter's side.


> +1, I would suggest to move ahead, !! is not really Postgres-like anyway.

The !! bit is a minor sideshow to this, afaics. It just came up when
discussing the specifics of the fixed macros and some people expressed a
clear preference for not using !!, so I fixed the occurrances I
introduced.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GinPageIs* don't actually return a boolean

2016-03-10 Thread Andres Freund

On 2016-03-02 21:48:16 -0500, Peter Eisentraut wrote:
> After reviewing this thread and relevant internet lore, I think this
> might be the wrong way to address this problem.  It is in general not
> guaranteed in C that a Boolean-sounding function or macro returns 0 or
> 1.  Prime examples are ctype.h functions such as isupper().  This is
> normally not a problem because built-in conditionals such as if, &&, ||
> handle this just fine.  So code like
> 
> -   Assert(!create || !!txn);
> +   Assert(!create || txn != NULL);
> 
> is arguably silly either way.  There is no risk in writing just
> 
> Assert(!create || txn);

Yes, obviously. I doubt that anyone misunderstood that.  I personally
find the !! easier to read when contrasting to a negated value (as in
the above assert).


> The problem only happens if you compare two "Boolean" values directly
> with each other;

That's not correct. It also happen if you compare an expression with a
stored version, i.e. only one side is a 'proper bool'.


> A quick look through the code based on the provided patch shows that
> approximately the only place affected by this is
> 
> if (isLeaf != GinPageIsLeaf(page) || isData != GinPageIsData(page))
> elog(ERROR, "right sibling of GIN page is of different type");
> 
> and that's not actually a problem because isLeaf and isData are earlier
> populated by the same macros.  It would still be worth fixing, but a
> localized fix seems better.

That's maybe the only place where we compare stored booleans to
expressions, but it's definitely not the only place where some
expression is stored in a boolean variable. And in all those cases
there's an int32 (or whatever type the expression has) to bool (usually
1byte char).  That's definitely problematic.


> But the lore on the internet casts some doubt on that: There is no
> guarantee that bool is 1 byte, that bool can be passed around like char,
> or even that bool arrays are laid out like char arrays.  Maybe this all
> works out okay, just like it has worked out so far that int is 4 bytes,
> but we don't know enough about it.  We could probably add some configure
> tests around that.

So?


> My proposal on this particular patch is to do nothing.  The stdbool
> issues should be looked into, for the sake of Windows and other
> future-proofness.  But that will likely be an entirely different patch.

That seems to entirely miss the part of this discussion dealing with
high bits set and such?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GinPageIs* don't actually return a boolean

2016-03-10 Thread Michael Paquier

On Thu, Mar 10, 2016 at 11:40 PM, Robert Haas  wrote:
> We need to decide what to do about this.  I disagree with Peter: I
> think that regardless of stdbool, what we've got right now is sloppy
> coding - bad style if nothing else.  Furthermore, I think that while C
> lets you use any non-zero value to represent true, our bool type is
> supposed to contain only one of those two values.  Therefore, I think
> we should commit the full patch, back-patch it as far as somebody has
> the energy for, and move on.  But regardless, this patch can't keep
> sitting in the CommitFest - we either have to take it or reject it,
> and soon.

+1, I would suggest to move ahead, !! is not really Postgres-like anyway.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Provide much better wait information in pg_stat_activity.

2016-03-10 Thread Joel Jacobson

On Fri, Mar 11, 2016 at 9:36 AM, Robert Haas  wrote:
> Well, this was discussed.  If we keep the Boolean "waiting" column, then 
> either:

Oh, sorry for missing out on that discussion.

> 1. We make it true only for heavyweight lock waits, and false for
> other kinds of waits.  That's pretty strange.
> 2. We make it true for all kinds of waits that we now know how to
> report.  That still breaks compatibility.

Why not 3: We make it true for exactly the same type of situations as
in previous versions. Or is it not possible to do so for some reason?

> I do understand that changing this is backward-incompatible and a lot
> of people are going to have to update their monitoring tools.  But I
> think that's the best alternative.  If we choose option #1, we're
> going to be saddled with a weird backward-compatibility column
> forever, and ten years from now we'll be explaining that even if
> waiting = false you might still be waiting depending on the value of
> some other column.  If we choose option #2, it won't be
> backward-compatible and some people's queries will still break, just
> less obviously.  Neither of those things seems very appealing.

I understand it's necessary to break backward-compatibility if the
it's not possible to return the same boolean value for the "waiting"
column in exactly the same situations.
Actually, even if it would be possible, I agree with you it's better
to force people to learn how to improve their tools by using the new
features.

Off topic, but related to the backward-compatibility subject:

Is there any written policy/wiki/thread/document on the topic "When
breaking backward-compatibility is acceptable"?

It would be helpful to get a better understand of this, as some ideas
on how to improve things can quickly be ruled out or ruled in
depending on what is acceptable or not.
For instance, there are some minor but annoying flaws in PL/pgSQL that
I would love to get fixed,
but the main arguments against doing so have been that it might break
some users' code somewhere,
even though doing so would probably be a good thing as the user could
have a bug in the code.
See: https://wiki.postgresql.org/wiki/Improving_PL/PgSQL_(September_2014)

I think one general rule should be "Breaking backward-compatibility is
acceptable if the new major pg-version throws an error in a situation
where the old major pg-version would conceal a bug or allow misuse of
a feature".
Trying to select the now removed "waiting" column throws an error.
Good! That lead me as a user here to figure out why I can't and
shouldn't use it. :)
Trying to e.g. select a different number of columns into a different
number of variables in a PL/pgSQL function doesn't throw an error.
Bad. :(
Here I would argue it's better to throw an error, just like when
trying to select from "waiting". It will hopefully save the day for
some users out there who can't find the bug in their complicated
PL/pgSQL application with millions of lines of code.

Sorry if this was completely off-topic, maybe I should start a new
thread or read some old thread in the archives on
backward-compatibility instead.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

2016-03-10 Thread Masahiko Sawada

On Fri, Mar 11, 2016 at 6:16 AM, Robert Haas  wrote:
> On Thu, Mar 10, 2016 at 1:41 PM, Masahiko Sawada  
> wrote:
>> On Fri, Mar 11, 2016 at 1:03 AM, Robert Haas  wrote:
>>> This 001 patch looks so little like what I was expecting that I
>>> decided to start over from scratch.  The new version I wrote is
>>> attached here.  I don't understand why your version tinkers with the
>>> logic for setting the all-frozen bit; I thought that what I already
>>> committed dealt with that already, and in any case, your version
>>> doesn't even compile against latest sources.  Your version also leaves
>>> the scan_all terminology intact even though it's not accurate any
>>> more, and I am not very convinced that the updates to the
>>> page-skipping logic are actually correct.  Please have a look over
>>> this version and see what you think.
>>
>> Thank you for your advise.
>> Sorry, optimising logic of previous patch was old by mistake.
>> Attached latest patch incorporated your suggestions with a little revising.
>
> Thanks.  I adopted some of your suggested, rejected others, fixed a
> few minor things that I missed previously, and committed this.  If you
> think any of the changes that I rejected still have merit, please
> resubmit those changes as separate patches.
>

Thank you for your effort to this feature and committing it.
I guess that I couldn't do good work to this feature at final stage,
but I really appreciate all your advice and suggestion.

> I think what I really want is some logic so that if we have a 1-page
> visibility map in the old cluster and the second half of that page is
> all zeroes, we only create a 1-page visibility map in the new cluster
> rather than a 2-page visibility map.
>
> Or more generally, if the old VM is N pages, but the last half of the
> last page is empty, then let the output VM be 2*N-1 pages instead of
> 2*N pages.
>

I got your point.
Attached latest patch can skip to write the last part of last old page
if it's empty.
Please review it.

Regards,

--
Masahiko Sawada
diff --git a/src/bin/pg_upgrade/file.c b/src/bin/pg_upgrade/file.c
index 2a99a28..7783b8a 100644
--- a/src/bin/pg_upgrade/file.c
+++ b/src/bin/pg_upgrade/file.c
@@ -9,10 +9,16 @@
 
 #include "postgres_fe.h"
 
+#include "access/visibilitymap.h"
 #include "pg_upgrade.h"
+#include "storage/bufpage.h"
+#include "storage/checksum.h"
+#include "storage/checksum_impl.h"
 
+#include 
 #include 
 
+#define BITS_PER_HEAPBLOCK_OLD 1
 
 
 #ifndef WIN32
@@ -138,6 +144,156 @@ copy_file(const char *srcfile, const char *dstfile, bool force)
 #endif
 
 
+/*
+ * rewriteVisibilityMap()
+ *
+ * In versions of PostgreSQL prior to catversion 201603011, PostgreSQL's
+ * visibility map included one bit per heap page; it now includes two.
+ * When upgrading a cluster from before that time to a current PostgreSQL
+ * version, we could refuse to copy visibility maps from the old cluster
+ * to the new cluster; the next VACUUM would recreate them, but at the
+ * price of scanning the entire table.  So, instead, we rewrite the old
+ * visibility maps in the new format.  That way, the all-visible bit
+ * remains set for the pages for which it was set previously.  The
+ * all-frozen bit is never set by this conversion; we leave that to
+ * VACUUM.
+ */
+const char *
+rewriteVisibilityMap(const char *fromfile, const char *tofile, bool force)
+{
+	int			src_fd = 0;
+	int			dst_fd = 0;
+	char		buffer[BLCKSZ];
+	ssize_t		bytesRead;
+	ssize_t		src_filesize;
+	int			rewriteVmBytesPerPage;
+	BlockNumber new_blkno = 0;
+	struct stat	statbuf;
+
+	/* Compute we need how many old page bytes to rewrite a new page */
+	rewriteVmBytesPerPage = (BLCKSZ - SizeOfPageHeaderData) / 2;
+
+	if ((fromfile == NULL) || (tofile == NULL))
+		return "Invalid old file or new file";
+
+	if ((src_fd = open(fromfile, O_RDONLY, 0)) < 0)
+		return getErrorText();
+
+	if (fstat(src_fd, ) != 0)
+	{
+		close(src_fd);
+		return getErrorText();
+	}
+
+	if ((dst_fd = open(tofile, O_RDWR | O_CREAT | (force ? 0 : O_EXCL), S_IRUSR | S_IWUSR)) < 0)
+	{
+		close(src_fd);
+		return getErrorText();
+	}
+
+	/* Save old file size */
+	src_filesize = statbuf.st_size;
+
+	/*
+	 * Turn each visibility map page into 2 pages one by one. Each new page
+	 * has the same page header as the old one.  If last section of last page
+	 * is empty, we skip to write it. That is, more generally the old visibility
+	 * map is N pages, but the last part of the last page is empty, this routine
+	 * ouputs (BITS_PER_HEAPBLOCK / BITS_PER_HEAPBLOCK_OLD) * N - 1 pages instead
+	 * of (BITS_PER_HEAPBLOCK / BITS_PER_HEAPBLOCK_OLD) * N pages.
+	 */
+	while ((bytesRead = read(src_fd, buffer, BLCKSZ)) == BLCKSZ)
+	{
+		char		*old_cur;
+		char		*old_break;
+		char		*old_blkend;
+		PageHeaderData pageheader;
+		bool		old_lastblk = ((BLCKSZ * (new_blkno + 1)) == src_filesize);
+
+		/* Save the page header data */
+		memcpy(, buffer,

Re: [HACKERS] Proposal: RETURNING primary_key()

2016-03-10 Thread Robert Haas

On Mar 10, 2016, at 2:07 PM, Igal @ Lucee.org  wrote:

> (Side note:  This was my first, and hopefully my last, experience with Oracle 
> database, and it's been a real PITA.  If I had tried it out some 20 years ago 
> then the experience would have probably led me to sell the stock short, which 
> would have probably ended with my bankruptcy.  Go figure...)
> 
> (Side note: after wasting almost a full day setting up and connecting to the 
> DB2 server I realized why Oracle was so successful)

This email made me laugh.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix for OpenSSL error queue bug

2016-03-10 Thread Peter Eisentraut

On 3/10/16 9:38 PM, Peter Geoghegan wrote:
> Looked at your proposed patch. Will respond to your original mail on the 
> matter.
> 
> On Thu, Mar 3, 2016 at 4:15 PM, Peter Eisentraut  wrote:
>> I think clearing the error after a call is not necessary.  The API
>> clearly requires that you should clear the error queue before a call, so
>> clearing it afterwards does not accomplish anything, except maybe make
>> broken code work sometimes, for a while.
> 
> Uh, if it's so clear, then why haven't we been doing it all along?

The issue only happens if two interleaving trains of execution, one of
which is libpq, use OpenSSL.  Not many applications do that.  And you
also need to provoke the errors in a certain order.  And even then, in
some cases you might just see a false positive error, rather than a
crash.  So it's an edge case.

> Part of the problem is that various scripting language OpenSSL
> wrappers are only very thin wrappers. They effectively pass the buck
> on to PHP and Ruby devs. If we cannot get it right, what chance have
> they? I've personally experienced a bit uptick in complaints about
> this recently. I think there are 3 separate groups within Heroku that
> regularly ask me how this patch is doing.

I think they have been getting away with it for so long for the same
reasons.

Arguably, if everyone followed "my" approach, this should be very easy
to fix everywhere.  Instead, reading through the PHP bug report, they
are coming up with a fairly complex solution for clearing the error
queue afterwards so as to not leave "landmines" for the next guy.  But
their code will (AFAICT) still be wrong because they are not clearing
the error *before* the API calls where it is required per documentation.
 So "everyone" (sample of 2) is scrambling to clean up for the next guy
instead of doing the straightforward fix of following the API
documentation and cleaning up before their own calls.

I also see the clean-up-afterwards approach in the Python ssl module.  I
fear there is de facto a second API specification that requires you to
clean up errors after yourself and gives an implicit guarantee that the
error queue is empty whenever you want to make any SSL calls.  I don't
think this actually works in all cases, but maybe if everyone else is
convinced of that (in plain violation of the published OpenSSL
documentation, AFAICT) we need to get on board with that for
interoperability.

>> Also, there is nothing that
>> says that an error produces exactly one entry in the error queue; it
>> could be multiple.  Or that errors couldn't arise at random times
>> between the reset and whatever happens next.
> 
> I think that it's kind of implied, since calling ERR_get_error() pops
> the stack. But even if that isn't so, it might be worth preventing bad
> things from happening to client applications only sometimes.

The lore on the internet suggests that multiple errors could definitely
happen.  So popping one error afterwards isn't going to fix it, it just
moves the edge case around.  At least what we should do is clear the
entire queue afterwards instead of just the first error.

> doesn't it give you pause? Heikki seemed to think that clearing our
> own queue was important when he looked at this a year ago:
> 
> http://www.postgresql.org/message-id/54edd30d.5050...@vmware.com

I think that message suggests that we should clear the queue before each
call, not after.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adjusting the API of pull_var_clause()

2016-03-10 Thread Ashutosh Bapat

Now, I'm pretty sure that the last time we touched pull_var_clause's
> API, we intentionally set it up to force every call site to be visited
> when new behaviors were added.  But right at the moment that's looking
> like it was a bad call.
>
> An alternative API design could look like
>
> #define PVC_INCLUDE_AGGREGATES   0x0001 /* include Aggrefs in output list
> */
> #define PVC_RECURSE_AGGREGATES   0x0002 /* recurse into Aggref arguments */
> #define PVC_INCLUDE_PLACEHOLDERS 0x0004 /* include PlaceHolderVars in
> output list */
> #define PVC_RECURSE_PLACEHOLDERS 0x0008 /* recurse into PlaceHolderVar
> arguments */
>
>
extern List *pull_var_clause(Node *node, int flags);
>
> with calls along the line of
>
> pull_var_clause(node, PVC_INCLUDE_AGGREGATES | PVC_RECURSE_PLACEHOLDERS);
>
> the default behavior if you specify no flag being "error if node type
> is seen".
>
> The attraction of this approach is that if we add another behavior
> to pull_var_clause, while we'd still likely need to run around and
> check every call site, it wouldn't be positively guaranteed that
> we'd need to edit every darn one of them.
>

That can be a problem for extension or any code outside the PG repository
that uses pull_var_clause(). Right now they would notice it because
compilation will fail. Those extensions wouldn't know that a new option has
been added and will be presented the result with default option. That may
not be bad for window nodes but I am sure that, that would be the case for
other nodes.

The name of the function suggests that should get all the Var nodes from a
given expression. Throwing error when an unexpected node is encountered,
seems to be a side effect. So RECURSE should be the default behaviour and
not REJECT.

I am not sure whether REJECT behaviour is something of a sanity check and
not the real thing. How many calls which specify REJECT_AGGREGATE, really
expect an aggregate to be in the expression passed. Probably not. If they
really case, shouldn't there be a separate API for checking just that. In
fact, we may want to change the  API to indicate where to stop recursing
e.g. at aggregate nodes or placeholder nodes or window nodes. Since we are
thinking of changing the API, we may consider this change as well.
Although, I think this would have been OK when pull_var_clause was being
written afresh. Now, that we have this API, I am not sure whether the
effort is worth the result.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Using quicksort for every external sort run

2016-03-10 Thread Peter Geoghegan

On Sun, Feb 14, 2016 at 8:01 PM, Peter Geoghegan  wrote:
> The query I'm testing is: "reindex index pgbench_accounts_pkey;"
>
> Now, with a maintenance_work_mem of 5MB, the most recent revision of
> my patch takes about 54.2 seconds to complete this, as compared to
> master's 44.4 seconds. So, clearly a noticeable regression there of
> just under 20%. I did not see a regression with a 5MB
> maintenance_work_mem when pgbench scale was 100, though.

I've fixed this regression, and possibly all regressions where workMem
> 4MB. I've done so without resorting to making the heap structure
more complicated, or using a heap more often than when
replacement_sort_mem is exceeded by work_mem or maintenance_work_mem
(so replacement_sort_mem becomes something a bit different to what we
discussed, Robert -- more on that later). This seems like an
"everybody wins" situation, because in this revision the patch series
is now appreciably *faster* where the amount of memory available is
only a tiny fraction of the total input size.

Jeff Janes deserves a lot of credit for helping me to figure out how
to do this. I couldn't get over his complaint about the regression he
saw a few months back. He spoke of an "anti-sweetspot" in polyphase
merge, and how he suspected that to be the real culprit (after all,
most of his time was spent merging, with or without the patch
applied). He also said that reverting the memory batch/pool patch made
things go a bit faster, somewhat ameliorating his regression (when
just the quicksort patch was applied). This made no sense to me, since
I understood the memory batching patch to be orthogonal to the
quicksort thing, capable of being applied independently, and more or
less a strict improvement on master, no matter what the variables of
the sort are. Jeff's regressed case especially made no sense to me
(and, I gather, to him) given that the regression involved no
correlation, and so clearly wasn't reliant on generating far
fewer/longer runs than the patch (that's the issue we've discussed
more than any other now -- it's a red herring, it seems). As I
suspected out loud on February 14th, replacement selection mostly just
*masked* the real problem: the problem of palloc() fragmentation.
There doesn't seem to be much of an issue with the scheduling of
polyphase merging, once you fix palloc() fragmentation. I've created a
new revision, incorporating this new insight.

New Revision

Attached revision of patch series:

1. Creates a separate memory context for tuplesort's copies of
caller's tuples, which can be reset at key points, avoiding
fragmentation. Every SortTuple.tuple is allocated there (with trivial
exception); *everything else*, including the memtuples array, is
allocated in the existing tuplesort context, which becomes the parent
of this new "caller's tuples" context. Roughly speaking, that means
that about half of total memory for the sort is managed by each
context in common cases. Even with a high work_mem memory budget,
memory fragmentation could previously get so bad that tuplesort would
in effect claim a share of memory from the OS that is *significantly*
higher than the work_mem budget allotted to its sort. And with low
work_mem settings, fragmentation previously made palloc() thrash the
sort, especially during non-final merging. In this latest revision,
tuplesort now almost gets to use 100% of the memory that was requested
from the OS by palloc() is cases tested.

2. Loses the "quicksort with spillover" case entirely, making the
quicksort patch significantly simpler. A *lot* of code was thrown out.

This change is especially significant because it allowed me to remove
the cost model that Robert took issue with so vocally. "Quicksort with
spillover" was always far less important than the basic idea of using
quicksort for external sorts, so I'm not sad to see it go. And, I
thought that the cost model was pretty bad myself.

3. Fixes cost_sort(), making optimizer account for the fact that runs
are now about sort_mem-sized, not (sort_mem * 2)-sized.

While I was at it, I made cost_sort() more optimistic about the amount
of random I/O required relative to sequential I/O. This additional
change to cost_sort() was probably overdue.

4. Restores the ability of replacement selection to generate one run
and avoid any merging (previously, only one really long run and one
short run was possible, because at the time I conceptualized
replacement selection as being all about enabling "quicksort with
spillover", which quicksorted that second run in memory). This
only-one-run case is the case that Robert particularly cared about,
and it's fully restored when RS is in use (which can still only happen
for the first run, just never for the benefit of the now-axed
"quicksort with spillover" case).

5. Adds a new GUC, "replacement_sort_mem". The default setting is
16MB. Docs are included. If work_mem/maintenance_work_mem is less than
or equal to this, the first (and

Re: [HACKERS] Fix for OpenSSL error queue bug

2016-03-10 Thread Peter Geoghegan

Looked at your proposed patch. Will respond to your original mail on the matter.

On Thu, Mar 3, 2016 at 4:15 PM, Peter Eisentraut  wrote:
> I think clearing the error after a call is not necessary.  The API
> clearly requires that you should clear the error queue before a call, so
> clearing it afterwards does not accomplish anything, except maybe make
> broken code work sometimes, for a while.

Uh, if it's so clear, then why haven't we been doing it all along? The
API doesn't require you to take *any* specific practical measure (for
example, the specific practical measure of resetting the queue before
calling an I/O function). It simply says "this exact thing cannot be
allowed to happen; the consequences are undefined", with nothing in
the way of guidance on what that means in the real world. It's a
shockingly bad API, but that's the reality.

Part of the problem is that various scripting language OpenSSL
wrappers are only very thin wrappers. They effectively pass the buck
on to PHP and Ruby devs. If we cannot get it right, what chance have
they? I've personally experienced a bit uptick in complaints about
this recently. I think there are 3 separate groups within Heroku that
regularly ask me how this patch is doing.

> Also, there is nothing that
> says that an error produces exactly one entry in the error queue; it
> could be multiple.  Or that errors couldn't arise at random times
> between the reset and whatever happens next.

I think that it's kind of implied, since calling ERR_get_error() pops
the stack. But even if that isn't so, it might be worth preventing bad
things from happening to client applications only sometimes.

> I think this is analogous to clearing errno before a C library call.
> You could clear it afterwards as well, to be nice to the next guy, but
> the next guy should really take care of that themselves, and we can't
> rely on what happens in between anyway.

It sounds like you're saying "well, we cannot be expected to bend over
backwards to make broken code work". But that broken code includes
every single version of libpq + OpenSSL currently distributed. Seems
like a very high standard. I'm not saying that that means we
definitely should clear the error queue reliably ourselves, but
doesn't it give you pause? Heikki seemed to think that clearing our
own queue was important when he looked at this a year ago:

http://www.postgresql.org/message-id/54edd30d.5050...@vmware.com

Again, not conclusive, but I would like to hear a rationale for why
you think it's okay to not consistently clear our own queue for the
benefit of others. Is this informed by a concern about some specific
downside to taking that extra precaution?

Thanks
-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Provide much better wait information in pg_stat_activity.

2016-03-10 Thread Robert Haas

On Thu, Mar 10, 2016 at 8:31 PM, Joel Jacobson  wrote:
> This is an excellent feature, thanks!
> But can we please keep the old boolean waiting column?
> I see no reason to break backward-compatibility. Or maybe I'm missing 
> something.

Well, this was discussed.  If we keep the Boolean "waiting" column, then either:

1. We make it true only for heavyweight lock waits, and false for
other kinds of waits.  That's pretty strange.
2. We make it true for all kinds of waits that we now know how to
report.  That still breaks compatibility.

I do understand that changing this is backward-incompatible and a lot
of people are going to have to update their monitoring tools.  But I
think that's the best alternative.  If we choose option #1, we're
going to be saddled with a weird backward-compatibility column
forever, and ten years from now we'll be explaining that even if
waiting = false you might still be waiting depending on the value of
some other column.  If we choose option #2, it won't be
backward-compatible and some people's queries will still break, just
less obviously.  Neither of those things seems very appealing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Relaxing SSL key permission checks

2016-03-10 Thread Peter Eisentraut

On 3/4/16 3:55 PM, Alvaro Herrera wrote:
> * it failed to check for S_IXUSR, so permissions 0700 were okay, in
> contradiction with what the error message indicates.  This is a
> preexisting bug actually.  Do we want to fix it by preventing a
> user-executable file (possibly breaking compability with existing
> executable key files), or do we want to document what the restriction
> really is?

I think we should not check for S_IXUSR.  There is no reason for doing that.

I can imagine that key files are sometimes copied around using USB
drives with FAT file systems or other means of that sort where
permissions can scrambled.  While I hate gratuitous executable bits as
much as the next person, insisting here would just create annoyances in
practice.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix for OpenSSL error queue bug

2016-03-10 Thread Peter Eisentraut

On 3/10/16 6:10 PM, Peter Geoghegan wrote:
> On Thu, Mar 10, 2016 at 3:09 PM, Peter Geoghegan  wrote:
>> Getting to it very soon. Just really busy right this moment.
> 
> That said, I agree with Peter's remarks about doing this frontend and
> backend. So, while I'm not sure, I think we're in agreement on all
> issues. I would have no problem with Peter E following through with
> final steps + commit as Robert outlined, if that works for him.

My proposal is the attached patch.


From 697b4f75fccbb5eda530211f3ef58c2b226c5461 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 10 Mar 2016 20:59:30 -0500
Subject: [PATCH] Clear OpenSSL error queue before OpenSSL calls

OpenSSL requires that the error queue be cleared before certain OpenSSL
API calls, so that you can be sure that the error you are checking
afterwards actually come from you and was not left over from other
activity.  We had never done that, which appears to have worked as long
as we are the only users of OpenSSL in the process.  But if a process
using libpq or a backend plugin uses OpenSSL as well, this can lead to
confusion and crashes.

see bug #12799 and https://bugs.php.net/bug.php?id=68276

based on patches by Dave Vitek and Peter Geoghegan
---
 src/backend/libpq/be-secure-openssl.c| 3 +++
 src/interfaces/libpq/fe-secure-openssl.c | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 1e3dfb6..be337f5 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -353,6 +353,7 @@ be_tls_open_server(Port *port)
 	port->ssl_in_use = true;
 
 aloop:
+	ERR_clear_error();
 	r = SSL_accept(port->ssl);
 	if (r <= 0)
 	{
@@ -501,6 +502,7 @@ be_tls_read(Port *port, void *ptr, size_t len, int *waitfor)
 	int			err;
 
 	errno = 0;
+	ERR_clear_error();
 	n = SSL_read(port->ssl, ptr, len);
 	err = SSL_get_error(port->ssl, n);
 	switch (err)
@@ -558,6 +560,7 @@ be_tls_write(Port *port, void *ptr, size_t len, int *waitfor)
 	int			err;
 
 	errno = 0;
+	ERR_clear_error();
 	n = SSL_write(port->ssl, ptr, len);
 	err = SSL_get_error(port->ssl, n);
 	switch (err)
diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c
index 133546b..0535338 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -212,6 +212,7 @@ pgtls_read(PGconn *conn, void *ptr, size_t len)
 
 rloop:
 	SOCK_ERRNO_SET(0);
+	ERR_clear_error();
 	n = SSL_read(conn->ssl, ptr, len);
 	err = SSL_get_error(conn->ssl, n);
 	switch (err)
@@ -320,6 +321,7 @@ pgtls_write(PGconn *conn, const void *ptr, size_t len)
 	int			err;
 
 	SOCK_ERRNO_SET(0);
+	ERR_clear_error();
 	n = SSL_write(conn->ssl, ptr, len);
 	err = SSL_get_error(conn->ssl, n);
 	switch (err)
@@ -1327,6 +1329,7 @@ open_client_SSL(PGconn *conn)
 {
 	int			r;
 
+	ERR_clear_error();
 	r = SSL_connect(conn->ssl);
 	if (r <= 0)
 	{
-- 
2.7.2


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-10 Thread Amit Langote

On 2016/03/10 23:29, Robert Haas wrote:
> On Thu, Mar 10, 2016 at 3:08 AM, Amit Langote
>  wrote:
>> Hi Vinayak,
>>
>> Thanks for the quick review!
> 
> Committed 0001 earlier this morning.

Thanks!

> On 0002:
> 
> +   /* total_index_blks */
> +   current_index_blks = (BlockNumber *) palloc(nindexes *
> sizeof(BlockNumber));
> +   total_index_blks = 0;
> +   for (i = 0; i < nindexes; i++)
> +   {
> +   BlockNumber nblocks =
> RelationGetNumberOfBlocks(Irel[i]);
> +
> +   current_index_blks[i] = nblocks;
> +   total_index_blks += nblocks;
> +   }
> +   pgstat_progress_update_param(PROG_PARAM_VAC_IDX_BLKS, 
> total_index_blks);
> 
> I think this is a bad idea.  The value calculated here isn't
> necessarily accurate, because the number of index blocks can change
> between the time this is calculated and the time the indexes are
> actually vacuumed.  If a client just wants the length of the indexes
> in round figures, that's already SQL-visible, and there's little
> reason to make VACUUM do it all the time whether anyone is looking at
> the progress information or not.  Note that I'm not complaining about
> the fact that you exposed the heap block count, because in that case
> you are exposing the actual value that VACUUM is using to guide its
> work.  The client can get the *current* length of the relation, but
> the value you are exposing gives you the number of blocks *this
> particular VACUUM intends to scan*.  That has some incremental value -
> but the index information doesn't have the same thing going for it.

So, from what I understand here, we should not put total count of index
pages into st_progress_param; rather, have the client (reading
pg_stat_progress_vacuum) derive it using pg_indexes_size() (?), as and
when necessary.  However, only server is able to tell the current position
within an index vacuuming round (or how many pages into a given index
vacuuming round), so report that using some not-yet-existent mechanism.

> On 0003:
> 
> I think you should make this work for all AMs, not just btree, and
> then consolidate it with 0002.

OK.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Relation extension scalability

2016-03-10 Thread Petr Jelinek


On 11/03/16 02:44, Dilip Kumar wrote:


On Fri, Mar 11, 2016 at 12:04 AM, Petr Jelinek > wrote:

Thanks for looking..

Those look good. The patch looks good in general now. I am bit
scared by the lockWaiters * 20 as it can result in relatively big
changes in rare corner cases when for example a lot of backends were
waiting for lock on relation and suddenly all try to extend it. I
wonder if we should clamp it to something sane (although what's sane
today might be small in couple of years).


But in such condition when all are waiting on lock, then at a time only
one waiter will get the lock and that will easily count the lock waiter
and extend in multiple of that. And we also have the check that if any
waiter get the lock it will first check in FSM that anybody else have
added one block or not..



I am not talking about extension locks, the lock queue can be long 
because there is concurrent DDL for example and then once DDL finishes 
suddenly 100 connections that tried to insert into table will try to get 
extension lock and this will add 2000 new pages when much fewer was 
actually needed. I guess that's fine as it's corner case and it's only 
16MB even in such extreme case.


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optimizer questions

2016-03-10 Thread Tom Lane

I wrote:
> As far as that goes, it seems to me after thinking about it that
> non-sort-column tlist items containing SRFs should always be postponed,
> too.  Performing a SRF before sorting bloats the sort data vertically,
> rather than horizontally, but it's still bloat.  (Although as against
> that, when you have ORDER BY + LIMIT, postponing SRFs loses the ability
> to use a bounded sort.)  The killer point though is that unless the sort
> is stable, it might cause surprising changes in the order of the SRF
> output values.  Our sorts aren't stable; here's an example in HEAD:

> # select q1, generate_series(1,9) from int8_tbl order by q1 limit 7;
>  q1  | generate_series 
> -+-
>  123 |   2
>  123 |   3
>  123 |   4
>  123 |   5
>  123 |   6
>  123 |   7
>  123 |   1
> (7 rows)

> I think that's pretty surprising, and if we have an opportunity to
> provide more intuitive semantics here, we should do it.

Here's an updated patch (requires current HEAD) that takes care of
window functions correctly and also does something reasonable with
set-returning functions:

# explain verbose select q1, generate_series(1,9) from int8_tbl order by q1 
limit 7;
   QUERY PLAN   
 
-
 Limit  (cost=1.11..1.14 rows=7 width=12)
   Output: q1, (generate_series(1, 9))
   ->  Result  (cost=1.11..26.16 rows=5000 width=12)
 Output: q1, generate_series(1, 9)
 ->  Sort  (cost=1.11..1.12 rows=5 width=8)
   Output: q1
   Sort Key: int8_tbl.q1
   ->  Seq Scan on public.int8_tbl  (cost=0.00..1.05 rows=5 width=8)
 Output: q1
(9 rows)

# select q1, generate_series(1,9) from int8_tbl order by q1 limit 7;
 q1  | generate_series 
-+-
 123 |   1
 123 |   2
 123 |   3
 123 |   4
 123 |   5
 123 |   6
 123 |   7
(7 rows)

I added some user-facing documentation too.  I think this is committable,
though maybe we should add a regression test case or two.

regards, tom lane

diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 44810f4..0520f2c 100644
*** a/doc/src/sgml/ref/select.sgml
--- b/doc/src/sgml/ref/select.sgml
*** UNBOUNDED FOLLOWING
*** 993,998 
--- 993,1028 
  cases it is not possible to specify new names with AS;
  the output column names will be the same as the table columns' names.
 
+ 
+
+ According to the SQL standard, the expressions in the output list should
+ be computed before applying DISTINCT, ORDER
+ BY, or LIMIT.  This is obviously necessary
+ when using DISTINCT, since otherwise it's not clear
+ what values are being made distinct.  However, in many cases it is
+ convenient if output expressions are computed after ORDER
+ BY and LIMIT; particularly if the output list
+ contains any volatile or expensive functions.  With that behavior, the
+ order of function evaluations is more intuitive and there will not be
+ evaluations corresponding to rows that never appear in the output.
+ PostgreSQL will effectively evaluate output expressions
+ after sorting and limiting, so long as those expressions are not
+ referenced in DISTINCT, ORDER BY
+ or GROUP BY.  (As a counterexample, SELECT
+ f(x) FROM tab ORDER BY 1 clearly must evaluate f(x)
+ before sorting.)  Output expressions that contain set-returning functions
+ are effectively evaluated after sorting and before limiting, so
+ that LIMIT will act to cut off the output from a
+ set-returning function.
+
+ 
+
+ 
+  PostgreSQL versions before 9.6 did not provide any
+  guarantees about the timing of evaluation of output expressions versus
+  sorting and limiting; it depended on the form of the chosen query plan.
+ 
+

  

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index a2cd6de..51f4ee4 100644
*** a/src/backend/optimizer/plan/planner.c
--- b/src/backend/optimizer/plan/planner.c
*** static RelOptInfo *create_distinct_paths
*** 127,132 
--- 127,133 
  	  RelOptInfo *input_rel);
  static RelOptInfo *create_ordered_paths(PlannerInfo *root,
  	 RelOptInfo *input_rel,
+ 	 PathTarget *target,
  	 double limit_tuples);
  static PathTarget *make_group_input_target(PlannerInfo *root, List *tlist);
  static List *postprocess_setop_tlist(List *new_tlist, List *orig_tlist);
*** static PathTarget *make_window_input_tar
*** 135,140 
--- 136,144 
  		 List *tlist, List *activeWindows);
  static List *make_pathkeys_for_window(PlannerInfo *root, WindowClause *wc,
  		 List

Re: [HACKERS] Relation extension scalability

2016-03-10 Thread Dilip Kumar

On Fri, Mar 11, 2016 at 12:04 AM, Petr Jelinek  wrote:

Thanks for looking..

Those look good. The patch looks good in general now. I am bit scared by
> the lockWaiters * 20 as it can result in relatively big changes in rare
> corner cases when for example a lot of backends were waiting for lock on
> relation and suddenly all try to extend it. I wonder if we should clamp it
> to something sane (although what's sane today might be small in couple of
> years).

But in such condition when all are waiting on lock, then at a time only one
waiter will get the lock and that will easily count the lock waiter and
extend in multiple of that. And we also have the check that if any waiter
get the lock it will first check in FSM that anybody else have added one
block or not..

And other waiter will not get lock unless first waiter extend all blocks
and release the locks.

One possible case is as soon as we extend the blocks new requester directly
find in FSM and don't come for lock, and old waiter after getting lock
don't find in FSM, But IMHO in such cases, also its good that other waiter
also extend more blocks (because this can happen when request flow is very
high).

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] checkpointer continuous flushing

2016-03-10 Thread Andres Freund

Hi,

I just pushed the two major remaining patches in this thread. Let's see
what the buildfarm has to say; I'd not be surprised if there's some
lingering portability problem in the flushing code.

There's one remaining issue we definitely want to resolve before the
next release:  Right now we always use one writeback context across all
tablespaces in a checkpoint, but Fabien's testing shows that that's
likely to hurt in a number of cases. I've some data suggesting the
contrary in others.

Things that'd be good:
* Some benchmarking. Right now controlled flushing is enabled by default
  on linux, but disabled by default on other operating systems. Somebody
  running benchmarks on e.g. freebsd or OSX might be good.
* If somebody has the energy to provide a windows implemenation for
  flush control, that might be worthwhile. There's several places that
  could benefit from that.
* The default values are basically based on benchmarking by me and Fabien.

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Provide much better wait information in pg_stat_activity.

2016-03-10 Thread Joel Jacobson

This is an excellent feature, thanks!
But can we please keep the old boolean waiting column?
I see no reason to break backward-compatibility. Or maybe I'm missing something.

I just had to commit this to make our system run locally on 9.6:

commit 2e189f85fa56724bec5c5cab2fcf0d2f3a4ce22a
Author: Joel Jacobson 
Date:   Fri Mar 11 08:19:52 2016 +0700

Make Have_Queries_Waiting() work with both <9.6 and >=9.6.

Apparently pg_stat_activity.waiting was removed by this commit:
  commit 53be0b1add7064ca5db3cd884302dfc3268d884e
  Author: Robert Haas 
  Date:   Thu Mar 10 12:44:09 2016 -0500

  Provide much better wait information in pg_stat_activity.

This forces us to do some ugly version checking to know which column to use.
I for one can think it would have been better to keep the old
boolean column,
which is not entirely useless as sometimes you just want to know
if something is
waiting and don't care about the details, then it's convenient to
have a boolean column
instead of having to write "wait_event IS NOT NULL".

Let's hope they will add back our dear waiting column so we can avoid this
ugly hack before upgrading to 9.6.

diff --git a/public/FUNCTIONS/have_queries_waiting.sql
b/public/FUNCTIONS/have_queries_waiting.sql
index d83e7c8..b54caf5 100644
--- a/public/FUNCTIONS/have_queries_waiting.sql
+++ b/public/FUNCTIONS/have_queries_waiting.sql
@@ -3,9 +3,16 @@ SET search_path TO 'public', pg_catalog;
 CREATE OR REPLACE FUNCTION have_queries_waiting() RETURNS boolean
 SECURITY DEFINER
 SET search_path TO public, pg_temp
-LANGUAGE sql
+LANGUAGE plpgsql
 AS $$
-SELECT EXISTS (SELECT 1 FROM pg_stat_activity WHERE waiting)
+DECLARE
+BEGIN
+IF version() ~ '^PostgreSQL 9\.[1-5]' THEN
+RETURN EXISTS (SELECT 1 FROM pg_stat_activity WHERE waiting);
+ELSE
+RETURN EXISTS (SELECT 1 FROM pg_stat_activity WHERE wait_event IS
NOT NULL);
+END IF;
+END;
 $$;

On Fri, Mar 11, 2016 at 6:17 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> Or ... maybe this is intentional behavior?  Now that I think about it,
>> doesn't each backend cache this info the first time its transaction
>> reads the data?
>
> Your view of pg_stat_activity is supposed to hold still within a
> transaction, yes.  Otherwise it'd be really painful to do any complicated
> joins.  I think there may be a function to explicitly flush the cache,
> if you really need to see intratransaction changes.
>
> regards, tom lane
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers



-- 
Joel Jacobson

Mobile: +46703603801
Trustly.com | Newsroom | LinkedIn | Twitter


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing - V18

2016-03-10 Thread Andres Freund

On 2016-03-11 00:23:56 +0100, Fabien COELHO wrote:
> As you wish. I thought that understanding the underlying performance model
> with sequential writes written in chunks is important for the admin, and as
> this guc would have an impact on performance it should be hinted about,
> including the limits of its effect where large bases will converge to random
> io performance. But maybe that is not the right place.

I do agree that that's something interesting to document somewhere. But
I don't think any of the current places in the documentation are a good
fit, and it's a topic much more general than the feature we're debating
here.  I'm not volunteering, but a good discussion of storage and the
interactions with postgres surely would be a significant improvement to
the postgres docs.

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing - V18

2016-03-10 Thread Fabien COELHO



[...]


If the default is in pages, maybe you could state it and afterwards
translate it in size.


Hm, I think that's more complicated for users than it's worth.


As you wish. I liked the number of pages you used initially because it 
really gives a hint of how much random IOs are avoided when they are 
contiguous, and I do not have the same just intuition with sizes. Also it 
is related to the io queue length manage by the OS.



The text could say something about sequential writes performance because
pages are sorted.., but that it is lost for large bases and/or short
checkpoints ?


I think that's an implementation detail.


As you wish. I thought that understanding the underlying performance model 
with sequential writes written in chunks is important for the admin, and 
as this guc would have an impact on performance it should be hinted about, 
including the limits of its effect where large bases will converge to 
random io performance. But maybe that is not the right place.


--
Fabien


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Provide much better wait information in pg_stat_activity.

2016-03-10 Thread Tom Lane

Robert Haas  writes:
> Or ... maybe this is intentional behavior?  Now that I think about it,
> doesn't each backend cache this info the first time its transaction
> reads the data?

Your view of pg_stat_activity is supposed to hold still within a
transaction, yes.  Otherwise it'd be really painful to do any complicated
joins.  I think there may be a function to explicitly flush the cache,
if you really need to see intratransaction changes.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing - V18

2016-03-10 Thread Fabien COELHO



Hello Andres,


I'm not sure I've seen these performance... If you have hard evidence,
please feel free to share it.


Man, are you intentionally trying to be hard to work with?


Sorry, I do not understand this remark.

You were refering to some latency measures in your answer, and I was just 
stating that I was interested in seeing these figures which were used to 
justify your choice to keep a shared writeback context.


I did not intend this wish to be an issue, I was expressing an interest.


To quote the email you responded to:

My current plan is to commit this with the current behaviour (as in 
this week[end]), and then do some actual benchmarking on this specific 
part. It's imo a relatively minor detail.


Good.

From the evidence in the thread, I would have given the per tablespace 
context the preference, but this is just a personal opinion and I agree 
that it can work the other way around.


I look forward to see these benchmarks later on, when you have them.

So all is well, and hopefully will be even better later on.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix for OpenSSL error queue bug

2016-03-10 Thread Peter Geoghegan

On Thu, Mar 10, 2016 at 3:09 PM, Peter Geoghegan  wrote:
> Getting to it very soon. Just really busy right this moment.

That said, I agree with Peter's remarks about doing this frontend and
backend. So, while I'm not sure, I think we're in agreement on all
issues. I would have no problem with Peter E following through with
final steps + commit as Robert outlined, if that works for him.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix for OpenSSL error queue bug

2016-03-10 Thread Peter Geoghegan

On Thu, Mar 10, 2016 at 2:50 PM, Robert Haas  wrote:
> So what's the next step here?  Peter G, are you planning to update the
> patch based on this review from Peter E?  If not, Peter E, do you want
> to update the patch and commit?  If neither, I'm going to mark this
> Returned with Feedback in the CF and move on, which seems a bit of a
> shame since this appears to be a bona fide bug, but if nobody's
> willing to work on it, it ain't gettin' fixed.

Getting to it very soon. Just really busy right this moment.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing - V18

2016-03-10 Thread Andres Freund

On 2016-03-10 23:43:46 +0100, Fabien COELHO wrote:
> 
> >   
> >Whenever more than bgwriter_flush_after bytes have
> >been written by the bgwriter, attempt to force the OS to issue these
> >writes to the underlying storage.  Doing so will limit the amount of
> >dirty data in the kernel's page cache, reducing the likelihood of
> >stalls when an fsync is issued at the end of a checkpoint, or when
> >the OS writes data back  in larger batches in the background.  Often
> >that will result in greatly reduced transaction latency, but there
> >also are some cases, especially with workloads that are bigger than
> >, but smaller than the OS's page
> >cache, where performance might degrade.  This setting may have no
> >effect on some platforms.  0 disables controlled
> >writeback. The default is 256Kb on Linux, 0
> >otherwise. This parameter can only be set in the
> >postgresql.conf file or on the server command line.
> >   
> >
> >(plus adjustments for the other gucs)

> What about the maximum value?

Added.

   bgwriter_flush_after (int)

bgwriter_flush_after configuration 
parameter

 Whenever more than bgwriter_flush_after bytes have
 been written by the bgwriter, attempt to force the OS to issue these
 writes to the underlying storage.  Doing so will limit the amount of
 dirty data in the kernel's page cache, reducing the likelihood of
 stalls when an fsync is issued at the end of a checkpoint, or when
 the OS writes data back in larger batches in the background.  Often
 that will result in greatly reduced transaction latency, but there
 also are some cases, especially with workloads that are bigger than
 , but smaller than the OS's page
 cache, where performance might degrade.  This setting may have no
 effect on some platforms.  The valid range is between
 0, which disables controlled writeback, and
 2MB.  The default is 256Kb on Linux,
 0 elsewhere.  (Non-default values of
 BLCKSZ change the default and maximum.)
 This parameter can only be set in the postgresql.conf
 file or on the server command line.

> If the default is in pages, maybe you could state it and afterwards
> translate it in size.

Hm, I think that's more complicated for users than it's worth.

> The text could say something about sequential writes performance because
> pages are sorted.., but that it is lost for large bases and/or short
> checkpoints ?

I think that's an implementation detail.

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix for OpenSSL error queue bug

2016-03-10 Thread Robert Haas

On Thu, Mar 3, 2016 at 7:15 PM, Peter Eisentraut  wrote:
> On 2/5/16 5:04 AM, Peter Geoghegan wrote:
>> As Heikki goes into on that thread, the appropriate action seems to be
>> to constantly reset the error queue, and to make sure that we
>> ourselves clear the queue consistently. (Note that we might not have
>> consistently called ERR_get_error() in the event of an OOM within
>> SSLerrmessage(), for example). I have not changed backend code in the
>> patch, though; I felt that we had enough control of the general
>> situation there for it to be unnecessary to lock everything down.
>
> I think clearing the error after a call is not necessary.  The API
> clearly requires that you should clear the error queue before a call, so
> clearing it afterwards does not accomplish anything, except maybe make
> broken code work sometimes, for a while.  Also, there is nothing that
> says that an error produces exactly one entry in the error queue; it
> could be multiple.  Or that errors couldn't arise at random times
> between the reset and whatever happens next.
>
> I think this is analogous to clearing errno before a C library call.
> You could clear it afterwards as well, to be nice to the next guy, but
> the next guy should really take care of that themselves, and we can't
> rely on what happens in between anyway.
>
> The places that you identified for change look correct as far as libpq
> goes.  I do think that the backend should be updated in the same way,
> because it's a) correct, b) easy enough, and c) there could well be
> interactions with postgres_fdw, plproxy, plperl, or who knows what.

So what's the next step here?  Peter G, are you planning to update the
patch based on this review from Peter E?  If not, Peter E, do you want
to update the patch and commit?  If neither, I'm going to mark this
Returned with Feedback in the CF and move on, which seems a bit of a
shame since this appears to be a bona fide bug, but if nobody's
willing to work on it, it ain't gettin' fixed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing - V18

2016-03-10 Thread Fabien COELHO




   
Whenever more than bgwriter_flush_after bytes have
been written by the bgwriter, attempt to force the OS to issue these
writes to the underlying storage.  Doing so will limit the amount of
dirty data in the kernel's page cache, reducing the likelihood of
stalls when an fsync is issued at the end of a checkpoint, or when
the OS writes data back  in larger batches in the background.  Often
that will result in greatly reduced transaction latency, but there
also are some cases, especially with workloads that are bigger than
, but smaller than the OS's page
cache, where performance might degrade.  This setting may have no
effect on some platforms.  0 disables controlled
writeback. The default is 256Kb on Linux, 0
otherwise. This parameter can only be set in the
postgresql.conf file or on the server command line.
   

(plus adjustments for the other gucs)


Some suggestions:

What about the maximum value?

If the default is in pages, maybe you could state it and afterwards 
translate it in size.


"The default is 64 pages on Linux (usually 256Kb)..."

The text could say something about sequential writes performance because 
pages are sorted.., but that it is lost for large bases and/or short 
checkpoints ?


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing - V18

2016-03-10 Thread Andres Freund

On 2016-03-10 23:38:38 +0100, Fabien COELHO wrote:
> I'm not sure I've seen these performance... If you have hard evidence,
> please feel free to share it.

Man, are you intentionally trying to be hard to work with?  To quote the
email you responded to:

> My current plan is to commit this with the current behaviour (as in this
> week[end]), and then do some actual benchmarking on this specific
> part. It's imo a relatively minor detail.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GinPageIs* don't actually return a boolean

2016-03-10 Thread Robert Haas

On Wed, Mar 2, 2016 at 9:48 PM, Peter Eisentraut  wrote:
> On 2/11/16 9:30 PM, Michael Paquier wrote:
>>> Well, Yury was saying upthread that some MSVC versions have a problem
>>> with the existing coding, which would be a reason to back-patch ...
>>> but I'd like to see a failing buildfarm member first.  Don't particularly
>>> want to promise to support compilers not represented in the farm.
>>
>> Grmbl. Forgot to attach the rebased patch upthread. Here is it now.
>>
>> As of now the only complain has been related to MS2015 and MS2013. If
>> we follow the pattern of cec8394b and [1], support to compile on newer
>> versions of MSVC would be master and REL9_5_STABLE, but MS2013 is
>> supported down to 9.3. Based on this reason, we would want to
>> backpatch down to 9.3 the patch of this thread.
>
> After reviewing this thread and relevant internet lore, I think this
> might be the wrong way to address this problem.  It is in general not
> guaranteed in C that a Boolean-sounding function or macro returns 0 or
> 1.  Prime examples are ctype.h functions such as isupper().  This is
> normally not a problem because built-in conditionals such as if, &&, ||
> handle this just fine.  So code like
>
> -   Assert(!create || !!txn);
> +   Assert(!create || txn != NULL);
>
> is arguably silly either way.  There is no risk in writing just
>
> Assert(!create || txn);
>
> The problem only happens if you compare two "Boolean" values directly
> with each other; and so maybe you shouldn't do that, or at least place
> the extra care there instead, instead of fighting a permanent battle
> with the APIs you're using.  (This isn't an outrageous requirement: You
> can't compare floats or strings either without extra care.)
>
> A quick look through the code based on the provided patch shows that
> approximately the only place affected by this is
>
> if (isLeaf != GinPageIsLeaf(page) || isData != GinPageIsData(page))
> elog(ERROR, "right sibling of GIN page is of different type");
>
> and that's not actually a problem because isLeaf and isData are earlier
> populated by the same macros.  It would still be worth fixing, but a
> localized fix seems better.
>
>
> Now on the matter of stdbool, I tried putting an #include 
> near the top of c.h and compile that to see what would happen.  This is
> the first warning I see:
>
> ginlogic.c: In function 'shimTriConsistentFn':
> ginlogic.c:171:24: error: comparison of constant '2' with boolean
> expression is always false [-Werror=bool-compare]
>if (key->entryRes[i] == GIN_MAYBE)
> ^
>
> and then later on something related:
>
> ../../../../src/include/tsearch/ts_utils.h:107:13: note: expected '_Bool
> (*)(void *, QueryOperand *) {aka _Bool (*)(void *, struct 
> *)}' but argument is of type 'GinTernaryValue (*)(void *, QueryOperand
> *) {aka char (*)(void *, struct  *)}'
>
> So the compiler is actually potentially helpful, but as it stands,
> PostgreSQL code is liable to break if you end up with stdbool.h somehow.
>
> (plperl also fails to compile because of a hot-potato game about who is
> actually responsible for defining bool.)
>
> So one idea would be to actually get ahead of the game, include
> stdbool.h if available, fix the mentioned issues, and maybe get more
> robust code that way.
>
> But the lore on the internet casts some doubt on that: There is no
> guarantee that bool is 1 byte, that bool can be passed around like char,
> or even that bool arrays are laid out like char arrays.  Maybe this all
> works out okay, just like it has worked out so far that int is 4 bytes,
> but we don't know enough about it.  We could probably add some configure
> tests around that.
>
> We could also go the other way and forcibly undefine an existing bool
> type (since stdbool.h is supposed to use macros, not typedefs).  But
> that might not work well if a header that is included later pulls in
> stdbool.h on top of that.
>
>
> My proposal on this particular patch is to do nothing.  The stdbool
> issues should be looked into, for the sake of Windows and other
> future-proofness.  But that will likely be an entirely different patch.

We need to decide what to do about this.  I disagree with Peter: I
think that regardless of stdbool, what we've got right now is sloppy
coding - bad style if nothing else.  Furthermore, I think that while C
lets you use any non-zero value to represent true, our bool type is
supposed to contain only one of those two values.  Therefore, I think
we should commit the full patch, back-patch it as far as somebody has
the energy for, and move on.  But regardless, this patch can't keep
sitting in the CommitFest - we either have to take it or reject it,
and soon.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] checkpointer continuous flushing - V18

2016-03-10 Thread Fabien COELHO



[...]

I had originally kept it with one context per tablespace after 
refactoring this, but found that it gave worse results in rate limited 
loads even over only two tablespaces. That's on SSDs though.


Might just mean that a smaller context size is better on SSD, and it could 
still be better per table space.



The number of pages still in writeback (i.e. for which sync_file_range
has been issued, but which haven't finished running yet) at the end of
the checkpoint matters for the latency hit incurred by the fsync()s from
smgrsync(); at least by my measurement.


I'm not sure I've seen these performance... If you have hard evidence, 
please feel free to share it.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing - V18

2016-03-10 Thread Andres Freund

On 2016-03-10 17:33:33 -0500, Robert Haas wrote:
> On Thu, Mar 10, 2016 at 5:24 PM, Andres Freund  wrote:
> > On 2016-02-21 09:49:53 +0530, Robert Haas wrote:
> >> I think there might be a semantic distinction between these two terms.
> >> Doesn't writeback mean writing pages to disk, and flushing mean making
> >> sure that they are durably on disk?  So for example when the Linux
> >> kernel thinks there is too much dirty data, it initiates writeback,
> >> not a flush; on the other hand, at transaction commit, we initiate a
> >> flush, not writeback.
> >
> > I don't think terminology is sufficiently clear to make such a
> > distinction. Take e.g. our FlushBuffer()...
> 
> Well then we should clarify it!

Trying that as we speak, err, write. How about:

 Whenever more than bgwriter_flush_after bytes have
 been written by the bgwriter, attempt to force the OS to issue these
 writes to the underlying storage.  Doing so will limit the amount of
 dirty data in the kernel's page cache, reducing the likelihood of
 stalls when an fsync is issued at the end of a checkpoint, or when
 the OS writes data back  in larger batches in the background.  Often
 that will result in greatly reduced transaction latency, but there
 also are some cases, especially with workloads that are bigger than
 , but smaller than the OS's page
 cache, where performance might degrade.  This setting may have no
 effect on some platforms.  0 disables controlled
 writeback. The default is 256Kb on Linux, 0
 otherwise. This parameter can only be set in the
 postgresql.conf file or on the server command line.

(plus adjustments for the other gucs)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing - V18

2016-03-10 Thread Robert Haas

On Thu, Mar 10, 2016 at 5:24 PM, Andres Freund  wrote:
> On 2016-02-21 09:49:53 +0530, Robert Haas wrote:
>> I think there might be a semantic distinction between these two terms.
>> Doesn't writeback mean writing pages to disk, and flushing mean making
>> sure that they are durably on disk?  So for example when the Linux
>> kernel thinks there is too much dirty data, it initiates writeback,
>> not a flush; on the other hand, at transaction commit, we initiate a
>> flush, not writeback.
>
> I don't think terminology is sufficiently clear to make such a
> distinction. Take e.g. our FlushBuffer()...

Well then we should clarify it!

:-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing - V18

2016-03-10 Thread Andres Freund

On 2016-02-21 09:49:53 +0530, Robert Haas wrote:
> I think there might be a semantic distinction between these two terms.
> Doesn't writeback mean writing pages to disk, and flushing mean making
> sure that they are durably on disk?  So for example when the Linux
> kernel thinks there is too much dirty data, it initiates writeback,
> not a flush; on the other hand, at transaction commit, we initiate a
> flush, not writeback.

I don't think terminology is sufficiently clear to make such a
distinction. Take e.g. our FlushBuffer()...


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Provide much better wait information in pg_stat_activity.

2016-03-10 Thread Robert Haas

On Thu, Mar 10, 2016 at 5:07 PM, Robert Haas  wrote:
> On Thu, Mar 10, 2016 at 5:05 PM, Robert Haas  wrote:
>> On Thu, Mar 10, 2016 at 4:51 PM, Pavel Stehule  
>> wrote:
>>> Maybe it be clear from attached text file
>>
>> Uh, yikes, that looks messed up, but I wouldn't have thought this
>> commit would have changed anything there one way or the other.  The
>> current query is reported by pgstat_report_activity(), which I didn't
>> touch.  I think.
>
> I just tried this on 9.5 - changing the query only to "select pid,
> state, query from pg_stat_activity"  and doing everything else the
> same - and I see the same behavior there.  So it looks like this is a
> preexisting bug.

Or ... maybe this is intentional behavior?  Now that I think about it,
doesn't each backend cache this info the first time its transaction
reads the data?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Provide much better wait information in pg_stat_activity.

2016-03-10 Thread Robert Haas

On Thu, Mar 10, 2016 at 5:05 PM, Robert Haas  wrote:
> On Thu, Mar 10, 2016 at 4:51 PM, Pavel Stehule  
> wrote:
>> Maybe it be clear from attached text file
>
> Uh, yikes, that looks messed up, but I wouldn't have thought this
> commit would have changed anything there one way or the other.  The
> current query is reported by pgstat_report_activity(), which I didn't
> touch.  I think.

I just tried this on 9.5 - changing the query only to "select pid,
state, query from pg_stat_activity"  and doing everything else the
same - and I see the same behavior there.  So it looks like this is a
preexisting bug.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Provide much better wait information in pg_stat_activity.

2016-03-10 Thread Robert Haas

On Thu, Mar 10, 2016 at 4:51 PM, Pavel Stehule  wrote:
> Maybe it be clear from attached text file

Uh, yikes, that looks messed up, but I wouldn't have thought this
commit would have changed anything there one way or the other.  The
current query is reported by pgstat_report_activity(), which I didn't
touch.  I think.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Provide much better wait information in pg_stat_activity.

2016-03-10 Thread Pavel Stehule

2016-03-10 22:24 GMT+01:00 Robert Haas :

>
> rhaas=# select query, state, wait_event, wait_event_type from
> pg_stat_activity;
>   query
>   | state  |  wait_event   | wait_event_type
>
> -++---+-
>  select query, state, wait_event, wait_event_type from
> pg_stat_activity; | active |   |
>  select * from foo for update;
>   | active | transactionid | Lock
> (2 rows)
>
> ...which looks right to me.
>
> > session two:
> > rollback; begin; select * from foo where a = 10 for update;
> > session two is waiting again
>
> I don't see how you can do this here - the session is blocked.
>
> There could well be a bug here, but I need a little more help to find it.
>

Maybe it be clear from attached text file

Regards



>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
=== <=== FIRST SESSION
postgres=# begin;
BEGIN
Time: 0.498 ms
postgres=# select * from foo for update;
┌┐
│ a  │
╞╡
│ 10 │
└┘
(1 row)

Time: 1.152 ms
==

** <=== SECOND SESSION
postgres=# begin;
BEGIN
Time: 0.426 ms
postgres=# select * from foo for update;
**

==
postgres=# select pid, wait_event, state, query from pg_stat_activity ;
┌───┬───┬┬──┐
│  pid  │  wait_event   │ state  │query 
│
╞═══╪═══╪╪══╡
│ 22870 │ transactionid │ active │ select * from foo for update;
│
│ 22874 │ ( null )  │ active │ select pid, wait_event, state, query from 
pg_stat_activity ; │
└───┴───┴┴──┘
(2 rows)

Time: 1.666 ms --- OOK
=

*
^CCancel request sent
ERROR:  57014: canceling statement due to user request
CONTEXT:  while locking tuple (0,1) in relation "foo"
LOCATION:  ProcessInterrupts, postgres.c:2977
Time: 121895.558 ms
postgres=# rollback;
ROLLBACK
Time: 0.648 ms
postgres=# begin;
BEGIN
Time: 0.461 ms
postgres=# select * from foo where a = 10 for update;
*

=
┌───┬───┬┬──┐
│  pid  │  wait_event   │ state  │query 
│
╞═══╪═══╪╪══╡
│ 22870 │ transactionid │ active │ select * from foo for update;
│ <=== expecting select * from foo where a = 10 for update
│ 22874 │ ( null )  │ active │ select pid, wait_event, state, query from 
pg_stat_activity ; │
└───┴───┴┴──┘
(2 rows)

Time: 1.421 ms


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Is there a way around function search_path killing SQL function inlining? - and backup / restore issue

2016-03-10 Thread Regina Obe


> Hmm.  The meaning of funcs.inline depends on the search_path, not just during 
> dump restoration but all the time.  So anything uses it under a different 
> search_path setting than the normal one will have this kind of problem; not 
> just 

> dump/restore.

> I don't have a very good idea what to do about that.

> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company

I wasn't suggesting it was a restore only issue, but it's most felt when your 
data doesn't come back.  It affects any extension that relies on another 
extension.

Take for example, I have tiger geocoder which relies on fuzzystrmatch.  I have 
no idea where someone installs fuzzystrmatch so I can't schema qualify those 
calls.  I use that dependent function to use to build an index on tables.

The indexes don't come back.  What I was trying to suggest (side topic, forget 
about inline issue), 

Is the pg_dump should have a switch to allow users to tack on extra schemas

So that the dump restore set search_path thing looks like:

Set search_path=my_data_schema, pg_catalog, whatever_otehr_schemas_I_have_for_db

People can choose to use that switch or not.  So that way if people do have 
database search_paths, they normally run with, their data will come back.

Am I missing something here in this suggestion?  It's one of the most common 
complaints I hear about PostgreSQL in general and the crazy things people do to 
get around the issue like doing plain text dumps and parsing the dump.

Thanks,
Regina




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing - V18

2016-03-10 Thread Andres Freund

On 2016-03-08 09:28:15 +0100, Fabien COELHO wrote:
> 
> >>>Now I cannot see how having one context per table space would have a
> >>>significant negative performance impact.
> >>
> >>The 'dirty data' etc. limits are global, not per block device. By having
> >>several contexts with unflushed dirty data the total amount of dirty
> >>data in the kernel increases.
> >
> >Possibly, but how much?  Do you have experimental data to back up that
> >this is really an issue?
> >
> >We are talking about 32 (context size) * #table spaces * 8KB buffers = 4MB
> >of dirty buffers to manage for 16 table spaces, I do not see that as a
> >major issue for the kernel.

We flush in those increments, that doesn't mean there's only that much
dirty data. I regularly see one order of magnitude more being dirty.

I had originally kept it with one context per tablespace after
refactoring this, but found that it gave worse results in rate limited
loads even over only two tablespaces. That's on SSDs though.

> To complete the argument, the 4MB is just a worst case scenario, in reality
> flushing the different context would be randomized over time, so the
> frequency of flushing a context would be exactly the same in both cases
> (shared or per table space context) if the checkpoints are the same size,
> just that with shared table space each flushing potentially targets all
> tablespace with a few pages, while with the other version each flushing
> targets one table space only.

The number of pages still in writeback (i.e. for which sync_file_range
has been issued, but which haven't finished running yet) at the end of
the checkpoint matters for the latency hit incurred by the fsync()s from
smgrsync(); at least by my measurement.

My current plan is to commit this with the current behaviour (as in this
week[end]), and then do some actual benchmarking on this specific
part. It's imo a relatively minor detail.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Provide much better wait information in pg_stat_activity.

2016-03-10 Thread Robert Haas

On Thu, Mar 10, 2016 at 3:44 PM, Pavel Stehule  wrote:
> I am trying to test this feature, and there I see not actual data. Maybe
> this behave is not related to this patch:
>
> create table foo(a int);
> insert into foo values(10);
>
> session one:
>
> begin; select * from foo for update;
>
> session two:
>
> begin; select * from foo for update;
> session two is waiting
>
> session one:
> select * from pg_stat_activity -- I don't see correct information about
> session two

At this point, I get:

rhaas=# select query, state, wait_event, wait_event_type from pg_stat_activity;
  query
  | state  |  wait_event   | wait_event_type
-++---+-
 select query, state, wait_event, wait_event_type from
pg_stat_activity; | active |   |
 select * from foo for update;
  | active | transactionid | Lock
(2 rows)

...which looks right to me.

> session two:
> rollback; begin; select * from foo where a = 10 for update;
> session two is waiting again

I don't see how you can do this here - the session is blocked.

There could well be a bug here, but I need a little more help to find it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

2016-03-10 Thread Robert Haas

On Thu, Mar 10, 2016 at 1:41 PM, Masahiko Sawada  wrote:
> On Fri, Mar 11, 2016 at 1:03 AM, Robert Haas  wrote:
>> This 001 patch looks so little like what I was expecting that I
>> decided to start over from scratch.  The new version I wrote is
>> attached here.  I don't understand why your version tinkers with the
>> logic for setting the all-frozen bit; I thought that what I already
>> committed dealt with that already, and in any case, your version
>> doesn't even compile against latest sources.  Your version also leaves
>> the scan_all terminology intact even though it's not accurate any
>> more, and I am not very convinced that the updates to the
>> page-skipping logic are actually correct.  Please have a look over
>> this version and see what you think.
>
> Thank you for your advise.
> Sorry, optimising logic of previous patch was old by mistake.
> Attached latest patch incorporated your suggestions with a little revising.

Thanks.  I adopted some of your suggested, rejected others, fixed a
few minor things that I missed previously, and committed this.  If you
think any of the changes that I rejected still have merit, please
resubmit those changes as separate patches.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] auto_explain sample rate

2016-03-10 Thread Petr Jelinek


On 10/03/16 20:59, Julien Rouhaud wrote:

On 10/03/2016 04:37, Petr Jelinek wrote:

On 17/02/16 01:17, Julien Rouhaud wrote:


Agreed, it's too obscure. Attached v4 fixes as you said.



Seems to be simple enough patch and works. However I would like
documentation to say that the range is 0 to 1 and represents fraction of
the queries sampled, because right now both the GUC description and the
documentation say it's in percent but that's not really true as percent
is 0 to 100.



Agreed. v5 attached fixes that.



Great, I will test it once more (just because when I don't bugs suddenly 
appear out of nowhere) and mark it ready for committer.


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Detecting SSI conflicts before reporting constraint violations

2016-03-10 Thread Simon Riggs

On 10 March 2016 at 20:36, Thomas Munro 
wrote:

> On Fri, Mar 11, 2016 at 8:50 AM, Simon Riggs 
> wrote:
> > On 3 February 2016 at 23:12, Thomas Munro  >
> > wrote:
> >
> >>
> >>  It quacks suspiciously like a bug.
> >
> >
> > Agreed
> >
> > What's more important is that is very publicly a bug in the eyes of
> others
> > and should be fixed and backpatched soon.
> >
> > We have a maintenance release coming in a couple of weeks and I'd like to
> > see this in there.
>
> As I understand it, the approach I've taken here can't be backpatched
> because it changes the aminsert_function interface (it needs the
> current snapshot when inserting), so I was proposing this as an
> improvement for 9.6.  I guess there are other way to get the right
> snapshot into btinsert (and thence _bt_check_unique), but I didn't
> think it would be very classy to introduce a 'current snapshot' global
> variable to smuggle it in.
>

But this is a Serializable transaction, so it only has one snapshot...

This is where adding comments on patch theory would help.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Add generate_series(date,date) and generate_series(date,date,integer)

2016-03-10 Thread Gavin Flower


On 11/03/16 08:48, Igal @ Lucee.org wrote:

On 3/10/2016 11:44 AM, Robert Haas wrote:
On Thu, Mar 10, 2016 at 2:35 PM, Simon Riggs  
wrote:

But I still don't know "meh" means.

Maybe this helps?

https://en.wikipedia.org/wiki/Meh


LOL
https://en.wikipedia.org/wiki/LOL



I'm pretending to work, so will you and Robert stop making that pretence 
more difficult!!!  :-)



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Detecting SSI conflicts before reporting constraint violations

2016-03-10 Thread Thomas Munro

On Fri, Mar 11, 2016 at 8:50 AM, Simon Riggs  wrote:
> On 3 February 2016 at 23:12, Thomas Munro 
> wrote:
>
>>
>>  It quacks suspiciously like a bug.
>
>
> Agreed
>
> What's more important is that is very publicly a bug in the eyes of others
> and should be fixed and backpatched soon.
>
> We have a maintenance release coming in a couple of weeks and I'd like to
> see this in there.

As I understand it, the approach I've taken here can't be backpatched
because it changes the aminsert_function interface (it needs the
current snapshot when inserting), so I was proposing this as an
improvement for 9.6.  I guess there are other way to get the right
snapshot into btinsert (and thence _bt_check_unique), but I didn't
think it would be very classy to introduce a 'current snapshot' global
variable to smuggle it in.

> Let's set good standards for responsiveness and correctness.
>
>
> I'd also like to see some theory in comments and an explanation of why we're
> doing this (code).

Will do.

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add generate_series(date,date) and generate_series(date,date,integer)

2016-03-10 Thread David G. Johnston

On Thu, Mar 10, 2016 at 11:45 AM, Robert Haas  wrote:

> On Thu, Mar 10, 2016 at 11:33 AM, David G. Johnston
>  wrote:
> > I tend to think we err toward this too much.  This seems like development
> > concerns trumping usability.  Consider that anything someone took the
> time
> > to write and polish to make committable to core was obviously genuinely
> > useful to them - and for every person capable of actually taking things
> that
> > far there are likely many more like myself who cannot but have
> encountered
> > the, albeit minor, usability annoyance that this kind of function seeks
> to
> > remove.
>
> Sure, an individual function like this has almost no negative impact.
> On the other hand, working around its absence is also trivial.  You
> can create a wrapper function that does exactly this in a couple of
> lines of SQL.  In my opinion, saying that people should do that in
> they need it has some advantages over shipping it to everyone.  If you
> don't have it and you want it, you can easily get it.  But what if you
> have it and you don't want it, for example because what you really
> want is a minimal postgres installation?

I'd rather cater to the people who are content that everything in
PostgreSQL is of high quality and ignore those things that they have no
immediate need for - and then are happy to find out that when they have a
new need PostgreSQL already provides them well thought-out and tested tool
that they can use.

We purport to be highly extensible but that doesn't preclude us from being
well-stocked at the same time.

And by not including these kinds of things in core the broader ecosystem is
harmed since not every provider or PostgreSQL services is willing or
capable of allowing random third-party extensions; nor is adding 20
"important and generally useful to me but an annoyance to the project"
functions to every database I create a trivial exercise.  But it is trivial
to ignore something that exists but that I have no need for.

You can't take anything in
> core back out again, or at least not easily.  Everything about core is
> expanding very randomly - code size, shared memory footprint, all of
> it.  If you think that has no downside for users, I respectfully
> disagree.

Attempts to limit that expansion seemingly happen "very randomly" as well.

If there is a goal and demand for a "minimalist" installation then we don't
seem to have anything going on that is actively trying to make it a
reality.  I'm likely being a bit myopic here but at the same time the
increased footprint from JSON, parallel queries, and the like dwarf the
contribution a handful of C-language functions would add.

I do understand why more trivial features are scrutinized the way they are
but I still hold the opinion that features such as this should generally be
given the benefit of inclusion unless there are concrete technical concerns.

David J.

Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-03-10 Thread Tom Lane

Gilles Darold  writes:
> Then, should I have to use an alternate file to store the information or
> implement a bidirectional communication with the syslogger?

I'd just define a new single-purpose file $PGDATA/log_file_name
or some such.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Upper planner pathification

2016-03-10 Thread Tom Lane

Andres Freund  writes:
> On 2016-03-10 14:16:03 -0500, Tom Lane wrote:
>> I don't deny that you *could* continue to do things that way, but
>> I dispute that it's a good idea.  Why can't you generate a Path tree
>> and then ask create_plan() to convert it?

> Primarily because create_plan(), and/or its children, have to know about
> what you're doing; you can hide some, but not all, things below
> CustomScan nodes.

And which of those things does e.g. setrefs.c not need to know about?

> ISTM, that there's good enough reasons to go either way; I don't see
> what we're gaining by making these private. That just encourages
> copy-paste coding.

What it encourages is having module boundaries that actually mean
something, as well as code that can be refactored without having
to worry about which extensions will complain about it.

I will yield on this point because it's not worth my time to argue about
it, but I continue to say that it's a bad decision you will regret.

Which functions did you need exactly?  I'm not exporting more than
I have to.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Upper planner pathification

2016-03-10 Thread Magnus Hagander

On Thu, Mar 10, 2016 at 8:45 PM, Robert Haas  wrote:

> On Thu, Mar 10, 2016 at 2:41 PM, Andres Freund  wrote:
> > ISTM, that there's good enough reasons to go either way; I don't see
> > what we're gaining by making these private. That just encourages
> > copy-paste coding.
>
> +1.  Frustrating Citus's attempt to open-source their stuff is not in
> the project's interest.
>
>
Agreed. And it's not like we're very restrictive with this in a lot of
other parts of the code. While we shouldn't go out of our way for
forks/extensions, this seems like a very reasonable tradeoff.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] auto_explain sample rate

2016-03-10 Thread Julien Rouhaud

On 10/03/2016 04:37, Petr Jelinek wrote:
> On 17/02/16 01:17, Julien Rouhaud wrote:
>>
>> Agreed, it's too obscure. Attached v4 fixes as you said.
>>
> 
> Seems to be simple enough patch and works. However I would like
> documentation to say that the range is 0 to 1 and represents fraction of
> the queries sampled, because right now both the GUC description and the
> documentation say it's in percent but that's not really true as percent
> is 0 to 100.
> 

Agreed. v5 attached fixes that.

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index 0950e18..2755fd1 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -29,6 +29,7 @@ static bool auto_explain_log_triggers = false;
 static bool auto_explain_log_timing = true;
 static int	auto_explain_log_format = EXPLAIN_FORMAT_TEXT;
 static bool auto_explain_log_nested_statements = false;
+static double auto_explain_sample_ratio = 1;
 
 static const struct config_enum_entry format_options[] = {
 	{"text", EXPLAIN_FORMAT_TEXT, false},
@@ -47,10 +48,14 @@ static ExecutorRun_hook_type prev_ExecutorRun = NULL;
 static ExecutorFinish_hook_type prev_ExecutorFinish = NULL;
 static ExecutorEnd_hook_type prev_ExecutorEnd = NULL;
 
+/* Is the current query sampled, per backend */
+static bool current_query_sampled = true;
+
 #define auto_explain_enabled() \
 	(auto_explain_log_min_duration >= 0 && \
 	 (nesting_level == 0 || auto_explain_log_nested_statements))
 
+
 void		_PG_init(void);
 void		_PG_fini(void);
 
@@ -62,6 +67,7 @@ static void explain_ExecutorFinish(QueryDesc *queryDesc);
 static void explain_ExecutorEnd(QueryDesc *queryDesc);
 
 
+
 /*
  * Module load callback
  */
@@ -159,6 +165,19 @@ _PG_init(void)
 			 NULL,
 			 NULL);
 
+	DefineCustomRealVariable("auto_explain.sample_ratio",
+			 "Fraction of queries to process.",
+			NULL,
+			_explain_sample_ratio,
+			1.0,
+			0.0,
+			1.0,
+			PGC_SUSET,
+			0,
+			NULL,
+			NULL,
+			NULL);
+
 	EmitWarningsOnPlaceholders("auto_explain");
 
 	/* Install hooks. */
@@ -191,7 +210,15 @@ _PG_fini(void)
 static void
 explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
 {
-	if (auto_explain_enabled())
+	/*
+	 * For ratio sampling, randomly choose top-level statement. Either
+	 * all nested statements will be explained or none will.
+	 */
+	if (auto_explain_log_min_duration >= 0 && nesting_level == 0)
+		current_query_sampled = (random() < auto_explain_sample_ratio *
+MAX_RANDOM_VALUE);
+
+	if (auto_explain_enabled() && current_query_sampled)
 	{
 		/* Enable per-node instrumentation iff log_analyze is required. */
 		if (auto_explain_log_analyze && (eflags & EXEC_FLAG_EXPLAIN_ONLY) == 0)
@@ -210,7 +237,7 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
 	else
 		standard_ExecutorStart(queryDesc, eflags);
 
-	if (auto_explain_enabled())
+	if (auto_explain_enabled() && current_query_sampled)
 	{
 		/*
 		 * Set up to track total elapsed time in ExecutorRun.  Make sure the
@@ -280,7 +307,7 @@ explain_ExecutorFinish(QueryDesc *queryDesc)
 static void
 explain_ExecutorEnd(QueryDesc *queryDesc)
 {
-	if (queryDesc->totaltime && auto_explain_enabled())
+	if (queryDesc->totaltime && auto_explain_enabled() && current_query_sampled)
 	{
 		double		msec;
 
diff --git a/doc/src/sgml/auto-explain.sgml b/doc/src/sgml/auto-explain.sgml
index d527208..978ef35 100644
--- a/doc/src/sgml/auto-explain.sgml
+++ b/doc/src/sgml/auto-explain.sgml
@@ -203,6 +203,24 @@ LOAD 'auto_explain';
  
 

+
+   
+
+ auto_explain.sample_ratio (real)
+ 
+  auto_explain.sample_ratio configuration parameter
+ 
+
+
+ 
+  auto_explain.sample_ratio (floating point)
+  causes auto_explain to only explain a fraction of the statements in each
+  session.  The default is 1, meaning explaining all the queries.  In case
+  of nested statements, either all will be explained or none. Only
+  superusers can change this setting.
+ 
+
+   
   
 
   

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Upper planner pathification

2016-03-10 Thread Peter Geoghegan

On Thu, Mar 10, 2016 at 11:45 AM, Robert Haas  wrote:
> On Thu, Mar 10, 2016 at 2:41 PM, Andres Freund  wrote:
>> ISTM, that there's good enough reasons to go either way; I don't see
>> what we're gaining by making these private. That just encourages
>> copy-paste coding.
>
> +1.  Frustrating Citus's attempt to open-source their stuff is not in
> the project's interest.

I agree.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Detecting SSI conflicts before reporting constraint violations

2016-03-10 Thread Simon Riggs

On 3 February 2016 at 23:12, Thomas Munro 
wrote:

>  It quacks suspiciously like a bug.
>

Agreed

What's more important is that is very publicly a bug in the eyes of others
and should be fixed and backpatched soon.

We have a maintenance release coming in a couple of weeks and I'd like to
see this in there.

Let's set good standards for responsiveness and correctness.

I'd also like to see some theory in comments and an explanation of why
we're doing this (code).

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Add generate_series(date,date) and generate_series(date,date,integer)

2016-03-10 Thread Igal @ Lucee.org


On 3/10/2016 11:44 AM, Robert Haas wrote:

On Thu, Mar 10, 2016 at 2:35 PM, Simon Riggs  wrote:

But I still don't know "meh" means.

Maybe this helps?

https://en.wikipedia.org/wiki/Meh


LOL
https://en.wikipedia.org/wiki/LOL



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: function parse_ident

2016-03-10 Thread Pavel Stehule

2016-03-10 15:34 GMT+01:00 Teodor Sigaev :

> select
>>
>> parse_ident(E'X\rXX');
>>
>>
>> I am sending updated patch - I used json function for correct escaping -
>> the
>> escaping behave is same.
>>
>
> Hmm, it doesn't look so:
> % select parse_ident(E'_\005');
> ERROR:  identifier contains disallowed characters: "_\u0005"
> % select parse_ident(E'\005');
> ERROR:  missing identifier: "\u0005"
>
> but
>
> # select parse_ident(E'"\005"');
>  parse_ident
> -
>  {\x05}
>
> Error messages above point wrong character wrongly.
>
> One more inconsistence:
> # select parse_ident(E'"\005"') as "\005";
>   \005
> 
>  {\x05}
>
> Display outputs of actual identifier and parse_indent are differ.
>
> Actually, I can live with both but are any other opinions? Seems, at least
> difference of actual identifier and output of parse_indent should be
> pointed in docs.


I afraid so I cannot to fix this inconsistency (if this is inconsistency -
the binary values are same) - the parameter of function is raw string with
processed escape codes, and I have not any information about original
escape sequences. When you enter octet value, and I show it as hex value,
then there should be difference. Buy I have not information about your
input (octet or hex). I have the original string of SQL identifier inside
parser, executor, but I have not original string of function parameter
inside function (not without pretty complex and long code).

I am trying describe it in doc (I am sorry for my less level English) in
new patch. Fixed duplicated oid too.

Regards

Pavel


>
> --
> Teodor Sigaev   E-mail: teo...@sigaev.ru
>WWW:
> http://www.sigaev.ru/
>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
new file mode 100644
index 4b5ee81..05b3cf9
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***
*** 1821,1826 
--- 1821,1849 

 
  
+  parse_ident
+ 
+ parse_ident(str text,
+[ strictmode boolean DEFAULT true ] )
+
+text[]
+Split qualified identifier into array parts.
+When strictmode is false, extra characters after the identifier are ignored.
+This is useful for parsing identifiers for objects like functions and arrays that may have trailing
+characters. By default, extra characters after the last identifier are considered an error.
+second parameter is false, then chars after last identifier are ignored. Note that this function
+does not truncate quoted identifiers. If you care about that you should cast the result of this
+function to name[]. A non printable chars (like 0 to 31) are displayed as hexadecimal codes always,
+what can be different from PostgreSQL internal SQL identifiers processing, when the oroginal
+escaped value is displayed.
+
+parse_ident('"SomeSchema".someTable')
+"SomeSchema,sometable"
+   
+ 
+   
+
+ 
   pg_client_encoding
  
  pg_client_encoding()
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
new file mode 100644
index abf9a70..38af138
*** a/src/backend/catalog/system_views.sql
--- b/src/backend/catalog/system_views.sql
*** RETURNS jsonb
*** 971,973 
--- 971,980 
  LANGUAGE INTERNAL
  STRICT IMMUTABLE
  AS 'jsonb_set';
+ 
+ CREATE OR REPLACE FUNCTION
+   parse_ident(str text, strict boolean DEFAULT true)
+ RETURNS text[]
+ LANGUAGE INTERNAL
+ STRICT IMMUTABLE
+ AS 'parse_ident';
diff --git a/src/backend/parser/scansup.c b/src/backend/parser/scansup.c
new file mode 100644
index 2b4ab20..7aa5b76
*** a/src/backend/parser/scansup.c
--- b/src/backend/parser/scansup.c
*** scanstr(const char *s)
*** 130,135 
--- 130,144 
  char *
  downcase_truncate_identifier(const char *ident, int len, bool warn)
  {
+ 	return downcase_identifier(ident, len, warn, true);
+ }
+ 
+ /*
+  * a workhorse for downcase_truncate_identifier
+  */
+ char *
+ downcase_identifier(const char *ident, int len, bool warn, bool truncate)
+ {
  	char	   *result;
  	int			i;
  	bool		enc_is_single_byte;
*** downcase_truncate_identifier(const char
*** 158,169 
  	}
  	result[i] = '\0';
  
! 	if (i >= NAMEDATALEN)
  		truncate_identifier(result, i, warn);
  
  	return result;
  }
  
  /*
   * truncate_identifier() --- truncate an identifier to NAMEDATALEN-1 bytes.
   *
--- 167,179 
  	}
  	result[i] = '\0';
  
! 	if (i >= NAMEDATALEN && truncate)
  		truncate_identifier(result, i, warn);
  
  	return result;
  }
  
+ 
  /*
   * truncate_identifier() --- truncate an identifier to NAMEDATALEN-1 bytes.
   *
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
new file mode 100644
index 43f36db..d11581a
***

Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-03-10 Thread Gilles Darold

Le 10/03/2016 16:26, Tom Lane a écrit :
> Robert Haas  writes:
>> On Wed, Mar 9, 2016 at 12:32 PM, Gilles Darold  
>> wrote:
>>> I choose to allow the log collector to write his current log file name
>>> into the lock file 'postmaster.pid'.
>> Gosh, why?  Piggybacking this on a file written for a specific purpose
>> by a different process seems like making life very hard for yourself,
>> and almost certainly a recipe for bugs.
> That's a *complete* nonstarter.  postmaster.pid has to be written by the
> postmaster process and nobody else.
>
> It's a particularly bad choice for the syslogger, which will exist
> fractionally longer than the postmaster, and thus might be trying to write
> into the file after the postmaster has removed it.

I was thinking that the lock file was removed after all and that
postmaster was waiting for all childs die before removing it, but ok I
know why this was not my first choice, this was a very bad idea :-)
Then, should I have to use an alternate file to store the information or
implement a bidirectional communication with the syslogger? The last
solution looks like really too much for this very simple feature. With
an alternate file what is the best place where it has to be written? All
places have their special use, the global/ directory?

-- 
Gilles Darold
Consultant PostgreSQL
http://dalibo.com - http://dalibo.org

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Upper planner pathification

2016-03-10 Thread Robert Haas

On Thu, Mar 10, 2016 at 2:41 PM, Andres Freund  wrote:
> ISTM, that there's good enough reasons to go either way; I don't see
> what we're gaining by making these private. That just encourages
> copy-paste coding.

+1.  Frustrating Citus's attempt to open-source their stuff is not in
the project's interest.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add generate_series(date,date) and generate_series(date,date,integer)

2016-03-10 Thread Robert Haas

On Thu, Mar 10, 2016 at 2:35 PM, Simon Riggs  wrote:
> But I still don't know "meh" means.

Maybe this helps?

https://en.wikipedia.org/wiki/Meh

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Upper planner pathification

2016-03-10 Thread Andres Freund

On 2016-03-10 14:16:03 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > In Citus' case a full PlannedStmt is generated on the master node, to
> > combine the data generated on worker nodes (where the bog standard
> > postgres planner is used).  It's not the only way to do things, but I
> > don't see why the approach would be entirely invalidated by the
> > pathification work.
> 
> I don't deny that you *could* continue to do things that way, but
> I dispute that it's a good idea.  Why can't you generate a Path tree
> and then ask create_plan() to convert it?

Primarily because create_plan(), and/or its children, have to know about
what you're doing; you can hide some, but not all, things below
CustomScan nodes. Secondarily, as an extension you will often have to
support several major versions.

ISTM, that there's good enough reasons to go either way; I don't see
what we're gaining by making these private. That just encourages
copy-paste coding.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_rewind just doesn't fsync anything?

2016-03-10 Thread Andres Freund

On 2016-03-10 20:31:55 +0100, Michael Paquier wrote:
> On Thu, Mar 10, 2016 at 7:52 PM, Andres Freund  wrote:
> > Having to backpatch a single system() invocation + find_other_exec()
> > call, and backporting a more general FRONTEND version of initdb's
> > fsync_pgdata() are wildly differing in complexity.
> 
> Talking about HEAD, wouldn't the dependency tree be cleaner if there
> is a common facility in src/common?

Yea, probably; but lets sort out the backbranches first.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add generate_series(date,date) and generate_series(date,date,integer)

2016-03-10 Thread Simon Riggs

On 10 March 2016 at 18:45, Robert Haas  wrote:

> On Thu, Mar 10, 2016 at 11:33 AM, David G. Johnston
>  wrote:
> > I tend to think we err toward this too much.  This seems like development
> > concerns trumping usability.  Consider that anything someone took the
> time
> > to write and polish to make committable to core was obviously genuinely
> > useful to them - and for every person capable of actually taking things
> that
> > far there are likely many more like myself who cannot but have
> encountered
> > the, albeit minor, usability annoyance that this kind of function seeks
> to
> > remove.
>
> Sure, an individual function like this has almost no negative impact.
> On the other hand, working around its absence is also trivial.  You
> can create a wrapper function that does exactly this in a couple of
> lines of SQL.  In my opinion, saying that people should do that in
> they need it has some advantages over shipping it to everyone.  If you
> don't have it and you want it, you can easily get it.  But what if you
> have it and you don't want it, for example because what you really
> want is a minimal postgres installation?  You can't take anything in
> core back out again, or at least not easily.  Everything about core is
> expanding very randomly - code size, shared memory footprint, all of
> it.  If you think that has no downside for users, I respectfully
> disagree.


Not sure what y'all are discussing, but I should add that I would have
committed this based solely upon Vik's +1.

My objection was made, then overruled; that works because the objection
wasn't "it won't work", only a preference so I'm happy.

But I still don't know "meh" means.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Using quicksort for every external sort run

2016-03-10 Thread Peter Geoghegan

On Thu, Mar 10, 2016 at 10:39 AM, Greg Stark  wrote:
> I want to rerun these on a dedicated machine and with trace_sort
> enabled so that we can see how many merge passes were actually
> happening and how much I/O was actually happening.

Putting the results in context, by keeping trace_sort output with the
results is definitely a good idea here. Otherwise, it's almost
impossible to determine what happened after the fact. I have had
"trace_sort = on" in my dev postgresql.conf for some time now. :-)

When I produce my next revision, we should focus on regressions at the
low end, like the 4MB work_mem for multiple GB table size cases you
show here. So, I ask that any benchmarks that you or Tomas do look at
that first and foremost. It's clear that in high memory environments
the patch significantly improves performance, often by as much as
2.5x, and so that isn't really a concern anymore. I think we may be
able to comprehensively address Robert's concerns about regressions
with very little work_mem and lots of data by fixing a problem with
polyphase merge. More to come soon.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_rewind just doesn't fsync anything?

2016-03-10 Thread Michael Paquier

On Thu, Mar 10, 2016 at 7:52 PM, Andres Freund  wrote:
>> a target data folder should be stopped properly to be able to rewind,
>> and it is better to avoid dependencies between utilities if that's not
>> strictly necessary.  initdb is likely to be installed side-by-side
>> with pg_rewind in any distribution though.
>
> It's not like we don't have any other such dependencies, in other
> binaries. I'm not concerned.
>
> Having to backpatch a single system() invocation + find_other_exec()
> call, and backporting a more general FRONTEND version of initdb's
> fsync_pgdata() are wildly differing in complexity.

Talking about HEAD, wouldn't the dependency tree be cleaner if there
is a common facility in src/common? For back-branches, I won't argue
against simplicity, those are more reliable solutions.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

2016-03-10 Thread Robert Haas

On Thu, Mar 10, 2016 at 1:41 PM, Masahiko Sawada  wrote:
> On Fri, Mar 11, 2016 at 1:03 AM, Robert Haas  wrote:
>> This 001 patch looks so little like what I was expecting that I
>> decided to start over from scratch.  The new version I wrote is
>> attached here.  I don't understand why your version tinkers with the
>> logic for setting the all-frozen bit; I thought that what I already
>> committed dealt with that already, and in any case, your version
>> doesn't even compile against latest sources.  Your version also leaves
>> the scan_all terminology intact even though it's not accurate any
>> more, and I am not very convinced that the updates to the
>> page-skipping logic are actually correct.  Please have a look over
>> this version and see what you think.
>
> Thank you for your advise.
> Sorry, optimising logic of previous patch was old by mistake.
> Attached latest patch incorporated your suggestions with a little revising.

OK, I'll have a look.  Thanks.

>> I think that's kind of pointless.  We need to test that this
>> conversion code works, but once it does, I don't think we should make
>> everybody pay the overhead of retesting that.  Anyway, the test code
>> could have bugs, too.
>>
>> Here's an updated version of your patch with that code removed and
>> some cosmetic cleanups like fixing typos and stuff like that.  I think
>> this is mostly ready to commit, but I noticed one problem: your
>> conversion code always produces two output pages for each input page
>> even if one of them would be empty.  In particular, if you have a
>> large number of small relations and run pg_upgrade, all of their
>> visibility maps will go from 8kB to 16kB.  That isn't the end of the
>> world, maybe, but I think you should see if you can't fix it
>> somehow
>
> Thank you for updating patch.
> To deal with this problem, I've changed it so that pg_upgrade checks
> file size before conversion.
> And if fork file does not exist or size is 0 (empty), ignore.
> Attached latest patch.

I think what I really want is some logic so that if we have a 1-page
visibility map in the old cluster and the second half of that page is
all zeroes, we only create a 1-page visibility map in the new cluster
rather than a 2-page visibility map.

Or more generally, if the old VM is N pages, but the last half of the
last page is empty, then let the output VM be 2*N-1 pages instead of
2*N pages.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Upper planner pathification

2016-03-10 Thread Tom Lane

Andres Freund  writes:
> On 2016-03-10 13:48:31 -0500, Tom Lane wrote:
>> That was intentional: in my opinion, nothing outside createplan.c ought
>> to be making Plan nodes anymore.  The expectation is that you make a
>> Path describing what you want.  Can you explain why, in the new planner
>> structure, it would be sane to have external callers of these functions?

> In Citus' case a full PlannedStmt is generated on the master node, to
> combine the data generated on worker nodes (where the bog standard
> postgres planner is used).  It's not the only way to do things, but I
> don't see why the approach would be entirely invalidated by the
> pathification work.

I don't deny that you *could* continue to do things that way, but
I dispute that it's a good idea.  Why can't you generate a Path tree
and then ask create_plan() to convert it?  Otherwise you're buying
into knowing a whole lot about the internals of createplan.c, and having
to track changes therein.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: RETURNING primary_key()

2016-03-10 Thread Igal @ Lucee.org


On 3/8/2016 4:42 PM, Craig Ringer wrote:
On 9 March 2016 at 05:40, Igal @ Lucee.org > wrote:


I will try to gather more information about the other DBMSs and
drivers and will post my findings here when I have them.


Thanks. I know that's not the most fun thing to do in the world, but 
it's often needed when implementing something where part of the goal 
is being compatible with other vendors, etc.


It seems that the implementations vary by the driver, and not the 
server, as evidenced by the Microsoft SQL Server drivers -- I tested 
both the official MS driver and the open sourced jTDS driver.


I noticed that you usually don't put html in the emails here, but I 
think that it's appropriate here to show the information in a clear way 
(also, according to my computer it's 2016).  I hope that it will be 
rendered properly:



*MySQL* *DB2*   *SQL Server (MS)*   *SQL Server (jTDS)* 
*Oracle*
*Returned Type* SET SET ROW ROW ROW
*Column Name* 	GENERATED_KEY 	[name of identity col] 	GENERATED_KEYS 
ID 	ROWID

*Column Type*   Unknown (numeric)   integer numeric numeric 
ROWID
*Value* 	Each inserted value to identity column 	Each inserted value to 
identity column 	Last inserted value to identity column 	Last inserted 
value to identity column 	internal address location that does not change 
on UPDATE

*Example*   (1), (2)(1), (2)(2) (2) 
AAAE5nAABAAALCxAAM


Some notes and observations:

It's the Wild West!  Each implementation does something completely 
different.  Even when something looks similar, e.g. the returned column 
name from MySQL and SQL Server (MS), it's not:  notice the plural in SQL 
Server's column name, which is ironic as they only return a single 
value, as opposed to MySQL which returns a SET.


This has been an "interesting experience" as it was my first exposure to 
some of those DBMSs. It only reinforced my decision to choose PostgreSQL 
moving forward, over the alternatives (after using SQL Server for about 
20 years).


More notes on the different DBMSs:

The first thing that I tested was against *MySQL*:

CREATE TABLE IF NOT EXISTS test_jdbc(name VARCHAR(64), id SERIAL);

An insert to that table via JDBC, with int flag RETURN_GENERATED_KEYS 
returns a result set with a column named "GENERATED_KEY " and type 
"UNKNOWN" (as per ResultSetMetaData's getColumnTypeName()), each row in 
the result set corresponded with an inserted record, so for example:


INSERT INTO test_jdbc(name) VALUES ('JDBC'), ('PostgreSQL');

returned two rows with the value of the "id" column for the inserted row 
in each, e.g.


GENERATED_KEY
-
7
8

Trying to add multiple SERIAL columns to a table results in an error:

CREATE TABLE IF NOT EXISTS jdbc(j_name VARCHAR(64), j_id SERIAL, 
id2 SERIAL)


Error Code: 1075. Incorrect table definition; there can be only one auto 
column and it must be defined as a key



*SQL Server*: via the Microsoft driver

Created table with the command:

CREATE TABLE dbo.jdbc (
j_name varchar(64) NOT NULL,
j_id int IDENTITY(1,1) NOT NULL
)

Generated Keys return a single row with a column named "GENERATED_KEYS" 
of type numeric, and the value is the last inserted id (i.e. sequence).  
This is different from MySQL which returns a row with the id for each 
inserted record.



*SQL Server*: via the jTDS driver

Generated Keys return a single row with a column named "ID" of type 
numeric, and the value is the last inserted id (i.e. sequence).  The 
behavior is similar to the Microsoft driver, but the column name is 
different.



*Oracle*:

Oracle returns the column ROWID which is of type ROWID as well:
https://docs.oracle.com/cd/B19306_01/server.102/b14200/pseudocolumns008.htm

This seems to be similar to PostgreSQL's ctid, but unlike ctid -- when I 
UPDATE the record the ROWID remains unchanged.


In my test I got the value "AAAE5nAABAAALCxAAM", and when I later ran:

SELECT * FROM jdbc WHERE ROWID='AAAE5nAABAAALCxAAM';

I got the information back from that row.  Updating that row does not 
change its ROWID.


When I tried to insert multiple values with RETURN_GENERATED_KEYS I got 
an error:  java.sql.SQLSyntaxErrorException: ORA-00900: invalid SQL 
statement


INSERT INTO jdbc(j_name) SELECT 'PG 9.5.0' FROM DUAL UNION SELECT 
'PG 9.5.1' FROM DUAL


The rows are, however, inserted into the table.  Running the same INSERT 
command without RETURN_GENERATED_KEYS works without error.


(Side note:  This was my first, and hopefully my last, experience with 
Oracle database, and it's been a real PITA.  If I had tried it out some 
20 years ago then the experience would have probably led me to sell the 
stock short, which would have probably ended with my bankruptcy.  Go 
figure...)



*IBM DB2*:

CREATE TABLE jdbc(j_name VARCHAR(64), j_id INT NOT NULL GENERATED 
ALWAYS AS IDENTITY)

Re: [HACKERS] WIP: Upper planner pathification

2016-03-10 Thread Andres Freund

On 2016-03-10 13:48:31 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > I see that you made a lot of formerly externally visible make_* routines
> > static. The Citus extension (which will be open source in a few days)
> > uses several of these (make_agg, make_sort_from_sortclauses, make_limit
> > at least).
> 
> > Can we please re-consider making these static?
> 
> That was intentional: in my opinion, nothing outside createplan.c ought
> to be making Plan nodes anymore.  The expectation is that you make a
> Path describing what you want.  Can you explain why, in the new planner
> structure, it would be sane to have external callers of these functions?

In Citus' case a full PlannedStmt is generated on the master node, to
combine the data generated on worker nodes (where the bog standard
postgres planner is used).  It's not the only way to do things, but I
don't see why the approach would be entirely invalidated by the
pathification work.

We do provide the planner_hook, so restricting the toolbox for that to
do something useful, doesn't seem like a necessarily good idea.

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-03-10 Thread Robert Haas

On Thu, Mar 10, 2016 at 12:18 AM, Amit Kapila  wrote:
> On Wed, Mar 9, 2016 at 7:17 PM, Robert Haas  wrote:
>>
>> On Wed, Mar 9, 2016 at 8:31 AM, Amit Kapila 
>> wrote:
>> > Thanks for the suggestion.  I have updated the patch to include
>> > wait_event_type information in the wait_event table.
>>
>> I think we should remove "a server process is" from all of these entries.
>>
>> Also, I think this kind of thing should be tightened up:
>>
>> + A server process is waiting on any one of the
>> commit_timestamp
>> + buffer locks to read or write the commit_timestamp page in the
>> + pg_commit_ts subdirectory.
>>
>> I'd just write: Waiting to read or write a commit timestamp buffer.
>>
>
> Okay, changed as per suggestion and fixed the morerows issue pointed by
> Thom.

Committed with some further editing.  In particular, the way you
determined whether we could safely access the tranche information for
any given ID was wrong; please check over what I did and make sure
that isn't also wrong.

Whew, this was a long process, but we got there.  Some initial pgbench
testing shows that by far the most common wait event observed on that
workload is WALWriteLock, which is pretty interesting: perf -e cs and
LWLOCK_STATS let you measure the most *frequent* wait events, but that
ignores duration.  Sampling pg_stat_activity tells you which things
you're spending the most *time* waiting for, which is awfully neat.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_rewind just doesn't fsync anything?

2016-03-10 Thread Andres Freund

Hi,

On 2016-03-10 08:47:16 +0100, Michael Paquier wrote:
> Still, I think that we had better fsync only entries that are modified
> by pg_rewind, and files that got updated, and not the whole directory

Why? If any files in there are dirty, they need to be fsynced. If
they're not dirty, fsync's free.

> a target data folder should be stopped properly to be able to rewind,
> and it is better to avoid dependencies between utilities if that's not
> strictly necessary.  initdb is likely to be installed side-by-side
> with pg_rewind in any distribution though.

It's not like we don't have any other such dependencies, in other
binaries. I'm not concerned.

Having to backpatch a single system() invocation + find_other_exec()
call, and backporting a more general FRONTEND version of initdb's
fsync_pgdata() are wildly differing in complexity.

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Upper planner pathification

2016-03-10 Thread Tom Lane

Andres Freund  writes:
> I see that you made a lot of formerly externally visible make_* routines
> static. The Citus extension (which will be open source in a few days)
> uses several of these (make_agg, make_sort_from_sortclauses, make_limit
> at least).

> Can we please re-consider making these static?

That was intentional: in my opinion, nothing outside createplan.c ought
to be making Plan nodes anymore.  The expectation is that you make a
Path describing what you want.  Can you explain why, in the new planner
structure, it would be sane to have external callers of these functions?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add generate_series(date,date) and generate_series(date,date,integer)

2016-03-10 Thread Robert Haas

On Thu, Mar 10, 2016 at 11:33 AM, David G. Johnston
 wrote:
> I tend to think we err toward this too much.  This seems like development
> concerns trumping usability.  Consider that anything someone took the time
> to write and polish to make committable to core was obviously genuinely
> useful to them - and for every person capable of actually taking things that
> far there are likely many more like myself who cannot but have encountered
> the, albeit minor, usability annoyance that this kind of function seeks to
> remove.

Sure, an individual function like this has almost no negative impact.
On the other hand, working around its absence is also trivial.  You
can create a wrapper function that does exactly this in a couple of
lines of SQL.  In my opinion, saying that people should do that in
they need it has some advantages over shipping it to everyone.  If you
don't have it and you want it, you can easily get it.  But what if you
have it and you don't want it, for example because what you really
want is a minimal postgres installation?  You can't take anything in
core back out again, or at least not easily.  Everything about core is
expanding very randomly - code size, shared memory footprint, all of
it.  If you think that has no downside for users, I respectfully
disagree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

2016-03-10 Thread Masahiko Sawada

On Fri, Mar 11, 2016 at 1:03 AM, Robert Haas  wrote:
> This 001 patch looks so little like what I was expecting that I
> decided to start over from scratch.  The new version I wrote is
> attached here.  I don't understand why your version tinkers with the
> logic for setting the all-frozen bit; I thought that what I already
> committed dealt with that already, and in any case, your version
> doesn't even compile against latest sources.  Your version also leaves
> the scan_all terminology intact even though it's not accurate any
> more, and I am not very convinced that the updates to the
> page-skipping logic are actually correct.  Please have a look over
> this version and see what you think.

Thank you for your advise.
Sorry, optimising logic of previous patch was old by mistake.
Attached latest patch incorporated your suggestions with a little revising.

>
> I think that's kind of pointless.  We need to test that this
> conversion code works, but once it does, I don't think we should make
> everybody pay the overhead of retesting that.  Anyway, the test code
> could have bugs, too.
>
> Here's an updated version of your patch with that code removed and
> some cosmetic cleanups like fixing typos and stuff like that.  I think
> this is mostly ready to commit, but I noticed one problem: your
> conversion code always produces two output pages for each input page
> even if one of them would be empty.  In particular, if you have a
> large number of small relations and run pg_upgrade, all of their
> visibility maps will go from 8kB to 16kB.  That isn't the end of the
> world, maybe, but I think you should see if you can't fix it
> somehow

Thank you for updating patch.
To deal with this problem, I've changed it so that pg_upgrade checks
file size before conversion.
And if fork file does not exist or size is 0 (empty), ignore.
Attached latest patch.

Regards,

--
Masahiko Sawada


001_optimize_vacuum_by_frozen_bit_v40.patch
Description: Binary data


000_pgupgrade_rewrite_vm_v42.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Using quicksort for every external sort run

2016-03-10 Thread Greg Stark

On Thu, Mar 10, 2016 at 1:40 PM, Tomas Vondra 
wrote:
> I was thinking about running some benchmarks on this patch, but the
> thread is pretty huge so I want to make sure I'm not missing something
> and this is indeed the most recent version.

I also ran some preliminary benchmarks just before FOSDEM and intend to get
back to in after running different benchmarks. These are preliminary
because it was only a single run and on a machine that wasn't dedicated for
benchmarks. These were comparing the quicksort-all-runs patch against HEAD
at the time without the memory management optimizations which I think are
independent of the sort algorithm.

It looks to me like the interesting space to test is on fairly small
work_mem compared to the data size. There's a general slowdown on 4MB-8MB
work_mem when the data set is more than a gigabyte but but even in the
worst case it's only a 30% slowdown and the speedup in the more realistic
scenarios looks at least as big.

I want to rerun these on a dedicated machine and with trace_sort enabled so
that we can see how many merge passes were actually happening and how much
I/O was actually happening.

-- 
greg

Re: [HACKERS] Using quicksort for every external sort run

2016-03-10 Thread Peter Geoghegan

On Thu, Mar 10, 2016 at 5:40 AM, Tomas Vondra
 wrote:
> I was thinking about running some benchmarks on this patch, but the
> thread is pretty huge so I want to make sure I'm not missing something
> and this is indeed the most recent version.

Wait 24 - 48 hours, please. Big update coming.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Relation extension scalability

2016-03-10 Thread Petr Jelinek


On 10/03/16 09:57, Dilip Kumar wrote:


On Thu, Mar 10, 2016 at 7:55 AM, Petr Jelinek > wrote:

Thanks for the comments..

Hmm, why did you remove the comment above the call to
UnlockRelationForExtension?

While re factoring I lose this comment.. Fixed it

It still seems relevant, maybe with some minor modification?

Also there is a bit of whitespace mess inside the conditional lock
block.

Fixed

I got the result of 10 mins run so posting it..
Note: Base code results are copied from up thread...

Results For 10 Mins run of COPY 1 records of size 4 bytes script and
configuration are same as used in up thread


ClientBasePatch
1105111
2217246
4210428
8166653
16  145808
32  124988
64---974


Results For 10 Mins run of INSERT 1000 records of size 1024 bytes(data
don't fits in shared buffer)
--

ClientBasePatch
1117120
2111126
4 51 130
8 43 147
16   40 209
32   ---  254
64   ---  205



Those look good. The patch looks good in general now. I am bit scared by 
the lockWaiters * 20 as it can result in relatively big changes in rare 
corner cases when for example a lot of backends were waiting for lock on 
relation and suddenly all try to extend it. I wonder if we should clamp 
it to something sane (although what's sane today might be small in 
couple of years).


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Upper planner pathification

2016-03-10 Thread Andres Freund

Hi Tom,

On 2016-02-28 15:03:28 -0500, Tom Lane wrote:
> diff --git a/src/include/optimizer/planmain.h 
> b/src/include/optimizer/planmain.h
> index eaa642b..cd7338a 100644
> *** a/src/include/optimizer/planmain.h
> --- b/src/include/optimizer/planmain.h
> *** extern RelOptInfo *query_planner(Planner
> *** 43,102 
>* prototypes for plan/planagg.c
>*/
>   extern void preprocess_minmax_aggregates(PlannerInfo *root, List *tlist);
> - extern Plan *optimize_minmax_aggregates(PlannerInfo *root, List *tlist,
> -const AggClauseCosts 
> *aggcosts, Path *best_path);
>   
>   /*
>* prototypes for plan/createplan.c
>*/
>   extern Plan *create_plan(PlannerInfo *root, Path *best_path);
> - extern SubqueryScan *make_subqueryscan(List *qptlist, List *qpqual,
> -   Index scanrelid, Plan *subplan);
>   extern ForeignScan *make_foreignscan(List *qptlist, List *qpqual,
>Index scanrelid, List *fdw_exprs, List 
> *fdw_private,
>List *fdw_scan_tlist, List *fdw_recheck_quals,
>Plan *outer_plan);
> - extern Append *make_append(List *appendplans, List *tlist);
> - extern RecursiveUnion *make_recursive_union(List *tlist,
> -  Plan *lefttree, Plan *righttree, int 
> wtParam,
> -  List *distinctList, long numGroups);
> - extern Sort *make_sort_from_pathkeys(PlannerInfo *root, Plan *lefttree,
> - List *pathkeys, double 
> limit_tuples);
> - extern Sort *make_sort_from_sortclauses(PlannerInfo *root, List *sortcls,
> -Plan *lefttree);
> - extern Sort *make_sort_from_groupcols(PlannerInfo *root, List *groupcls,
> -  AttrNumber *grpColIdx, Plan 
> *lefttree);
> - extern Agg *make_agg(PlannerInfo *root, List *tlist, List *qual,
> -  AggStrategy aggstrategy, const AggClauseCosts *aggcosts,
> -  int numGroupCols, AttrNumber *grpColIdx, Oid *grpOperators,
> -  List *groupingSets, long numGroups, bool combineStates,
> -  bool finalizeAggs, Plan *lefttree);
> - extern WindowAgg *make_windowagg(PlannerInfo *root, List *tlist,
> -List *windowFuncs, Index winref,
> -int partNumCols, AttrNumber *partColIdx, Oid 
> *partOperators,
> -int ordNumCols, AttrNumber *ordColIdx, Oid 
> *ordOperators,
> -int frameOptions, Node *startOffset, Node *endOffset,
> -Plan *lefttree);
> - extern Group *make_group(PlannerInfo *root, List *tlist, List *qual,
> -int numGroupCols, AttrNumber *grpColIdx, Oid *grpOperators,
> -double numGroups,
> -Plan *lefttree);
>   extern Plan *materialize_finished_plan(Plan *subplan);
> ! extern Unique *make_unique(Plan *lefttree, List *distinctList);
> ! extern LockRows *make_lockrows(Plan *lefttree, List *rowMarks, int 
> epqParam);
> ! extern Limit *make_limit(Plan *lefttree, Node *limitOffset, Node 
> *limitCount,
> !int64 offset_est, int64 count_est);
> ! extern SetOp *make_setop(SetOpCmd cmd, SetOpStrategy strategy, Plan 
> *lefttree,
> !List *distinctList, AttrNumber flagColIdx, int firstFlag,
> !long numGroups, double outputRows);
> ! extern Result *make_result(PlannerInfo *root, List *tlist,
> ! Node *resconstantqual, Plan *subplan);
> ! extern ModifyTable *make_modifytable(PlannerInfo *root,
> !  CmdType operation, bool canSetTag,
> !  Index nominalRelation,
> !  List *resultRelations, List *subplans,
> !  List *withCheckOptionLists, List 
> *returningLists,
> !  List *rowMarks, OnConflictExpr *onconflict, 
> int epqParam);
>   extern bool is_projection_capable_plan(Plan *plan);


I see that you made a lot of formerly externally visible make_* routines
static. The Citus extension (which will be open source in a few days)
uses several of these (make_agg, make_sort_from_sortclauses, make_limit
at least).

Can we please re-consider making these static?

It's fine if their parameter list changes from release to release,
that's easy enough to adjust to, and it's easily visible. But having to
copy all of these into extension code is more than a bit painful;
especially make_sort_from_sortclauses.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add generate_series(date,date) and generate_series(date,date,integer)

2016-03-10 Thread Corey Huinker

removed leftover development comment

On Thu, Mar 10, 2016 at 11:02 AM, Corey Huinker 
wrote:

> On Thu, Mar 10, 2016 at 10:58 AM, Robert Haas 
> wrote:
>
>> On Thu, Mar 10, 2016 at 10:30 AM, Simon Riggs 
>> wrote:
>> > On 10 March 2016 at 06:53, Michael Paquier 
>> > wrote:
>> >>
>> >> On Wed, Mar 9, 2016 at 12:13 AM, Alvaro Herrera
>> >>  wrote:
>> >> > Robert Haas wrote:
>> >> >> I'm pretty meh about the whole idea of this function, though,
>> >> >> actually, and I don't see a single clear +1 vote for this
>> >> >> functionality upthread.  (Apologies if I've missed one.)  In the
>> >> >> absence of a few of those, I recommend we reject this.
>> >> >
>> >> > +1
>> >>
>> >> I'm meh for this patch.
>> >
>> >
>> > "meh" == +1
>> >
>> > I thought it meant -1
>>
>> In my case it meant, like, -0.5.  I don't really like adding lots of
>> utility functions like this to the default install, because I'm not
>> sure how widely they get used and it gradually increases the size of
>> the code, system catalogs, etc.  But I also don't want to block
>> genuinely useful things.  So basically, I'm not excited about this
>> patch, but I don't want to fight about it either.
>>
>> --
>> Robert Haas
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>>
>
> New patch for Alvaro's consideration.
>
> Very minor changes since the last time, the explanations below are
> literally longer than the changes:
> - Rebased, though I don't think any of the files had changed in the mean
> time
> - Removed infinity checks/errors and the test cases to match
> - Amended documentation to add 'days' after 'step' as suggested
> - Did not add a period as suggested, to remain consistent with other
> descriptions in the same sgml table
> - Altered test case and documentation of 7 day step example so that the
> generated dates do not land evenly on the end date. A reader might
> incorrectly infer that the end date must be in the result set, when it
> doesn't have to be.
> - Removed unnecessary indentation that existed purely due to following of
> other generate_series implementations
>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 4b5ee81..0a8c280 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14700,6 +14700,26 @@ AND
   
  
 
+ 
+  generate_series(start, 
stop)
+  date
+  setof date
+  
+   Generate a series of values, from start to 
stop
+   with a step size of one day
+  
+ 
+
+ 
+  generate_series(start, 
stop, step 
integer)
+  date
+  setof date
+  
+   Generate a series of values, from start to 
stop
+   with a step size of step days
+  
+ 
+
 

   
@@ -14764,6 +14784,26 @@ SELECT * FROM generate_series('2008-03-01 
00:00'::timestamp,
  2008-03-03 22:00:00
  2008-03-04 08:00:00
 (9 rows)
+
+SELECT d.date_val FROM generate_series('1991-09-24'::date,'1991-10-01'::date) 
as d(date_val);
+  date_val  
+
+ 1991-09-24
+ 1991-09-25
+ 1991-09-26
+ 1991-09-27
+ 1991-09-28
+ 1991-09-29
+ 1991-09-30
+ 1991-10-01
+(8 rows)
+
+SELECT d.date_val FROM 
generate_series('1991-09-24'::date,'1991-10-02'::date,7) as d(date_val);
+  date_val  
+
+ 1991-09-24
+ 1991-10-01
+(2 rows)
 
   
 
diff --git a/src/backend/utils/adt/date.c b/src/backend/utils/adt/date.c
index 332db7e..af4000d 100644
--- a/src/backend/utils/adt/date.c
+++ b/src/backend/utils/adt/date.c
@@ -30,6 +30,7 @@
 #include "utils/datetime.h"
 #include "utils/nabstime.h"
 #include "utils/sortsupport.h"
+#include "funcapi.h"
 
 /*
  * gcc's -ffast-math switch breaks routines that expect exact results from
@@ -2811,3 +2812,92 @@ timetz_izone(PG_FUNCTION_ARGS)
 
PG_RETURN_TIMETZADT_P(result);
 }
+
+typedef struct
+{
+   DateADT current;
+   DateADT stop;
+   int32   step;
+} generate_series_date_fctx;
+
+
+/* generate_series_date()
+ * Generate the set of dates from start to stop by step
+ */
+Datum
+generate_series_date(PG_FUNCTION_ARGS)
+{
+   return generate_series_step_date(fcinfo);
+}
+
+Datum
+generate_series_step_date(PG_FUNCTION_ARGS)
+{
+   FuncCallContext *funcctx;
+   generate_series_date_fctx *fctx;
+   DateADT result;
+
+   /* stuff done only on the first call of the function */
+   if (SRF_IS_FIRSTCALL())
+   {
+   DateADT start = PG_GETARG_DATEADT(0);
+   DateADT stop = PG_GETARG_DATEADT(1);
+   int32   step = 1;
+   MemoryContext oldcontext;
+
+   /* see if we were given an explicit step size */
+   if (PG_NARGS() == 3)
+   {
+   step = PG_GETARG_INT32(2);
+   if (step == 0)
+   ereport(ERROR,
+

[HACKERS] Adjusting the API of pull_var_clause()

2016-03-10 Thread Tom Lane

Over in the "Optimizer questions" thread, it's become apparent that
we need to fix pull_var_clause() to offer multiple behaviors for
WindowFunc nodes that are parallel to the ones it has for Aggrefs
(viz, reject, recurse, or include in result).  This should've been
done when window functions were introduced, likely; but we've escaped
the need for it so far because the planner hasn't done any real
analysis of post-WindowAgg targetlists.

The straightforward way to do this would be to add another enum type
similar to PVCAggregateBehavior and a fourth argument to pull_var_clause,
plus tedious updates of all twenty-or-so existing call sites, almost all
of which should choose PVC_REJECT_WINDOWFUNCS because they'd not expect
to get invoked on expressions that could contain WindowFuncs.

Now, I'm pretty sure that the last time we touched pull_var_clause's
API, we intentionally set it up to force every call site to be visited
when new behaviors were added.  But right at the moment that's looking
like it was a bad call.

An alternative API design could look like

#define PVC_INCLUDE_AGGREGATES   0x0001 /* include Aggrefs in output list */
#define PVC_RECURSE_AGGREGATES   0x0002 /* recurse into Aggref arguments */
#define PVC_INCLUDE_PLACEHOLDERS 0x0004 /* include PlaceHolderVars in output 
list */
#define PVC_RECURSE_PLACEHOLDERS 0x0008 /* recurse into PlaceHolderVar 
arguments */

extern List *pull_var_clause(Node *node, int flags);

with calls along the line of

pull_var_clause(node, PVC_INCLUDE_AGGREGATES | PVC_RECURSE_PLACEHOLDERS);

the default behavior if you specify no flag being "error if node type
is seen".

The attraction of this approach is that if we add another behavior
to pull_var_clause, while we'd still likely need to run around and
check every call site, it wouldn't be positively guaranteed that
we'd need to edit every darn one of them.

This might all be moot of course.  Either way, we'll have to touch every
call site today; and there is nothing on the horizon suggesting that we'll
need to make another change in pull_var_clause in the foreseeable future.

I'm undecided which way to fix it.  Anybody else have an opinion?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Endless loop calling PL/Python set returning functions

2016-03-10 Thread Alexey Grishchenko

Tom Lane  wrote:

> Alexey Grishchenko > writes:
> > No, my fix handles this well.
> > In fact, with the first function call you allocate global variables
> > representing Python function input parameters, call the function and
> > receive iterator over the function results. Then in a series of Postgres
> > calls to PL/Python handler you just fetch next value from the iterator,
> you
> > are not calling the Python function anymore. When the iterator reaches
> the
> > end, PL/Python call handler deallocates the global variable representing
> > function input parameter.
>
> > Regardless of the number of parallel invocations of the same function,
> each
> > of them in my patch would set its own input parameters to the Python
> > function, call the function and receive separate iterators. When the
> first
> > function's result iterator would reach its end, it would deallocate the
> > input global variable. But it won't affect other functions as they no
> > longer need to invoke any Python code.
>
> Well, if you think that works, why not undo the global-dictionary changes
> at the end of the first call, rather than later?  Then there's certainly
> no overlap in their lifespan.
>
> regards, tom lane
>

Could you elaborate more on this? In general, stack-like solution would
work - if before the function call there is a global variable with the name
matching input variable name, push its value to the stack, and pop it after
the function execution. Would implement it tomorrow and see how it works


-- 

Sent from handheld device

Re: [HACKERS] WAL log only necessary part of 2PC GID

2016-03-10 Thread Petr Jelinek


On 10/03/16 13:43, Pavan Deolasee wrote:

On Wed, Mar 9, 2016 at 7:56 PM, Petr Jelinek > wrote:

Hi,

I wonder why you define the gidlen as uint32 when it would fit into
uint8 which in the current TwoPhaseFileHeader struct should be win
of  8 bytes on padding (on 64bit). I think that's something worth
considering given that this patch aims to lower the size of the data.


Hi Petr,

That sounds like a good idea; I didn't think about that. I would like to
make it uint16 though just in case if we decide to increase GIDSIZE from
200 to something more than 256 (Postgres-XL does that already). That
still fits in the same aligned width, on both 32 as well as 64-bit
machines. New version attached.


Correct, and I see Simon committed it like this in meantime, thanks.

--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Tsvector editing functions

2016-03-10 Thread Teodor Sigaev


Thanks! Fixed and added tests.

Thank you!

I did some patch cleanup/fix, but I have some doubt with function's names:

1 to_tsvector:
# \df to_tsvector
 List of functions
   Schema   |Name | Result data type | Argument data types |  Type
+-+--+-+
 pg_catalog | to_tsvector | tsvector | regconfig, text | normal
 pg_catalog | to_tsvector | tsvector | text| normal
 pg_catalog | to_tsvector | tsvector | text[]  | normal

First two variants of to_tsvector make a morphological processing, last one 
doesn't.

2 to_array
# \df *to_array
  List of functions
   Schema   | Name  | Result data type | Argument data types | 
 Type

+---+--+-+
 pg_catalog | regexp_split_to_array | text[]   | text, text  | 
normal
 pg_catalog | regexp_split_to_array | text[]   | text, text, text| 
normal
 pg_catalog | string_to_array   | text[]   | text, text  | 
normal
 pg_catalog | string_to_array   | text[]   | text, text, text| 
normal
 pg_catalog | to_array  | text[]   | tsvector| 
normal


Seems, to_array is not a right name compared to other *to_array.

I would like to suggest rename both functions to array_to_tsvector and 
tsvector_to_array to have consistent name. Later we could add 
to_tsvector([regconfig, ], text[]) with morphological processing.


Thoughts?



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 4b5ee81..ed0b6be 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -9211,16 +9211,29 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
  
   setweight
  
- setweight(tsvector, "char")
+ setweight(vector tsvector, weight "char")
 
 tsvector
-assign weight to each element of tsvector
+assign weight to each element of vector
 setweight('fat:2,4 cat:3 rat:5B'::tsvector, 'A')
 'cat':3A 'fat':2A,4A 'rat':5A


 
  
+  setweight
+  setweight by filter
+ 
+ setweight(vector tsvector, weight "char", lexemes "text"[])
+
+tsvector
+assign weight to elements of vector that are listed in lexemes array
+setweight('fat:2,4 cat:3 rat:5B'::tsvector, 'A', '{cat,rat}')
+'cat':3A 'fat':2,4 'rat':5A
+   
+   
+
+ 
   strip
  
  strip(tsvector)
@@ -9233,6 +9246,81 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple

 
  
+  delete
+  delete lemexeme
+ 
+ delete(vector tsvector, lexeme text)
+
+tsvector
+remove given lexeme from vector
+delete('fat:2,4 cat:3 rat:5A'::tsvector, 'fat')
+'cat':3 'rat':5A
+   
+   
+
+ 
+  delete
+  delete lemexemes array
+ 
+ delete(vector tsvector, lexemes text[])
+
+tsvector
+remove any occurrence of lexemes in lexemes array from vector
+delete('fat:2,4 cat:3 rat:5A'::tsvector, ARRAY['fat','rat'])
+'cat':3
+   
+   
+
+ 
+  unnest
+ 
+ unnest(tsvector, OUT lexeme text, OUT positions smallint[], OUT weights text)
+
+setof record
+expand a tsvector to a set of rows
+unnest('fat:2,4 cat:3 rat:5A'::tsvector)
+(cat,{3},{D}) ...
+   
+   
+
+ 
+  to_array
+ 
+ to_array(tsvector)
+
+text[]
+convert tsvector to array of lexemes
+to_array('fat:2,4 cat:3 rat:5A'::tsvector)
+{cat,fat,rat}
+   
+   
+
+ 
+  to_tsvector
+  array to tsvector
+ 
+ to_tsvector(text[])
+
+tsvector
+convert array of lexemes to tsvector
+to_tsvector('{fat,cat,rat}'::text[])
+'fat' 'cat' 'rat'
+   
+   
+
+ 
+  filter
+ 
+ filter(vector tsvector, weights "char"[])
+
+tsvector
+Select only elements with given weights from vector
+filter('fat:2,4 cat:3b rat:5A'::tsvector, '{a,b}')
+'cat':3B 'rat':5A
+   
+   
+
+ 
   to_tsquery
  
  to_tsquery( config regconfig ,  query text)
diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml
index ff99976..ea3abc9 100644
---

Re: [HACKERS] Endless loop calling PL/Python set returning functions

2016-03-10 Thread Tom Lane

Alexey Grishchenko  writes:
> No, my fix handles this well.
> In fact, with the first function call you allocate global variables
> representing Python function input parameters, call the function and
> receive iterator over the function results. Then in a series of Postgres
> calls to PL/Python handler you just fetch next value from the iterator, you
> are not calling the Python function anymore. When the iterator reaches the
> end, PL/Python call handler deallocates the global variable representing
> function input parameter.

> Regardless of the number of parallel invocations of the same function, each
> of them in my patch would set its own input parameters to the Python
> function, call the function and receive separate iterators. When the first
> function's result iterator would reach its end, it would deallocate the
> input global variable. But it won't affect other functions as they no
> longer need to invoke any Python code.

Well, if you think that works, why not undo the global-dictionary changes
at the end of the first call, rather than later?  Then there's certainly
no overlap in their lifespan.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 184 matches

Mail list logo