date:20140723

Re: [HACKERS] timeout of pg_receivexlog --status-interval

2014-07-23 Thread Fujii Masao

On Wed, Jul 16, 2014 at 2:29 AM, Sawada Masahiko  wrote:
> On Tue, Jul 15, 2014 at 7:38 PM, Fujii Masao  wrote:
>> On Thu, Jul 10, 2014 at 11:10 PM, Sawada Masahiko  
>> wrote:
>>> Hi,
>>>
>>> At 1047 line of receivelog.c:CopyStreamPoll(), we set NULL to
>>> timeoutptr variable.
>>> if the value of timeoutprt is set NULL then the process will wait
>>> until can read socket using by select() function as following.
>>>
>>> if (timeout_ms < 0)
>>> timeoutptr = NULL;
>>> else
>>> {
>>> timeout.tv_sec = timeout_ms / 1000L;
>>> timeout.tv_usec = (timeout_ms % 1000L) * 1000L;
>>> timeoutptr = &timeout;
>>> }
>>>
>>> ret = select(PQsocket(conn) + 1, &input_mask, NULL, NULL, timeoutptr);
>>>
>>> But the 1047 line of receivelog.c is never executed because the value
>>> of timeout_ms is NOT allowed less than 0 at CopyStreamReceive which is
>>> only one function calls CopyStreamPoll().
>>> The currently code, if we specify -s to 0 then CopyStreamPoll()
>>> function is never called.
>>> And the pg_receivexlog will be execute PQgetCopyData() and failed, in
>>> succession.
>>
>> Thanks for reporting this! Yep, this is a problem.
>>
>>> I think that it is contradiction, and should execute select() function
>>> with NULL of fourth argument.
>>> the attached patch allows to execute select() with NULL, i.g.,
>>> pg_receivexlog.c will wait until can  read socket without timeout, if
>>> -s is specified to 0.
>>
>> Your patch changed the code so that CopyStreamPoll is called even
>> when the timeout is 0. I don't agree with this change because the
>> timeout=0 basically means that the caller doesn't request to block and
>> there is no need to call CopyStreamPoll in this case. So I'm thinking to
>> apply the attached patch. Thought?
>>
>
> Thank you for the response.
> I think this is  better.
>
> One another point about select() function, I think that they are same
> behavior between the fifth argument is NULL and 0(i.g. 0 sec).

No, per manpage of select(2), select(2) with timeout=0 behaves differently
from select(2) with timeout=NULL. So I don't think your suggestion is
acceptable...

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] date-time library

2014-07-23 Thread Pavel Stehule

Hi


2014-07-24 3:49 GMT+02:00 Charlie Holleran :

> postgres has fantastic date-time parsing.  I've tried some Java libraries
> that advertise amazing time parsing.  But nothing seems to match postgres's
> time parsing.  I started peeking through the source to find a reference to
> a library that postgres might be using for time parsing. no luck.  Where
> would one look in the source code for the date-time parsing code or library
> reference?
>
>
the code is in
https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/datetime.c
file

Regards

Pavel

Re: [HACKERS] Stating the significance of Lehman & Yao in the nbtree README

2014-07-23 Thread Peter Geoghegan

On Wed, Jul 23, 2014 at 8:52 PM, Amit Kapila  wrote:
> As such there is no problem in saying the way you have mentioned, but
> I feel it would be better if we can mention the mechanism of _bt_search()
> as quoted by you upthread in the first line.
> "> In more concrete terms, _bt_search() releases and only then acquires
>> read locks during a descent of the tree (by calling
>> _bt_relandgetbuf()), and, perhaps counterintuitively, that's just
>> fine."

I guess I could say that too.

> One more point, why you think it is important to add this new text
> on top?  I think adding new text after "Lehman and Yao don't require read
> locks, .." paragraph is okay.

I've added it to the top because it's really the most important point
on Lehman and Yao. It's the _whole_ point. Consider how it's
introduced here, for example:
http://db.cs.berkeley.edu/jmh/cs262b/treeCCR.html

Why should I "bury the lead"?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgresql.auto.conf and reload

2014-07-23 Thread Amit Kapila

On Wed, Jul 23, 2014 at 11:02 PM, Alvaro Herrera 
wrote:
> Tom Lane wrote:
> > Alvaro Herrera  writes:
> > > Tom Lane wrote:
> > >> No, ALTER SYSTEM is there now and it needs to work right in its first
> > >> release.  I will go fix this if nobody else does.
> >
> > > Just checking -- you didn't get around to dealing with this, right?
> >
> > Not yet... do you want it?
>
> I might give it a try, but not just yet.  I'll let you know if I attempt
> anything.

I am not sure you have noticed or not, but I have posted a patch upthread
to address this issue.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Stating the significance of Lehman & Yao in the nbtree README

2014-07-23 Thread Amit Kapila

On Wed, Jul 23, 2014 at 11:19 AM, Peter Geoghegan  wrote:
> On Tue, Jul 22, 2014 at 8:59 PM, Amit Kapila 
wrote:
> > Okay, but how does this justify to add below new text in README.
> > + Even with these read locks, Lehman and Yao's approach obviates the
> > + need of earlier schemes to hold multiple read locks concurrently when
> > + descending the tree as part of servicing index scans (pessimistic lock
> > + coupling).
> >
> > Actually I think putting it can lead to inconsistency in the README.
> > Currently it indicates that our algorithm is different from L&Y w.r.t
taking
> > Read Locks and has given explanation for same.
>
> Not really. Firstly, that sentence acknowledges that there are read
> locks where L&Y assume there will not be. "Even with these read locks"
> references the first paragraph, where it is stated the Postgres
> B-Trees still acquire read locks while descending the tree.

I think here you want to state that the difference in Postgres is "as we are
using L & Y approach, it don't need to hold *multiple* read locks
concurrently",
and L & Y approach which obviates this need is explained in second line
(which indicates the importance of maintaining right-links and high-keys to
detect and recover from page splits).

As such there is no problem in saying the way you have mentioned, but
I feel it would be better if we can mention the mechanism of _bt_search()
as quoted by you upthread in the first line.
"> In more concrete terms, _bt_search() releases and only then acquires
> read locks during a descent of the tree (by calling
> _bt_relandgetbuf()), and, perhaps counterintuitively, that's just
> fine."

One more point, why you think it is important to add this new text
on top?  I think adding new text after "Lehman and Yao don't require read
locks, .." paragraph is okay.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

[HACKERS] date-time library

2014-07-23 Thread Charlie Holleran

postgres has fantastic date-time parsing.  I've tried some Java libraries
that advertise amazing time parsing.  But nothing seems to match postgres's
time parsing.  I started peeking through the source to find a reference to
a library that postgres might be using for time parsing. no luck.  Where
would one look in the source code for the date-time parsing code or library
reference?

Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-23 Thread Peter Geoghegan

On Wed, Jul 23, 2014 at 9:58 AM, Robert Haas  wrote:
> I'd suggest something like:
>
> UPSERT table SET col = value [, ...] WHERE predicate;

I think I see another problem with this. The UPDATE must provide a
WHERE clause Var on a unique indexed column (let's say it's
constrained to provide a "(Var [unique-indexed opclass support
function 3 op] Const)" qual during parse analysis because you asked
for UPSERT. But it can also affect 0 rows for other reasons, since
this UPDATE qual can have arbitrary other expressions. So in general
you have any number of reasons for not updating, which implies that
you must insert, which might not be possible because there might even
still be duplicate key violation in a not-yet-visible-to-you row (even
just in the unique index you intended to merge on). Whereas, when
inserting, there is exactly one reason (per row) to go and update - a
conflict (i.e. a would-be duplicate violation). And this is a conflict
that is meaningful to every transaction, regardless of their snapshot,
since it represents an objective fact about the physical presence of
an index tuple.

So, do you make the UPDATE differentiate between different reasons for
the UPDATE failing, only inserting when an UPSERT would have happened
had you omitted the extra stuff in your UPSERT predicate? Can you
really differentiate anyway? And isn't the choice to do the insert
based on what your MVCC snapshot happens to see - rather than
something that there is necessarily objective truth on, the physical
presence of a duplicate tuple in an index - rather arbitrary? It's a
different situation to my implementation, where you do an insert-like
thing, and then find a conflict row to lock and then update, because
at that point the row is successfully locked, and the WHERE clause may
be evaluated against rejects and the *most recent version* of the
locked, conflict-causing row. The most recent, locked version is not
arbitrary at all. That's the version you ought to evaluate a predicate
against before finally deciding to UPDATE.

I assume you agree with my view that UPSERT should always insert or
update. This kind of business sounds closer to SQL MERGE, where an
insert may not drive things, and you accept these kinds of anomalies
in exchange for a lot more flexibility in not having inserts drive
things. Did you happen to read my post on that?
-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Diagnose incompatible OpenLDAP versions during build and test.

2014-07-23 Thread Tom Lane

Noah Misch  writes:
> On Tue, Jul 22, 2014 at 11:51:18AM -0400, Tom Lane wrote:
>> Hmm.  I'm pretty sure it is not considered good style to drop AC_DEFUN
>> blocks right into configure.in; at least, we have never done that before.
>> PGAC_LDAP_SAFE should get defined somewhere in config/*.m4, instead.
>> I am not real sure however which existing config/ file is appropriate,
>> or whether we should create a new one.  Peter, any opinion?

> I don't mind reinstating the absence of AC_DEFUN from configure.in.  Add
> config/misc-libraries.m4, perhaps?  Alternately, stretch programs.m4; it
> already has two readline library checks.

programs.m4 sounds like the place then.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Diagnose incompatible OpenLDAP versions during build and test.

2014-07-23 Thread Noah Misch

On Tue, Jul 22, 2014 at 11:51:18AM -0400, Tom Lane wrote:
> Noah Misch  writes:
> > Diagnose incompatible OpenLDAP versions during build and test.
> 
> Hmm.  I'm pretty sure it is not considered good style to drop AC_DEFUN
> blocks right into configure.in; at least, we have never done that before.
> PGAC_LDAP_SAFE should get defined somewhere in config/*.m4, instead.
> I am not real sure however which existing config/ file is appropriate,
> or whether we should create a new one.  Peter, any opinion?

I don't mind reinstating the absence of AC_DEFUN from configure.in.  Add
config/misc-libraries.m4, perhaps?  Alternately, stretch programs.m4; it
already has two readline library checks.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PDF builds broken again

2014-07-23 Thread Alvaro Herrera

Magnus Hagander wrote:

> I think it would be very useful to have. Given how long it takes to
> build not all the time, but running it every couple of days or weekly
> or so would be quite useful. Then we'd catch things earlier and not
> have to spend as much time trying to figure out exactly what broke...

I have a half-setup buildfarm job in the machine that powers guaibasaurus
that's intended to build HTML docs every five minutes or so.  I guess I
could see about finishing that, and having it do the PDF build once a
week or so.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-23 Thread Peter Geoghegan

On Wed, Jul 23, 2014 at 3:27 PM, Robert Haas  wrote:
> To the last question, yes.  To the first point, I'm not bent on this
> particular syntax, but I am +1 for the idea that the command is
> something specifically upsert-like rather than something more generic
> like an ON CONFLICT SELECT that lets you do arbitrary things with the
> returned rows.

FWIW I wasn't proposing that you'd be able to do arbitrary things.
There'd be a few restrictions, such as forbidding unexpected DML
commands, and requiring that only a special update reference the
special rejected-projecting CTE. They just wouldn't be arbitrary
restrictions. But that's not that interesting to me anymore; you and
Andres want a dedicated DML command with the update part built in,
that isn't as flexible. Okay, fine. I don't think it makes the
implementation any easier, but that's what I'll do.

> It's certain arguable whether you should INSERT and then turn failures
> into an update or try to UPDATE and then turn failures into an INSERT;
> we might even want to have both options available, though that smells
> a little like airing too much of our dirty laundry.  But I think I
> generally favor optimizing for the UPDATE case for more or less the
> same reasons Kevin articulated.

I don't see the connection between this and Kevin's remarks. And FWIW,
I don't see a reason to favor inserts or updates. Fortunately, what I
have balances both cases very well, and doesn't cause bloat. The work
of descending the index to lock it isn't wasted if an update is
required. My implementation decides to either insert or update at
literally the latest possible moment.

>> For one thing we cannot use any kind of scan unless there is a new
>> mechanism for seeing not-yet-visible tuples, like some kind of
>> non-MVCC snapshot that those scan nodes can selectively use. Now, I
>> guess that you intend that in this scenario you go straight to 5, and
>> then your insert finds the not-yet-visible conflict. But it's not
>> clear that when 5 fails, and we pick up from 1 that we then get a new
>> MVCC Snapshot or something else that gives us a hope of succeeding
>> this time around. And it seems quite wasteful to not use knowledge of
>> conflicts from the insert at all - that's one thing I'm trying to
>> address here; removing unnecessary index scans when we actually know
>> what our conflict TIDs are.
>
> Here you seem to be suggested that I intended to propose your existing
> design rather than something else, which I didn't.  In this design,
> you find the conflict (at most one) but scanning for the tuple to be
> updated.

Yes, but what if you don't see a conflict because it isn't visible to
your snapshot, and then you insert, and only then (step 5), presumably
with a dirty snapshot, you find a conflict? How does the loop
terminate if that brings you back to step 1 with the same MVCC
snapshot feeding the update?

>>> 2. If you find more than one tuple that is visible to your scan, error.
>>
>> This point seems to concern making the UPSERT UPDATE only affect zero
>> or one rows. Is that right? If so, why do you think that's a useful
>> constraint to impose?
>
> Because nobody wants an operation to either insert 1 tuple or update
> n>=1 tuples.  The intention is that the predicate should probably be
> something like WHERE unique_key = 'some_value', but you can use
> something else.  So it's kinda like saying which index you care about
> for a particular operation, except without having to explicitly name
> an index.  But in any event you should use a predicate that uniquely
> identifies the tuple you want to update.

I agree that you want to uniquely identify each tuple. What I meant
was, why should we not be able to upsert multiple rows in a single
command? What's wrong with that?

>> Point 4b ("if there is more than one such tuple...") seems like it
>> needs some practical or theoretical justification - do you just not
>> want to follow an update chain? If so, why? What would the error
>> message actually be? And, to repeat myself: Why should an MVCC
>> snapshot see more than one such tuple in the ordinary course of
>> scanning a relation, since there is by definition a unique constraint
>> that prevents this? Or maybe I'm wrong; maybe you don't even require
>> that. That isn't clear.
>
> In case (4b), like in case (2), you've done something like UPSERT tab
> SET ... WHERE non_unique_index = 'value_appearing_more_than_once'.
> You shouldn't do that, because you have only one replacement tuple
> (embodied in the SET clause).

Oh, I totally agree that we should throw an error if any one row  is
affected more than once by the same command. Indeed, SQL MERGE
requires this, since to do otherwise would leave the final value of
the row unspecified. It seemed to me from your original explanation of
point 4b that you were saying something much stronger. But if that's
all you meant, then I agree.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresq

Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-23 Thread Alvaro Herrera

Robert Haas wrote:
> On Wed, Jul 23, 2014 at 4:05 PM, Peter Geoghegan  wrote:
> > On Wed, Jul 23, 2014 at 9:58 AM, Robert Haas  wrote:

> >> 2. If you find more than one tuple that is visible to your scan, error.
> >
> > This point seems to concern making the UPSERT UPDATE only affect zero
> > or one rows. Is that right? If so, why do you think that's a useful
> > constraint to impose?
> 
> Because nobody wants an operation to either insert 1 tuple or update
> n>=1 tuples.  The intention is that the predicate should probably be
> something like WHERE unique_key = 'some_value', but you can use
> something else.  So it's kinda like saying which index you care about
> for a particular operation, except without having to explicitly name
> an index.  But in any event you should use a predicate that uniquely
> identifies the tuple you want to update.

This seemed a nice idea when I first read it earlier today, but now I'm
not so sure.  Are you saying that it wouldn't be allowed to use an
UPSERT with some sort of join, such that each joined row would produce
either one insert or one update?  To clarify: suppose I import some
external data into a temp table, then run UPSERT "USING" that table so
that the rows end up in a permanent table; some of the rows might be
already in the permanent table, some others might not.  I would hope
that by the time UPSERT is done, all the rows are in the permanent
table.  Would that raise an error, with your proposed design?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-07-23 Thread Alvaro Herrera

Peter Geoghegan wrote:

> Maybe that would be marginally better than classic Levenshtein
> distance, but I doubt it would pay for itself. It's just more code to
> maintain. Are we really expecting to not get the best possible
> suggestion due to some number of transposition errors very frequently?
> You still have to have a worse suggestion spuriously get ahead of
> yours, and typically there just aren't that many to begin with. I'm
> not targeting spelling errors so much as thinkos around plurals and
> whether or not an underscore was used. Damerau-Levenshtein seems like
> an algorithm with fairly specialized applications.

Yes, it's for typos.  I guess it's an unfrequent scenario to have both a
typoed column and a column that's missing the plural declension, which
is the case in which Damerau-Lvsh would be a win.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-07-23 Thread Michael Paquier

On Thu, Jul 24, 2014 at 1:09 AM, Tom Lane  wrote:

> Robert Haas  writes:
> > There are several possible methods of doing that, but I think the best
> > one is just to leave the SQL-callable C functions in fuzzystrmatch and
> > move only the underlying code that supports into core.
>
> I hadn't been paying close attention to this thread, but I'd just assumed
> that that would be the approach.
>
> It might be worth introducing new differently-named pg_proc entries for
> the same functions in core, but only if we can agree that there are better
> names for them than what the extension uses.
>
Yes, that's a point I raised upthread as well. What about renaming those
functions as string_distance and string_distance_less_than? Then have only
fuzzystrmatch do some DirectFunctionCall using the in-core functions?
-- 
Michael

Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-23 Thread Peter Geoghegan

On Wed, Jul 23, 2014 at 3:01 PM, Kevin Grittner  wrote:
> Could you clarify that?  Does this mean that you feel that we
> should write to the heap before reading the index to see if the row
> will be a duplicate?  If so, I think that is a bad idea, since this
> will sometimes be used to apply a new data set which hasn't changed
> much from the old, and that approach will perform poorly for this
> use case, causing a lot of bloat.  It certainly would work well for
> the case that most of the rows are expected to be INSERTs rather
> than DELETEs, but I'm not sure that's justification for causing
> extreme bloat in the other cases.

No, I think we should stagger ordinary index insertion in a way that
locks indexes - we lock indexes, then if successful insert a heap
tuple before finally inserting index tuples using the existing
heavyweight page-level index locks. My design doesn't cause bloat
under any circumstances. Heikki's design, which he sketched with an
actual POC implementation involved possible bloating in the event of a
conflict. He also had to go and delete the promise tuple (from within
ExecInsert()) in the event of the conflict before row locking in order
to prevent unprincipled deadlocking. Andres wanted to do something
else similarly involving "promise tuples", where the xid on the
inserter was stored in indexes with a special flag. That could also
cause bloat. I think that could be particularly bad when conflicts
necessitate visiting indexes one by one to kill promise tuples, as
opposed to just killing one heap tuple as in the case of Heikki's
design.

Anyway, both of those designs, and my own design are insert-driven.
The main difference between the design that Heikki sketched and my own
is that mine does not cause bloat, but is more invasive to the nbtree
code (but less invasive to a lot of other places to make the
deadlocking-ultimately-conflicting tuple killing work). But I believe
that Heikki's design is identical to my own in terms of user-visible
semantics. That said, his design was just a sketched and it wouldn't
be fair to hold him to it.

> Also, just a reminder that I'm going to squawk loudly if the
> implementation does not do something fairly predictable and sane
> for the case that the table has more than one UNIQUE index and you
> attempt to UPSERT a row that is a duplicate of one row on one of
> the indexes and a different row on a different index.

Duly noted.  :-)

I think that it's going to have to support that one way or the other.
It might be the case that I'll want to make the choice of unique index
optionally "implicit", but it's clear that we want to be able to
specify a specific unique index in one form or another. Actually, I've
already added that. It's just optional right now. I haven't found a
better way than by just specifying the name of the unique index in
DML, which is ugly, which is the main reason I want to make it
optional. Perhaps we can overcome this.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-23 Thread David G Johnston

Peter Geoghegan-3 wrote
>> with semantics like this:
>>
>> 1. Search the table, using any type of scan you like, for a row
>> matching the given predicate.
> 
> Perhaps I've misunderstood, but this is fundamentally different to
> what I'd always thought would need to happen. I always understood that
> the UPSERT should be insert-driven, and so not really have an initial
> scan at all in the same sense that every Insert lacks one. Moreover,
> I've always thought that everyone agreed on that. We go to insert, and
> then in the course of inserting index tuples do something with dirty
> snapshots. That's how we get information on conflicts.

SQL standard not-withstanding there really is no way to pick one
implementation and not have it be sub-optimal in some situations.  Unless
there is a high barrier why not introduce syntax:

SCAN FIRST; INSERT FIRST

that allows the user to specify the behavior that they expect would be most
efficient given their existing/new data ratio?


>> Having said all that, I believe the INSERT ON CONFLICT syntax is more
>> easily comprehensible than previous proposals.  But I still tend to
>> agree with Andres that an explicit UPSERT syntax or something like it,
>> that captures all of the MVCC games inside itself, is likely
>> preferable from a user standpoint, whatever the implementation ends up
>> looking like.
> 
> Okay then. If you both feel that way, I will come up with something
> closer to what you sketch. But for now I still strongly feel it ought
> to be driven by an insert. Perhaps I've misunderstood you entirely,
> though.

Getting a little syntax crazy here but having all of:

UPSERT [SCAN|INSERT] FIRST
INSERT ON CONFLICT UPDATE - same as INSERT FIRST
UPDATE ON MISSING INSERT - same as SCAN FIRST

with the corresponding 2 implementations would make the user interface
slightly more complicated but able to be conformed to the actual data that
the user has.


You could basically perform a two-phase pass where you run the
user-requested algorithm and then for all failures attempt the alternate
algorithm and then error if both fail.

I am not at all fluent on the concurrency issues here, and the MVCC
violations and re-tries that might be considered, but at a high-level there
is disagreement here simply because both answers are "correct" and ideally
both can be provided to the user.

David J.






--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Making-joins-involving-ctid-work-for-the-benefit-of-UPSERT-tp5811919p5812640.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] System shutdown signal on Windows (was Re: [GENERAL])

2014-07-23 Thread Krystian Bigaj

On 23 July 2014 22:16, Tom Lane  wrote:

> Krystian Bigaj  writes:
> > - when pg_console_handler receives CTRL_SHUTDOWN_EVENT from OS, then it
> > calls pg_queue_signal(SIGINT).
>
> > Problems:
> > - when OS is in shutdown path, then it sends CTRL_SHUTDOWN_EVENT, and
> *all*
> > Postgres processes (main and sub/forked) will call
> pg_queue_signal(SIGINT)
> > - so main and sub processes will start to shutdown independently? Can
> this
> > have any bad consequences?
>
> Hm.  We ought to have that sending SIGTERM instead, so as to mimic the
> situation when Unix "init" is trying to shut down the system.  It might be
> that SIGINT will more or less work, but the postmaster logic is designed
> to work with global SIGTERM as being the clean-up-ASAP trigger.  As an
> example, backends servicing remote applications (which will *not* have
> already gotten killed) would not exit in response to SIGINT.
>
> > I think that CTRL_SHUTDOWN_EVENT should be removed from
> pg_console_handler,
>
> That does not sound like a good idea, at least not if Windows has the
> same behavior as "init" does of proceeding to hard kills after some
> grace period.
>
> regards, tom lane
>

I'm not really familiar with Unix and it's SIG-commands. I know only
about SIGINT/SIGTERM from Postgres documentation.

However form what I see is that when Postgress is running by pg_ctl from
service, then it will receive SIGINT (independently and in general in
unspecified order)
- *each* postgres.exe process will queue itself SIGINT (because of
CTRL_SHUTDOWN_EVENT),
- pg_ctl will send SIGINT to main postmaster process (and possibly it will
pass that command to sub-processes)
So there are two independent paths where SIGINT are sent, and pg_ctl
doesn't have really a control when postgres.exe receives SIGINT.
This CTRL_SHUTDOWN_EVENT is not used when postgres.exe is run on *user
session* - so removing it won't change anything.

I see only two cases where CTRL_SHUTDOWN_EVENT might be need (all of there
where postgres.exe is run on service session):
- postgres.exe run by pg_ctl.exe, but pg_ctl service process was
terminated/killed, and then system was shutdown
- someone starts postgres.exe from own service, but doesn't send
SIGINT/SIGTERM command to postgres.exe on service system shutdown (but he
must for service stop)
As I previously wrote, I have workaround for it, so if you think that this
change would break compatibility and don't want to change it, then I'm
really fine with it.

However I've probably found something with pg_ctl.c regarding shutdown and
maybe that suspicious postgres.exe process termination on Windows.

1) shutdownEvent is signaled in pgwin32_ServiceMain
by SERVICE_CONTROL_STOP/SERVICE_CONTROL_SHUTDOWN in pgwin32_ServiceHandler
There is dwWaitHint = 1.

2)
...
  /* Wait for quit... */
ret = WaitForMultipleObjects(2, shutdownHandles, FALSE, INFINITE);

pgwin32_SetServiceStatus(SERVICE_STOP_PENDING);
switch (ret)
 {
case WAIT_OBJECT_0: /* shutdown event */
 kill(postmasterPID, SIGINT);

/*
 * Increment the checkpoint and try again Abort after 12
 * checkpoints as the postmaster has probably hung
 */
while (WaitForSingleObject(postmasterProcess, 5000) == WAIT_TIMEOUT &&
status.dwCheckPoint < 12)
 status.dwCheckPoint++; < missing call
to pgwin32_SetServiceStatus(SERVICE_STOP_PENDING)
or SetServiceStatus(hStatus, (LPSERVICE_STATUS) &status);
 break;
...

There is incremented dwCheckPoint every 5000ms, but that status is not
updated (missing pgwin32_SetServiceStatus/SetServiceStatus), so SCM after
10s (dwWaitHint = 1) will not receive incremented dwCheckPoint, and
it's allowed to kill that process (because this service didn't respond with
dwWaitHint). It kills pg_ctl.exe, but when all services stops, then it
simply terminates other remaining processes - in this case postgres.exe

http://msdn.microsoft.com/en-us/library/windows/desktop/ms685996(v=vs.85).aspx
dwWaitHint:
...
If the amount of time specified by dwWaitHint passes, and dwCheckPoint has
not been incremented or dwCurrentState has not changed, the service control
manager or service control program can assume that an error has occurred
and the service should be stopped.
...

So if main postgres.exe (run by pg_ctl.exe service process) won't shutdown
in 10s (for any reason) it might be terminated/killed by Windows/SCM.

Best regards,
Krystian Bigaj

Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-23 Thread Robert Haas

On Wed, Jul 23, 2014 at 4:05 PM, Peter Geoghegan  wrote:
> On Wed, Jul 23, 2014 at 9:58 AM, Robert Haas  wrote:
>> This all seems completely to one side of Andres's point.  I think what
>> he's saying is: don't implement an SQL syntax of the form INSERT ON
>> CONFLICT and let people use that to implement upsert.  Instead,
>> directly implement an SQL command called UPSERT or similar.  That's
>> long been my intuition as well.  I think generality here is not your
>> friend.
>
> Sure, I was just making one fairly narrow point in relation to Andres'
> concern about executor structuring of the UPSERT.
>
>> I'd suggest something like:
>>
>> UPSERT table SET col = value [, ...] WHERE predicate;
>
> Without dismissing the overall concern shared by you and Andres, that
> particular update-driven syntax doesn't strike me as something that
> should be pursued. I would like to be able to insert or update more
> than a single row at a time, for one thing. For another, what exactly
> appears in the predicate? Is it more or less the same as an UPDATE's
> predicate?

To the last question, yes.  To the first point, I'm not bent on this
particular syntax, but I am +1 for the idea that the command is
something specifically upsert-like rather than something more generic
like an ON CONFLICT SELECT that lets you do arbitrary things with the
returned rows.

>> with semantics like this:
>>
>> 1. Search the table, using any type of scan you like, for a row
>> matching the given predicate.
>
> Perhaps I've misunderstood, but this is fundamentally different to
> what I'd always thought would need to happen. I always understood that
> the UPSERT should be insert-driven, and so not really have an initial
> scan at all in the same sense that every Insert lacks one. Moreover,
> I've always thought that everyone agreed on that. We go to insert, and
> then in the course of inserting index tuples do something with dirty
> snapshots. That's how we get information on conflicts.

It's certain arguable whether you should INSERT and then turn failures
into an update or try to UPDATE and then turn failures into an INSERT;
we might even want to have both options available, though that smells
a little like airing too much of our dirty laundry.  But I think I
generally favor optimizing for the UPDATE case for more or less the
same reasons Kevin articulated.

> For one thing we cannot use any kind of scan unless there is a new
> mechanism for seeing not-yet-visible tuples, like some kind of
> non-MVCC snapshot that those scan nodes can selectively use. Now, I
> guess that you intend that in this scenario you go straight to 5, and
> then your insert finds the not-yet-visible conflict. But it's not
> clear that when 5 fails, and we pick up from 1 that we then get a new
> MVCC Snapshot or something else that gives us a hope of succeeding
> this time around. And it seems quite wasteful to not use knowledge of
> conflicts from the insert at all - that's one thing I'm trying to
> address here; removing unnecessary index scans when we actually know
> what our conflict TIDs are.

Here you seem to be suggested that I intended to propose your existing
design rather than something else, which I didn't.  In this design,
you find the conflict (at most one) but scanning for the tuple to be
updated.

>> 2. If you find more than one tuple that is visible to your scan, error.
>
> This point seems to concern making the UPSERT UPDATE only affect zero
> or one rows. Is that right? If so, why do you think that's a useful
> constraint to impose?

Because nobody wants an operation to either insert 1 tuple or update
n>=1 tuples.  The intention is that the predicate should probably be
something like WHERE unique_key = 'some_value', but you can use
something else.  So it's kinda like saying which index you care about
for a particular operation, except without having to explicitly name
an index.  But in any event you should use a predicate that uniquely
identifies the tuple you want to update.

>> 3. If you find exactly one tuple that is visible to your scan, update it.  
>> Stop.
>> 4. If you find no tuple that is visible to your scan, but you do find
>> at least one tuple whose xmin has committed and whose xmax has not
>> committed, then (4a) if the isolation level is REPEATABLE READ or
>> higher, error; (4b) if there is more than one such tuple, error; else
>> (4b) update it, in violation of normal MVCC visibility rules.
>
> Point 4b ("if there is more than one such tuple...") seems like it
> needs some practical or theoretical justification - do you just not
> want to follow an update chain? If so, why? What would the error
> message actually be? And, to repeat myself: Why should an MVCC
> snapshot see more than one such tuple in the ordinary course of
> scanning a relation, since there is by definition a unique constraint
> that prevents this? Or maybe I'm wrong; maybe you don't even require
> that. That isn't clear.

In case (4b), like in case (2), you've done somethin

Re: [HACKERS] Shapes on the regression test for polygon

2014-07-23 Thread Robert Haas

On Wed, Jul 23, 2014 at 4:06 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Mon, Jul 21, 2014 at 10:52 AM, Emre Hasegeli  wrote:
>>> The first two shapes on src/test/regress/sql/polygon.sql do not make
>>> sense to me.
>
>> Well, I think the number of tabs that makes them look best depends on
>> your tab-stop setting.  At present, I find that with 8-space tabs
>> things seem to line up pretty well, whereas with your patch, 4-space
>> tabs line up well.
>
> Perhaps we should expand tabs to spaces in those ascii-art diagrams?
>
>> Personally, though, my vote would be to just leave it the way it is.
>> I don't think there's really a problem here in need of solving.
>
> I concur with doubting that changing the actual regression test cases
> is a good thing to do, at least not without more analysis.  But "the
> pictures make no sense with the wrong tab setting" is a pretty simple
> issue, and one that I can see biting people.

AFAICT, the pictures make no sense with the right tab setting, either.
The possibility that someone might use the wrong tab setting is just
icing on the cake.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-23 Thread Kevin Grittner

Peter Geoghegan  wrote:

> I still strongly feel it ought to be driven by an insert

Could you clarify that?  Does this mean that you feel that we
should write to the heap before reading the index to see if the row
will be a duplicate?  If so, I think that is a bad idea, since this
will sometimes be used to apply a new data set which hasn't changed
much from the old, and that approach will perform poorly for this
use case, causing a lot of bloat.  It certainly would work well for
the case that most of the rows are expected to be INSERTs rather
than DELETEs, but I'm not sure that's justification for causing
extreme bloat in the other cases.

Also, just a reminder that I'm going to squawk loudly if the
implementation does not do something fairly predictable and sane
for the case that the table has more than one UNIQUE index and you
attempt to UPSERT a row that is a duplicate of one row on one of
the indexes and a different row on a different index.  The example
discussed during your PGCon talk was something like a city table
with two column, each with a UNIQUE constraint, containing:

 city_id | city_name
-+---
   1 | Toronto
   2 | Ottawa

... and an UPSERT comes through for (1, 'Ottawa').  We would all 
like for that never to happen, but it will.  There must be sane and
documented behavior in that case.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-23 Thread Greg Stark

On Wed, Jul 23, 2014 at 5:58 PM, Robert Haas  wrote:
> I'd suggest something like:
>
> UPSERT table SET col = value [, ...] WHERE predicate;

I don't think this is actually good enough. Typical use cases are
things like "increment this counter or insert a new one starting at
0".

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Verbose output of pg_dump not show schema name

2014-07-23 Thread Fabrízio de Royes Mello

On Thu, Apr 17, 2014 at 9:15 PM, Michael Paquier 
wrote:
>
> On Fri, Apr 18, 2014 at 4:29 AM, Fabrízio de Royes Mello
>  wrote:
> >
> > On Thu, Apr 17, 2014 at 12:46 PM, Bruce Momjian 
wrote:
> >>
> >> On Thu, Apr 17, 2014 at 11:44:37AM -0400, Tom Lane wrote:
> >> > Bruce Momjian  writes:
> >> > > The idea is that we only need quotes when there are odd characters
in
> >> > > the identifier.  We do that right now in some places, though I
can't
> >> > > find them in pg_dump.  I know psql does that, see quote_ident().
> >> >
> >> > I think our general style rule is that identifiers embedded in
messages
> >> > are always double-quoted.  There's an exception for type names, but
> >> > not otherwise.  You're confusing the message case with printing SQL.
> >>
> >> OK.  I was unclear if a status _display_ was a message like an error
> >> message.
> >>
> >
> > The attached patch fix missing double-quoted in "dumping contents of
> > table.." message and add schema name to other messages:
> > - "reading indexes for table \"%s\".\"%s\"\n"
> > - "reading foreign key constraints for table \"%s\".\"%s\"\n"
> > - "reading triggers for table \"%s\".\"%s\"\n"
> >
> > - "finding the columns and types of table \"%s\".\"%s\"\n"
> > - "finding default expressions of table \"%s\".\"%s\"\n"
> > - "finding check constraints for table \"%s\".\"%s\"\n"
> Cool additions. There may be a more elegant way to check if namespace
> is NULL, but I couldn't come up with one myself. So patch may be fine.
>

Hi all,

I think this small patch was lost. There are something wrong?

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello

Re: [HACKERS] [GSoC2014] Patch ALTER TABLE ... SET LOGGED

2014-07-23 Thread Fabrízio de Royes Mello

On Tue, Jul 22, 2014 at 3:29 PM, Fabrízio de Royes Mello <
fabriziome...@gmail.com> wrote:
>
> On Tue, Jul 22, 2014 at 12:01 PM, Fabrízio de Royes Mello <
fabriziome...@gmail.com> wrote:
> >
> >
> >
> > On Thu, Jul 17, 2014 at 8:02 PM, Andres Freund 
wrote:
> > >
> > > > That means should I "FlushRelationBuffers(rel)" before change the
> > > > relpersistence?
> > >
> > > I don't think that'd help. I think what this means that you simply
> > > cannot change the relpersistence of the old relation before the heap
> > > swap is successful. So I guess it has to be something like
(pseudocode):
> > >
> > > OIDNewHeap = make_new_heap(..);
> > > newrel = heap_open(OIDNewHeap, AEL);
> > >
> > > /*
> > >  * Change the temporary relation to be unlogged/logged. We have to do
> > >  * that here so buffers for the new relfilenode will have the right
> > >  * persistency set while the original filenode's buffers won't get
read
> > >  * in with the wrong (i.e. new) persistency setting. Otherwise a
> > >  * rollback after the rewrite would possibly result with buffers for
the
> > >  * original filenode having the wrong persistency setting.
> > >  *
> > >  * NB: This relies on swap_relation_files() also swapping the
> > >  * persistency. That wouldn't work for pg_class, but that can't be
> > >  * unlogged anyway.
> > >  */
> > > AlterTableChangeCatalogToLoggedOrUnlogged(newrel);
> > > FlushRelationBuffers(newrel);
> > > /* copy heap data into newrel */
> > > finish_heap_swap();
> > >
> > > And then in swap_relation_files() also copy the persistency.
> > >
> > >
> > > That's the best I can come up right now at least.
> > >
> >
> > Isn't better if we can set the relpersistence as an argument to
"make_new_heap" ?
> >
> > I'm thinking to change the make_new_heap:
> >
> > From:
> >
> >  make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, bool forcetemp,
> >LOCKMODE lockmode)
> >
> > To:
> >
> >  make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, char relpersistence,
> >LOCKMODE lockmode)
> >
> > That way we can create the new heap with the appropriate
relpersistence, so in the swap_relation_files also copy the persistency, of
course.
> >
> > And after copy the heap data to the new table (ATRewriteTable) change
relpersistence of the OldHeap's indexes, because in the "finish_heap_swap"
they'll be rebuild.
> >
> > Thoughts?
> >
>
> The attached patch implement my previous idea based on Andres thoughts.
>

I don't liked the last version of the patch, especially this part:

+/* check if SetUnlogged or SetLogged exists in subcmds */
+for(pass = 0; pass < AT_NUM_PASSES; pass++)
+{
+List *subcmds = tab->subcmds[pass];
+ListCell*lcmd;
+
+if (subcmds == NIL)
+continue;
+
+foreach(lcmd, subcmds)
+{
+AlterTableCmd *cmd = (AlterTableCmd *) lfirst(lcmd);
+
+if (cmd->subtype == AT_SetUnLogged || cmd->subtype ==
AT_SetLogged)
+{
+/*
+ * Change the temporary relation to be
unlogged/logged. We have to do
+ * that here so buffers for the new relfilenode
will have the right
+ * persistency set while the original filenode's
buffers won't get read
+ * in with the wrong (i.e. new) persistency
setting. Otherwise a
+ * rollback after the rewrite would possibly
result with buffers for the
+ * original filenode having the wrong persistency
setting.
+ *
+ * NB: This relies on swap_relation_files() also
swapping the
+ * persistency. That wouldn't work for pg_class,
but that can't be
+ * unlogged anyway.
+ */
+if (cmd->subtype == AT_SetUnLogged)
+newrelpersistence = RELPERSISTENCE_UNLOGGED;
+
+isSetLoggedUnlogged = true;
+}
+}
+}


So I did a refactoring adding new items to AlteredTableInfo to pass the
information through the phases.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 69a1e14..2d131df 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -59,16 +59,17 @@ ALTER TABLE [ IF EXISTS ] name
 ENABLE ALWAYS RULE rewrite_rule_name
 CLUSTER ON index_name
 SET WITHOUT CLUSTER
+SET {LOGGED | UNLOGGED}
 SET WITH OIDS
 SET WITHOUT OIDS
 SET ( storage_parameter =

Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-07-23 Thread Peter Geoghegan

On Wed, Jul 23, 2014 at 1:10 PM, Alvaro Herrera
 wrote:
> I had two thoughts:
>
> 1. Should we consider making levenshtein available to frontend programs
> as well as backend?

I don't think so. Why would that be useful?

> 2. Would it provide better matching to use Damerau-Levenshtein[1] instead
> of raw Levenshtein?

Maybe that would be marginally better than classic Levenshtein
distance, but I doubt it would pay for itself. It's just more code to
maintain. Are we really expecting to not get the best possible
suggestion due to some number of transposition errors very frequently?
You still have to have a worse suggestion spuriously get ahead of
yours, and typically there just aren't that many to begin with. I'm
not targeting spelling errors so much as thinkos around plurals and
whether or not an underscore was used. Damerau-Levenshtein seems like
an algorithm with fairly specialized applications.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PDF builds broken again

2014-07-23 Thread Magnus Hagander

On Wed, Jul 23, 2014 at 10:15 PM, Andrew Dunstan  wrote:
>
> On 07/23/2014 06:31 AM, Magnus Hagander wrote:
>>
>>
>>
>> Do we actually have any buildfarm boxes building the PDFs? And if so,
>> any idea why they didn't catch it?
>
>
>
> AFAIK, nobody's ever asked for such a thing. The docs optional step just
> builds the default docs target, which is simply the HTML docs.
>
> It should be possible, just a SMOC.

I think it would be very useful to have. Given how long it takes to
build not all the time, but running it every couple of days or weekly
or so would be quite useful. Then we'd catch things earlier and not
have to spend as much time trying to figure out exactly what broke...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] System shutdown signal on Windows (was Re: [GENERAL])

2014-07-23 Thread Tom Lane

Krystian Bigaj  writes:
> - when pg_console_handler receives CTRL_SHUTDOWN_EVENT from OS, then it
> calls pg_queue_signal(SIGINT).

> Problems:
> - when OS is in shutdown path, then it sends CTRL_SHUTDOWN_EVENT, and *all*
> Postgres processes (main and sub/forked) will call pg_queue_signal(SIGINT)
> - so main and sub processes will start to shutdown independently? Can this
> have any bad consequences?

Hm.  We ought to have that sending SIGTERM instead, so as to mimic the
situation when Unix "init" is trying to shut down the system.  It might be
that SIGINT will more or less work, but the postmaster logic is designed
to work with global SIGTERM as being the clean-up-ASAP trigger.  As an
example, backends servicing remote applications (which will *not* have
already gotten killed) would not exit in response to SIGINT.

> I think that CTRL_SHUTDOWN_EVENT should be removed from pg_console_handler,

That does not sound like a good idea, at least not if Windows has the
same behavior as "init" does of proceeding to hard kills after some
grace period.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PDF builds broken again

2014-07-23 Thread Andrew Dunstan



On 07/23/2014 06:31 AM, Magnus Hagander wrote:



Do we actually have any buildfarm boxes building the PDFs? And if so,
any idea why they didn't catch it?



AFAIK, nobody's ever asked for such a thing. The docs optional step just 
builds the default docs target, which is simply the HTML docs.


It should be possible, just a SMOC.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-07-23 Thread Alvaro Herrera

Peter Geoghegan wrote:

> For some reason I thought that that was what Michael was proposing - a
> more comprehensive move of code into core than the structuring that I
> proposed. I actually thought about a Levenshtein distance operator at
> one point months ago, before I entirely gave up on that. The
> MAX_LEVENSHTEIN_STRLEN limitation made me think that the Levenshtein
> distance functions are not suitable for core as is (although that
> doesn't matter for my purposes, since all I need is something that
> accommodates NAMEDATALEN sized strings). MAX_LEVENSHTEIN_STRLEN is a
> considerable limitation for an in-core feature. I didn't get around to
> forming an opinion on how and if that should be fixed.

I had two thoughts:

1. Should we consider making levenshtein available to frontend programs
as well as backend?
2. Would it provide better matching to use Damerau-Levenshtein[1] instead
of raw Levenshtein?

.oO(Would anyone be so bold as to attempt to implement bitap[2] using
bitmapsets ...)

[1] http://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance
[2] http://en.wikipedia.org/wiki/Bitap_algorithm

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shapes on the regression test for polygon

2014-07-23 Thread Tom Lane

Robert Haas  writes:
> On Mon, Jul 21, 2014 at 10:52 AM, Emre Hasegeli  wrote:
>> The first two shapes on src/test/regress/sql/polygon.sql do not make
>> sense to me.

> Well, I think the number of tabs that makes them look best depends on
> your tab-stop setting.  At present, I find that with 8-space tabs
> things seem to line up pretty well, whereas with your patch, 4-space
> tabs line up well.

Perhaps we should expand tabs to spaces in those ascii-art diagrams?

> Personally, though, my vote would be to just leave it the way it is.
> I don't think there's really a problem here in need of solving.

I concur with doubting that changing the actual regression test cases
is a good thing to do, at least not without more analysis.  But "the
pictures make no sense with the wrong tab setting" is a pretty simple
issue, and one that I can see biting people.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-23 Thread Peter Geoghegan

On Wed, Jul 23, 2014 at 9:58 AM, Robert Haas  wrote:
> This all seems completely to one side of Andres's point.  I think what
> he's saying is: don't implement an SQL syntax of the form INSERT ON
> CONFLICT and let people use that to implement upsert.  Instead,
> directly implement an SQL command called UPSERT or similar.  That's
> long been my intuition as well.  I think generality here is not your
> friend.

Sure, I was just making one fairly narrow point in relation to Andres'
concern about executor structuring of the UPSERT.

> I'd suggest something like:
>
> UPSERT table SET col = value [, ...] WHERE predicate;

Without dismissing the overall concern shared by you and Andres, that
particular update-driven syntax doesn't strike me as something that
should be pursued. I would like to be able to insert or update more
than a single row at a time, for one thing. For another, what exactly
appears in the predicate? Is it more or less the same as an UPDATE's
predicate?

> with semantics like this:
>
> 1. Search the table, using any type of scan you like, for a row
> matching the given predicate.

Perhaps I've misunderstood, but this is fundamentally different to
what I'd always thought would need to happen. I always understood that
the UPSERT should be insert-driven, and so not really have an initial
scan at all in the same sense that every Insert lacks one. Moreover,
I've always thought that everyone agreed on that. We go to insert, and
then in the course of inserting index tuples do something with dirty
snapshots. That's how we get information on conflicts.

For one thing we cannot use any kind of scan unless there is a new
mechanism for seeing not-yet-visible tuples, like some kind of
non-MVCC snapshot that those scan nodes can selectively use. Now, I
guess that you intend that in this scenario you go straight to 5, and
then your insert finds the not-yet-visible conflict. But it's not
clear that when 5 fails, and we pick up from 1 that we then get a new
MVCC Snapshot or something else that gives us a hope of succeeding
this time around. And it seems quite wasteful to not use knowledge of
conflicts from the insert at all - that's one thing I'm trying to
address here; removing unnecessary index scans when we actually know
what our conflict TIDs are.

> 2. If you find more than one tuple that is visible to your scan, error.

This point seems to concern making the UPSERT UPDATE only affect zero
or one rows. Is that right? If so, why do you think that's a useful
constraint to impose?

> 3. If you find exactly one tuple that is visible to your scan, update it.  
> Stop.
> 4. If you find no tuple that is visible to your scan, but you do find
> at least one tuple whose xmin has committed and whose xmax has not
> committed, then (4a) if the isolation level is REPEATABLE READ or
> higher, error; (4b) if there is more than one such tuple, error; else
> (4b) update it, in violation of normal MVCC visibility rules.

Point 4b ("if there is more than one such tuple...") seems like it
needs some practical or theoretical justification - do you just not
want to follow an update chain? If so, why? What would the error
message actually be? And, to repeat myself: Why should an MVCC
snapshot see more than one such tuple in the ordinary course of
scanning a relation, since there is by definition a unique constraint
that prevents this? Or maybe I'm wrong; maybe you don't even require
that. That isn't clear.

As you know, I've always opposed any type of READ COMMITTED
serialization failure. If you're worried about lock starvation hazards
- although I don't think strong concerns here are justified - we can
always put in a fail-safe number of retries before throwing an error.
I'm comfortable with that because I think based on experimentation
that it's going to be virtually impossible to hit. Apparently other
snapshot isolation databases acquire a new snapshot, and re-do the
command rather than using an EvalPlanQual()-like mechanism and thereby
violating snapshot isolation. Those other systems have just such a
hard limit on retries before erroring, and it seems to work out okay
for them.

> 5. Construct a new tuple using the col/value pairs from the SET clause
> and try to insert it.  If this would fail with a unique index
> violation, back out and go back to step 1.

Again, I can't see why you'd do this step last and not first, since
this is naturally the place to initially "see into the future" with a
dirty snapshot.

> Having said all that, I believe the INSERT ON CONFLICT syntax is more
> easily comprehensible than previous proposals.  But I still tend to
> agree with Andres that an explicit UPSERT syntax or something like it,
> that captures all of the MVCC games inside itself, is likely
> preferable from a user standpoint, whatever the implementation ends up
> looking like.

Okay then. If you both feel that way, I will come up with something
closer to what you sketch. But for now I still strongly feel it ought
to be driven by a

Re: [HACKERS] Least Active Transaction ID function

2014-07-23 Thread Rohit Goyal

Hi All,

On Wed, Jul 23, 2014 at 5:01 PM, Rohit Goyal  wrote:

> Hi All,
>
> I am doing programming with postgresql source code. I want to find out the
> function which can give me Least active transaction id currenty in the
> system.
>
> Is there any function which can give me that?
>
> Regards,
> Rohit Goyal
>

1> I know that somewhere there is an active transaction list in postgresql.
At any point of time, I want to get the smallest transaction present in
this active tx list or I want to get the transaction id before which all
transaction smaller than that are committed/aborted.

Is there any function which can give me this value?

2> I found a function giving *GetStableLatestTransactionId()*, please tel
me what this function gives. I was not able to understand the description
given above it.

Thanks for support in advance :)!!

Regards,
Rohit Goyal

Re: [HACKERS] Shapes on the regression test for polygon

2014-07-23 Thread Robert Haas

On Mon, Jul 21, 2014 at 10:52 AM, Emre Hasegeli  wrote:
> The first two shapes on src/test/regress/sql/polygon.sql do not make
> sense to me.  They look more like polygons with some more tabs,
> but still did not match the coordinates.  I changed them to make
> consistent with the shapes.  I believe this was the intention of
> the original author.  Patch attached.

Well, I think the number of tabs that makes them look best depends on
your tab-stop setting.  At present, I find that with 8-space tabs
things seem to line up pretty well, whereas with your patch, 4-space
tabs line up well.  Either way, I have no idea what the picture is
supposed to mean, because it looks to me like the original picture has
circles at (3,3) and (2,1) and plus signs at (2,2), (3,1), and (4,0).
With your patch applied, the circle at (2,1) has moved to (2,0.5).  I
can't correlate any of this to the test cases you modified (and which
I see no reason to modify, whether they match the picture or not).

And really, if the diagram is confusing, let's just remove it.  We
don't really need our regression test comments to illustrate what a
triangle looks like, or how to do bad ASCII art.

Personally, though, my vote would be to just leave it the way it is.
I don't think there's really a problem here in need of solving.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checkpointer crashes on slave in 9.4 on windows

2014-07-23 Thread Robert Haas

On Mon, Jul 21, 2014 at 4:16 AM, Amit Kapila  wrote:
> During internals tests, it is observed that checkpointer
> is getting crashed on slave with below log on slave in
> windows:
>
> LOG:  checkpointer process (PID 4040) was terminated by exception 0xC005
> HINT:  See C include file "ntstatus.h" for a description of the hexadecimal
> value.
> LOG:  terminating any other active server processes
>
> I debugged and found that it is happening when checkpointer
> tries to update shared memory config and below is the
> call stack.
>
>> postgres.exe!LWLockAcquireCommon(LWLock * l=0x, LWLockMode
>> mode=LW_EXCLUSIVE, unsigned __int64 * valptr=0x0020, unsigned
>> __int64 val=18446744073709551615)  Line 579 + 0x14 bytes C
>   postgres.exe!LWLockAcquireWithVar(LWLock * l=0x, unsigned
> __int64 * valptr=0x0020, unsigned __int64
> val=18446744073709551615)  Line 510 C
>   postgres.exe!WALInsertLockAcquireExclusive()  Line 1627 C
>   postgres.exe!UpdateFullPageWrites()  Line 9037 C
>   postgres.exe!UpdateSharedMemoryConfig()  Line 1364 C
>   postgres.exe!CheckpointerMain()  Line 359 C
>   postgres.exe!AuxiliaryProcessMain(int argc=2, char * *
> argv=0x007d2180)  Line 427 C
>   postgres.exe!SubPostmasterMain(int argc=4, char * *
> argv=0x007d2170)  Line 4635 C
>   postgres.exe!main(int argc=4, char * * argv=0x007d2170)  Line 207
> C
>
> Basically, here the issue is that during startup when
> checkpointer tries to acquire WAL Insertion Locks to
> update the value of fullPageWrites, it crashes because
> the same is still not initialized. It will be initialized in
> InitXLOGAccess() which will get called via RecoveryInProgress()
> in case recovery is in progress before doing actual checkpoint.
> However we are trying to access it before that which leads to
> crash.
>
> I think the reason why it occurs only on windows is that
> on linux fork will ensure that WAL Insertion Locks get
> initialized with same values as postmaster.
>
> To fix this issue, we need to ensure that WAL Insertion
> Locks should get initialized before we use them, so one of
> the ways is to call InitXLOGAccess() before calling
> CheckPointerMain() as I have done in attached patch, other
> could be to call RecoveryInProgess() much earlier in path
> than now.

So, this problem was introduced by Heikki's commit,
68a2e52bbaf98f136a96b3a0d734ca52ca440a95, to replace XLogInsert slots
with regular LWLocks.   I think the problem here is that the
initialization code here really doesn't belong in InitXLOGAccess at
all:

1. I think WALInsertLocks is just another global variable that needs
to be saved and restored in EXEC_BACKEND mode and that it therefore
ought to participate in the save_backend_variables() mechanism instead
of having its own special-purpose mechanism to save and restore the
value.

2. And I think that the LWLockRegisterTranche call belongs in
XLOGShmeInit(), so that it's parallel to the other call in
CreateLWLocks.

I think that would be more robust, because while your fix will
definitely work, we could easily reintroduce a similar
platform-specific bug for some other auxiliary process.  Using the
mechanisms described above will mean that this is set up properly for
everything that's attached to shared memory at all.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-07-23 Thread Peter Geoghegan

On Wed, Jul 23, 2014 at 8:57 AM, Robert Haas  wrote:
> There are several possible methods of doing that, but I think the best
> one is just to leave the SQL-callable C functions in fuzzystrmatch and
> move only the underlying code that supports into core.

For some reason I thought that that was what Michael was proposing - a
more comprehensive move of code into core than the structuring that I
proposed. I actually thought about a Levenshtein distance operator at
one point months ago, before I entirely gave up on that. The
MAX_LEVENSHTEIN_STRLEN limitation made me think that the Levenshtein
distance functions are not suitable for core as is (although that
doesn't matter for my purposes, since all I need is something that
accommodates NAMEDATALEN sized strings). MAX_LEVENSHTEIN_STRLEN is a
considerable limitation for an in-core feature. I didn't get around to
forming an opinion on how and if that should be fixed.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IS NOT DISTINCT FROM + Indexing

2014-07-23 Thread Alvaro Herrera

Jonathan S. Katz wrote:

> Well that definitely answers "how hard would it be." - before
> embarking on something laborious (as even just indexing is
> nontrivial), I think it would be good to figure out how people are
> using IS [NOT] DISTINCT FROM and if there is interest in having it be
> indexable, let alone used in a JOIN optimization.  It could become a
> handy tool to simplify the SQL in queries that are returning a lot of
> NULL / NOT NULL data mixed together.

FWIW my project (abandoned for now, but I intend to get back to it
someday) to catalog NOT NULL constraints in pg_constraint had me
rewriting them into some kind of IS DISTINCT FROM NULL expression.  (It
was IS NOT NULL initially until somebody pointed out that that wouldn't
work for composite type columns).  I'm not sure how that interacts with
what you're doing here, maybe not at all; I thought I'd mention it.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PDF builds broken again

2014-07-23 Thread Tom Lane

I wrote:
> Oh ... the TUG page now has a recipe for finding the problem less
> painfully, which I don't recall having seen before:
> http://ftp.tug.org/mail-archives/pdftex/2002-February/002216.html
> In short, you can add a "draft" option that lets PDF output get generated
> anyway, and then you can actually see exactly what link is getting
> split.

Unfortunately, that recipe seems to have little to do with current reality
:-(.  Perhaps there's a \usepackage{hyperref} somewhere in the style files
JadeTeX provides, but it's sure not in the tex-pdf files we generate,
so editing it isn't a convenient solution.

What I did to locate the problem was to add some garbage text in the
para you identified, so that the A4 PDF would build again, and then look
at the output.  The problem turns out to be in the *next* para, which in
A4 format breaks like this:

pg_relation_size accepts the OID or name of a table, index or toast table, and 
returns the on-
disk size in bytes. Specifying âmainâ or leaving out the second argument 
returns the size of the
main data fork of the relation. Specifying âfsmâ returns the size of the 
Free Space Map (see Section
54.3) associated with the relation. Specifying âvmâ returns the size of the 
Visibility Map (see Sec-
tion 54.4) associated with the relation. Note that this function shows the size 
of only one fork; for
most purposes it is more convenient to use the higher-level functions 
pg_total_relation_size or
pg_table_size.

So both the FSM and VM section-reference hyperlinks break across lines,
and one or the other is falling across a page boundary in the problematic
build.  I think we'd better rearrange this para to avoid problems in
future, especially in case somebody decides to invent more forks.

I remember thinking that the possible options would look better if broken
out as an itemizedlist anyway.  Let me try that ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IS NOT DISTINCT FROM + Indexing

2014-07-23 Thread Kevin Grittner

Jonathan S. Katz  wrote:

> before embarking on something laborious (as even just indexing
> is nontrivial), I think it would be good to figure out how people
> are using IS [NOT] DISTINCT FROM and if there is interest in
> having it be indexable, let alone used in a JOIN optimization.
> It could become a handy tool to simplify the SQL in queries that
> are returning a lot of NULL / NOT NULL data mixed together.

To prevent subtle inconsistencies, I think we would need to limit
support to data types with a btree opclass which uses "=" as the
equality operator on indexes using that opclass (either by default
or explicitly).  That limitation would still allow a lot of useful
cases.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgresql.auto.conf and reload

2014-07-23 Thread Alvaro Herrera

Tom Lane wrote:
> Alvaro Herrera  writes:
> > Tom Lane wrote:
> >> No, ALTER SYSTEM is there now and it needs to work right in its first
> >> release.  I will go fix this if nobody else does.
> 
> > Just checking -- you didn't get around to dealing with this, right?
> 
> Not yet... do you want it?

I might give it a try, but not just yet.  I'll let you know if I attempt
anything.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Index-only scans for multicolumn GIST

2014-07-23 Thread Emre Hasegeli

> That seems like a nonstarter :-(.  Index-only scans don't have a license
> to return approximations.  If we drop the behavior for circles, how much
> functionality do we have left?

It should work with exact operator classes, box_ops, point_ops,
range_ops, inet_ops.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-23 Thread Robert Haas

On Sat, Jul 19, 2014 at 10:04 PM, Peter Geoghegan  wrote:
> On Fri, Jul 18, 2014 at 11:23 AM, Andres Freund  
> wrote:
>> Meh. A understandable syntax wouldn't require the pullups with a special
>> scan node and such.
>
> Well, in general ExecModifyTable()/ExecUpdate() trusts the tid passed
> to match the qual in the query. Unless you're willing to give up on
> the idea of having a qual on the update (other than something like an
> ON CONFLICTS qual, obviously) I think you'll probably end up with some
> kind of pseudo-scan node anyway, if only for consistency with how
> things already work, to give you somewhere to show the Filter in
> explain output and so on. ExecScan() probably needs to ExecQual().
> Inserts don't have scan nodes, but updates do, and so it seems pretty
> odd to make the "Insert" node (a ModifyTable node) pullup from a
> result node (as always), and the "Update" node (also a ModifyTable
> node) treat the first ModifyTable node as its own pseudo-scan node,
> when in fact we are scanning the heap in a manner not unlike a
> conventional update. Or do you envision a second result node where an
> update qual might be evaluated? I am not enthused about either
> possibility.
>
> The idea of not having the ability to put a predicate on the update at
> all, much like the MySQL thing isn't completely outrageous, but it
> certainly isn't going to go down well those that have already
> expressed concerns about how much of a foot gun this could be.

This all seems completely to one side of Andres's point.  I think what
he's saying is: don't implement an SQL syntax of the form INSERT ON
CONFLICT and let people use that to implement upsert.  Instead,
directly implement an SQL command called UPSERT or similar.  That's
long been my intuition as well.  I think generality here is not your
friend.

I'd suggest something like:

UPSERT table SET col = value [, ...] WHERE predicate;

with semantics like this:

1. Search the table, using any type of scan you like, for a row
matching the given predicate.
2. If you find more than one tuple that is visible to your scan, error.
3. If you find exactly one tuple that is visible to your scan, update it.  Stop.
4. If you find no tuple that is visible to your scan, but you do find
at least one tuple whose xmin has committed and whose xmax has not
committed, then (4a) if the isolation level is REPEATABLE READ or
higher, error; (4b) if there is more than one such tuple, error; else
(4b) update it, in violation of normal MVCC visibility rules.
5. Construct a new tuple using the col/value pairs from the SET clause
and try to insert it.  If this would fail with a unique index
violation, back out and go back to step 1.

Having said all that, I believe the INSERT ON CONFLICT syntax is more
easily comprehensible than previous proposals.  But I still tend to
agree with Andres that an explicit UPSERT syntax or something like it,
that captures all of the MVCC games inside itself, is likely
preferable from a user standpoint, whatever the implementation ends up
looking like.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gaussian distribution pgbench -- splits v4

2014-07-23 Thread Fabien COELHO



Hello Robert,


Some review comments:


Thanks a lot for your return.

Please find attached two new parts of the patch (A for setrandom 
extension, B for pgbench embedded test case extension).



1. I suggest that getExponentialrand and getGaussianrand be renamed to
getExponentialRand and getGaussianRand.


Done.

It was named like that because "getrand" was used for the uniform case.



2. I suggest that the code be changed so that the branch currently
labeled as /* uniform with extra argument */ become a hard error
instead of a warning.

3. Similarly, I suggest that the use of gaussian or uniform be an
error when argc < 6 OR argc > 6.  I also suggest that the
parenthesized distribution type be dropped from the error message in
all cases.


I wish to agree, but my interpretation of the previous code is that they 
were ignored before, so ISTM that we are stuck with keeping the same 
unfortunate behavior.



4. This question mark seems like it should be a period:

+  * value fails the test? To be on the safe side, let us try over.


Indeed.


5. With regards to the following paragraph:

 
+  The default random distribution is uniform, that is all values in the
+  range are drawn with equal probability. The gaussian and exponential
+  options allow to change this default. The mandatory
+  threshold double value controls the actual distribution
+  with gaussian or exponential.
+ 

This paragraph needs a bit of copy-editing.  Here's an attempt: "By
default, all values in the range are drawn with equal probability.
The gaussian and exponential options modify
this behavior; each requires a mandatory threshold which determines
the precise shape of the distribution."  The following paragraph
should be changed to begin with "For a Gaussian distribution" and the
one after "For an exponential distribution".


Ok. I've kept "uniform" in the first sentence, because this is both
an option name and it is significant in term of probabilities.


6. Overall, I think the documentation here looks much better now, but
I suggest adding one or two example to the Gaussian section.  Like
this: for example, if threshold is 2.0, 68% of the values will fall in
the middle third of the interval; with a threshold of 3.0, 99.7% of
the values will fall in the middle third of the interval.  These
numbers are fabricated, and the middle third of the interval might not
be the best part to talk about, but you get the idea (I hope).


Done with threshold value 4.0 so I have a "middle quarter" and a "middle 
half".


--
Fabien.diff --git a/contrib/pgbench/README b/contrib/pgbench/README
new file mode 100644
index 000..6881256
--- /dev/null
+++ b/contrib/pgbench/README
@@ -0,0 +1,5 @@
+# gaussian and exponential tests
+# with XXX as "expo" or "gauss"
+psql test < test-init.sql
+./pgbench -M prepared -f test-XXX-run.sql -t 100 -P 1 -n test
+psql test < test-XXX-check.sql
diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 4aa8a50..e07206a 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -98,6 +98,8 @@ static int	pthread_join(pthread_t th, void **thread_return);
 #define LOG_STEP_SECONDS	5	/* seconds between log messages */
 #define DEFAULT_NXACTS	10		/* default nxacts */
 
+#define MIN_GAUSSIAN_THRESHOLD		2.0	/* minimum threshold for gauss */
+
 int			nxacts = 0;			/* number of transactions per client */
 int			duration = 0;		/* duration in seconds */
 
@@ -471,6 +473,76 @@ getrand(TState *thread, int64 min, int64 max)
 	return min + (int64) ((max - min + 1) * pg_erand48(thread->random_state));
 }
 
+/*
+ * random number generator: exponential distribution from min to max inclusive.
+ * the threshold is so that the density of probability for the last cut-off max
+ * value is exp(-threshold).
+ */
+static int64
+getExponentialRand(TState *thread, int64 min, int64 max, double threshold)
+{
+	double cut, uniform, rand;
+	Assert(threshold > 0.0);
+	cut = exp(-threshold);
+	/* erand in [0, 1), uniform in (0, 1] */
+	uniform = 1.0 - pg_erand48(thread->random_state);
+	/*
+	 * inner expresion in (cut, 1] (if threshold > 0),
+	 * rand in [0, 1)
+	 */
+	Assert((1.0 - cut) != 0.0);
+	rand = - log(cut + (1.0 - cut) * uniform) / threshold;
+	/* return int64 random number within between min and max */
+	return min + (int64)((max - min + 1) * rand);
+}
+
+/* random number generator: gaussian distribution from min to max inclusive */
+static int64
+getGaussianRand(TState *thread, int64 min, int64 max, double threshold)
+{
+	double		stdev;
+	double		rand;
+
+	/*
+	 * Get user specified random number from this loop, with
+	 * -threshold < stdev <= threshold
+	 *
+	 * This loop is executed until the number is in the expected range.
+	 *
+	 * As the minimum threshold is 2.0, the probability of looping is low:
+	 * sqrt(-2 ln(r)) <= 2 => r >= e^{-2} ~ 0.135, then when taking the average
+	 * sinus multiplier as 2/pi, we have a 8.6% looping probability in the
+	 * worst case. For a 5.0

Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

2014-07-23 Thread MauMau


From: "Tom Lane" 

This seems like a pretty unsafe suggestion, because the smgr level doesn't
know or care whether relations are temp; files are files.  In any case it
would only paper over one specific instance of whatever problem you're
worried about, because sinval messages definitely do need to be sent in
general.


I'm sorry I don't show the exact problem yet.  Apart from that, I understood 
that you insist it's not appropriate for smgr to be aware of whether the 
file is a temporary relation, in terms of module layering.  However, it 
doesn't seem so in the current implementation.  md.c, which is a layer under 
or part of smgr, uses SmgrIsTemp().  In addition, as the name suggests, 
SmgrIsTemp() is a function of smgr, which is defined in smgr.h.  So, it's 
not inappropriate for smgr to use it.


What I wanted to ask is whether and why sinval messages are really necessary 
for session-private objects like temp relations.  I thought shared inval is, 
as the name indicates, for objects "shared" among sessions.  If so, sinval 
for session-private objects doesn't seem to match the concept.


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-07-23 Thread Tom Lane

Robert Haas  writes:
> There are several possible methods of doing that, but I think the best
> one is just to leave the SQL-callable C functions in fuzzystrmatch and
> move only the underlying code that supports into core.

I hadn't been paying close attention to this thread, but I'd just assumed
that that would be the approach.

It might be worth introducing new differently-named pg_proc entries for
the same functions in core, but only if we can agree that there are better
names for them than what the extension uses.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-07-23 Thread Robert Haas

On Thu, Jul 17, 2014 at 9:34 AM, Michael Paquier
 wrote:
> Patch 1 does a couple of things:
> - fuzzystrmatch is dumped to 1.1, as Levenshtein functions are not part of
> it anymore, and moved to core.
> - Removal of the LESS_EQUAL flag that made the original submission patch
> harder to understand. All the Levenshtein functions wrap a single common
> function.
> - Documentation is moved, and regression tests for Levenshtein functions are
> added.
> - Functions with costs are renamed with a suffix with costs.
> After hacking this feature, I came up with the conclusion that it would be
> better for the user experience to move directly into backend code all the
> Levenshtein functions, instead of only moving in the common wrapper as Peter
> did in his original patches. This is done this way to avoid keeping portions
> of the same feature in two different places of the code (backend with common
> routine, fuzzystrmatch with levenshtein functions) and concentrate all the
> logic in a single place. Now, we may as well consider renaming the
> levenshtein functions into smarter names, like str_distance, and keep
> fuzzystrmatch to 1.0, having the functions levenshteing_* calling only the
> str_distance functions.

This is not cool.  Anyone who is running a 9.4 or earlier database
using fuzzystrmatch and upgrades, either via dump-and-restore or
pg_upgrade, to a version with this patch applied will have a broken
database.  They will still have the catalog entries for the 1.0
definitions, but those definitions won't be resolvable inside the new
cluster's .so file.  The user will get a fairly-unfriendly error
message that won't go away until they upgrade the extension, which may
involve dealing with dependency hell since the new definitions are in
a different place than the old definitions, and there may be
dependencies on the old definitions.  One of the great advantages of
extension packaging is that this kind of problem is quite easily
avoidable, so let's avoid it.

There are several possible methods of doing that, but I think the best
one is just to leave the SQL-callable C functions in fuzzystrmatch and
move only the underlying code that supports into core.  Then, the
whole thing will be completely transparent to users.  They won't need
to upgrade their fuzzystrmatch definitions at all, and everything will
just work; under the covers, the fuzzystrmatch code will now be
calling into core code rather than to code located in that same
module, but the user doesn't need to know or care about that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gaussian distribution pgbench -- part 1/2

2014-07-23 Thread Robert Haas

On Thu, Jul 17, 2014 at 12:09 AM, Fabien COELHO  wrote:
> pgbench with gaussian & exponential, part 1 of 2.
>
> This patch is a subset of the previous patch which only adds the two
> new \setrandom gaussian and exponantial variants, but not the
> adapted pgbench test cases, as suggested by Fujii Masao.
> There is no new code nor code changes.
>
> The corresponding documentation has been yet again extended wrt
> to the initial patch, so that what is achieved is hopefully unambiguous
> (there are two mathematical formula, tasty!), in answer to Andres Freund
> comments, and partly to Robert Haas comments as well.
>
> This patch also provides several sql/pgbench scripts and a README, so
> that the feature can be tested. I do not know whether these scripts
> should make it to postgresql. I would say yes, otherwise there is no way
> to test...
>
> part 2 which provide adapted pgbench test cases will come later.

Some review comments:

1. I suggest that getExponentialrand and getGaussianrand be renamed to
getExponentialRand and getGaussianRand.

2. I suggest that the code be changed so that the branch currently
labeled as /* uniform with extra argument */ become a hard error
instead of a warning.

3. Similarly, I suggest that the use of gaussian or uniform be an
error when argc < 6 OR argc > 6.  I also suggest that the
parenthesized distribution type be dropped from the error message in
all cases.

4. This question mark seems like it should be a period:

+* value fails the test? To be on the safe side, let
us try over.

5. With regards to the following paragraph:

+  The default random distribution is uniform, that is all values in the
+  range are drawn with equal probability. The gaussian and exponential
+  options allow to change this default. The mandatory
+  threshold double value controls the actual distribution
+  with gaussian or exponential.
+ 

This paragraph needs a bit of copy-editing.  Here's an attempt: "By
default, all values in the range are drawn with equal probability.
The gaussian and exponential options modify
this behavior; each requires a mandatory threshold which determines
the precise shape of the distribution."  The following paragraph
should be changed to begin with "For a Gaussian distribution" and the
one after "For an exponential distribution".

6. Overall, I think the documentation here looks much better now, but
I suggest adding one or two example to the Gaussian section.  Like
this: for example, if threshold is 2.0, 68% of the values will fall in
the middle third of the interval; with a threshold of 3.0, 99.7% of
the values will fall in the middle third of the interval.  These
numbers are fabricated, and the middle third of the interval might not
be the best part to talk about, but you get the idea (I hope).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Production block comparison facility

2014-07-23 Thread Simon Riggs

On 23 July 2014 15:14, Michael Paquier  wrote:

> I have spent some time digging more into this idea and finished with the
> patch attached

Thank you for investigating the idea. I'll review by Monday.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

2014-07-23 Thread Tom Lane

"MauMau"  writes:
> Looking at smgrtruncate(), the sinval message is sent even when the 
> truncated relation is a temporary relation.  However, I think the sinval 
> message is not necessary for temp relations, because each session doesn't 
> see the temp relations of other sessions.

This seems like a pretty unsafe suggestion, because the smgr level doesn't
know or care whether relations are temp; files are files.  In any case it
would only paper over one specific instance of whatever problem you're
worried about, because sinval messages definitely do need to be sent in
general.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Least Active Transaction ID function

2014-07-23 Thread Rohit Goyal

Hi All,

I am doing programming with postgresql source code. I want to find out the
function which can give me Least active transaction id currenty in the
system.

Is there any function which can give me that?

Regards,
Rohit Goyal

Re: [HACKERS] PDF builds broken again

2014-07-23 Thread Tom Lane

Magnus Hagander  writes:
> Have you figured out any way to actually track down which para has the
> problem itself, or is it all manual work?

Oh ... the TUG page now has a recipe for finding the problem less
painfully, which I don't recall having seen before:

http://ftp.tug.org/mail-archives/pdftex/2002-February/002216.html

In short, you can add a "draft" option that lets PDF output get generated
anyway, and then you can actually see exactly what link is getting
split.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PDF builds broken again

2014-07-23 Thread Tom Lane

Magnus Hagander  writes:
> On Wed, Jul 23, 2014 at 4:06 PM, Tom Lane  wrote:
>> A more robust fix would be to identify the para where the problem actually
>> is and re-word it so that the link doesn't cross a *line* boundary (in
>> either US or A4).  That makes it safe as long as that particular para
>> doesn't get reworded in future; whereas with what you did, addition or
>> subtraction of a line anywhere in a pretty broad range could resurrect
>> the issue.

> Hmm. Good point. OTOH it only showed up in the backbranch (and only in
> one of them), so I figured we might get away with it.

> Have you figured out any way to actually track down which para has the
> problem itself, or is it all manual work?

My recollection is it takes a bit of detective work but you can generally
figure it out by eyeballing the TeX input file around the complained-of
line number.  The first trick is that our makefiles think the TeX input
file is temp and delete it on failure, so you need to ask for it to be
built not the PDF file.  The second trick is that the line number is not
spot-on; the error seems to come out only when TeX decides where it's
going to break the page, which it won't do until it has absorbed a few
more lines than actually end up on the page.

I think I've posted more details in some past thread on the issue ...
oh, here:
http://www.postgresql.org/message-id/9473.1296172...@sss.pgh.pa.us

>> Of course, it would be a lot better if the toolchain didn't have this
>> limitation (or at least managed to report it more usefully).  I'm not
>> holding my breath for that to happen though.

> Yeah, they would probably have done it years ago if they were going to at 
> all...

IIRC there actually was a patch that purported to fix this a year or so
ago, but it didn't help :-(

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

2014-07-23 Thread MauMau


Hello,

I'm investigating a mysterious hang problem on PostgreSQL 9.2.8.  If many 
sessions use temporary tables whose rows are deleted on commit, the hang 
occurs.  I'd like to show you the stack trace, but I'm trying to figure out 
how to reproduce the problem.  IIRC, the stack trace was as follows.  The 
standby server was running normally.


...
SyncRepWaitForLSN
CommitTransaction
CommitTransactionCommand
ProcessCatchupEvent
HandleCatchupInterrupt
procsignal_sigusr1_handler

recv
...
ReadCommand
PostgresMain
...


Looking at smgrtruncate(), the sinval message is sent even when the 
truncated relation is a temporary relation.  However, I think the sinval 
message is not necessary for temp relations, because each session doesn't 
see the temp relations of other sessions.  So, the following code seems 
better.  This avoids sinval queue overflow which leads to SIGUSR1.  Is this 
correct?


if (SmgrIsTemp(reln))
   /* Do his on behalf of sinval message handler */
   smgrclosenode(reln->smgr_rnode);
else
   CacheInvalidateSmgr(reln->smgr_rnode);


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inconsistencies of service failure handling on Windows

2014-07-23 Thread Michael Paquier

On Wed, Jul 23, 2014 at 11:22 PM, Tom Lane  wrote:

> Was that a backend that you directly killed?  Or the postmaster?  The
> subsequent connection failures suggest it was the postmaster.  Killing
> the postmaster is not a supported operation, not on Windows and not
> anywhere else either.  It's in the category of "doctor, it hurts when
> I do this".
>
The headshot was done on random backends. Perhaps in some of those tests
the postmaster was taken down though :) I didn't check postmaster.pid all
the time.
-- 
Michael

Re: [HACKERS] Inconsistencies of service failure handling on Windows

2014-07-23 Thread Tom Lane

Michael Paquier  writes:
> While playing on Windows with services, I noticed an inconsistent behavior
> in the way failures are handled when using a service for a Postgres
> instance.
> ...
> However when a backend process is directly killed something different
> happens.

Was that a backend that you directly killed?  Or the postmaster?  The
subsequent connection failures suggest it was the postmaster.  Killing
the postmaster is not a supported operation, not on Windows and not
anywhere else either.  It's in the category of "doctor, it hurts when
I do this".

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Production block comparison facility

2014-07-23 Thread Michael Paquier

On Tue, Jul 22, 2014 at 4:49 PM, Michael Paquier 
wrote:

> Then, looking at the code, we would need to tweak XLogInsert for the
> WAL record construction to always do a FPW and to update
> XLogCheckBufferNeedsBackup. Then for the redo part, we would need to
> do some extra operations in the area of
> RestoreBackupBlock/RestoreBackupBlockContents, including masking
> operations before comparing the content of the FPW and the current
> page.
>
> Does that sound right?
>

I have spent some time digging more into this idea and finished with the
patch attached, doing the following: addition of a consistency check when
FPW is restored and applied on a given page.

The consistency check is made of two phases:
- Apply a mask on the FPW and the current page to eliminate potential
conflicts like hint bits for example.
- Check that the FPW is consistent with the current page, aka the current
page does not contain any new information that the FPW taken has not. This
is done by checking the masked portions of the FPW and the current page.
Also some more details:
- If an inconsistency is found, a WARNING is simply logged.
- The consistency check is done if current page is not empty, and if
database has reached a consistent state.
- The page masking API is taken from the WAL replay patch that was
submitted in CF1 and plugged in as an independent set of API.
- In masking stuff, to facilitate if a page is used by a sequence relation
SEQ_MAGIC as well as the its opaque data structure are renamed and moved
into sequence.h.
- To facilitate debugging and comparison, the masked FPW and current page
are also converted into hex.
Things could be refactored and improved for sure, but this patch is already
useful as-is so I am going to add it to the next commit fest.

Comments are welcome.
Regards,
-- 
Michael
From 5a05a3751ae278ba8eb7b79f50a4f7b652a1179c Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Wed, 23 Jul 2014 23:11:10 +0900
Subject: [PATCH] Add  facility to check FPW consistency at WAL replay

The consistency check on FDW is made of two phases each time a FPW is
restored in place of an existing page::
- Apply a mask on the FPW and the current page to eliminate potential
conflicts like hint bits for example.
- Check that the FPW is consistent with the current page, aka the current
page does not contain any new information that the FPW taken has not.
This is done by checking the masked portions of the FPW and the current
page.
- If an inconsistency is found, a WARNING is simply logged.
This check is done only if current page is not empty and if database has
reached a consistent check.
---
 src/backend/access/transam/xlog.c| 104 
 src/backend/commands/sequence.c  |  34 ++--
 src/backend/storage/buffer/Makefile  |   2 +-
 src/backend/storage/buffer/bufmask.c | 301 +++
 src/include/commands/sequence.h  |  13 ++
 src/include/storage/bufmask.h|  21 +++
 6 files changed, 452 insertions(+), 23 deletions(-)
 create mode 100644 src/backend/storage/buffer/bufmask.c
 create mode 100644 src/include/storage/bufmask.h

diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index e4f9595..a383126 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -47,6 +47,7 @@
 #include "replication/walsender.h"
 #include "storage/barrier.h"
 #include "storage/bufmgr.h"
+#include "storage/bufmask.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
 #include "storage/large_object.h"
@@ -4065,6 +4066,109 @@ RestoreBackupBlockContents(XLogRecPtr lsn, BkpBlock 
bkpb, char *blk,
 
page = (Page) BufferGetPage(buffer);
 
+   /*
+* Before saving the new contents of the backup block ensure that
+* it is consistent with the existing one. Apply masking to it and
+* then perform a comparison with what is going to be added. Do
+* only this operation once a consistent state has been reached by
+* server and only if the page that is being rewritten is not empty
+* to have a clean base for comparison.
+*/
+   if (reachedConsistency &&
+   !PageIsEmpty(page))
+   {
+   RelFileNode rnode;
+   ForkNumber  forknum;
+   BlockNumber blkno;
+   char *norm_new_page, *norm_old_page;
+   char *blk_data = (char *) palloc(BLCKSZ);
+   charold_buf[BLCKSZ * 2];
+   charnew_buf[BLCKSZ * 2];
+   int j = 0;
+   int i;
+   boolinconsistent = false;
+
+   /*
+* Block data needs to be reformated correctly, especially if
+* it has a hole. This is the same procssing as below but
+* replicating this code saves memcpy calls if consistency
+* checks cannot be done on this backup block.
+*/
+   if (bkpb.h

Re: [HACKERS] PDF builds broken again

2014-07-23 Thread Magnus Hagander

On Wed, Jul 23, 2014 at 4:06 PM, Tom Lane  wrote:
> Magnus Hagander  writes:
>> On Wed, Jul 23, 2014 at 12:31 PM, Magnus Hagander  
>> wrote:
>>> ! pdfTeX error (ext4): \pdfendlink ended up in different nesting level than 
>>> \pd
>
>> Additional point of info - the -US pdf's do build on this version,
>> just not the -A4.
>
>> And with even more of those entries about overfull hbox, so clearly
>> that was not the actual breakage.
>
> Yeah.  What this actually is is the symptom of  text crossing a page
> boundary.  The patch you made did not fix the problem (because there's no
> hyperlink anywhere in that para); you just moved the problematic line
> pair, which must be somewhere below here, up or down so it didn't fall
> across a page break.

Right - it fixed the symptoms only. (And now that you mention it I do
remember the thing about ).


> A more robust fix would be to identify the para where the problem actually
> is and re-word it so that the link doesn't cross a *line* boundary (in
> either US or A4).  That makes it safe as long as that particular para
> doesn't get reworded in future; whereas with what you did, addition or
> subtraction of a line anywhere in a pretty broad range could resurrect
> the issue.

Hmm. Good point. OTOH it only showed up in the backbranch (and only in
one of them), so I figured we might get away with it.

Have you figured out any way to actually track down which para has the
problem itself, or is it all manual work?


> Of course, it would be a lot better if the toolchain didn't have this
> limitation (or at least managed to report it more usefully).  I'm not
> holding my breath for that to happen though.

Yeah, they would probably have done it years ago if they were going to at all...


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PDF builds broken again

2014-07-23 Thread Tom Lane

Magnus Hagander  writes:
> On Wed, Jul 23, 2014 at 12:31 PM, Magnus Hagander  wrote:
>> ! pdfTeX error (ext4): \pdfendlink ended up in different nesting level than 
>> \pd

> Additional point of info - the -US pdf's do build on this version,
> just not the -A4.

> And with even more of those entries about overfull hbox, so clearly
> that was not the actual breakage.

Yeah.  What this actually is is the symptom of  text crossing a page
boundary.  The patch you made did not fix the problem (because there's no
hyperlink anywhere in that para); you just moved the problematic line
pair, which must be somewhere below here, up or down so it didn't fall
across a page break.

A more robust fix would be to identify the para where the problem actually
is and re-word it so that the link doesn't cross a *line* boundary (in
either US or A4).  That makes it safe as long as that particular para
doesn't get reworded in future; whereas with what you did, addition or
subtraction of a line anywhere in a pretty broad range could resurrect
the issue.

Of course, it would be a lot better if the toolchain didn't have this
limitation (or at least managed to report it more usefully).  I'm not
holding my breath for that to happen though.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PDF builds broken again

2014-07-23 Thread Magnus Hagander

On Wed, Jul 23, 2014 at 3:15 PM, Magnus Hagander  wrote:
> On Wed, Jul 23, 2014 at 1:31 PM, Magnus Hagander  wrote:
>> On Wed, Jul 23, 2014 at 12:31 PM, Magnus Hagander  
>> wrote:
>>> It seems at least the 9.0 PDFs are broken (trying to build for the release):
>>>
>>> Lots of errors/warnings (and AFAIK no way to see which is which in the
>>> output), but It hink this is the telltale as usual:
>>>
>>> Overfull \hbox (7.12454pt too wide) in paragraph at lines 88092--88092
>>>  []\T1/pcr/m/n/9 CREATE FUNCTION getf1(myrowtype) RETURNS int AS 'SELECT 
>>> $1.f1'
>>>  LANGUAGE SQL;[]
>>>  []
>>> 
>>> and many more like it until
>>> 
>>> Overfull \hbox (1.5pt too wide) in alignment at lines 241488--241741
>>>  [] [] []
>>>  []
>>>
>>> [256.0.1
>>> ! pdfTeX error (ext4): \pdfendlink ended up in different nesting level than 
>>> \pd
>>> fstartlink.
>>> \AtBegShi@Output ...ipout \box \AtBeginShipoutBox
>>>   \fi \fi
>>> l.241875 ...char95{}stat\char95{}file('filename');
>>>
>>>
>>>
>>> Here is how much of TeX's memory you used:
>>>  22467 strings out of 482156
>>>  171125 string characters out of 3785924
>>>  308594 words of memory out of 3085000
>>>  27304 multiletter control sequences out of 15000+50
>>>  80861 words of font info for 131 fonts, out of 300 for 9000
>>>  14 hyphenation exceptions out of 8191
>>>  30i,12n,43p,307b,1338s stack positions out of 
>>> 1500i,500n,1500p,20b,5s
>>> !  ==> Fatal error occurred, no output PDF file produced!
>>>
>>>
>>>
>>>
>>> Do we actually have any buildfarm boxes building the PDFs? And if so,
>>> any idea why they didn't catch it?
>>>
>>> Do we have a reasonable way to figure out which commit actually broke
>>> it, other than manually testing backing out each of the 11 commits
>>> since 9.0.17?
>>
>> Additional point of info - the -US pdf's do build on this version,
>> just not the -A4.
>>
>> And with even more of those entries about overfull hbox, so clearly
>> that was not the actual breakage.
>
> And with some dissecting, the offending patch is
> 6b2a1445ec8a631060c4cbff3f172bf31d3379b9.

I ended up splitting the paragraph in two in order to get the PDFs to
build. I've applied a patch for this to 9.0 only so we can keep
building PDFs.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PDF builds broken again

2014-07-23 Thread Magnus Hagander

On Wed, Jul 23, 2014 at 1:31 PM, Magnus Hagander  wrote:
> On Wed, Jul 23, 2014 at 12:31 PM, Magnus Hagander  wrote:
>> It seems at least the 9.0 PDFs are broken (trying to build for the release):
>>
>> Lots of errors/warnings (and AFAIK no way to see which is which in the
>> output), but It hink this is the telltale as usual:
>>
>> Overfull \hbox (7.12454pt too wide) in paragraph at lines 88092--88092
>>  []\T1/pcr/m/n/9 CREATE FUNCTION getf1(myrowtype) RETURNS int AS 'SELECT 
>> $1.f1'
>>  LANGUAGE SQL;[]
>>  []
>> 
>> and many more like it until
>> 
>> Overfull \hbox (1.5pt too wide) in alignment at lines 241488--241741
>>  [] [] []
>>  []
>>
>> [256.0.1
>> ! pdfTeX error (ext4): \pdfendlink ended up in different nesting level than 
>> \pd
>> fstartlink.
>> \AtBegShi@Output ...ipout \box \AtBeginShipoutBox
>>   \fi \fi
>> l.241875 ...char95{}stat\char95{}file('filename');
>>
>>
>>
>> Here is how much of TeX's memory you used:
>>  22467 strings out of 482156
>>  171125 string characters out of 3785924
>>  308594 words of memory out of 3085000
>>  27304 multiletter control sequences out of 15000+50
>>  80861 words of font info for 131 fonts, out of 300 for 9000
>>  14 hyphenation exceptions out of 8191
>>  30i,12n,43p,307b,1338s stack positions out of 
>> 1500i,500n,1500p,20b,5s
>> !  ==> Fatal error occurred, no output PDF file produced!
>>
>>
>>
>>
>> Do we actually have any buildfarm boxes building the PDFs? And if so,
>> any idea why they didn't catch it?
>>
>> Do we have a reasonable way to figure out which commit actually broke
>> it, other than manually testing backing out each of the 11 commits
>> since 9.0.17?
>
> Additional point of info - the -US pdf's do build on this version,
> just not the -A4.
>
> And with even more of those entries about overfull hbox, so clearly
> that was not the actual breakage.

And with some dissecting, the offending patch is
6b2a1445ec8a631060c4cbff3f172bf31d3379b9.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SKIP LOCKED DATA (work in progress)

2014-07-23 Thread David Rowley

On Sun, Jun 29, 2014 at 9:01 PM, Thomas Munro  wrote:

>
> Please find attached a rebased version of my SKIP LOCKED
> patch (formerly SKIP LOCKED DATA), updated to support only the
> Oracle-like syntax.
>
>
Hi Thomas,

Apologies for taking this long to get to reviewing this, I'd gotten a bit
side tracked with my own patches during this commitfest.

I'm really glad to see this patch is back again. I think it will be very
useful for processing queues. I could have made good use of it in my last
work, using it for sending unsent emails which were "queued" up in a table
in the database.

I've so far read over the patch and done only some brief tests of the
functionality.

Here's what I've picked up on so far:

* In heap_lock_tuple() there's a few places where you test the wait_policy,
but you're not always doing this in the same order. The previous code did
if (nolock) first each time, but since there's now 3 values I think if
(wait_policy == LockWaitError) should be first each time as likely this is
the most common case.

* The following small whitespace change can be removed in gram.y:

@@ -119,7 +119,6 @@ typedef struct PrivTarget
 #define CAS_NOT_VALID 0x10
 #define CAS_NO_INHERIT 0x20

-
 #define parser_yyerror(msg)  scanner_yyerror(msg, yyscanner)
 #define parser_errposition(pos)  scanner_errposition(pos, yyscanner)

* analyze.c. rc->waitPolicy = Max(rc->waitPolicy, waitPolicy);
I'm not quite sure I understand this yet, but I'm guessing the original
code was there so that something like:

SELECT * FROM (SELECT * FROM a FOR UPDATE) a FOR UPDATE NOWAIT;
Would give:
ERROR:  could not obtain lock on row in relation "a"

So it seems that with the patch as you've defined in by the order of the
enum values in LockClauseWaitPolicy that SKIP LOCKED overrides FOR UPDATE.
I'm wondering if this is dangerous.

Should the following really skip locked tuples?
SELECT * FROM (SELECT * FROM a FOR UPDATE SKIP LOCKED) a FOR UPDATE LIMIT 1;

But on the other hand perhaps I've missed a discussion on this, if so then
I think the following comment should be updated to explain it all:

 * We also consider that NOWAIT wins if it's specified both ways. This
 * is a bit more debatable but raising an error doesn't seem helpful.
 * (Consider for instance SELECT FOR UPDATE NOWAIT from a view that
 * internally contains a plain FOR UPDATE spec.)
 *

* plannodes.h -> RowWaitPolicy waitPolicy;   /* NOWAIT and SKIP LOCKED DATA
options */
Should be "NOWAIT and SKIP LOCKED options" since DATA has now been removed
from the syntax.

* nodeLockRow.c has extra space in if condition: else if (erm->waitPolicy
== RWP_SKIP )

I'm also wondering about this block of code in general:

if (erm->waitPolicy == RWP_WAIT)
wait_policy = LockWaitBlock;
else if (erm->waitPolicy == RWP_SKIP )
wait_policy = LockWaitSkip;
else /* erm->waitPolicy == RWP_NOWAIT */
wait_policy = LockWaitError;

Just after this code heap_lock_tuple() is called, and if that
returns HeapTupleWouldBlock, the code does a goto lnext, which then will
repeat that whole block again. I'm wondering why there's 2 enums that are
for the same thing? if you just had 1 then you'd not need this block of
code at all, you could just pass erm->waitPolicy to heap_lock_tuple().

* parsenodes.h comment does not meet project standards (
http://www.postgresql.org/docs/devel/static/source-format.html)

typedef enum LockClauseWaitPolicy
{
/* order is important (see applyLockingClause which takes the greatest
   value when several wait policies have been specified), and values must
   match RowWaitPolicy from plannodes.h */

* parsenode.h remove "DATA" from LockClauseWaitPolicy waitPolicy; /* NOWAIT
and SKIP LOCKED DATA */

I have noticed the /* TODO -- work out what needs to be released here */
comments in head_lock_tuple(), and perhaps the lack of cleaning up is what
is causing the following:

create table skiptest (id int primary key); insert into skiptest (id)
select x.x from generate_series(1,100) x(x);

Session 1: begin work; select * from skiptest for update limit 99;
Session 2: select * from skiptest for update skip locked limit 1;
WARNING:  out of shared memory
ERROR:  out of shared memory
HINT:  You might need to increase max_locks_per_transaction.

Yet if I do:

session 1: begin work; select * from skiptest where id > 1 for update;
session 2:  select * from skiptest for update skip locked limit 1;
 id

  1
(1 row)

That test makes me think your todo comments are in the right place,
something is missing there for sure!

* plays about with patch for a bit *

I don't quite understand how heap_lock_tuple works, as if I debug session 2
from the above set of queries the first call
to ConditionalLockTupleTuplock() (heapam.c line 4236) succeeds where I'd
have thought it would fail, since session 1 should be locking this tuple?
Later at line 4527 on the line if (!ConditionalXactLockTableWait(xwait)),
it fails to grab the lock and returns HeapTupleWouldBlock. The above query
seems to work ok if

Re: [HACKERS] PDF builds broken again

2014-07-23 Thread Magnus Hagander

On Wed, Jul 23, 2014 at 12:31 PM, Magnus Hagander  wrote:
> It seems at least the 9.0 PDFs are broken (trying to build for the release):
>
> Lots of errors/warnings (and AFAIK no way to see which is which in the
> output), but It hink this is the telltale as usual:
>
> Overfull \hbox (7.12454pt too wide) in paragraph at lines 88092--88092
>  []\T1/pcr/m/n/9 CREATE FUNCTION getf1(myrowtype) RETURNS int AS 'SELECT 
> $1.f1'
>  LANGUAGE SQL;[]
>  []
> 
> and many more like it until
> 
> Overfull \hbox (1.5pt too wide) in alignment at lines 241488--241741
>  [] [] []
>  []
>
> [256.0.1
> ! pdfTeX error (ext4): \pdfendlink ended up in different nesting level than 
> \pd
> fstartlink.
> \AtBegShi@Output ...ipout \box \AtBeginShipoutBox
>   \fi \fi
> l.241875 ...char95{}stat\char95{}file('filename');
>
>
>
> Here is how much of TeX's memory you used:
>  22467 strings out of 482156
>  171125 string characters out of 3785924
>  308594 words of memory out of 3085000
>  27304 multiletter control sequences out of 15000+50
>  80861 words of font info for 131 fonts, out of 300 for 9000
>  14 hyphenation exceptions out of 8191
>  30i,12n,43p,307b,1338s stack positions out of 1500i,500n,1500p,20b,5s
> !  ==> Fatal error occurred, no output PDF file produced!
>
>
>
>
> Do we actually have any buildfarm boxes building the PDFs? And if so,
> any idea why they didn't catch it?
>
> Do we have a reasonable way to figure out which commit actually broke
> it, other than manually testing backing out each of the 11 commits
> since 9.0.17?

Additional point of info - the -US pdf's do build on this version,
just not the -A4.

And with even more of those entries about overfull hbox, so clearly
that was not the actual breakage.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] PDF builds broken again

2014-07-23 Thread Magnus Hagander

It seems at least the 9.0 PDFs are broken (trying to build for the release):

Lots of errors/warnings (and AFAIK no way to see which is which in the
output), but It hink this is the telltale as usual:

Overfull \hbox (7.12454pt too wide) in paragraph at lines 88092--88092
 []\T1/pcr/m/n/9 CREATE FUNCTION getf1(myrowtype) RETURNS int AS 'SELECT $1.f1'
 LANGUAGE SQL;[]
 []

and many more like it until

Overfull \hbox (1.5pt too wide) in alignment at lines 241488--241741
 [] [] []
 []

[256.0.1
! pdfTeX error (ext4): \pdfendlink ended up in different nesting level than \pd
fstartlink.
\AtBegShi@Output ...ipout \box \AtBeginShipoutBox
  \fi \fi
l.241875 ...char95{}stat\char95{}file('filename');



Here is how much of TeX's memory you used:
 22467 strings out of 482156
 171125 string characters out of 3785924
 308594 words of memory out of 3085000
 27304 multiletter control sequences out of 15000+50
 80861 words of font info for 131 fonts, out of 300 for 9000
 14 hyphenation exceptions out of 8191
 30i,12n,43p,307b,1338s stack positions out of 1500i,500n,1500p,20b,5s
!  ==> Fatal error occurred, no output PDF file produced!




Do we actually have any buildfarm boxes building the PDFs? And if so,
any idea why they didn't catch it?

Do we have a reasonable way to figure out which commit actually broke
it, other than manually testing backing out each of the 11 commits
since 9.0.17?


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [TODO] Process pg_hba.conf keywords as case-insensitive

2014-07-23 Thread Christoph Berg

Re: Viswanatham kirankumar 2014-07-23 

> 3) Host name is not a SQL object so it will be treated as case-sensitive 
>except for all, samehost, samenet are considered as keywords. 
>For these user need to use quotes to differentiate between hostname and 
> keywords.

DNS is case-insensitive, though most of the time case-preserving
(nothing guarantees that it won't down-up-whatever-case the answer you
get).

(FTR, I'll retract my original complaint, the idea of using SQL-like
case folding is nice.)

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-07-23 Thread Pavan Deolasee

I'm trying to understand what would it take to have this patch in an
acceptable form before the next commitfest. Both Abhijit and Andres has
done some extensive review of the patch and have given many useful
suggestions to Rahila. While she has incorporated most of them, I feel we
are still some distance away from having something which can be committed.
Here are my observations based on the discussion on this thread so far.

1. Need for compressing full page backups:
There are good number of benchmarks done by various people on this list
which clearly shows the need of the feature. Many people have already
voiced their agreement on having this in core, even as a configurable
parameter. There had been some requests to have more benchmarks such as
response times immediately after a checkpoint or CPU consumption which I'm
not entirely sure if already done.

2. Need for different compression algorithms:
There were requests for comparing different compression algorithms such as
LZ4 and snappy. Based on the numbers that Rahila has posted, I can see LZ4
has the best compression ratio, at least for TPC-C benchmarks she tried.
Having said that, I was hoping to see more numbers in terms of CPU resource
utilization which will demonstrate the trade-off, if any. Anyways, there
were also apprehensions expressed about whether to have pluggable algorithm
in the final patch that gets committed. If we do decide to support more
compression algorithms, I like what Andres had done before i.e. store the
compression algorithm information in the varlena header. So basically, we
should have a abstract API which can take a buffer and the desired
algorithm and returns compressed data, along with varlena header with
encoded information. ISTM that the patch Andres had posted earlier was
focused primarily on toast data, but I think we can make it more generic so
that both toast and FPW can use it.

Having said that, IMHO we should go one step at a time. We are using pglz
for compressing toast data for long, so we can continue to use the same for
compressing full page images. We can simultaneously work on adding more
algorithms to core and choose the right candidate for different scenarios
such as toast or FPW based on test evidences. But that work can happen
independent of this patch.

3. Compressing one block vs all blocks:
Andres suggested that compressing all backup blocks in one go may give us
better compression ratio. This is worth trying. I'm wondering what would
the best way to do so without minimal changes to the xlog insertion code.
Today, we add more rdata items for backup block header(s) and backup blocks
themselves (if there is a "hole" then 2 per backup block) beyond what the
caller has supplied. If we have to compress all the backup blocks together,
then one approach is to copy the backup block headers and the blocks to a
temp buffer, compress that and replace the rdata entries added previously
with a single rdata. Is there a better way to handle multiple blocks in one
go?

We still need a way to tell the restore path that the wal data is
compressed. One way is to always add a varlena header irrespective of
whether the blocks are compressed or not. This looks overkill. Another way
to add a new field to XLogRecord to record this information. Looks like we
can do this without increasing the size of the header since there are 2
bytes padding after the xl_rmid field.

4. Handling holes in backup blocks:
I think we address (3) then this can be easily done. Alternatively, we can
also memzero the "hole" and then compress the entire page. The compression
algorithm should handle that well.

Thoughts/comments?

Thanks,
Pavan

Re: [HACKERS] proposal (9.5) : psql unicode border line styles

2014-07-23 Thread Pavel Stehule

2014-07-23 8:38 GMT+02:00 Tomas Vondra :

> On 23 Červenec 2014, 7:36, Pavel Stehule wrote:
> >
> > updated version is in attachment
>
> OK, thanks. The new version seems OK to me.
>

Thank you

Pavel


>
> Tomas
>
>

64 matches

Mail list logo