[HACKERS] Where is it documented what role executes constraint triggers?

2017-11-03 Thread Chapman Flack
>From a little experimenting in 9.5, it seems that a referential
integrity trigger is executed with the identity of the referencED
table's owner, but I have not been able to find this covered in
the docs. Is this a documentation oversight, or is it explained
somewhere I didn't look (or may have skimmed right over it)?

The question came up at $work after the departure of $colleague,
who had created some tables as himself and not changed their
ownership. His role had the superuser bit at the time, so
RI checks involving those tables never incurred 'permission denied'
errors until he left. Then, his role was not dropped, only disabled
for login and made no longer superuser, and that's when RI checks
started incurring 'permission denied'.

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Faster methods for getting SPI results

2017-09-12 Thread Chapman Flack
On 09/12/17 17:00, Tom Lane wrote:

> I did not see any reason given in the thread why we should need that.
> If you want to accumulate tuples ten at a time before you do something
> with them, you can do that now, by calling ExecutorRun with count=10.

Ah, that sounds easy enough. I'll withdraw the more-complicated suggestion.

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Faster methods for getting SPI results

2017-09-12 Thread Chapman Flack
On 09/12/2017 03:41 PM, Tom Lane wrote:
> So the conclusion at the end of the last commitfest was that this patch
> should be marked Returned With Feedback, and no new work appears to have
> been done on it since then.  Why is it in this fest at all?  There
> certainly doesn't seem to be any reason to review it again.

I'm not sure how to read the history of the CF entry. Could it
have rolled over to 2017-09 by default if its status was simply
never changed to Returned with Feedback as intended in the last
one? The history doesn't seem to show anything since 2017-03-19.

I would still advocate for a fast-callback/slow-callback distinction,
as in
https://www.postgresql.org/message-id/59813946.40508%40anastigmatix.net
if that does not seem overcomplicated to more experienced hands.

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-09-06 Thread Chapman Flack
On 09/06/17 18:33, Omar Kilani wrote:

> Is there anything people using float datetimes can do that isn't a
> pg_dumpall | pg_restore to do a less painful update?
> 
> We have several TB of data still using float datetimes and I'm trying
> to figure out how we can move to 10 (currently on 9.6.x) without
> massive amounts of $ in duplicated hardware or downtime.

I ran into an analogous issue with endianness of PL/Java-defined datatypes,
and ended up devising a procedure [1] for treating the type one way on
output and another way on input, and then constructing an UPDATE query
that just assigns the column to itself using a function that's essentially
identity, but forces the output-input conversion (and doesn't look like
a simple identity function to the query optimizer, which otherwise might
optimize the whole update away).

While the details of that were specific to PL/Java and byte order,
something similar might work in your case if you can afford some period,
either just before or just after migration, when your normal workload
is blocked, long enough to run such updates on your several TB of data.

Or if that's too big a chunk of downtime, you might be able to add
a column in advance, of some user-defined type the same width as an
integer datetime, populate that in advance, migrate, then do a quick
catalog switcheroo of the column names and types. I've never done that,
so someone else might have comments on its feasibility or the best way
of doing it.

> the exact crash, but the datetime values were either too small or too
> large to fit into the integer datetimes field -- I can retry this if

That does seem important to get the details on. If the integer datetime
column won't represent your stuff (and the too-large or too-small values
are necessary to represent), then some schema change might be necessary.
Or, if the outlying values were sentinel values of some kind (I wonder,
how else would you even have datetimes outside of the int64 range?),
maybe they can be replaced with others that fit.

-Chap

[1]: https://tada.github.io/pljava/use/byteordermigrate.html


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Faster methods for getting SPI results

2017-08-01 Thread Chapman Flack
On 12/20/16 23:14, Jim Nasby wrote:
> I'm guessing one issue might be that
> we don't want to call an external interpreter while potentially holding page
> pins, but even then couldn't we just copy a single tuple at a time and save
> a huge amount of palloc overhead?

On 04/06/17 03:38, Craig Ringer wrote:
> Also, what rules apply in terms of what you can/cannot do from within
> a callback? Presumably it's unsafe to perform additional SPI calls,
> perform transactions, call into the executor, change the current
> snapshot, etc, but I would consider that reasonably obvious. Are there
> any specific things to avoid?


Confessing, right up front, that I'm not very familiar with the executor
or DestReceiver code, but thinking of issues that might be expected with
PLs, I wonder if there could be a design where the per-tuple callback
could sometimes return a status HAVE_SLOW_STUFF_TO_DO.

If it does, the executor could release some pins or locks, stack some
state, whatever allows it to (as far as practicable) relax restrictions
on what the callback would be allowed to do, then reinvoke the callback,
not with another tuple, but with OK_GO_DO_YOUR_SLOW_STUFF.

On return from that call, the executor could reacquire its stacked
state/locks/pins and resume generating tuples.

That way, a callback could, say, return normally 9 out of 10 times, just
quickly buffering up 10 tuples, and every 10th time return SLOW_STUFF_TO_DO
and get a chance to jump into the PL interpreter and deal with those 10 ...
(a) minimizing the restrictions on what the PL routine may do, and (b)
allowing any costs of state-stacking/lock-releasing-reacquiring, and control
transfer to the interpreter, to be amortized over some number of tuples.
How many tuples that should be might be an empirical question for any given
PL, but with a protocol like this, the callback has an easy way to control
it.

Or would that be overcomplicated?

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [PATCH] Pattern based listeners for asynchronous messaging (LISTEN/NOTIFY)

2017-07-31 Thread Chapman Flack
On 07/31/17 16:30, Peter Eisentraut wrote:

> I would think about specifying an operator somewhere in the syntax, like
> you have with LISTEN SIMILAR TO.  It would even be nice if a
> non-built-in operator could be used for matching names.

Hmm ... as I was reading through the email thread, I saw a suggestion
was once made to look at ltree, and then I noticed the patch as presented
had simply gone with regular expressions instead. I wonder if there'd be
a way to work it so an operator can be specified, and allow you to treat
the names as (say) ltree labels if that suited your application.

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility

2017-07-26 Thread Chapman Flack
On 07/17/17 11:29, Michael Paquier wrote:

> FWIW, I would rather see any optimization done in
> AdvanceXLInsertBuffer() instead of seeing a second memset re-zeroing
> the WAL page header after its data has been initialized by
> AdvanceXLInsertBuffer() once.

Is that an aesthetic 'rather', or is there a technical advantage you
have in mind?

I also began by looking at how to stop AdvanceXLInsertBuffer()
initializing headers and taking locks when neither is needed.
But Heikki's just-rezero-them suggestion has a definite simplicity
advantage. It can be implemented entirely with a tight group of
lines added to CopyXLogRecordToWAL, as opposed to modifying
AdvanceXLInsertBuffer in a few distinct places, adding a parameter,
and changing its call sites.

There's a technical appeal to making the changes in AdvanceXLInsertBuffer
(who wants to do unnecessary initialization and locking?), but the amount
of unnecessary work that can be avoided is proportional to the number of
unused pages at switch time, meaning it is largest when the system
is least busy, and may be of little practical concern.

Moreover, optimizing AdvanceXLInsertBuffer would reveal one more
complication: some of the empty pages about to be written out may
have been initialized opportunistically in earlier calls to
AdvanceXLInsertBuffer, so those already have populated headers, and
would need rezeroing anyway. And not necessarily just an insignificant
few of them: if XLOGChooseNumBuffers chose the maximum, it could even
be all of them.

That might also be handled by yet another conditional within
AdvanceXLInsertBuffer. But with all of that in view, maybe it is
just simpler to have one loop in CopyXLogRecordToWAL simply zero them all,
and leave AdvanceXLInsertBuffer alone, so no complexity is added when it
is called from other sites that are arguably hotter.

Zeroing SizeOfXLogShortPHD bytes doesn't cost a whole lot.

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SCRAM auth and Pgpool-II

2017-07-14 Thread Chapman Flack
On 07/13/2017 10:46 PM, Chapman Flack wrote:

> Neither is suitable on an unencrypted channel (as has been repeatedly

Please forgive my thinko about md5. I had overlooked the second
salted md5 used in the protocol, and that had to be some years ago
when I was sure I had looked for one in the code. But it's been there
since 2001, so I simply overlooked it somehow. Not sure how. I've had
an unnecessarily jaundiced view of "md5" auth for years as a result.

I feel much better now. Sorry for the noise.

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SCRAM auth and Pgpool-II

2017-07-13 Thread Chapman Flack
On 07/13/17 21:54, Tatsuo Ishii wrote:
>>> The comment in pg_hba.conf.sample seem to prefer md5 over clear text
>>> password.
>>>
>>> # Note that "password" sends passwords in clear text; "md5" or
>>> # "scram-sha-256" are preferred since they send encrypted passwords.
>>
>> Should that be reworded to eliminate "md5"? I'd consider "scram-sha-256"
>> suitable over a clear channel, but I've never recommended "md5" for that.
> 
> I don't think so unless clear text password is superior than md5.

Neither is suitable on an unencrypted channel (as has been repeatedly
observed back to 2005 at least [1], so I guess I'm not spilling the beans).
At last, scram-sha-256 is an option that is believable for that use.

So, allowing that neither "password" nor "md5" should ever be used on
an unencrypted channel, as long as the channel is encrypted they are both
protected (by the channel encryption) from eavesdropping, so they score
a tie on that dimension. For a tiebreaker, you could look at the
consequences of revealing rolpassword from pg_authid. On that dimension,
with "md5" you have revealed a password-equivalent, while with "password"
you have not [2], so on that dimension "password" indeed is superior to
"md5".

-Chap

[1]: https://www.postgresql.org/message-id/8764ygc7i6.fsf%40stark.xeocode.com
[2]:
https://www.postgresql.org/message-id/20050421190637.GF29028%40ns.snowman.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SCRAM auth and Pgpool-II

2017-07-13 Thread Chapman Flack
On 07/13/17 20:09, Tatsuo Ishii wrote:

> The comment in pg_hba.conf.sample seem to prefer md5 over clear text
> password.
> 
> # Note that "password" sends passwords in clear text; "md5" or
> # "scram-sha-256" are preferred since they send encrypted passwords.

Should that be reworded to eliminate "md5"? I'd consider "scram-sha-256"
suitable over a clear channel, but I've never recommended "md5" for that.

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] idea: custom log_line_prefix components besides application_name

2017-07-12 Thread Chapman Flack
On 07/12/17 08:38, Robert Haas wrote:

> another protocol message.  I feel like the usefulness of this for
> connection pooling software is pretty obvious: it's a lot easier for
> the pooler to disallow a certain protocol message than a certain SQL
> command.

I assume you mean easier than disallowing an SQL command that has to be
disallowed (with all the complexity of of parsing and recognizing all the
forms it could take) or else the client could abuse it—in other words,
the current state of affairs, without a cookie'd SQL command.

Once the comparison is not to the current state, but to a proposed
cookie mechanism for the SQL command, I don't think I see either idea
as strikingly easier or more effective.

But the protocol extension becomes another thing (like SASL channel
binding) that you can add to the server, but you don't really have it
until all of the protocol driver projects catch up. A mechanism in SQL
is ready for everybody as soon as it's there.

> variable can't later be changed from SQL.  So now you don't even need
> the cookie, and the client can't try to guess the cookie.

Again, the trouble of needing a cookie or of supporting a special protocol
message don't seem that different to me, and with, say, a 64-bit cookie
built from a good source of randomness, the risk of a client guessing it
seems negligible.

One other reason I think I'm slow to warm to a protocol extension is
things done that way tend to make second-class citizens of code that
runs in the backend.

For an example, think of how warnings are handled. If client code uses
JDBC, it should be able to call getWarnings() on a ResultSet and find out
what warnings might have been raised. If you move the same code to the
backend, in PL/Java, it still uses JDBC and there's still a getWarnings()
method but it's (currently) useless. elog makes catching errors easy,
but warnings get shipped straight to libpq and out to the client. (For
this example, that's also an encapsulation breach and possible information
leak, the client getting warnings from internal SQL queries by backend
routines).

LISTEN/NOTIFY is another example of a mechanism that's not there for
backend code, because the notification mechanism is purely a message
over pq. A less interesting issue than the warnings, perhaps (once the
code is in the backend, why be notified by a trigger when it could simply
BE a trigger?) ... but it could be a bit surprising to someone accustomed
to having it available for client code.

So, even if I don't this instant have an example of why some backend
code or extension in an arbitrary PL might want to be able to lock down
a particular GUC, I can imagine it might happen, and if the mechanism is
just SQL with a cookie, all the PLs have it for free, but if it is
tied up with the fe/be protocol, it's hard.

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility

2017-07-03 Thread Chapman Flack
On 07/03/2017 09:39 AM, Heikki Linnakangas wrote:

> Hmm. That's not the problem, though. Imagine that instead of the loop
> above, you do just:
> 
> WALInsertLockUpdateInsertingAt(CurrPos);
> AdvanceXLInsertBuffer(EndPos, false);
> 
> AdvanceXLInsertBuffer() will call XLogWrite(), to flush out any pages
> before EndPos, to make room in the wal_buffers for the new pages. Before
> doing that, it will call WaitXLogInsertionsToFinish()

Although it's moot in the straightforward approach of re-zeroing in
the loop, it would still help my understanding of the system to know
if there is some subtlety that would have broken what I proposed
earlier, which was an extra flag to AdvanceXLInsertBuffer() that
would tell it not only to skip initializing headers, but also to
skip the WaitXLogInsertionsToFinish() check ... because I have
the entire region reserved and I hold all the writer slots
at that moment, it seems safe to assure AdvanceXLInsertBuffer()
that there are no outstanding writes to wait for.

I suppose it's true there's not much performance to gain; it would
save a few pairs of lock operations per empty page to be written,
but then, the more empty pages there are at the time of a log switch,
the less busy the system is, so the less it matters.

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility

2017-07-03 Thread Chapman Flack
On 07/03/2017 09:39 AM, Heikki Linnakangas wrote:

> The most straightforward solution would be to just clear each page with
> memset() in the loop. It's a bit wasteful to clear the page again, just
> after AdvanceXLInsertBuffer() has initialized it, but this isn't
> performance-critical.

An in that straightforward approach, I imagine it would suffice to
memset just the length of a (short) page header; the page content
is already zeroed, and there isn't going to be a switch at the very
start of a segment, so a long header won't be encountered ... will it?

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility

2017-06-29 Thread Chapman Flack
On 06/25/17 21:20, Chapman Flack wrote:
> I want to make sure I understand what the deadlock potential is
> in this case. AdvanceXLInsertBuffer will call WaitXLogInsertionsToFinish
> ...
> Does not the fact we hold all the insertion slots exclude the possibility
> that any dirty buffer (preceding the one we're touching) needs to be checked
> for in-flight insertions? [in the filling-out-the-log-tail case only]

Anyone?

Or have I not even achieved 'wrong' yet?

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility

2017-06-25 Thread Chapman Flack
I notice CopyXLogRecordToWAL contains this loop (in the case where
the record being copied is a switch):

while (CurrPos < EndPos)
{
/* initialize the next page (if not initialized already) */
WALInsertLockUpdateInsertingAt(CurrPos);
AdvanceXLInsertBuffer(CurrPos, false);
CurrPos += XLOG_BLCKSZ;
}

in which it calls, one page at a time, AdvanceXLInsertBuffer, which contains
its own loop able to do a sequence of pages. A comment explains why:

/*
 * We do this one page at a time, to make sure we don't deadlock
 * against ourselves if wal_buffers < XLOG_SEG_SIZE.
 */

I want to make sure I understand what the deadlock potential is
in this case. AdvanceXLInsertBuffer will call WaitXLogInsertionsToFinish
before writing any dirty buffer, and we do hold insertion slot locks
(all of 'em, in the case of a log switch, because that makes
XlogInsertRecord call WALInsertLockAcquireExclusive instead of just
WALInsertLockAcquire for other record types).

Does not the fact we hold all the insertion slots exclude the possibility
that any dirty buffer (preceding the one we're touching) needs to be checked
for in-flight insertions?

I've been thinking along the lines of another parameter to
AdvanceXLInsertBuffer to indicate when the caller is exactly this loop
filling out the tail after a log switch (originally, to avoid filling
in page headers). It now seems to me that, if AdvanceXLInsertBuffer
has that information, it could also be safe for it to skip the
WaitXLogInsertionsToFinish in that case. Would that eliminate the
deadlock potential, and allow the loop in CopyXLogRecordToWAL to be
replaced with a single call to AdvanceXLInsertBuffer and a single
WALInsertLockUpdateInsertingAt ?

Or have I overlooked some other subtlety?

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility

2017-06-22 Thread Chapman Flack
On 06/21/17 04:51, Heikki Linnakangas wrote:
> (I'm cleaning up my inbox, hence the delayed reply)

I had almost completely forgotten ever bringing it up. :)

> When I wrote that code, I don't remember if I realized that we're
> initializing the page headers, or if I thought that it's good enough even if
> we do. I guess I didn't realize it, because a comment would've been in order
> if it was intentional.
> 
> So +1 on fixing that, a patch would be welcome.

Ok, that sounds like something I could take a whack at. Overall, xlog.c
is a bit daunting, but this particular detail seems fairly approachable.

> In the meanwhile, have you
> tried using a different compression program? Something else than gzip might
> do a better job at the almost zero pages.

Well, gzip was doing pretty well; it could get a 16 MB segment file down
to under 27 kB, or less than 14 bytes for each of 2000 pages, when a page
header is what, 20 bytes, it looks like? I'm not sure how much better
I'd expect a (non-custom) compression scheme to do. The real difference
comes between compressing (even well) a large unchanged area, versus being
able to recognize (again with a non-custom tool) that the whole area is
unchanged.

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgresql transactons not fully isolated

2017-06-20 Thread J Chapman Flack
On 06/20/2017 12:33 PM, Merlin Moncure wrote:

> postgres=# create table ints (n int);
> CREATE TABLE
> postgres=# insert into ints values (1);
> INSERT 0 1
> postgres=# insert into ints values (2);
> INSERT 0 1
> 
> T1: BEGIN
> T1: UPDATE ints SET n = n + 1;
> T2: BEGIN
> T2: DELETE FROM ints where n = 2; -- blocks
> T1: COMMIT; -- T2 frees
> T2: SELECT * FROM ints;  -- both rows 2 and 3 visible
> T2: COMMIT:

For me (in PG 9.5 at $work), at the instant of the commit in T1,
T2 says:
ERROR:  could not serialize access due to concurrent update

Is it indicated what PG version Michael Malis is using?
Is it clear that transaction_isolation was set to serializable?

I don't actually see that claim in the linked post. I see the
example (about halfway down, under "Skipped Modification"), but
it doesn't claim that transaction_isolation was set to serializable
at the time, unless I skimmed over it somehow. It seems more of a
demonstration of what can happen under a different isolation setting.

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgresql transactons not fully isolated

2017-06-20 Thread Chapman Flack
On 06/20/2017 03:08 PM, Chapman Flack wrote:

> For me (in PG 9.5 at $work), at the instant of the commit in T1,
> T2 says:
> ERROR:  could not serialize access due to concurrent update

I get that result in 'serializable' and in 'repeatable read'.

I get the reported result (DELETE 0 and a table containing 2 and 3)
in both 'read committed' and 'read uncommitted'.

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [BUGS] [HACKERS] Re: Postgresql bug report - unexpected behavior of suppress_redundant_updates_trigger

2017-06-19 Thread Chapman Flack
On 06/19/2017 05:19 PM, David G. Johnston wrote:

> At first glance I think I'd rather have it do the correct thing all of
> the time, even if it takes longer, so that my only trade-off decision
> is whether to improve performance by fixing the application.
> 
> Ideally if the input tuple wouldn't require compression we wouldn't
> bother to decompress the stored tuple.

That looks like one reasonable elimination check.

I wonder how much closer it could get with some changes that wouldn't
necessarily use many more cycles.

One might be a less_easy queue; marching through the tuple
comparing fields, if one is found to be TOASTed, throw it
on the queue and march on. Only if all the easy ones matched
is there any point in looking at the queue.

At that point, there could be a tunable for how much effort
to expend. Perhaps I'm willing to decompress an inline value,
but not retrieve an out-of-line one? For the TOAST compression
algorithm I'm not sure of the balance between compression
and decompression effort; I know gzip decompression is pretty cheap.

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Postgresql bug report - unexpected behavior of suppress_redundant_updates_trigger

2017-06-19 Thread J Chapman Flack
On 06/19/2017 11:40 AM, Dilip Kumar wrote:
> ... Artus de benque ... wrote:
>> ...=# UPDATE test_table SET field = rpad('', 2001, 'a') WHERE id = 1;
>
> Seems like in "suppress_redundant_updates_trigger"  we are comparing
> toasted tuple with the new tuple and that is the cause of the bug.

Something still puzzles me about this, though, maybe only because
I don't know enough about TOAST.

The size of 'field' ends up 2001, or just over the threshold where
TOASTing will be attempted at all. The report doesn't mention changing
the strategy from the default EXTENDED, so won't the first thing
attempted be compression? Won't that succeed spectacularly, since the
test string is a single character 2001 times, probably producing
a compressed string a handful of bytes long, well under the threshold,
obviating any need to go further with TOAST pointers?

Is the compression algorithm nondeterministic? Is there some way
that compressing the same 2001*'a' on two occasions would produce
compressed strings that don't match?

What exactly is s_r_u_t() comparing, in the case where the TOASTed
value has been compressed, but not out-of-lined?

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] oidin / oidout and InvalidOid

2017-06-12 Thread Chapman Flack
Hi,

I was recently guilty of writing a less-than-clear SQL example
because I plain forgot whether InvalidOid was 0 (correct) or -1
(my bad).

Would there be any sense in letting oidin_subr accept the string
InvalidOid for 0? I understand that changing oidout could break
existing code outside of the tree. But what if oidout were to be
conservative in what it does, and oidin liberal in what it accepts?

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] regproc and when to schema-qualify

2017-06-11 Thread Chapman Flack
I was idly following along in GSoC 2017: Foreign Key Arrays
when I noticed this:

=# select * from pg_amproc where amprocfamily = 2745;
 amprocfamily | amproclefttype | amprocrighttype | amprocnum |
amproc
--++-+---+
 2745 |   2277 |2277 | 2 |
pg_catalog.ginarrayextract
 2745 |   2277 |2277 | 3 |
ginqueryarrayextract
...

where only ginarrayextract is schema-qualified. It seems to be
regproc's output procedure doing it:

=# select 2743::regproc, 2774::regproc;
  regproc   |   regproc
+--
 pg_catalog.ginarrayextract | ginqueryarrayextract


The manual says regproc "will display schema-qualified names on output
if the object would not be found in the current search path without
being qualified."

Is regproc displaying the schema in this case because there are two
overloaded flavors of ginarrayextract, though both are in pg_catalog?
Could it be searching for the object by name, ignoring the argument
signature, and just detecting that it hit one with a different OID first?

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP: allow overriding PostgresNode in get_new_node

2017-06-07 Thread Chapman Flack
On 06/02/17 15:51, Chapman Flack wrote:
> But what it buys you is then if your MyExtraPGNode has PostgresNode
> as a base, the familiar idiom
> 
>   MyExtraPGNode->get_new_node('foo');
> 
> works, as it inserts the class as the first argument.
> 
> As a bonus, you then don't need to complicate get_new_node
> with a test for (not ($node->isa("PostgresNode"))) because
> if it weren't, it wouldn't have inherited get_new_node

Any takers if I propose this amendment in the form of a patch?

Relying on the perl idiom instead of a $node->isa() test shortens
the patch; does that ameliorate at all the concern about complicating
core for the benefit of modules?

I'm not fully persuaded that just re-blessing a PostgresNode suffices
as a workaround ... if the subclass expects to have additional state
set up by its own constructor, the deception seems likely to be exposed.
So I think there are more than just cosmetic grounds for allowing this.

-Chap
--- src/test/perl/PostgresNode.pm	2017-06-07 20:15:52.827639829 -0400
+++ src/test/perl/PostgresNode.pm	2017-06-07 20:57:49.205145761 -0400
@@ -853,20 +853,27 @@
 
 =pod
 
-=item get_new_node(node_name)
+=item get_new_node(node_name) I PostgresNode->get_new_node(node_name)
 
-Build a new PostgresNode object, assigning a free port number. Standalone
-function that's automatically imported.
+Build a new PostgresNode object, assigning a free port number. This can be
+called either as a standalone function that's automatically imported, or as
+a class method on PostgresNode or any subclass.
 
 Remembers the node, to prevent its port number from being reused for another
 node, and to ensure that it gets shut down when the test script exits.
 
 You should generally use this instead of PostgresNode::new(...).
 
+If you have a subclass of PostgresNode you want created, use the class-method
+form of the call, as in Cget_new_node(node_name)>.
+The standalone-function form creates an instance of PostgresNode.
+
 =cut
 
 sub get_new_node
 {
+	my $class = 'PostgresNode';
+	$class = shift if 1 < scalar @_;
 	my $name  = shift;
 	my $found = 0;
 	my $port  = $last_port_assigned;
@@ -911,7 +918,7 @@
 	print "# Found free port $port\n";
 
 	# Lock port number found by creating a new node
-	my $node = new PostgresNode($name, $test_pghost, $port);
+	my $node = $class->new($name, $test_pghost, $port);
 
 	# Add node to list of nodes
 	push(@all_nodes, $node);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP: allow overriding PostgresNode in get_new_node

2017-06-02 Thread Chapman Flack
On 06/02/2017 12:50 PM, Robert Haas wrote:
> On Thu, Jun 1, 2017 at 3:36 AM, Michael Paquier
>>
>> +$pgnclass = 'PostgresNode' unless defined $pgnclass;
>> I'd rather leave any code of this kind for the module maintainers,
> 
> Craig's proposal is a standard Perl idiom, though.

Would it not be even more idiomatic to have the class, if
present, be the first argument? That is:

  my $pgnclass = 'PostgresNode';
  $pgnclass = shift if 1 < scalar @_;
  my $name = shift;

That part's a little weird (an optional FIRST argument?) in order
to preserve compatibility with callers who don't pass a class.

But what it buys you is then if your MyExtraPGNode has PostgresNode
as a base, the familiar idiom

  MyExtraPGNode->get_new_node('foo');

works, as it inserts the class as the first argument.

As a bonus, you then don't need to complicate get_new_node
with a test for (not ($node->isa("PostgresNode"))) because
if it weren't, it wouldn't have inherited get_new_node
so MyExtraPGNode->get_new_node('foo') would have failed.

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP: allow overriding PostgresNode in get_new_node

2017-06-02 Thread J Chapman Flack
On 06/02/2017 12:50 PM, Robert Haas wrote:
> On Thu, Jun 1, 2017 at 3:36 AM, Michael Paquier
>>
>> +$pgnclass = 'PostgresNode' unless defined $pgnclass;
>> I'd rather leave any code of this kind for the module maintainers,
> 
> Craig's proposal is a standard Perl idiom, though.

Would it not be even more idiomatic to have the class, if
present, be the first argument? That is:

  my $pgnclass = 'PostgresNode';
  $pgnclass = shift if 1 < scalar @_;
  my $name = shift;

That part's a little weird (an optional FIRST argument?) in order
to preserve compatibility with callers who don't pass a class.

But what it buys you is then if your MyExtraPGNode has PostgresNode
as a base, the familiar idiom

  MyExtraPGNode->get_new_node('foo');

works, as it inserts the class as the first argument.

As a bonus, you then don't need to complicate get_new_node
with a test for (not ($node->isa("PostgresNode"))) because
if it weren't, it wouldn't have inherited get_new_node
so MyExtraPGNode->get_new_node('foo') would have failed.

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] quiet conversion warning in DatumGetFloat4

2017-06-01 Thread Chapman Flack
On 06/01/17 17:41, Tom Lane wrote:
> 12169 warnings generated by -Wconversion
> 4106 warnings generated by -Wconversion -Wno-sign-conversion
> ...
> So it's better with -Wno-sign-conversion, but I'd say we're still not
> going there anytime soon.

On an optimistic note, there might not turn out to be anywhere near
as many distinct causes; there's typically a lot of amplification.
The one patch I sent in eliminated screens upon screens of warning
output from the PL/Java build (I made no effort to count them, I just
listened to the noise in my speakers until I heard the scrolling stop).

It might be fun to see how big a chunk of the 4106 would vanish just
with the first tweak to one of the causes that's mentioned in a lot of
them. (Unless your figures were already after culling to distinct causes,
which would sound like a more-than-casual effort.)

Trouble is, after that first taste of success beyond expectation,
it gets like a drug.

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] quiet conversion warning in DatumGetFloat4

2017-05-31 Thread Chapman Flack
On 05/31/2017 11:36 AM, Tom Lane wrote:

> However, I grant your point that some extensions may have outside
> constraints that mandate using -Wconversion, so to the extent that
> we can keep key headers like postgres.h from triggering those warnings,
> it's probably worth doing.  I suspect you're still seeing a lot of them
> though --- experiments with some contrib modules suggest that a lot of
> our other headers also contain code that would trigger them.  I do not
> think I'd be on board with trying to silence them generally.

That was actually the only one PL/Java gets, outside of /sign/
conversions, a special subset of conversion warnings that can be
separately turned off with -Wno-sign-conversion. There are gobs
of those, which I looked into briefly last year, realized they'd
only be fixed by a bunch of cluttery unenlightening casts in pg
headers, which I did not propose, having a suspicion that's how
you would feel about them. I added a Maven build profile that
adds the -Wno-sign-conversion (though actually getting Maven to
select that profile automatically when the compiler is gcc is
one of those things you'd expect to be easy and then be surprised.)

But as long as there used to be no extraneous noise outside of
/sign/ conversions, I figured this one new appearance of a warning
outside that category would be worth silencing, before some day
comes when there are three and it seems like an organization,
or fifty and it seems like a movement.

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] quiet conversion warning in DatumGetFloat4

2017-05-30 Thread Chapman Flack
On 05/31/17 01:26, Tom Lane wrote:
> Hm.  I think it would be better to use DatumGetInt32 here.  Arguably,
> direct use of GET_4_BYTES and its siblings should only appear in
> DatumGetFoo macros.

Like so? These are the 4 sites where {GET,SET}_n_BYTES got introduced
in 14cca1b (for consistency, though only the GET_4 case produces warnings).

-Chap
--- src/include/postgres.h	2017-05-31 01:36:16.621829183 -0400
+++ src/include/postgres.h	2017-05-31 01:45:51.303427115 -0400
@@ -679,7 +679,7 @@
 		float4		retval;
 	}			myunion;
 
-	myunion.value = GET_4_BYTES(X);
+	myunion.value = DatumGetInt32(X);
 	return myunion.retval;
 }
 #else
@@ -704,7 +704,7 @@
 	}			myunion;
 
 	myunion.value = X;
-	return SET_4_BYTES(myunion.retval);
+	return Int32GetDatum(myunion.retval);
 }
 #else
 extern Datum Float4GetDatum(float4 X);
@@ -727,7 +727,7 @@
 		float8		retval;
 	}			myunion;
 
-	myunion.value = GET_8_BYTES(X);
+	myunion.value = DatumGetInt64(X);
 	return myunion.retval;
 }
 #else
@@ -753,7 +753,7 @@
 	}			myunion;
 
 	myunion.value = X;
-	return SET_8_BYTES(myunion.retval);
+	return Int64GetDatum(myunion.retval);
 }
 #else
 extern Datum Float8GetDatum(float8 X);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] quiet conversion warning in DatumGetFloat4

2017-05-30 Thread Chapman Flack
It seems that 14cca1b (use static inline functions for float <-> Datum
conversions) has an implicit narrowing conversion in one of those
functions.

If building an extension with gcc's -Wconversion warning enabled
(*cough* pljava *cough* ... the Maven plugin that runs the compiler
enables the warning by default), this makes for a noisy build.
The warning is harmless, but repeated everywhere postgres.h is
included. An explicit cast is enough to suppress it.

-Chap
--- src/include/postgres.h	2017-05-15 17:20:59.0 -0400
+++ src/include/postgres.h	2017-05-31 00:21:27.329393499 -0400
@@ -679,7 +679,7 @@
 		float4		retval;
 	}			myunion;
 
-	myunion.value = GET_4_BYTES(X);
+	myunion.value = (int32)GET_4_BYTES(X);
 	return myunion.retval;
 }
 #else

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG10 Crash-safe and replicable Hash Indexes and UNIQUE

2017-05-22 Thread Chapman Flack
On 05/22/17 18:39, Alvaro Herrera wrote:
> Chapman Flack wrote:
>> CREATE INDEX ON foo USING btree ( bar, baz ALSO quux );
>
> INCLUDING:
> https://www.postgresql.org/message-id/56168952.4010...@postgrespro.ru

I'd buy that.

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG10 Crash-safe and replicable Hash Indexes and UNIQUE

2017-05-22 Thread Chapman Flack
On 05/22/2017 05:16 AM, Amit Kapila wrote:
> Agreed, but even if we have any such syntax, making it work for hash
> indexes is tricky, because we currently store the hash code in the
> index, not the original hash index key.

That was what gave me the idea in the first place, which then
I realized could be more generally useful. If I could say
something like

CREATE INDEX ON foo USING btree ( bar, baz ALSO quux );

so that only bar and baz are compared in insertion and search,
but quux is along for the ride and available to index-only scans,
then the (admittedly weird) syntax

CREATE INDEX ON foo USING hash ( bar ALSO bar );

could be taken to mean that the value of bar as well as its hash
is wanted in the index. I was first thinking of that to save
unique-insert from running to the heap to check hash collisions,
though on second thought if collisions are common enough for that
to be a win, maybe the hash function's just wrong. It could still
be useful for index-only scans.

-Chap



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG10 Crash-safe and replicable Hash Indexes and UNIQUE

2017-05-21 Thread Chapman Flack
On 05/19/17 11:41, Tom Lane wrote:

> No, nobody's done anything about allowing hash indexes to support
> uniqueness AFAIK.  I don't have a clear picture of how much work
> it would be, but it would likely be more than trivial effort;

I see what you mean. Because of the way hash values are ordered
(to allow binary search) within a page, but not between pages of
a bucket, insertion as it stands now is able to stop as soon as
it finds any page with room for the entry, but a unique-insertion
will have to check every page of the bucket for matching hashes,
and then (because only the hash and tid are in the index) chase
any of those to the heap to compare the value.

Maybe both hash collisions and overflow pages are rare enough
in practice with reasonable data that the performance impact
of that would be small, but still the possibility has to be
accounted for, the locking may get hairier (do you now keep
the lock you have on the page where room was found for the entry,
and use another lock to walk the remaining pages until sure
there's no duplicate?).

At least I see that interest in UNIQUE for hash indexes has been
shown on -hackers several times over the years, and is on the TODO.
Neil Conway seems to have had an idea [1] for making the locking work,
14 years ago (however relevant that might be to today's code).

... and one inquiry last year [2] did seem to get tabled because of the
lack of WAL logging, which is now a non-blocker.

I haven't seen much discussion of /why/ one would want hash-based UNIQUE.
I know my own reasons, but I'm not sure how persuasive they are in light
of the implementation realities, so maybe that makes such a discussion
worthwhile. I can start; these are the two reasons I had:

1. To a naive intuition (especially one raised more on in-memory data
   structures than the guts of databases), it just seems natural:
   hashing seems like the canonical approach to uniqueness testing
   where there's no need for ordering, intuition suggests a performance
   advantage, and so the least-astonishment principle suffers upon finding
   it isn't supported.

2. When developing a custom data type, it feels like tedious
   busy-work to have to bang out a full set of ordering operators
   for a btree operator class if there is no meaningful order for
   the type.

Maybe the intuitions behind (1) are just misinformed, the performance
ones at least, in light of Craig Ringer's low opinion of whether "hash
indexes are better than btree for anything" [3], and André Barbosa's
more recent performance comparison [4] (which does show some advantages
for hash in some circumstances, but mostly not large. The only large
advantage was in initial creation; would that be hashsort.c at work?).

But then, both [3] and [4] predate the recent projects on hash indexes
that have "made them crash safe and are on the way to making them
performant" [5], so maybe an updated comparison would be timely, or some
addition to the docs to better characterize the circumstances where hash
could be good. (Every index method newer than btree and hash has its own
part VII Internals chapter; for completeness, might it make sense to have
those for btree and hash also, even if only to broadly discuss
the conditions under which they perform especially well or poorly?)

For all sorts of indexes, would there be any use for some CREATE INDEX
syntax for a multicolumn index to say that some of its rightmost columns
aren't there to participate in the indexing scheme, but only to benefit
index-only scans? Applied to a hash index, that might offer another useful
kind of multicolumn support, which otherwise seems limited to queries
where you have the exact values of all indexed columns.

Anyway, even if my performance assumption behind (1) was too optimistic,
the astonishment when a new user finds a hash can't support uniqueness
still seems real. A related astonishment is that a hash opclass can't
support DISTINCT, and that seems like something that could be changed
with much less work than making hash indexes amcanunique. Apparently,
array comparison already looks for a btree opclass but can fall back to
a hash one, if present, for equality comparisons. Would it be difficult
to do the same for DISTINCT?

As for my reason (2), the tedium of having to bang out btree operators
for a new type with no meaningful order (which, as I've just remembered,
is necessary not just for UNIQUE constraints but even just to make
SELECT DISTINCT work), maybe there's a solution that simply reduces
the tedium. After all, if a new type has no meaningful notion of order,
an arbitrary one imposed by copy/paste of some existing opclass
for a type with the same internallength might often be good enough.
Could there be some syntactic sugar for that, say,

CREATE OPERATOR CLASS btree_foo_ops
FOR TYPE foo USING btree LIKE int4_ops;

? A transformOpclassLike function could verify that foo and the opcintype
of int4_ops have the same typlen and typbyval, and that the 

[HACKERS] PG10 Crash-safe and replicable Hash Indexes and UNIQUE

2017-05-19 Thread Chapman Flack
Hi,

The item on hash indexes reminded me of an old comment from years
ago that I put in the code of the first custom PG datatype I ever
built at $work:

COMMENT ON OPERATOR CLASS puid_ops USING btree IS
'As puids are only identifiers, there is no obvious reason to define
ordering operators or support btree indexing. But for some curious
reason PostgreSQL 8.4 does not allow a hash index to support UNIQUE
constraints (this may be because, per the manual, hash index "operations
are not presently WAL-logged" so it could be risky to base constraints
on them). Therefore, the whole set of ordering operators must be
implemented to provide an operator class for the btree index method.';

Was my guess about the reason right? Does this PG10 announcement
also mean it will be possible to use UNIQUE constraints with some
pure-identifier, no-natural-ordering type that supports only hashing?

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] idea: custom log_line_prefix components besides application_name

2017-05-10 Thread Chapman Flack
On 05/10/2017 03:56 AM, Craig Ringer wrote:
> On 10 May 2017 10:44 am, "Chapman Flack" <c...@anastigmatix.net> wrote:
>> On 05/09/17 18:48, Mark Dilger wrote: 
>>> SET SESSION ON BEHALF OF 'joe user'
> 
> No need to do anything they custom and specific. No need for new syntax
> either.
> SET myapp.appuser = 'joe'

We seem to be agreed here ... I was claiming that Mark's new syntax
and simply setting a custom variable are different ways of "spelling"
the same idea, but the custom variable approach is less invasive and
can probably be done all within an extension.

>> The other bit of my proposal was to prevent Mallory from spoofing
>> his appident info by managing to inject some SQL through your app
> 
> If your attacker gets that far you're kind of screwed anyway.

Sure, but part of defense-in-depth still applies then, either to
collect more info on who's screwing you or to throw tables and chairs
to make them climb over.

> But that's where something like 'secure variables' or package variables
> come in. See the mailing list discussion on that topic a couple of months
> ago.

My hasty search for a mailing list discussion didn't find one (got
a link?), but I did find a wiki page; is that what you have in mind?

https://wiki.postgresql.org/wiki/Variable_Design

>> That's where it might be handy to have a
>> way to associate the info with the current thread or current request
>> in a way that doesn't need any support in third party layers in the middle,
>> but can be retrieved by the driver (or a thin wrapper around it, down
>> at the bottom of the stack) and turned into the proper SET commands.
> 
> I don't see how postgres can do anything about this. PgJDBC maybe. ...

We're agreed here too; that's why I described it as a separable,
future-work idea. I was just sketching how it might be possible in
the client-side stack to build on whatever basic function (secure
variables or a GUCs-with-cookie extension) postgres could provide.

Any uptake of that idea would have to happen in PgJDBC *and* in
psycopg2 *and* in DBD::Pg *and* in you-name-it for programming
languages/environments x, y, and z. So it's clearly science fiction.
(Actually, a more practical way to do it might be to write thin
wrappers that implement the DB driver API in each language, that
could be used *in place of* PgJDBC or psycopg2 or DBD::Pg and just
delegate everything to the real driver, except the 'connect'
operation would delegate to the real driver but then issue one
query for the cookie, and there'd be some additional non-standard
API for higher layers to call and set values, to be passed along
via SET commands and the cookie ahead of later queries.)

The design for any such thing would want to be idiomatic and
sensitive to the typical usages in each language or software
stack, so I don't remotely propose to do or even design any
for now. Just wanted to put rough ideas into the collective
consciousness to soak for a while.

> The main part I would like is a generic mech[an]ism to inject the value of a
> GUC into the logs.
> 
> For csvlog, it'd be a list of GUC names, each  a to be emitted as a
> separate field if set, or empty field if unset.
> 
> For normal log, it'd be available in log_line_prefix as something like
> 
> %(myapp.user)g 
> 
> I can see this being plenty useful for all sorts of work, and nicely
> flexible.

Thank you for bringing that back in, since logging was my motive when
I started this thread, and then my last message was all about other
things.

Yes, I think I'd like to see logging extended in exactly those ways
(modulo whatever exact spelling ends up preferred for %(gucname)g).

Looked at that way, that's a logging extension of general utility
so it's also separable from the 'variables to store on-behalf-of
identities' extension proposed here.

... with the caveat that if the variables-to-store-identities idea
were to end up going a different direction than an-extension-with-GUCs
(secure variables or whatever), then I'd want the logging extension
to also provide an escape syntax for logging whatever those are.

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] idea: custom log_line_prefix components besides application_name

2017-05-09 Thread Chapman Flack
On 05/09/17 18:48, Mark Dilger wrote:

> I don't have any positive expectation that the postgres community will go
> along with any of this, but just from my point of view, the cleaner way to
> do what you are proposing is something like setting a session variable.
> 
> In your middle tier java application or whatever, you'd run something like
> 
> SET SESSION ON BEHALF OF 'joe user'
> 
> And then when you reuse that connection for the next web page you
> would do something like that again:
> 
> SET SESSION ON BEHALF OF "anonymous from 1.2.3.4"

I may have muddled my presentation by starting it with the more-speculative
ideas about building support into client-side drivers. At present, none of
that exists, and in the simple case where the application code has access
to the driver for sending arbitrary SQL, then yes, you just have it send

SET SESSION ON BEHALF OF 'joe user'
or
SET SESSION ON BEHALF OF "anonymous from 1.2.3.4"

or something equivalent but spelled differently, like

SET appident.user TO 'joe user'
or
SET appident.user TO 'anonymous'; SET appident.inet_addr TO '1.2.3.4'

Your spelling is nicer, but requires teaching the parser new syntax
and touching lots of places in PostgreSQL core, probably a non-starter.
In contrast, writing an 'appident' extension that defines some new GUCs
would be trivial and well-encapsulated. (What I'd think ideal would be
an extension that defines _whatever new GUCs make sense for your
application_ so if it would serve your purpose to have appident.locale
you just make it so. I wonder if an extension can define one GUC that
you set in the config file to a list of other GUCs it should also define
within its own namespace, or if that would run afoul of the order
initialization happens in.)

The other bit of my proposal was to prevent Mallory from spoofing
his appident info by managing to inject some SQL through your app
like "21' && sET_/**/cONfiG('appident.user', 'alice', fA/**/lsE)".

That's where the appident.cookie() function comes in. You just
query it once at session establishment and remember the cookie.
That allows your code to say:

SET SESSION ON BEHALF OF 'joe user' BECAUSE I HAVE :cookie AND I SAY SO;

and Mallory can't inject that because he doesn't have :cookie and the
appident.cookie() function only succeeds the first time.

Without any new syntax, that could be spelled either:

SELECT appident.set('user', 'joe user', is_local => false,
  cookie => :cookie)
(where the cookie is simply verified by the SQL-callable function), or
SET appident.user TO :cookie || 'joe user'
(where it's verified by the check hook on the GUC, then stripped off).

That's pretty much the total extent of what I would propose the
extension to PostgreSQL proper would do. Just let you define some
new GUCs with meaning to your application, and not interpret them
or use them for anything, but provide a bit of protection so your
code controls them and arbitrary SQL queries can't.

The more science-fiction, client-side ideas I proposed were just
ruminations on what might be useful to application code sitting on top
of a taller stack of third-party code that might get in the way of just
sending your arbitrary SET command ahead of your query.

> and so forth.  You wouldn't have to worry about threads since this is
> being handled much like SET ROLE only instead of changing the role
> under which your database handle is operating, it only changes an
> opaque value that you can then use for whatever purpose.  I would
> not expect the database permissions logic to use the value in any
> way, but just to preserve it for you to query in logging or from a stored
> procedure or similar.  As long as your jdbc driver or similar does not
> prevent you from running non-standard sql statements, you should
> be able to pass this down the line without any of the third party
> software in the middle messing with it.

It's those more complex architectures I was thinking of with the client-
side ideas, where your code may be at the top of such a tall stack of
persistence/ORM/whatever layers that you're not sure you can just emit
an arbitrary SET command and have it come out in front of the right query
generated by the lower layers. That's where it might be handy to have a
way to associate the info with the current thread or current request
in a way that doesn't need any support in third party layers in the middle,
but can be retrieved by the driver (or a thin wrapper around it, down
at the bottom of the stack) and turned into the proper SET commands. That's
really a separable, less immediate, future-work idea.

> If this feature were implemented, I'd probably use it.  I might also be
> willing to write this with you in the unlikely event that it gets community
> approval.

I might just go ahead and try writing the extension part. Isn't that
the beauty of extensions? Approval? We don't need no stinkin' approval. :)

But critiques and better ideas are never unwelcome. :)

-Chap


-- 
Sent via pgsql-hackers 

Re: [HACKERS] idea: custom log_line_prefix components besides application_name

2017-05-09 Thread Chapman Flack
On 05/09/2017 01:25 PM, Mark Dilger wrote:

> Consensus, no, but utility, yes.
> 
> In three tier architectures there is a general problem that the database
> role used by the middle tier to connect to the database does not entail
> information about the user who, such as a visitor to your website, made
> the request of the middle tier.  Chapman wants this information so he
> can include it in the logs, but stored procedures that work in support
> of the middle tier might want it for locale information, etc.  As things
> currently stand, there is no good way to get this passed all the way down
> into the database stored procedure that needs it, given that you are
> typically calling down through third party code that doesn't play along.

I like this framing.

Clearly a good part of the story is outside of PostgreSQL proper, and
has to be written elsewhere. There could be a picture like this:

 middle tier receiving request (webapp?) - knows user/origin info
   |
   V
 third-party code (rails? web2py? spring?) - doesn't play along
   |
   V
 PQ protocol driver (pg? psycopg2? pgjdbc?) - could offer support
   .
   .
   V
 PostgreSQL (what to do here?)


What to do on the client side of the . . > can only be suggested and
would have to be independently implemented by several drivers, but
I could imagine a driver offering some API to tuck a bit of
application-specific data into some form of thread-local storage.
In the picture above, the top layer, where the user/origin info
is known, would need a small modification to call that driver API
and provide that info. The request could then be processed on down
through the third-party layer(s) that don't play along. When
it reaches the driver, something magic will happen to forward
the thread-local preserved information on to PostgreSQL along with
the query.

That of course isn't enough if the intervening layers that don't
play along use thread pools, and the request could ultimately
reach the driver on a different thread. But for the simple case
it gives an idea.

As to how the driver then propagates the info to PostgreSQL, seems
to me it could generate a SET in front of the actual query. Most or
all of what would be needed in PostgreSQL might be possible in an
extension, which I could try my hand at writing. Here's the idea:

The extension would define one or more custom GUCs, with flags /
check hooks to enforce strict limits on when and how they can be set.

If the client stack is using a simple connection-per-request
approach, they could just be PGC_BACKEND, and the client part of
the picture could just be that the top layer supplies them as
options= in the conninfo string, which various drivers already
support.

But if connections may be pooled and re-used for different identities
and origins, that isn't enough. So the extension would provide
a function that can be called once in the session, returning
a random magic cookie. The driver itself would call this function
upon connecting, and save the cookie in a per-connection
private variable. Code above the driver in the stack would have
no access to it, as the function can't be called a second time,
and so could not spoof identities just by sending arbitrary SET
commands. The extension would reject any attempts to set or reset
those GUCs unless accompanied by the cookie.

Stored procedures could then look at those GUCs for locale / identity
/ origin information and trust that they haven't been spoofed by
injected commands.

If there were such a thing as a log_line_prefix_hook, then such an
extension could also support my original idea and add some new
escapes to log the added information. But there doesn't seem to be
such a hook at present. Or, if there were simply a %{name-of-GUC}
escape supported in log_line_prefix, nothing more would even be
needed.

Does this sound workable?

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] idea: custom log_line_prefix components besides application_name

2017-05-04 Thread Chapman Flack
Hi,

At $work I am often entertained by log entries like:

invalid input syntax for integer: "21' && 1=2)) Uni/**/ON
SEl/**/eCT 0x646665743166657274,0x646665743266657274,
0x646665743366657274 -- "

They're entertaining mostly because I know our web guy has heard
of SQL injection and doesn't write stuff where it would work. So
I'm mostly just entertained to see what wacky tactics the lowlifes
come up with.

But suppose I wanted to log more information about the origins
of the attempts. As is often the case with web apps, %r/%h/%u
won't give me anything useful, because %h will always be our
own server hosting the app, and %u is the hardcoded name the app
uses to connect. Obviously, each app itself would need to be tweaked
to pass along what I really care about, the actual remote host and
the user identity within the app.

That could be done today by having the app construct some 64-
character string out of the app name, remote IP, and identity,
and setting that as application_name, and I could log %a.

I just wonder if anybody thinks web apps, and therefore this
scenario, are common enough these days to maybe justify one
or two more GUCs with their own log_line_prefix escapes, such
as app_client_addr or app_user. Naturally they would only be
as reliable as the app setting them, and uninterpreted by
PostgreSQL itself, and so functionally no different from the
uninterpreted string already available as application_name.
The benefit is perhaps to be clearer than just overloading
application_name to carry two or three pieces of information
(and perhaps privacy, if you care about app user identities and
source IPs showing up in ps titles).

Worth considering, or is application_name Good Enough?

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] If an extension library is loaded during pg_upgrade, can it tell?

2017-04-03 Thread Chapman Flack
On 04/03/17 22:08, Bruce Momjian wrote:
> On Mon, Apr  3, 2017 at 09:53:34PM -0400, Chapman Flack wrote:
>> Hi,
>>
>> Is there any distinctive state that could be checked by extension code
>> to determine that it has been loaded incidentally during the operation
>> of pg_upgrade rather than under normal conditions?
>> ... 
> You can check the backend global variable IsBinaryUpgrade to check if
> binary upgrade is being performed.  Does that help you?

That sounds like exactly what I'd hoped for. Thanks!

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] If an extension library is loaded during pg_upgrade, can it tell?

2017-04-03 Thread Chapman Flack
Hi,

Is there any distinctive state that could be checked by extension code
to determine that it has been loaded incidentally during the operation
of pg_upgrade rather than under normal conditions?

PL/Java ordinarily checks what version of its schema is around, but
that may be premature while pg_upgrade is doing its thing and the schema
might not be all there yet.

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] who calls InitializeTimeouts() ?

2016-11-02 Thread Chapman Flack
Hi,

It looks like for about 3 years, PL/Java has been calling
InitializeTimeouts before calling RegisterTimeout. Looking over
the callers of InitializeTimeouts in core, though, it appears
that an extension like PL/Java should be able to assume it has
already been called, and in fact would be rude to call it again,
as it isn't idempotent and could conceivably clobber somebody
else's timeouts.

As PL/Java only uses it for a timeout on process exit anyway,
perhaps this is a mistake that has just never had much chance
to cause a noticeable problem.

Am I right that it's a mistake, though?

Thanks,
-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.6, background worker processes, and PL/Java

2016-10-26 Thread Chapman Flack
On 10/26/16 07:04, Amit Kapila wrote:
> No, parallel workers in parallel query doesn't have MyProcPort.

Ok ... it turns out I was using MyProcPort as a quick way to grab
database_name and user_name (very early in startup, for a purpose
analogous to setting a 'ps' process title), and that seemed more
lightweight than other methods of getting the database
and user Oids and mapping those to the corresponding names.

But I guess I can change that easily enough.

> ...
>> Are BGWs for parallel queries born fresh for each query, or do they
>> get pooled and reused?
>
> born fresh for each query.

Yikes. But ok, if there's ever a reason to try to make a "safe"
Java function, I see there is a parallel_setup_cost GUC that could
be used to inform the planner of the higher cost when BGWs have to
start JVMs, so it probably wouldn't make parallel plans often, but
still could if analysis showed a sufficient advantage.


On 10/26/16 07:15, Amit Kapila wrote:

> All the GUCs are synchronised between leader and worker backends.

Ah, thanks.  I have now found README.parallel, so I much better understand
what is synchronized, and what operations are allowed or not. :)

On 10/26/16 07:42, Craig Ringer wrote:
>
> For loaded in shared_preload_libraries, test
>
> IsPostmasterEnvironment && !IsUnderPostmaster

Hmm, IsUnderPostmaster is PGDLLIMPORTed but IsPostmasterEnvironment isn't,
so I'm out of luck on Windows. Is there another way I can check?

>> Do I detect I'm in a BGW by a non-null MyBgworkerEntry?
>
> Use IsBackgroundWorker, same place as above.

Also not PGDLLIMPORTed. MyBgworkerEntry is, though. It does appear to be
initialized to NULL. Can I get away with checking that, since I can't see
IsBackgroundWorker?

I now see what caused the reported crash. It was a parallel query that
did not make any use of PL/Java functions, but the group leader had used
them before so the library was loaded, so ParallelWorkerMain loaded it
in the worker process, so _PG_init got called and was going to refer to
stuff that wasn't set up yet, because the library loading comes pretty
early in ParallelWorkerMain.

I think I could easily fix that by having the library init code just bail
right after defining the custom GUCs, if InitializingParallelWorker
is true.

Alas, InitializingParallelWorker isn't PGDLLIMPORTed either. This isn't
my day. Is there a way I can successfully infer that on Windows?

I guess I can just bail from initialization early when in *any* kind
of background worker, and just leave the rest to be done when called
through the language handler, if ever.

This would be so much easier if Visual Studio were not a thing.

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.6, background worker processes, and PL/Java

2016-10-25 Thread Chapman Flack
On 10/25/16 18:56, Chapman Flack wrote:

> If pooled, and tied to the backend that started them, do they need
> to do anything special to detect when the leader has executed
> SET ROLE or SET SESSION AUTHORIZATION?

Let me guess ... such information is *not* synchronized across workers,
and that'd be why the manual says "functions must be marked PARALLEL
RESTRICTED if they access ... client connection state ..."?

That's probably a resounding 'no' for declaring any PL/Java function
SAFE, then.

And if changing "the transaction state even temporarily (e.g. a PL/pgsql
function which establishes an EXCEPTION block to catch errors)" is enough
to require UNSAFE, then it may be that RESTRICTED is off limits too, as
there are places PL/Java does that internally.

I take it that example refers not to just any use of PG_TRY/PG_CATCH,
but only to those uses where an internal subtransaction is used to
allow execution to continue?

If a person writes a function in some language (SQL, for example),
declares it PARALLEL SAFE but is lying because it calls another
function (in Java, say) that is PARALLEL UNSAFE or RESTRICTED,
does PostgreSQL detect or prevent that, or is it just considered
an unfortunate mistake by the goofball who declared the first
function safe?

And if that's not already prevented, could it be worth adding
code in the PL/Java call handler to detect such a situation and
make sure it ends in a meaningful ereport and not something worse?

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] 9.6, background worker processes, and PL/Java

2016-10-25 Thread Chapman Flack
Hi,

I have a report of a PL/Java crash in 9.6 where the stack trace
suggests it was trying to initialize in a background worker
process (not sure why that even happened, yet), and by my first
glance, it seems to have crashed dereferencing MyProcPort, which
I am guessing a BGW might not always have (?).

So, as I try to get up to speed on this PostgreSQL feature, it
seems to me that I have up to three different cases that I may
need to make PL/Java detect and respond appropriately to. (If
you see me veering into any misconceptions, please let me know.)

1. A worker explicitly created with Register... or RegisterDynamic...
   that has not called ...InitializeConnection... and so isn't
   any particular user or connected to any database.

2. A worker explicitly created that has called ...Initialize...
   and therefore is connected to some database as some user.
   (So, is there a MyProcPort in this case?)

3. A worker implicitly created for a parallel query plan (and therefore
   associated with a database and a user). Does this have a MyProcPort?


Case 1, I think I at most need to detect and ereport. It is hard to
imagine how it could even arise, as without a database connection
there's no pg_extension, pg_language, or pg_proc, but I suppose it
could happen if someone misguidedly puts libpljava in
shared_preload_libraries, or some other bgw code inexplicably loads
it. It's a non-useful case as PL/Java has nothing to do without
a database connection and sqlj schema.

Case 2 might be worth supporting, but I may need to account for
anything that differs in this environment from a normal connected
backend.

Case 3 seems most likely. It should only be possible by invoking
a declared Java function that somebody marked parallel-safe, right?
In the parallel-unsafe or -restricted cases, PL/Java can only find
itself invoked within the leader process?

Such a leader process can only be a normal backend? Or perhaps also
a case-2 explicitly created BGW that is executing a query?

My main question is, what state do I need to examine at startup
in order to distinguish these cases? Do I detect I'm in a BGW by
a non-null MyBgworkerEntry? If it's there, do I detect whether
I have a database and an identity by checking for a MyProcPort,
or some other way?

As for declaring functions parallel-unsafe, -restricted, or -safe,
I assume there should be no problems with PL/Java functions with
the default designation of unsafe. There should be no essential
problem if someone declares a function -restricted - provided PL/Java
itself can be audited to make sure it doesn't do any of the things
restricted functions can't do - as it will only be running in the
leader process anyway.

Even should somebody mark a PL/Java function safe, while hard to
imagine a good case for, shouldn't really break anything; as the
workers are separate processes, this should be safe. Any imagined
speed advantage of the parallel query is likely to evaporate while
the several processes load their own JVMs, but nothing should
outright break.

That leads me to:

Are BGWs for parallel queries born fresh for each query, or do they
get pooled and reused?

If pooled, can they be reused across backends/database connections/
identities, or only by the backend that created them?

If reusable across contexts, that's a dealbreaker and I'd have to
have PL/Java reject any parallel-safe declaration, but a pool tied
to a connection should be ok (and better yet, allow amortizing the
JVM startup cost).

If pooled, and tied to the backend that started them, do they need
to do anything special to detect when the leader has executed
SET ROLE or SET SESSION AUTHORIZATION?

If all of this is covered to death in some document I obviously
haven't read, please feel free to point me to it.

Thanks!
-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility

2016-08-02 Thread Chapman Flack
On 08/02/2016 02:33 PM, Bruce Momjian wrote:

> My guess is that the bytes are there to detect problems where
> a 512-byte disk sector is zeroed by a disk failure.

Does that seem plausible? (a) there is only one such header for
every 16 512-byte disk sectors, so it only affords a 6% chance of
detecting a zeroed sector, and (b) the header contains other
non-zero values in fields other than xlp_pageaddr, so the use
of a fixed value for _that field_ in zeroed tail blocks would
not prevent (or even reduce the 6% probability of) detecting
a sector zeroed by a defect.

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility

2016-07-26 Thread Chapman Flack
On 07/26/16 20:01, Michael Paquier wrote:
> On Tue, Jul 26, 2016 at 9:48 PM, Amit Kapila  wrote:
>> Does any body else see the use case
>> reported by Chapman important enough that we try to have some solution
>> for it in-core?
> 
> The lack of updates in the pg_lesslog project is a sign that it is not
> that much used. I does not seem a good idea to bring in-core a tool
> not used that much by users.

Effectively, it already was brought in-core in commit 9a20a9b.
Only, that change had an unintended consequence that *limits*
compressibility - and it would not have that consequence, if
it were changed to simply set xlp_pageaddr to InvalidXLogRecPtr
in the dummy zero pages written to fill out a segment.

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility

2016-07-26 Thread Chapman Flack
On 07/26/2016 04:21 PM, Robert Haas wrote:

> I'm kind of curious WHY you are using archiving and forcing regular
> segment switches rather than just using streaming replication.
> ... AFAIK, streaming replication
> essentially obsoleted that use case.  You can just dribble the
> individual bytes over the wire a few at a time to the standby or, with
> pg_receivexlog, to an archive location.  If it takes 6 months to fill
> up a WAL segment, you don't care: you'll always have all the bytes

Part of it is just the legacy situation: at the moment, the offsite
host is of a different architecture and hasn't got PostgreSQL
installed (but it's easily ssh'd to for delivering compressed WAL
segments).  We could change that down the road, and pg_receivexlog
would work for getting the bytes over there.

My focus for the moment was just on migrating a cluster to 9.5
without changing the surrounding arrangements all at once.
Seeing how much worse our compression ratio will be, though,
maybe I need to revisit that plan.

Even so, I'd be curious whether it would break anything to have
xlp_pageaddr simply set to InvalidXLogRecPtr in the dummy zero
pages written to fill out a segment. At least until it's felt
that archive_timeout has been so decidedly obsoleted by streaming
replication that it is removed, and the log-tail zeroing code
with it.

That at least would eliminate the risk of anyone else repeating
my astonishment. :)  I had read that 9.4 added built-in log-zeroing
code, and my first reaction was "cool! that may make the compression
technique we're using unnecessary, but certainly can't make it worse"
only to discover that it did, by ~ 300x, becoming now 3x *worse* than
plain gzip, which itself is ~ 100x worse than what we had.

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility

2016-07-26 Thread Chapman Flack
On 07/26/2016 08:48 AM, Amit Kapila wrote:

> general, if you have a very low WAL activity, then the final size of
> compressed WAL shouldn't be much even if you use gzip.  It seems your

9.5 pg_xlog, low activity test cluster (segment switches forced
only by checkpoint timeouts), compressed with gzip -9:

$ for i in 0*; do echo -n "$i  " && gzip -9 <$i | wc -c; done
000100010042  27072
000100010043  27075
000100010044  27077
000100010045  27073
000100010046  27075

Log from live pre-9.4 cluster, low-activity time of day, delta
compression using rsync:

2016-07-26 03:54:02 EDT (walship) INFO: using 2.39s user, 0.4s system,
9.11s on
wall:
231 byte 000100460029_000100460021_fwd
...
2016-07-26 04:54:01 EDT (walship) INFO: using 2.47s user, 0.4s system,
8.43s on
wall:
232 byte 00010046002A_000100460022_fwd
...
2016-07-26 05:54:02 EDT (walship) INFO: using 2.56s user, 0.29s system,
9.44s on
 wall:
230 byte 00010046002B_000100460023_fwd

So when I say "factor of 100", I'm understating slightly. (Those
timings, for the curious, include sending a copy offsite via ssh.)

> everything zero. Now, it might be possible to selectively initialize
> the fields that doesn't harm the methodology for archive you are using
> considering there is no other impact of same in code. However, it

Indeed, it is only the one header field that duplicates the low-
order part of the (hex) file name that breaks delta compression,
because it has always been incremented even when nothing else is
different, and it's scattered 2000 times through the file.
Would it break anything for *that* to be zero in dummy blocks?

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility

2016-07-25 Thread Chapman Flack
On 07/25/16 22:09, Michael Paquier wrote:

> This is over-complicating things for little gain. The new behavior of
> filling in with zeros the tail of a segment makes things far better
> when using gzip in archive_command.

Then how about filling with actual zeros, instead of with mostly-zeros
as is currently done?  That would work just as well for gzip, and would
not sacrifice the ability to do 100x better than gzip.

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility

2016-07-25 Thread Chapman Flack
On 07/23/2016 08:25 AM, Amit Kapila wrote:
> On Sat, Jul 23, 2016 at 3:32 AM, Chapman Flack <c...@anastigmatix.net> wrote:
>>
>> Would it then be possible to go back to the old behavior (or make
>> it selectable) of not overwriting the full 16 MB every time?
>>
> 
> I don't see going back to old behaviour is an improvement, because as
> as you pointed out above that it helps to improve the compression
> ratio of WAL files for tools like gzip and it doesn't seem advisable
> to loose that capability.  I think providing an option to select that
> behaviour could be one choice, but use case seems narrow to me
> considering there are tools (pglesslog) to clear the tail.  Do you
> find any problems with that tool which makes you think that it is not
> reliable?

It was a year or so ago when I was surveying tools that attempted
to do that. I had found pg_clearxlogtail, and I'm sure I also found
pglesslog / pg_compresslog ... my notes from then simply refer to
"contrib efforts like pg_clearxlogtail" and observed either a dearth
of recent search results for them, or a predominance of results
of the form "how do I get this to compile for PG x.x?"

pg_compresslog is mentioned in a section, Compressed Archive Logs,
of the PG 9.1 manual:
https://www.postgresql.org/docs/9.1/static/continuous-archiving.html#COMPRESSED-ARCHIVE-LOGS

That section is absent in the docs any version > 9.1.

The impression that leaves is of tools that relied too heavily
on internal format knowledge to be viable outside of core, which
have had at least periods of incompatibility with newer PG versions,
and whose current status, if indeed any are current, isn't easy
to find out.

It seems a bit risky (to me, anyway) to base a backup strategy
on having a tool in the pipeline that depends so heavily on
internal format knowledge, can become uncompilable between PG
releases, and isn't part of core and officially supported.

And that, I assume, was also the motivation to put the zeroing
in AdvanceXLInsertBuffer, eliminating the need for one narrow,
specialized tool like pg{_clear,_compress,less}log{,tail}, so
the job can be done with ubiquitous, bog standard (and therefore
*very* exhaustively tested) tools like gzip.

So it's just kind of unfortunate that there used to be a *further*
factor of 100 (nothing to sneeze at) possible using rsync
(another non-PG-specific, ubiquitous, exhaustively tested tool)
but a trivial feature of the new behavior has broken that.

Factors of 100 are enough to change the sorts of things you think
about, like possibly retaining years-long unbroken histories of
transactions in WAL.

What would happen if the overwriting of the log tail were really
done with just zeros, as the git comment implied, rather than zeros
with initialized headers? Could the log-reading code handle that
gracefully? That would support all forms of non-PG-specific,
ubiquitous tools used for compression; it would not break the rsync
approach.

Even so, it still seems to me that a cheaper solution is a %e
substitution in archive_command: just *tell* the command where
the valid bytes end. Accomplishes the same thing as ~ 16 MB
of otherwise-unnecessary I/O at the time of archiving each
lightly-used segment.

Then the actual zeroing could be suppressed to save I/O, maybe
with a GUC variable, or maybe just when archive_command is seen
to contain a %e. Commands that don't have a %e continue to work
and compress effectively because of the zeroing.

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility

2016-07-22 Thread Chapman Flack
Teaser: change made in 9.4 to simplify WAL segment compression
made it easier to compress a low-activity-period WAL segment
from 16 MB to about 27 kB ... but much harder to do better than
that, as I was previously doing (about two orders of magnitude
better).

At $work, we have a usually-low-activity PG database, so that almost
always the used fraction of each 16 MB WAL segment is far smaller
than 16 MB, and so it's a big win for archived-WAL storage space
if an archive-command can be written that compresses those files
effectively.

Our database was also running on a pre-9.4 version, and I'm
currently migrating to 9.5.3.  As I understand it, 9.4 was where
commit 9a20a9b landed, which changed what happens in the unwritten
'tail' of log segments.

In my understanding, before 9.4, the 'tail' of any log segment
on disk just wasn't written, and so (as segment recycling simply
involves renaming a file that held some earlier segment), the
remaining content was simply whatever had been there before
recycling. That was never a problem for recovery (which could
tell when it reached the end of real data), but was not well
compressible with a generic tool like gzip. Specialized tools
like pg_clearxlogtail existed, but had to know too much about
the internal format, and ended up unmaintained and therefore
difficult to trust.

The change in 9.4 included this, from the git comment:

  This has one user-visible change: switching to a new WAL segment
  with pg_switch_xlog() now fills the remaining unused portion of
  the segment with zeros.

... thus making the segments easily compressible with bog standard
tools. So I can just point gzip at one of our WAL segments from a
light-activity period and it goes from 16 MB down to about 27 kB.
Nice, right?

But why does it break my earlier approach, which was doing about
two orders of magnitude better, getting low-activity WAL segments
down to 200 to 300 *bytes*? (Seriously: my last solid year of
archived WAL is contained in a 613 MB zip file.)

That approach was based on using rsync (also bog standard) to
tease apart the changed and unchanged bits of the newly-archived
segment and the last-seen content of the file with the same
i-number. You would expect that to work just as well when the
tail is always zeros as it was working before, right?

And what's breaking it now is the tiny bit of fine
print that's in the code comment for AdvanceXLInsertBuffer but
not in the git comment above:

  * ... Any new pages are initialized to zeros, with pages headers
  * initialized properly.

That innocuous "headers initialized" means that the tail of the
file is *almost* all zeros, but every 8 kB there is a tiny header,
and in each tiny header, there is *one byte* that differs from
its value in the pre-recycle content at the same i-node, because
that one byte in each header reflects the WAL segment number.

Before the 9.4 change, I see there were still headers there,
and they did contain a byte matching the segment number, but in
the unwritten portion of course it matched the pre-recycle
segment number, and rsync easily detected the whole unchanged
tail of the file. Now there is one changed byte every 8 kB,
and the rsync output, instead of being 100x better than vanilla
gzip, is about 3x worse.

Taking a step back, isn't overwriting the whole unused tail of
each 16 MB segment really just an I/O intensive way of communicating
to the archive-command where the valid data ends?  Could that not
be done more efficiently by adding another code, say %e, in
archive-command, that would be substituted by the offset of the
end of the XLOG_SWITCH record? That way, however archive-command
is implemented, it could simply know how much of the file to
copy.

Would it then be possible to go back to the old behavior (or make
it selectable) of not overwriting the full 16 MB every time?
Or did the 9.4 changes also change enough other logic that stuff
would now break if that isn't done?

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Postgres 9.6 scariest patch tournament

2016-04-19 Thread Chapman Flack
On 04/18/2016 04:22 PM, Josh berkus wrote:
> 
> We should send the owner of the scariest patch something as a prize.
> Maybe a plastic skeleton or something ...

A mouse.

-Chap



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is a syscache tuple more like an on-disk tuple or a freshly made one?

2016-04-15 Thread Chapman Flack
On 04/15/16 18:03, Alvaro Herrera wrote:

> I suppose you could create a copy of the tuple (SysCacheSearchCopy) and
> use that for HeapTupleGetDatum.  The problem with the syscache tuple is
> that it can go away as soon as you do the ReleaseSysCache -- it lives in
> shared_buffers memory, so when it's released the buffer might get
> evicted.

Sure ... I wasn't going to call ReleaseSysCache until I was all done
with it anyway, should only take microseconds ... thought I'd be
clever and avoid making a copy, and pass it to existing code expecting
a Datum, but I guess that's more trouble than it's worth.

> A "syscache tuple" is definitely an on-disk tuple.

Got it. Thanks!

On 04/15/16 18:13, Tom Lane wrote:

> You could use heap_copy_tuple_as_datum().

Thanks, that looks like what the doctor ordered.

For pre-9.4, would the equivalent be basically
heap_form_tuple applied to the results of heap_deform_tuple ?

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Is a syscache tuple more like an on-disk tuple or a freshly made one?

2016-04-15 Thread Chapman Flack
Please bear with a neophyte question ...

I am tempted to apply HeapTupleGetDatum to a tuple retrieved from
the syscache (as I already have code for processing a tuple presented
as a Datum).

But I see a comment on HeapTupleHeaderGetDatum: "This must *not* get
applied to an on-disk tuple; the tuple should be freshly made by
heap_form_tuple or some wrapper ..."

... and here I confess I'm unsure whether a tuple retrieved from
the syscache is more like an on-disk one, or a freshly-made one,
for purposes of the warning in that comment.

Is there a conventional proper way to pass a tuple retrieved from
syscache to code that accepts a tuple as a Datum? Or is there some
fundamental reason a smart person wouldn't do that?

Thanks,
-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Safe to apply HeapTupleHeaderGetDatum to a tuple from syscache?

2016-04-13 Thread Chapman Flack
I'm still learning, and looking at the HeapTupleHeaderGetDatum comment
that says

  This must *not* get applied to an on-disk tuple; the tuple should
  be freshly made by heap_form_tuple or some wrapper routine for it
  (such as BuildTupleFromCStrings).  Be sure also that the tupledesc
  used to build the tuple has a properly "blessed" rowtype.

... and I'm not 100% confident I know where a pg_proc or pg_language
tuple just retrieved from the syscache fits in that picture.

It would be convenient if safe, because I'd like to take a bit of
brittle C code in PL/Java that works out what to do from the proc
tuple, and reimplement that part in Java where it can be expressed
more concisely, and PL/Java already has infrastructure to take a
Datum representing a HeapTupleHeader and present it as an introspectable
readonly Java type. So if I can safely pass a pg_proc tuple from the cache
to HeapTupleGetDatum, then I'm only a couple lines of code away from
passing it into a Java method to do the remaining work there.

Or do I need to do something harder than that?

Thanks,
-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PROPOSAL: make PostgreSQL sanitizers-friendly (and prevent information disclosure)

2016-03-21 Thread Chapman Flack
On 03/21/2016 10:21 AM, Aleksander Alekseev wrote:

> Well in this case here is a patch that fixes "use of uninitialized
> value" reports by MemorySanitizer I managed to catch so far.

I'm new here so someone more experienced would have to weigh in,
but I would wonder a couple of things:

a. whether a braced struct assignment is supported in every
   C compiler that PostgreSQL still intends to support

b. whether such a struct assignment is guaranteed to initialize
   padding spaces as well as declared fields (in all supported
   C versions/compilers).

It's possible that memset() would be more convincing.

-Chap



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: PROPOSAL: make PostgreSQL sanitizers-friendly (and prevent information disclosure)

2016-03-21 Thread Chapman Flack
On 03/21/2016 06:08 AM, Aleksander Alekseev wrote:

> As you may see there are "holes" that were in fact not filled. Under
> normal conditions they will be filled with data previously stored on
> stack which could be anything including passwords and other private
> data. Afterwards this structure is written to disk where potentially
> someone who not supposed to see this data may see it.
> 
> I realize this is not a big problem in practice.

Well, the documentation already says to avoid it:

http://www.postgresql.org/docs/current/static/xfunc-c.html

   Another important point is to avoid leaving any uninitialized
   bits within data type values; for example, take care to zero out
   any alignment padding bytes that might be present in structs.

so I don't think what you're suggesting would be controversial
at all; it looks like what you've done is found a(t least one)
bug where the documented practice wasn't followed, and it's good
to find any such places.

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Make primnodes.h gender neutral

2016-03-19 Thread Chapman Flack
On 03/17/16 19:09, Robert Haas wrote:
> On Thu, Mar 17, 2016 at 6:34 PM, Chapman Flack <c...@anastigmatix.net> wrote:

> Not to pick on you in particular...
> Debating whether or not somebody is currently upset about this, and
> how upset the are, and what the value is of fixing it is missing the
> point.

Well, looking at my response in particular, you can see that I *began
it* with concrete constructive suggestions about fixing it, so that was
never part of the question. (I work pretty hard at non-obtrusively-gendered
language myself, I'm from a region/era where that was a thing to strive
for so it feels natural to me ... not that I never have something slip out
that someone might object to, but most obviously gendered usages already
sound funny to me, so by and large I avoid them.)

It was only in the later part of my comment that I asked for a link,
and even there I said nothing about "whether or not somebody is currently
upset", and certainly not "Is it ridiculous?" as in Joshua's response to
me. I just wanted a chance to read the comments in question, because
I hadn't been able to google them, and I *wanted the chance to get familiar
with the opinions and arguments of those vocally concerned in their own
words* ... because that's a thing I like to do.

So Joshua included the link, so I'll go read now :)

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Make primnodes.h gender neutral

2016-03-19 Thread Chapman Flack
On 03/17/16 17:29, Kevin Grittner wrote:
> On Thu, Mar 17, 2016 at 4:11 PM, Tom Lane  wrote:
> 
>> Sexist language is a distraction
>> for some, in-your-face non-sexism (such as made-up pronouns) is a
>> distraction for others, bad or awkward grammar is a distraction for yet
>> others.  It's not that easy to write prose that manages not to call
>> attention to itself in any of these ways.  But that's what we need to
>> do, and s/xxx/yyy/g editing that's only thinking about one of these
>> concerns is unlikely to get us there.
> 
> +1

^^^ I would have said that if I'd been fast enough.

> A grep with a quick skim of the results to exclude references to
> particular people who are mentioned by name and then referred to
> with a pronoun (which I assume we can leave alone), suggest there
> are about 70 lines in the 1346667 line C code base that need work.
> 
> Any word-smiths out there who want to volunteer to sort this out?

So that must be N affected files for some N <= 70 ...

what would you think of starting a wiki page with those N filenames
(so nobody has to repeat your grepping/skimming effort), and volunteers
can claim a file or five, marking them taken on that page, and wordsmith
away?

On 03/17/16 17:17, Gavin Flower wrote:
> Wanna bet? There is a very loud movement about this.

For those of us who are outside of the twitterverse sort of on purpose,
are there a few representative links you could post? Maybe this is such
fresh breaking news Google hasn't spidered it yet, but I didn't find
any reference to the primnodes language when I looked, and I really am
curious to see just exactly what kind of issue is being made around it

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] typmod is always -1

2016-03-19 Thread Chapman Flack
On 03/17/16 09:35, Tom Lane wrote:
> Chapman Flack <c...@anastigmatix.net> writes:
>> I'm in the same boat ... I have an input function I want to test, and so
>> far I have failed to think of *any* sql construct that causes it to be
>> invoked with other than -1 for the typmod.
> 
> COPY was the first case a quick grep came across.

Thanks, that does make a working test. Given a table with a typmod'd
column, COPY FROM exercises the 'input' function with a typmod != -1,
and COPY FROM (FORMAT BINARY) likewise exercises the 'receive' function.

While I'm here, I guess I should check the sense I am getting of what
can and can't be workable semantics for type modifiers.

It seems that a typmod can only be used restrict the set of possible
values of the unmodified type (as clearly seen in the language "length
conversion cast", since certainly a typmod allowing { string | length < N }
is doing nothing but enforcing a subset of { string }. Each element of
the subset is still a valid element of the whole set (naturally, boring)
*and has to be represented the same way* (interesting): the representation
mustn't do clever things that you would need to know the typmod in order to
interpret, because most uses of a value are without access to the typmod.

So, the generalization of "length conversion cast" could be something like
"typmod application cast" and the only things a typmod application cast can
do to a value V are:

1. pass V unchanged if it is in the subset implied by the typmod
2. silently pass some V' that is in that subset and "close to" V
   in some sense (longest initial substring shorter than N, nearest
   numeric value with no more than N precision digits, etc.)
3. fail

with sometimes the choice of (2) or (3) depending on whether the cast
is explicit or not.

All in all, very like a domain, except a domain can only do (1) or (3),
not (2).

Differences in representation, like short strings getting 1-byte headers,
are only possible as a consequence of a lower layer doing that consistently
to all values that happen to be short, and not as an effect of a typmod.

Am I getting it about right?

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] typmod is always -1

2016-03-18 Thread Chapman Flack
nothing like resurrecting a really old thread ...

> Pavel Stehule  writes:
>> I have a problem - every call of mvarcharin is with typmod = -1.

2009/3/17 Tom Lane :
> Also, there are a bunch of scenarios where we rely on a cast function to
> apply the typmod rather than passing it to the input function initially.
> I'm not sure if the particular case you're checking here falls into that
> category,

Is it possible to name any case that *does not* fall into that category?

I'm in the same boat ... I have an input function I want to test, and so
far I have failed to think of *any* sql construct that causes it to be
invoked with other than -1 for the typmod.

> but you definitely should have a "length conversion cast"
> function in pg_cast if you expect to do anything useful with typmod.

Ok, that's good to know (and I didn't until now). But back to the
input and recv functions, which are both documented to have 3-arg
forms that get typmods ... how would one test them?  Is there any
sql syntax that can be written to make them get passed a typmod?

If I just write them with assert(typmod == -1), will anyone ever
see a failure?

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pstrdup(TextDatumGetCString(foo)) ?

2016-03-09 Thread Chapman Flack
I am encountering, here and there, an idiom like

  pstrdup(TextDatumGetCString(foo))

or a pre-8.4 version,

  pstrdup(DatumGetCString(DirectFunctionCall1(textout, foo)))

It's leading me to question my sanity because it appears to me
that both text_to_cstring (underlying TextDatumGetCString) and
textout already return a string palloc'd in the current context,
and that pstrdup (without any change of context) can't be accomplishing
anything. I'm probably missing something crucial, but what?

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Managing a long-held TupleDesc reference

2016-03-08 Thread Chapman Flack
When PL/Java is told to map a PostgreSQL composite type to a certain
Java class, on its first use of the type mapping it calls
lookup_rowtype_tupdesc_noerror and then creates a PL/Java UDT structure
that retains a reference to the TupleDesc. This seems to be what is leading
to a TupleDesc reference leak warning at completion of the transaction.

So I am wondering what a recommended way of managing these TupleDescs
would be. I could use lookup_rowtype_tupdesc_copy before saving it in
the UDT struct, and I assume that would silence the leak warning, but
would that be asking for trouble if the composite type gets altered
during the session and the saved TupleDesc is stale?

If that's an issue, what would be better? Should UDT not retain a
TupleDesc, but rather look it up with each use? (That would happen
in preparation for calling a Java function with a parameter or return
of that type, or when a Java function makes JDBC-SPI calls touching
that type.) Reading typcache.c, I get the impression that it may be
intended to be fast enough for such usage, and that it manages the
complexity of invalidating entries when something gets altered.

Should there be some intermediate solution where UDT does retain a
TupleDesc reference within a transaction, but certain callbacks are
registered to know when to release or refresh it?

Is there a canonical, preferred approach?

Thanks,
-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] character_not_in_repertoire vs. untranslatable_character

2016-03-06 Thread Chapman Flack
So there's an ISO error 22021 "character not in repertoire" and
a PostgreSQL error 22P05 "untranslatable character" that seem
very similar.

If I look in backend/utils/mb/wchar.c, it looks as if PostgreSQL
uses the first for the case of a corrupted encoding (bytes that
can't be decoded to a character at all), and the second for the
case of a valid character that isn't available in a conversion's
destination encoding.

Am I right about that? The names seem sort of confusable, and even
reading ISO 9075 drafts I haven't really found any additional detail
on what they wanted 22021 to mean, so I guess as long as I know what
it means in PG, that's as good as it gets. :)

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL extension API? Documentation?

2016-02-27 Thread Chapman Flack
On 02/27/16 14:11, Álvaro Hernández Tortosa wrote:

> Why would it be so much work? Creating a function list, and maybe
> documenting those, doesn't sound like a daunting task.
> 
> I wouldn't mind volunteering for this work, but I guess I would need
> some help to understand and identify the candidate parts of the API.

I guess one daunting part is that the first approximation to "candidate
parts of the API" is something like "that which is useful to extensions"
and there are a lot of those, adding a really wide variety of capabilities,
and not all of their maintainers may be close followers of -hackers or
in a position to promptly answer if you asked "what are all the PostgreSQL
interfaces your extension relies on and why?".

My experience in working on PL/Java has been, sort of recurringly, that
I may appear on -hackers needing to advocate that PGDLLIMPORT be put on
some recently-added variable, or that there be some way to hook into
the extension dependency mechanism (to cite a couple recent examples)
and face initial questions on why such a need crops up in an extension.
So it takes some more explaining, and I don't think that reflects in
any way on the perspicacity of the -hackers readership; it's just that
any piece you're not personally immersed in is likely to have details
that won't have jumped out at you.

Such things probably lurk in the corners of most existing extensions,
of which there are a lot.

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL extension API? Documentation?

2016-02-27 Thread Chapman Flack
On 02/27/16 13:51, Álvaro Hernández Tortosa wrote:

> ... I still lack a list of functions that might be callable (I
> understand not even those labeled with PGDLLIMPORT are all good candidates
> and some good candidates might not be labeled as such) from an extension
> point of view. Have you come across such a list over any of these threads?

On my best understanding, there isn't really such a thing exactly.
If the formulation by Andres is persuasive ("We have a (mostly) proper API.
Just not an internal/external API split"), then the good references for
hacking an extension will be essentially the same as the good references
for hacking PostgreSQL, such as the "Hacking PostgreSQL Resources" found
on the "So, you want to be a developer?" wiki page:

https://wiki.postgresql.org/wiki/So,_you_want_to_be_a_developer%3F

Also, the PostgreSQL code repository has a lot of README files in
subdirectories where important pieces of the architecture happen,
and they are very informative and worth reading, and also the comments
are often quite comprehensive in the .h or .c files pertaining to the
parts of the system you need to interact with.

The extra ingredients for being an *extension* author, in the absence
of any formalized "this is the extension API" documentation, seem to be
those unformalized qualities like taste or restraint, in looking over
the available interfaces and judging which ones seem to be fundamental,
useful, stable, less likely to be whacked around later, etc. Those
qualities also can be called "enlightened self-interest" because you
are not looking forward to fixing your busted extension when something
you have relied on changes.

Another piece of the puzzle seems to be participating on -hackers
so that you may see what changes are coming, or possibly advocate
for why a particular interface really is useful to your extension
and is worth committing to.

If there is some subspace of possible extensions where you are
interested in working, taking on some maintenance of an existing
extension in that space, thereby getting familiar with what interfaces
it relies on and why, seems to be an effective baptism-by-fire. :)
The danger to avoid would be then drawing overbroad conclusions about
what should or shouldn't be extension API, based on what is useful
for the subspace of imaginable extensions in which you are working.

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL extension API? Documentation?

2016-02-27 Thread Chapman Flack
On 02/27/16 08:37, Álvaro Hernández Tortosa wrote:
> In other words: what is the API surface exposed by PostgreSQL to
> extension developers? The assumption is that no PostgreSQL code should be
> modified, just adding your own and calling existing funcitons.

That's an excellent question that repeatedly comes up, in particular
because of the difference between the way the MSVC linker works on Windows,
and the way most other linkers work on other platforms.

The issue there is ... on most non-Windows platforms, there are only the
general C rules to think about: if a symbol is static (or auto of course)
it is not visible to extensions, but otherwise it is.

For MSVC, in contrast, symbols need to have a certain decoration
(look for PGDLLIMPORT in various PostgreSQL .h files) for an MSVC-built
extension to be able to see it, otherwise it isn't accessible.

Well, that's not quite right. It turns out (and it may have taken some
work on the build process to make it turn out this way) ... *functions*
are accessible from MSVC (as long as they would be accessible under
normal C rules) whether or not they have PGDLLIMPORT. It's just
data symbols/variables that have to have PGDLLIMPORT or they aren't
available on Windows/MSVC.

And *that* arrangement is the result of a long thread in 2014 that
unfolded after discovering that what was really happening in MSVC
*before* that was that MSVC would silently pretend to link your
non-PGDLLIMPORT data symbols, and then give you the wrong data.

http://www.postgresql.org/message-id/flat/52fab90b.6020...@2ndquadrant.com

In that long thread, there are a few messages in the middle that probably
give the closest current answer to your API question. Craig Ringer has
consistently favored making other platforms work more like Windows/MSVC,
so that the PGDLLIMPORT business would serve to limit and more clearly
define the API surface:

http://www.postgresql.org/message-id/52ef1468.6080...@2ndquadrant.com

Andres Freund had the pragmatic reply:

http://www.postgresql.org/message-id/20140203103701.ga1...@awork2.anarazel.de

> I think that'd be an exercise in futility. ... We'd break countless
> extensions people have written. ... we'd need to have a really
> separate API layer ... doesn't seem likely to arrive anytime soon,
> if ever.

which was ultimately concurred in by Tom, and Craig too:

http://www.postgresql.org/message-id/29286.1391436...@sss.pgh.pa.us
http://www.postgresql.org/message-id/52efa654.8010...@2ndquadrant.com

Andres characterized it as "We have a (mostly) proper API. Just not
an internal/external API split."

http://www.postgresql.org/message-id/20140203142514.gd1...@awork2.anarazel.de

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] about google summer of code 2016

2016-02-21 Thread Chapman Flack
On 02/21/16 23:10, Tom Lane wrote:

> Another variable is that your answers might depend on what format you
> assume the client is trying to convert from/to.  (It's presumably not
> text JSON, but then what is it?)

This connects tangentially to a question I've been meaning to ask
for a while, since I was looking at the representation of XML.

As far as I can tell, XML is simply stored in its character serialized
representation (very likely compressed, if large enough to TOAST), and
the text in/out methods simply deal in that representation. The 'binary'
send/recv methods seem to differ only in possibly using a different
character encoding on the wire.

Now, also as I understand it, there's no requirement that a type even
/have/ binary send/recv methods. Text in/out it always needs, but send/recv
only if they are interesting enough to buy you something. I'm not sure
the XML send/recv really do buy anything. It is not as if they present the
XML in any more structured or tokenized form. If they buy anything at all,
it may be only an extra transcoding that the other end will probably
immediately do in reverse.

So, if that's the situation, is there some other, really simple, choice
for what XML send/recv might usefully do, that would buy more than what
they do now?

Well, PGLZ is in libpqcommon now, right? What if xml send wrote a flag
to indicate compressed or not, and then if the value is compressed TOAST,
streamed it right out as is, with no expansion on the server? I could see
that being a worthwhile win, /without even having to devise some
XML-specific encoding/. (XML has a big expansion ratio.)

And, since that idea is not inherently XML-specific ... does the JSONB
representation have the same properties?  How about even text or bytea?




The XML question has a related, JDBC-specific part. JDBC presents XML
via interfaces that can deal in Source and Result objects, and these
come in different flavors (DOMSource, an all-in-memory tree, SAXSource
and StAXSource, both streaming tokenized forms, or StreamSource, a
streaming, character-serialized form). Client code can ask for one of
those forms explicitly, or use null to say it doesn't care. In the
doesn't-care case, the driver is expected to choose the form closest
to what it's got under the hood; the client can convert if necessary,
and if it had any other preference, it would have said so. For PGJDBC,
that choice would naturally be the character StreamSource, because that
/is/ the form it's got under the hood, but for reasons mysterious to me,
pgjdbc actually chooses DOMSource in the don't-care case, and then
expends the full effort of turning the serialized stream it does have
into a full in-memory DOM that the client hasn't asked for and might
not even want. I know this is more a PGJDBC question, but I mention it
here just because it's so much like the what-should-send/recv-do question,
repeated at another level.

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How are CREATE EXTENSION ... VERSION or ALTER EXTENSION ... UPDATE TO ... intended to work?

2016-02-21 Thread Chapman Flack
On 02/16/16 22:44, Tom Lane wrote:
> Chapman Flack <c...@anastigmatix.net> writes:
>> been several releases of an extension, and the extensions directory
>> is now populated with a bunch of files quux--1.0.sql, quux--1.1.sql,
>> quux--1.0--1.1.sql, quux--1.1--1.0.sql, ..., quux.control.
>> And somewhere in $libdir there are quux-1.0.so, quux-1.1.so.
> 
> Well, at least so far as the existing extensions in contrib are concerned,
> there are *not* version numbers in the .so filenames.  This means you
> can't have more than one version of the .so installed at once, but we've
> not really found a need for that.  It's usually feasible --- and desirable
> --- to keep ABI compatibility to the extent that the new .so can be
> swapped in for the old without needing to change the SQL function
> definitions immediately.

It's true enough that in PL/Java's case, the ABIs / interfaces between
the SQL function definitions and their implementations in the .so have
been quite stable for years, so there might be no immediate problem.
On the other hand, other details of the implementation (bugs come to mind)
do change ... letting a version-specific CREATE EXTENSION load an
unversioned .so could lead to surprises in that area, because it would
/appear to succeed/, meaning pg_extension.extversion would get changed
to what you /thought/ you had loaded (and what you would probably answer,
if asked "what version are you reporting this bug against?"), while what's
actually running could be different, and if I squint I see diagnostic
headaches lying not far around that bend.

> into the field.  Avoiding MODULE_PATHNAME in favor of writing out
> the versioned .so name in the .sql files is probably the path of
> least resistance.

Agreed ... for once maybe I'll follow it.

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] MinGW / Windows / printf format specifiers

2016-02-18 Thread Chapman Flack
So I announce a PL/Java 1.5.0-BETA1 with a comment that windows-MSVC
was well tested but that I hadn't found anyone to test windows-mingw,
and behold, what should happen but someone turns up with an interest
in testing the mingw build, so we might have that back in shape as
well for -BETA2.

Haven't run into any showstoppers, but gcc is issuing printf-format
warnings in three places where JNI values of type jint or jlong are
printed.

The cause seems to be that Windows conventions have int = long = int32
(even on 64-bit platforms) and only 'long long' = int64. The Java JNI
headers of course know this, so they type jlong as 'long long', while
jint they type as 'long' - curiously, because they could just call it
int and get the same width. Maybe a habit from a 16-bit C environment?

gcc warns about %ld used with a jlong argument (and it is right,
because for Windows that really should be %lld or %I64d).

It also complains about %d used with a jint argument (a bit spuriously,
because it wants to see %d paired with 'int' while 'jint' is typed as
'long', producing a warning _even though those have the same width_).

I'm familiar with a common way of handling this using a macro like
PRId64 that expands to the correct printf format code on each platform
(printf("My value is %10" PRId64 "\n", v)) ... which then becomes a
potential localization headache because if the format string is the
message key, it's now not the same between platforms.

In my particular case, PL/Java has never had any localization effort
yet, and the only affected sites right now are two debug messages and
one error, not places where the urgency to localize burns hottest.

But if others here have already considered these issues and settled
on a good approach, I'd be happy to not spend time inventing another.

I found some of the printf format specifier differences, Windows to
other platforms, described well in this StackOverflow thread:

http://stackoverflow.com/questions/6299083/cross-platform-printing-of-64-bit-integers-with-printf

And the MinGW wiki has a directly relevant page:

https://sourceforge.net/p/mingw-w64/wiki2/gnu%20printf/

They provide their own printf that supports %lld (you can get it by
default by defining __USE_MINGW_ANSI_STDIO) ... and, to avoid
spurious compiler warnings, they also define a macro __MINGW_PRINTF_FORMAT
that can be used in __attribute__((__format__ ...))) so gcc's format
checker applies the right checks.

Have issues like this been dealt with in PostgreSQL code before, and did
a favorite approach emerge?

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] about google summer of code 2016

2016-02-18 Thread Chapman Flack
On 02/18/16 19:35, Amit Langote wrote:

> Apparently, the deadline is: February 20, 2016 at 04:00 (+0900 UTC)
> 
> https://summerofcode.withgoogle.com/

For anybody finding that web site as anti-navigable as I did, here
are more direct links to the actual rules, and terms of agreement
for the various participants:

https://summerofcode.withgoogle.com/rules/
https://summerofcode.withgoogle.com/terms/org
https://summerofcode.withgoogle.com/terms/mentor
https://summerofcode.withgoogle.com/terms/student

Here is a question: does it ever happen that PostgreSQL acts as
the org for a project that is PostgreSQL-related but isn't
directly PGDG-led?

... there are definitely interesting and promising areas for further
development in PL/Java beyond what I would ever have time to tackle
solo, and I could easily enjoy mentoring someone through one or
another of them over a summer, which could also help reinvigorate
the project and get another developer familiar with it at a
non-superficial level.  While I could easily see myself mentoring,
I think it would feel like overkill to apply individually as a
one-trick 'organization'.

I see that there was a "based on PL/Java" GSoC'12 project, so maybe
there is some room for non-core ideas under the PostgreSQL ægis?

In any case, I am quite confident that I could *not* complete a
separate org application by tomorrow 2 pm EST. In reading the rules,
it looks possible that the Ideas List does not have to accompany
the org application, but would be needed shortly after acceptance?

If acceptance announcements are 29 February, I could have some
ideas drafted by then.

Is this a thinkable thought?

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] How are CREATE EXTENSION ... VERSION or ALTER EXTENSION ... UPDATE TO ... intended to work?

2016-02-16 Thread Chapman Flack
I've been looking a little more deeply at the extension mechanism,
trying to answer my own question about what happens once there have
been several releases of an extension, and the extensions directory
is now populated with a bunch of files quux--1.0.sql, quux--1.1.sql,
quux--1.0--1.1.sql, quux--1.1--1.0.sql, ..., quux.control.
And somewhere in $libdir there are quux-1.0.so, quux-1.1.so.

The .sql scripts pretty much all CREATE OR REPLACE the function quux()
AS 'MODULE_PATHNAME', 'quux', and module_pathname is set in quux.control.
That file was most recently written when quux-1.1 was installed, so it
defines module_pathname as $libdir/quux-1.1 and the default_version is 1.1.

So it's clear how a plain CREATE EXTENSION quux; works. No question there.
But what is intended to happen if I want to CREATE EXTENSION quux
VERSION 1.0; ?

If there is still a file quux--1.0.sql in extensions/ (which there
may very well be), it will be executed. But it may still contain
CREATE OR REPLACE FUNCTION quux() ... AS 'MODULE_PATHNAME', 'quux'
(which was, after all, correct back when that was the current release)
but now the definition in the current quux.control will cause
MODULE_PATHNAME to expand to quux-1.1.so, not 1.0 as expected.

At least, that's what would happen if each new quux release just ships
new extension files (a new quux.control, new main quux--1.x.sql, and a
few quux--1.y--1.x.sql update files), but does not overwrite the older
quux--1.y.sql files the site may already have in the extensions/ directory.

A workaround could be that each new quux release installer either removes
all pre-existing older quux--*.sql files (so then you would never have
the option to downgrade or create from an older version) ... or overwrites
them all with new versions that hardcode the older shared-object names
instead of using the magic MODULE_PATHNAME. A sort of 'freeze' operation.

That seems to lead to a situation where the *simple* part of the extension
build script is the part that actually builds the extension, along with the
.control file and the new main quux--1.x.sql file (which can just be a
boilerplate with the version injected in two places, and otherwise never
change between releases), while a steadily-growing part of the build script
will only be there to generate overwriting versions of older .sql files
that have their corresponding older module pathnames hardcoded in.

OR ... avoid using module_pathname and just generate every .sql file
with the correct pathname injected ... then older files don't have to
be overwritten when newer versions of the extension are installed, the
previously-installed ones can just be left in place, and they will always
refer to the correct version of the module. That would work, but leave
module_pathname rather useless.

It seems to me that maybe this scheme is missing something like a
%v substitution that can be specified in the .control file as part
of the module_pathname value:

module_pathname = '$libdir/quux-%v'

which the system would expand to the effective version string whenever
substituting MODULE_PATHNAME into any of the .sql scripts.

"Effective version string" would need careful attention; I think if the
upgrade planner picks a sequence of .sql scripts to execute, within each
script in turn, %v needs to expand to the 'destination' version of that
script, that is, the version string that appears last in that script's
filename.

I think with a change like that, there would be less danger that
extension build scripts grow to an unnecessary and awkward complexity
just to deal with curating the collection of .sql scripts associated
with past versions.

Maybe it should be a new keyword, like module_pathname_versioned, just
to avoid changing the meaning of anybody's current module_pathname that
might have a literal %v.

OR ... am I completely overlooking a better way of using the facility
as it now exists?

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A bit of PG archeology uncovers an interesting Linux/Unix factoid

2016-02-15 Thread Chapman Flack
On 02/15/16 20:03, Greg Stark wrote:
> On Tue, Feb 16, 2016 at 12:51 AM, Chapman Flack <c...@anastigmatix.net> wrote:
>> If the calling process subsequently waits for its
>> children, and the process has no unwaited for children that were
>> transformed into zombie processes, it will block until all of its
>> children terminate, and wait(), wait3(), waitid() and waitpid() will
>> fail and set errno to [ECHILD].

> And actually looking at that documentation it's not clear to me why
> it's the case. I would have expected system to immediately call
> waitpid after the fork and unless the subprocess was very quick that
> should be sufficient to get the exit code. One might even imagine
> having system intentionally have some kind interlock to ensure that
> the parent has called waitpid before the child execs the shell.

Doesn't the wording suggest that even if the parent is fast enough
to call waitpid before the child exits, waitpid will only block until
the child terminates and then say ECHILD anyway?

I wouldn't be surprised if they specified it that way to avoid creating
a race condition where you would *sometimes* think it was doing what you
wanted.

Agree that the language for ECHILD in system(3) doesn't clearly reflect that
in the "status ... is no longer available" description it gives for ECHILD.

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A bit of PG archeology uncovers an interesting Linux/Unix factoid

2016-02-15 Thread Chapman Flack
On 02/15/16 13:42, Greg Stark wrote:

> (it returns error with errno ECHILD upon successful completion of 
> commands).
> This fix ignores an error from system() if errno == ECHILD.
> 
> It looks like Linux now behaves similarly,

It seems to be official, in the Single Unix Specification:
http://pubs.opengroup.org/onlinepubs/7908799/xsh/sigaction.html

SA_NOCLDWAIT
If set, and sig equals SIGCHLD, child processes of the calling
processes will not be transformed into zombie processes when they
terminate. If the calling process subsequently waits for its
children, and the process has no unwaited for children that were
transformed into zombie processes, it will block until all of its
children terminate, and wait(), wait3(), waitid() and waitpid() will
fail and set errno to [ECHILD]. Otherwise, terminating child
processes will be transformed into zombie processes, unless SIGCHLD
is set to SIG_IGN.

> So just in case anyone else wants to use system() in Postgres or
> indeed any other Unix application that twiddles with the SIGCHILD
> handler this is something to beware of. It's not entirely clear to me
> that the mention of SA_NOCLDWAIT is the only way to get this
> behaviour, at least one stackoverflow answer implied just setting
> SIG_IGN was enough.

Yup:

• If a process sets the action for the SIGCHLD signal to SIG_IGN, the
behaviour is unspecified, except as specified below. If the action
for the SIGCHLD signal is set to SIG_IGN, child processes of the
calling processes will not be transformed into zombie processes when
they terminate. If the calling process subsequently waits for its
children, and the process has no unwaited for children that were
transformed into zombie processes, it will block until all of its
children terminate, and wait(), wait3(), waitid() and waitpid() will
fail and set errno to [ECHILD].

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mac OS: invalid byte sequence for encoding "UTF8"

2016-02-10 Thread Chapman Flack
On 02/10/16 17:19, Tom Lane wrote:

> I also verified that in UTF8-based locales, isspace() thinks that 0x85 and
> 0xA0, and no other high-bit-set values, are spaces.  Not sure exactly why

Unicode NEXT LINE (NEL) and NO-BREAK SPACE, respectively.

http://unicode.org/standard/reports/tr13/tr13-5.html
http://unicode.org/charts/PDF/U0080.pdf

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mac OS: invalid byte sequence for encoding "UTF8"

2016-02-10 Thread Chapman Flack
On 02/10/16 23:55, Tom Lane wrote:

> Yeah, I got that --- what seems squishier is that none of the other C1
> control characters are considered whitespace?

That seems to be exactly the case:

http://www.unicode.org/Public/5.2.0/ucd/PropList.txt

09..0D, 20, 85, and A0 are the only whitespace chars whose codepoints
fit in a byte.

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: schema PL session variables

2016-02-08 Thread Chapman Flack
[resending because thunderbird helpfully defaulted my sender
address to the one that -isn't- subscribed to -hackers, sorry]


On 02/08/2016 03:16 AM, Pavel Stehule wrote:

> Only a owner of schema can edit functions inside schema

Can't anyone granted CREATE on the schema do that? Would
that be changed by this proposal?

-Chap



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementing a new Scripting Language

2016-01-27 Thread Chapman Flack
On 01/27/16 16:31, Igal @ Lucee.org wrote:

> This can be a major thing.  I will open a ticket in
> https://github.com/tada/pljava -- or is it already on the roadmap?

Now that you mention it, it isn't officially in a ticket. Though it's
not like I was going to forget. :)  I can guarantee it won't be in 1.5...

Speaking of tickets, I should probably make actual tickets, for after
1.5.0, out of all the items now showing in the "known issues" section
of the draft release notes. More chores

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementing a new Scripting Language

2016-01-27 Thread Chapman Flack
On 01/27/16 22:26, Igal @ Lucee.org wrote:

> If I can help with anything with the pl/Java project I'd love to help.

Man, you do NOT know what you just walked into.  :D

The most imminent thing I see happening is s/1.5.0-SNAPSHOT/1.5.0-BETA1
which is not far off, so testing is always good. Maybe a good opportunity
for you to try it out, get some Lucee code running ... but by separately
compiling and installing jars at first, no JSR two-twenty-three for the
near future. See if it even does what you're hoping it'll do. Depending
on what's reported during beta, there could be delightful diagnostic
diversions too.

I guess getting some time in playing with PostgreSQL and installing
PL/Java would be the right way to start.  Discussion that's specific
to PL/Java and might not interest all of -hackers can also happen on
pljava-...@lists.pgfoundry.org.

Cheers,
-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementing a new Scripting Language

2016-01-27 Thread Chapman Flack
On 01/27/2016 11:46 AM, Igal @ Lucee.org wrote:

>> Have you checked PL/Java?
> That seems like a good place to start, thanks.  Are there also any docs
> about the subject?

I just did a quick search on Lucee and what I found suggests that
it compiles to JVM bytecode and runs on a JVM. If that is the
case, and it can compile methods that will have the sort of
method signatures PL/Java expects, and you can put the .class
files in a jar and load it, your job should be just about done. :)

Or, you might end up writing thin wrappers in Java, probably
nothing more.

Another possibility: Java has pluggable script engine support
(java specification request 233, see the javax.script package).
Does Lucee have an existing JSR 233 engine implementation?

PL/Java does not _currently_ have JSR233 support, but it is
definitely being thought about ... the idea being, put a Lucee
JSR233 engine jar on the classpath, define it as a new PostgreSQL
language (PL/Java will handle the interfacing), and then actually
write stuff like

DO $$echo("Hello World");$$ LANGUAGE lucee;

As I said, in current PL/Java version, JSR 233 is still
science fiction ... but it's "hard science fiction", not the
fantasy stuff.

http://tada.github.io/pljava/

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementing a new Scripting Language

2016-01-27 Thread Chapman Flack
On 01/27/2016 01:17 PM, Igal @ Lucee.org wrote:

> the next version of Lucee (currently in Beta) does support JSR-223,
> which I actually mentioned as a viable solution in my first email in

Sorry, I jumped in late.

> this thread.  That would be awesome if PL/Java would support JSR-223.

Ok, if your 233 support is already in beta, you'll get there
before we do, but the paths should intersect eventually. :)

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementing a new Scripting Language

2016-01-27 Thread Chapman Flack
On 01/27/2016 02:06 PM, Igal @ Lucee.org wrote:

> two-twenty-three, not thirty ;)),

Thanks. :)  On occasions in the past I have written it
correctly ... there is evidence in the archives 

> we will be able to use Javascript
> through that via the Nashorn project,

Yes, that's an attraction for me, as the JSR-two-twenty-three
engine for JS is now typically already included with the JRE.
(Technically, I keep reading that there is no two-twenty-three
engine that *needs* to be included and it's totally up to the
JRE packager ... but there have actually been some JavaScript
snippets in PL/Java's build process since July 2014 and no one
has yet reported they broke the build.)

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: PL/Pythonu - function ereport

2016-01-26 Thread Chapman Flack
On recent occasions, Pavel Stehule and Catalin Iacob have written:
...
>> But *inside* PLPython what I wrote is true, see this example for what I
>> mean:
>>
>> CREATE FUNCTION test()
>>   RETURNS int
>> AS $$
>>   def a_func():
>> raise plpy.Error('an error')
>>
>>   try:
>> a_func()
>>   except plpy.Error as exc:
>> plpy.info('have exc {} of type {}'.format(exc, type(exc)))
>> plpy.info('have spidata {}'.format(hasattr(exc, 'spidata')))
>> $$ LANGUAGE plpython3u;
>>
>> Running this produces:
>>
>> DROP FUNCTION
>> CREATE FUNCTION
>> psql:plpy_demo:16: INFO:  have exc an error of type 
>> psql:plpy_demo:16: INFO:  have spidata False
>>
>>> Currently If I raise plpy.Error, then it is immediately trasformed to
>>> PostgreSQL, and and then to SPIError and successors.
>>
>> Depends how you define immediately. I would say it's really not
>> immediately, it's only at the Postgres boundary. Imagine in the
>> example above that a_func could call lots of other Python code and
>> somewhere deep down raise Error would be used. Within that whole
>> execution the error stays Error and can be caught as such, it will
>> have nothing to do with SPIError.
...
> I would to reduce this patch and don't try to touch on buildin exceptions.
> I hope, so there isn't any disagreement?
> 
> I though about it lot of, and I see a  the core of problems in orthogonal
> constructed exceptions in Python and Postgres. We working with statements
> elog, ereport, RAISE EXCEPTION - and these statements are much more like
> templates - can generate any possible exception. Python is based on working
> with instances of predefined exceptions. And it is core of problem.
> Template like class cannot be ancestor of instance oriented classes. This
> is task for somebody who knows well OOP and meta OOP in Python - total

I've been following this discussion with great interest, because PL/Java
also is rather overdue for tackling the same issues: it has some partial
ability to catch things from PostgreSQL and even examine them in proper
detail, but very limited ability to throw things as information-rich as
is possible from C with ereport. And that work is not as far along as
you are with PL/Python, there is just a preliminary design/discussion
wiki document at
  https://github.com/tada/pljava/wiki/Thoughts-on-logging

I was unaware of this project in PL/Pythonu when I began it, then added
the reference when I saw this discussion.

In some ways designing for PL/Java might be easier because it is not
complete freedom to invent the whole design; for example in Java there
is already an SQLException hierarchy specified in JDBC 4, with
SQLNonTransientException, SQLTransientException, SQLRecoverable exception
and their several subclasses, with a defined mapping from the leading
two-character SQLSTATE category codes. So for Java at least one part
of the bikeshed is already painted. :)

But a lot of the rest of the design clearly involves asking the same
questions, such as what happens to an event as it bubbles up from a
PL function called by a PG query called by a PL function ... maybe all
the way out to an exception handler in front-end code. (That could be
a thinkable thought for Java because the standards specify JDBC as its
common database API both client-side and for server PL code, so it's
natural to think about the exception handling being similar both places.
It would be analogous to having the plpy database access functions designed
to present an interface consistent with a client-side API like psycopg.
Forgive me, I'm not familiar enough with PL/Pythonu to know if it *is*
like that or not.)

>From following this thread, I have the impression that what is
currently under discussion is low-hanging fruit that may be committed
soon, and other questions remaining for further design and discussion.
I'll continue to be interested in how the deeper design issues continue
to be shaped. Maybe ideas can sort of cross-pollinate 

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] is there a deep unyielding reason to limit U&'' literals to ASCII?

2016-01-23 Thread Chapman Flack
I see in the documentation (and confirm in practice) that a
Unicode character string literal U&'...' is only allowed to have
s representing Unicode characters if the
server encoding is, exactly and only, UTF8.

Otherwise, it can still have s, but they can only
be in the range \+01 to \+7f and can only represent ASCII characters
... and this isn't just for an ASCII server encoding but for _any server
encoding other than UTF8_.

I'm a newcomer here, so maybe there was an existing long conversation
where that was determined to be necessary for some deep reason, and I
just need to be pointed to it.

What I would have expected would be to allow s
for any Unicode codepoint that's representable in the server encoding,
whatever encoding that is. Indeed, that's how I read the SQL standard
(or my scrounged 2006 draft of it, anyway). The standard even lets
you precede U& with _charsetname and have the escapes be allowed to
be any character representable in the specified charset. *That*, I assume,
would be tough to implement in PostgreSQL, since strings don't walk
around with their own personal charsets attached. But what's the reason
for not being able to mention characters available in the server encoding?

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: BSD Authentication support

2016-01-14 Thread Chapman Flack
Forgive my late comment ... I haven't used the PAM support in postgresql
either, or I'd know.  PAM (I know for sure), and I suppose similarly BSD
Authentication, models a generalized auth interaction where a given
authentication module can send a number of arbitrary prompts back to the
client (via callbacks so different protocols and UIs can be used), and
demand a number of arbitrary responses, so that a variety of authentication
schemes can easily be supported.

Is the PostgreSQL support (for either PAM or BSD Authentication) able to
handle that in its designed generality, or only for the case (number of
requested items = 1, item 1 = a password)?

Could the general form be handled with the existing fe/be protocol,
or would the protocol have to grow?

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Fwd: [JDBC] Re: 9.4-1207 behaves differently with server side prepared statements compared to 9.2-1102

2016-01-13 Thread Chapman Flack
Maybe more work, but:

Could a custom plan be cached /along with the values of
the parameters for which it was planned/?  Then it's kind of
a no-brainer to say "if the parameters are the same this time,
I'm probably looking at the best plan already."  Pretty simpleminded,
but at least it would catch the testcase where the plan flips to
a bad one even when literally the very same parameters are used.

Generalizing a little, how about caching a plan along with a
boolean expression tree over the parameters, evaluating to true
if the new parameters are "close enough" to the planned ones so
that the plan is probably still better than generic?

Version zero could just create the expression p1 = oldp1
AND p2 = oldp2 AND p3 = oldp3, and be simply the same as the
first suggestion. But then, how to generate more interesting
and useful validity-expressions for different plan types and
statistics could be a future-work area with some meat to it,
promising successive improvements. Maybe plugins could
supplement it for particular characterized workloads

-Chap


On 01/13/2016 08:28 AM, Marko Tiikkaja wrote:
> On 13/01/16 14:12, Pavel Stehule wrote:
>> so the strategy - if cost of generic plan is less than some MAGIC
>> CONSTANT
>> (can be specified by GUC), then use generic plan. Elsewhere use a custom
>> plan everytime.
>>
>> It allow to controll the plan reusing. When MAGIC CONSTANT = 0 .. use
>> custom plan everytime, When MAGIC CONSTANT = M, then use generic plan
>> always.
> 
> I don't think that would solve even the original problem without
> effectively disabling generic plans, despite the problem being
> relatively simple.  The generic plan appears to be really cheap, because
> the planner doesn't have the concept of a "worst case".
> 
> 
> .m
> 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynloader.h missing in prebuilt package for Windows?

2016-01-05 Thread Chapman Flack
On 01/05/2016 09:18 AM, Chapman Flack wrote:
> On 01/05/2016 12:53 AM, Michael Paquier wrote:
> 
>> That's not a mandatory fix, but I think we had better do it. As long
>> as dynloader.h is copied in include/, there is no need to keep the
>> tweak of dfmgr.c to include the definitions those routines.
> 
> Looking at what you changed, all becomes clear.  :)

Out of curiosity, what happens (or what is supposed to happen)
to port/dynloader/.c ?

-Chap



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynloader.h missing in prebuilt package for Windows?

2016-01-05 Thread Chapman Flack
On 01/05/2016 12:53 AM, Michael Paquier wrote:

> That's not a mandatory fix, but I think we had better do it. As long
> as dynloader.h is copied in include/, there is no need to keep the
> tweak of dfmgr.c to include the definitions those routines.

Looking at what you changed, all becomes clear.  :)

-Chap



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynloader.h missing in prebuilt package for Windows?

2016-01-04 Thread Chapman Flack
On 01/05/16 00:18, Michael Paquier wrote:

> with MSVC what is used instead of dynloader.h is
> port/dynloader/win32.h.

Seems like a good catch - AFAICS, what happens with port/dynloader
is that for 12 different OSes, there's an .h file there to
be copied _renamed to dynloader.h_ into the build tree, and a .c
file expecting similar treatment, and that the problem that kicked
off this whole thread was just that the windows build process (and
only the windows build process) was neglecting to do that.

And so I was pretty sure the fix was to make the windows build do
that, and then it would be doing the same thing as the other eleven,
but I just looked at Bruce's patch more closely and it does seem to
be doing something else instead.

> Instead of this patch I would be incline to
> remove the #define stuff with dynloader.h that use WIN32_ONLY_COMPILER
> (see for example dfmgr.c) and just copy the header in include/. This
> way we use the same header for all platforms.

But this part I'm not sure I follow (and maybe I don't need to follow
it, you guys know your code better than I do) ... in this whole thread
haven't we been talking about just making the windows build copy its
port/dynloader files the same way the other eleven platforms do, because
it wasn't, and that seemed to be an omission, and worth not continuing
to omit?

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: CurrentExtensionObject was Re: [HACKERS] dynloader.h missing in prebuilt package for Windows?

2015-12-31 Thread Chapman Flack
On 12/31/15 16:13, Tom Lane wrote:
>> I see that 9.5.0 already adds PGDLLIMPORT on the global variable
>> creating_extension, but CurrentExtensionObject on the very next
>> line of extension.h still doesn't have it.
> 
> Why would you need to access that?

This returns to the earlier question about extensions whose purpose
is to enable other extensions. I'm thinking a bit ahead. At the moment,
I am only working on nudging PL/Java itself into the extension framework,
so you can install *it* with CREATE EXTENSION. For that, I can get along
without the extension Oid.

Down the road, it would be quite natural for PL/Java users to develop
functionality in Java, package it in a jar file, and want to install that
using CREATE EXTENSION. So they'd distribute their foo.jar file with a
foo.control file (requires = 'pljava'), and a very short foo--1.0.0.sql file
  SELECT sqlj.install_jar('file:foo.jar', 'foo', true);
and most of the right stuff will automagically happen ... the
associated system objects (created by the deployment script inside
the jar, executed by install_jar) will be captured as extension
members. But the jar itself, stashed by install_jar into a PL/Java
managed table that can't participate in pg_depend, somehow still
needs to be associated with the extension too.

I suppose there really won't be a way to do this with reliability
unless someday extensions can hook the dependency infrastructure,
as you mentioned in passing in an earlier message.

That sounds like a longer discussion. But I wondered if it might
not be too hard to put PGDLLIMPORT on CurrentExtensionObject as
a stopgap.

... though perhaps it doesn't matter that much, because I still
have to write a circuitous workaround anyway, and keep it in the
code until 9.1 through 9.4 all vanish from the earth.

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking owner of extension-managed objects

2015-12-31 Thread Chapman Flack
On 12/31/15 19:06, Jim Nasby wrote:
> Those columns are there to tell you what *kind* of objects are involved in
> the dependency. For example, a function will have two entries. classid will
> be 1255 (pg_proc) for both of them. refclassid will be 2612 (pg_language)
> for one and 2615 (pg_namespace) for the other.

Right, well, the classid for a function is 1255 because 1255 is the Oid
of a row in pg_class ... and that row happens to describe a relation
named pg_proc, which is the relation in which you find functions.

Now, if you go to that relation, and look for a row whose Oid is the
objid part of the address, that row is the specific function you're
looking for.

So, if I give you the object address (1255, 1397, 0), these are the two
steps you will use to learn what I'm talking about:


SELECT relname FROM pg_class WHERE oid = 1255;
-
 pg_proc


SELECT * FROM pg_proc WHERE oid = 1397;
-+-...-+-+-...
 abs | ... | int4abs | ...



Ok, now what if I give you the object address (17001, 17270, 0) ?
Of course, these oids aren't predefined so they won't be the same
from one database to the next. But in the test database I'm logged
in to right now, you can follow the very same two steps:


SELECT relname FROM pg_class WHERE oid = 17001;
   relname

 jar_repository


SELECT * FROM jar_repository WHERE oid = 17270;
 jarid | jarname  | jarowner |jarmanifest| ...
---+--+--+---+-...
 2 | examples | chap | Manifest-Versio...| ...


Nothing new has happened here. The algorithm is unchanged.
The object address (17001, 17270, 0) means "the PL/Java examples jar",
in exactly the same way that (1255, 1397, 0) means "the int4abs
function".

(I had to cheat a little and ALTER TABLE jar_repository SET WITH OIDS
because in stock PL/Java it doesn't have them, but that was easy enough.)

The only thing that stops me at present from passing
an ObjectAddress like (17001, 17270, 0) to recordDependencyOn(...)
is that the *documentation* says the classid can't be the Oid
of just *any* old row in pg_class, is has to be the oid of a row
in pg_class *that describes a system catalog*.

As far as the actual code, if I tried that right now I don't actually
think anything would stop me from recording the dependency.
Things would break when I tried to drop something though, because
getObjectClass() would be called on my dependency, not recognize that
Oid among the ones it knows about, and throw an error.

How about having a default case there, saying "hmm, jar_repository,
not a table I recognize, it must belong to some extension. Well, I'll
just go and delete its row with oid = 17270 anyway, and let the extension
that owns it handle the DELETE trigger if any, and we're good to go." ?

To a first approximation, it's as easy as that.  All the rest of my
earlier message was about proposing rules to enforce reasonable
restrictions so the dependency jazz isn't opened up to every old user
table with triggers calling arbitrary code to get all tangled up in
PostgreSQL's DROP internals, but it is opened up in a controlled way
to extensions that create new classes of things to be managed.

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


CurrentExtensionObject was Re: [HACKERS] dynloader.h missing in prebuilt package for Windows?

2015-12-31 Thread Chapman Flack
While on the subject of things that could make it or not into 9.5.?,
I see that 9.5.0 already adds PGDLLIMPORT on the global variable
creating_extension, but CurrentExtensionObject on the very next
line of extension.h still doesn't have it.

The simplest way I've come up with in Windows to identify the extension
being created is to create some temporary object, call
getExtensionOfObject() on it, then drop it. A bit circuitous when on
any other platform I can just look at CurrentExtensionObject

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking owner of extension-managed objects

2015-12-31 Thread Chapman Flack
On 12/23/15 15:02, Jim Nasby wrote:
> BTW, I've been pondering a very similar problem to this. I'm working on a
> metacoding framework, and it's inevitable that at some point it will want to
> know what objects it's created
> 
> I was planning on just making a best possible attempt and solving this in an
> extension via a combination of event triggers, reg* and other voodoo,

In the voodoo department, I guess PL/Java could accompany each loading of
a jar 'foo' with a 'CREATE TYPE sqlj.voodoo_foo AS ()' which is about as
lightweight as it gets, and an extra regtype column in the jar_repository
table could carry the Oid of that type. The type would be in pg_shdepend for
the owner, and its ACL could even be used to implement the SQL/JRT
requirement for jars to have an ACL (and be automatically represented
in pg_shdepend for any roles mentioned in the ACL). The type would also
end up in pg_depend for the owning extension, if the jar was installed
by an extension script. And (in 9.3+ anyway), I could have an sql_drop
event trigger to detect when the type goes away for any reason, and
remove the corresponding jar_repository entry.

How well will pg_dump/restore handle that voodoo? I suppose they'll
reliably recreate the types before loading the table with a regtype
column, and by typing the column as regtype, the dump will refer to
the type by name, and therefore work even if pg_dump is not given
the --oids option?

But that's all voodoo. What if we wanted to not need voodoo?

On 12/21/15 12:46, Tom Lane wrote:
> (I guess there are problems with extensions trying to do such things at
> all, since we don't provide a way for extensions to hook into the DROP
> mechanisms.  Perhaps that should be fixed.)


Ok, how numerous would be the problems with this:


- The classid and refclassid columns (in both pg_shdepend and pg_depend)
  are currently Oid columns referencing pg_class.oid.  The catalog
  definition would not preclude putting the oid of a non-system table
  there.  The *documentation* says it has to be the oid of a system
  catalog, and surely there is code that currently would be surprised by
  an entry that wasn't (for example, default: elog(ERROR, "unrecognized
  object class..." in getObjectClass).
  But even now, looking at recordDependencyOn or shdepAddDependency,
  I don't see that the code would prevent such an entry being added.

- It still would make no sense as a refclassid in pg_shdepend. All three
  other cases (classid in pg_shdepend, classid or refclassid in pg_depend)
  can make sense in a world of extension-managed objects.

- So, what would be needed to make those 3 cases supportable?  For starters,
  how about a strict rule for *when* a non-system-catalog classid or
  refclassid can be allowed into either of those tables:

  IF an ObjectAddress.classId IS NOT a system catalog
(currently detectable by getObjectClass(classId) throwing an error),
  THEN:
  - it MUST be the Oid of an existing (non-system) relation
  - that relation MUST be WITH OIDS (so the ObjectAddress.objectId can
identify a row in it)
... alternatively, must have an int unique key, and the objSubId can
be what identifies the row
  - that relation MUST have a DELETE FOR EACH ROW trigger that calls
pg_extension_check_depend, a system-provided trigger function to enforce
reference integrity for any pg_depend/pg_shdepend mentions of the row
  - that relation MAY have another DELETE trigger that was allowed to be
created by the restrictions on triggers below.

- The pg_extension_check_depend trigger function has two duties:
  a. to ereport(ERROR) and prevent deletion in some circumstances
 (for example, when the row to be deleted is mentioned on the
 classid side of an 'e' dependency, and the extension isn't being
 dropped, or on the refclassid side of a normal dependency, and
 the dependent object isn't being dropped),
  b. in all other cases, to allow the deletion, while also removing
 associated pg_depend/pg_shdepend entries.
  That's why no non-system table is allowed to be mentioned in pg_depend
  or pg_shdepend unless it has a trigger that calls this function.

- CREATE TRIGGER refuses to create a trigger that calls
  pg_extension_check_depend UNLESS:
  - creating_extension is true, AND
  - the trigger is being created on a table belonging to the current
extension, AND
  - no other DELETE trigger exists on the table, unless the next rule
would allow it.

- CREATE TRIGGER refuses to create any other DELETE trigger on a table
  that has a pg_extension_check_depend DELETE trigger, UNLESS:
  - creating_extension is true, AND
  - the table and the trigger function both belong to the current extension.


With that set of rules, extensions (and only extensions) are able to
invent and manage new kinds of dependency-managed objects, representing
them as rows in a table with appropriate triggers. When doDeletion, for
example, encounters a pg_depend record with a 

Re: [HACKERS] dynloader.h missing in prebuilt package for Windows?

2015-12-30 Thread Chapman Flack
On 12/30/15 20:40, Bruce Momjian wrote:

> your hack will have to be active until 9.4 gets to end-of-life, or
> perhaps 9.5 if we can't get this into 9.5.0.  People who are using 9.5
> Beta or RC will still not have the file, meaning 9.5 end-of-life might
> still be a requirement.

... by which time PL/Java will be sporting the javax.ai.selfaware
features of Java SE 11 or 12, and no one will even have to tell it to
drop the fallback directory.

Well, so it's one directory and one file and a few extra lines in pom.xml.
There are already more onerous backward compatibility hacks in PL/Java
than that. :)

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynloader.h missing in prebuilt package for Windows?

2015-12-29 Thread Chapman Flack
On 12/29/15 14:08, Bruce Momjian wrote:

> This should fix the PL/Java Windows build problem.  I don't think I will
> get this patch into 9.5.0 but will put it in after 9.5.0 is released and
> it will appear in all the next minor version releases, including 9.5.1,
> which should happen in the next 60 days.

Thanks!  What I ended up doing in PL/Java was just to create a 'fallback'
directory with a copy of dynloader.h in it, and adding it at the very
end of the include path when building on Windows, so I think the build
will now work either way, using the real file if it is present, or the
frozen-in-time fallback copy if it isn't.  Then the fallback directory
can be axed as soon as 9.5.1 becomes the oldest thing in PL/Java's
rearview support mirror.

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] missing "SPI_finish" that isn't missing

2015-12-28 Thread Chapman Flack
On 12/24/15 16:37, Tom Lane wrote:

> to make this coding pattern work is to set up a subtransaction, and either
> commit it in the success path or roll it back upon catching an error.
> (This is not terribly well documented, but the exception-block handling
> in plpgsql provides a working model to follow.)

Ok, I've just had a look at that in plpgsql, and I see what you mean about
the path of least resistance ... though it's not completely obvious to me
how many pieces of that are there for other plpgsql purposes, and how many
are inherently necessary just to be able to catch and recover from an
error thrown from SPI.

BeginInternalSubTransaction? CreateExecutorState? CreateExprContext?
xact callback? subxact callback?

> 99% of the time, the path of least resistance at the C-code level is to
> avoid expected error conditions in the first place, rather than try to
> catch and clean up.  In this example, there are plenty of ways you might
> test whether "mytable" exists before risking the SPI_execute call.

I can think of:

- another SPI query
  "select 1 from pg_catalog.pg_class join pg_catalog.pg_namespace n
  on relnamespace = n.oid
  where nspname = 'myschema' and relname = 'mytable'"

- InvalidOid != get_relname_relid("mytable",
GetSysCacheOid1(NAMESPACENAME, CStringGetDatum("myschema")))

- ... other variations on those ...

The first one strikes me funny because it's an even heavier SQL query
to plan and execute just to find out if I can execute the original
trivial one. (Which doesn't really matter, admittedly, since I need to
do it exactly once.) The second one strikes me funny for the mixture of API
levels, undocumented low level to check existence followed by documented
SPI level for the original query. Almost tempts me to ditch SPI entirely
and learn how to use heap_* methods for the single trivial row retrieval
I need, only by the time I've figured out all that, I'll have forgotten
what I was trying to do in the first place.

Is there a particular, reasonably tidy idiom that has emerged as the usual,
customary approach to a task like this?

Thanks,
-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] missing "SPI_finish" that isn't missing

2015-12-24 Thread Chapman Flack

I probably just need to learn the right coding pattern for this.
What is it?

What I want this code to do:

a) If there is a table 'mytable', set theSavedStuff to a pstrdup'd in
   TopMemoryContext copy of column s from one row if present.

b) In case of an error because there is no table 'mytable', just swallow
   the error and do nothing. That's an expected case.

c) In case of any other error (something unexpected), rethrow.

Case (a) works. Case (c) works. Case (b) works but produces this:

WARNING:  01000: transaction left non-empty SPI stack
HINT:  Check for missing "SPI_finish" calls.

I can verify that my SPI_finish call is being reached. In fact, in
case b, it returns SPI_ERROR_UNCONNECTED, as if even before my
PG_CATCH received control, the SPI "connection" got closed. But the
"SPI stack" still appears as if something is open?

In the context where this is called, there is no SPI connection already
open (if I try SPI_push_conditional at the start, it returns false).

I'm sure the only bug is in my understanding of how to approach this.
Can someone point out what I'm missing?

Thanks,
-Chap

{
MemoryContext curr;
SPI_connect();
curr = CurrentMemoryContext;
PG_TRY();
{
if ( SPI_OK_SELECT == SPI_execute(
"SELECT s FROM mytable", true, 1) && 1 == SPI_processed )
{
MemoryContextSwitchTo(TopMemoryContext);
theSavedStuff = (char const *)SPI_getvalue(
SPI_tuptable->vals[0], SPI_tuptable->tupdesc, 1);
MemoryContextSwitchTo(curr);
}
}
PG_CATCH();
{
if ( ERRCODE_UNDEFINED_TABLE != geterrcode() )
PG_RE_THROW();
MemoryContextSwitchTo(curr);
FlushErrorState();
}
PG_END_TRY();
SPI_finish();
}


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking owner of extension-managed objects

2015-12-21 Thread Chapman Flack
On 12/21/2015 12:46 PM, Tom Lane wrote:

> BTW, any such ownership relationship really needs to be reflected into
> pg_shdepend, else someone might drop a role that still owns objects.
> (I guess there are problems with extensions trying to do such things at
> all, since we don't provide a way for extensions to hook into the DROP
> mechanisms.  Perhaps that should be fixed.)

That is literally *the very next* e-mail I was going to compose.

I was looking at pg_(sh)?depend, and it seems they both only allow
recording dependencies *of* things in system catalogs *on* things
in system catalogs. It doesn't seem to offer a way to record that
some row in my added, non-system table, does in fact depend on
some system object. I can probably cobble around this with some
combination of triggers on my own table ('cause that works) and
event triggers to grovel through the parse trees of commands that
could affect the system object, but I get tired just thinking about
it.

> But tell me: why do you need to record ownership?

Some fraction of the maybe unusually demanding things PL/Java tries
to do might just be chalked up to its being one of the few PLs
for which there's an existing standard. ISO 9075-13 says jars got
owners. So they got owners. (It also says they got ACLs, USAGE
anyway, which PL/Java's jars ain't got yet, but yeah, that's
another thing.)

Noah and I have had a side conversation about what 9075-13 says
about jar paths, and how that is and isn't similar to what
Thomas implemented in PL/Java; in the standard when you load
a jar you also get to say what other jars it depends on, which
requires you to own the dependent one and have USAGE on the
dependencies.

> Anything involving filesystem
> references really ought to be superuser-only, I'd think, and the
> ability
> to load arbitrary jarfiles even more so.

It's kind of subtle ... if you have a PL and you assume it exercises
enough control over code it executes to qualify as a trusted one,
then you want non-supers to be able to declare functions, and
somehow they have to be able to supply the code their functions
will run. It happens that for most PLs they supply it by stuffing
the code itself between the quote marks after AS.  In PL/Java
what you put there instead is a reference to a jar previously
loaded and given an internal name by install_jar(url, intname, ...)
(and that is straight outta the standard).

So your ability to call install_jar with some url is nothing more
than the PL/Java way of supplying the code for your functions,
and if non-superusers are allowed to supply their own code for
other PLs, this isn't a completely different game.

Now, where it gets different is that one possible scheme for a url
is file:, and currently in PL/Java if you call install_jar with a
file: url, you are telling it to read from the server's filesystem.
If the file exists and is a jar, you can then call code in it;
otherwise from the error you can deduce something about the file,
that it doesn't exist, isn't readable by postgres, isn't a jar

The standard does leave an implementation complete freedom to say
what urls work for install_jar, whether to forbid certain
urls or schemes entirely, or even to allow special schemes that
have no meaning outside the implementation.

So it would be perfectly standard-conformant to say only a superuser
gets to use a file: url with install_jar, or, non-superusers can
only use file: urls within file:/tmp/placetoputmyjars/. If the
user puts his jar up on his web server and calls install_jar with
an http: url, that should be no more of a security concern than
any other PL allowing the user to say whatever he wants between
the quote marks after AS. And if the implementation wanted to
define a special urlscheme pqcopy: where pqcopy:/path/to/file
refers to a jar on the client machine, all of that falls within
what the standard allows. (I haven't really looked at how \copy
works enough to know whether a scheme like pqcopy: can really be
implemented, initiated from the server side; just brainstorming.)

Btw, the standard is silent on what install_jar actually does
with the jar, beyond that it gets a short name and an owner,
and no longer depends on the original url being accessible.
It could be stored in a table or tables (as PL/Java currently
does), a blob (in PG versions where blobs got owners, that could
simplify the owner dependency problem), or even some internally
managed filesystem area by the PL implementation itself; that
doesn't count as filesystem access by user code, any more than
it would if a trusted function requests a sort for which PG creates
a temp file behind the scenes. The JVM itself also creates and
manages temp files transparently for various internal purposes,
just as, for all I know, Python or R might.

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking owner of extension-managed objects

2015-12-21 Thread Chapman Flack
On 12/21/2015 02:30 PM, Chapman Flack wrote:
> On 12/21/2015 12:46 PM, Tom Lane wrote:

>> all, since we don't provide a way for extensions to hook into the DROP
>> mechanisms.  Perhaps that should be fixed.)
> 
> That is literally *the very next* e-mail I was going to compose.
> 
> I was looking at pg_(sh)?depend, ...
> I can probably cobble around this with some
> combination of triggers on my own table ('cause that works) and
> event triggers to grovel through the parse trees of commands that
> could affect the system object,

right, I can't event-trigger on role commands either, can I?

What's the lightest-weight object I can create that has an owner,
and whose disappearance I can be notified of?

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] tracking owner of extension-managed objects

2015-12-21 Thread Chapman Flack
I'm looking for best-practice advice.

PL/Java is an extension that manages some objects (jar files, which
users can tell PL/Java to load, drop, or replace). The objects have
owners (have had since PL/Java 1.1.0 anyway).

When the owner tracking was added for 1.1.0 it recorded the owner oid.

In 2006, before 1.3.0, it was changed to keep the owner name instead
of the oid, in response to a bug report 1506 that involved the wrong
owner name being shown after dump/restore into another db where the
user oids were different.

Neither approach seems perfect to me (in fact, they both strike me
as having complementary sides of the same weakness, which dump/restore
just happens to expose). I am also wondering whether PL/Java ought to
create references, or a trigger, on pg_authid to clean up if the user
goes away; it currently doesn't.

Before I spend too many cycles on this, is there a most favored
design pattern that has already emerged for this kind of thing?

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Threads in PostgreSQL

2015-12-20 Thread Chapman Flack
On 12/21/15 01:24, Atri Sharma wrote:
> On Mon, Dec 21, 2015, sri harsha <sriharsha9...@gmail.com> wrote:
> 
>> I am using threads in my
>> foreign data wrapper and i get the following error when i use the threads .
>>
>> *ERROR:  stack depth limit exceeded*
>> *HINT:  Increase the configuration parameter "max_stack_depth" (currently
>> 2048kB), after ensuring the platform's stack depth limit is adequate.*

> PostgreSQL is a process-per-backend model.

To elaborate on that, it is a process per backend and most of the code
in that process has not been written for safe execution by multiple
threads. If your code starts other threads that only do other things,
but only one ever touches PostgreSQL backend code, that can be ok.
This happens in PL/Java, but it uses several interrelated precautions
to make sure no thread ever enters PostgreSQL backend code unless every
other thread is known to be out.

Even if you are taking that precaution, if the backend code is entered
by a different thread than last executed there, the stack depth tests
may be made by comparing the last thread's stack base to the current
thread's stack pointer, which will naturally give you a bogus result.
There is some API in miscadmin.h for manipulating the backend's idea
of the stack base, but there be dragons.

I am far short of the knowledgeable voices here, but in case you don't
hear from them right away, that is at least how I understand the matter.

Chapman Flack


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >