Re: [HACKERS] patch: function xmltable

2017-01-25 Thread Pavel Stehule
2017-01-24 21:38 GMT+01:00 Alvaro Herrera :

> Pavel Stehule wrote:
>
> > * SELECT (xmltable(..)).* + regress tests
> > * compilation and regress tests without --with-libxml
>
> Thanks.  I just realized that this is doing more work than necessary --
>

?? I don't understand?


> I think it would be simpler to have tableexpr fill a tuplestore with the
> results, instead of just expecting function execution to apply
> ExecEvalExpr over and over to obtain the results.  So evaluating a
> tableexpr returns just the tuplestore, which function evaluation can
> return as-is.  That code doesn't use the value-per-call interface
> anyway.
>

ok


> I also realized that the expr context callback is not called if there's
> an error, which leaves us without shutting down libxml properly.  I
> added PG_TRY around the fetchrow calls, but I'm not sure that's correct
> either, because there could be an error raised in other parts of the
> code, after we've already emitted a few rows (for example out of
> memory).  I think the right way is to have PG_TRY around the execution
> of the whole thing rather than just row at a time; and the tuplestore
> mechanism helps us with that.
>


ok.



>
> I think it would be good to have a more complex test case in regress --
> let's say there is a table with some simple XML values, then we use
> XMLFOREST (or maybe one of the table_to_xml functions) to generate a
> large document, and then XMLTABLE uses that document as input document.
>
> Please fix.
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: [HACKERS] pgbench more operators & functions

2017-01-25 Thread Fabien COELHO






Hello Tom,


I concur that this is expanding pgbench's expression language well beyond
what anybody has shown a need for.


As for the motivation, I'm assuming that pgbench should provide features 
necessary to implement benchmarks, so I'm adding operators that appear in 
standard benchmark specifications.



From TPC-B 2.0.0 section 5.3.5:


 | The Account_ID is generated as follows:
 | • A random number X is generated within [0,1]
 | • If X<0.85 or branches = 1, a random Account_ID is selected over all
 |accounts.
 | • If X>=0.85 and branches > 1, a random Account_ID is selected over all
 |   non- accounts.

The above uses a condition (If), logic (or, and), comparisons (=, <, >=).

From TPC-C 5.11 section 2.1.6, a bitwise-or operator is used to skew a 

distribution:

 | NURand (A, x, y) = (((random (0, A) | random (x, y)) + C) % (y - x + 1)) + x

And from section 5.2.5.4 of same, some time is computed based on a logarithm:

 | Tt = -log(r) * µ

I'm also concerned that there's an opportunity cost here, in that the patch 
establishes a precedence ordering for its new operators, which we'd have a 
hard time changing later. That's significant because the proposed precedence 
is different from what you'd get for similarly-named operators on the backend 
side. I realize that it's trying to follow the C standard instead,


Oops. I just looked at the precedence from a C parser, without realizing that 
precedence there was different from postgres SQL implementation:-( This is a 
bug on my part.



I'd be okay with the parts of this that duplicate existing backend syntax
and semantics, but I don't especially want to invent further than that.


Okay. In the two latest versions sent, discrepancies from that were bugs, I was 
trying to conform.


Version 8 attached hopefully fixes the precedence issue raised above:

 - use precedence taken from "backend/gram.y" instead of C. I'm not sure
   that it is wise that pg has C-like operators with a different
   precedence, but this is probably much too late...

And fixes the documentation:

 - remove a non existing anymore "if" function documentation which made
   Robert assume that I had not taken the hint to remove it. I had!

 - reorder operator documentation by their pg SQL precedence.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 1eee8dc..73101e1 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -828,11 +828,11 @@ pgbench  options  dbname
   The expression may contain integer constants such as 5432,
   double constants such as 3.14159,
   references to variables :variablename,
-  unary operators (+, -) and binary operators
-  (+, -, *, /,
-  %) with their usual precedence and associativity,
-  function calls, and
-  parentheses.
+  operators
+  with their usual precedence and associativity,
+  function calls,
+  SQL CASE generic conditional expressions
+  and parentheses.
  
 
  
@@ -917,6 +917,165 @@ pgbench  options  dbname
   
  
 
+ 
+  Built-In Operators
+
+  
+   The arithmetic, bitwise, comparison and logical operators listed in
+are built into pgbench
+   and may be used in expressions appearing in
+   \set.
+  
+
+  
+   pgbench Operators by increasing precedence
+   
+
+ 
+  Operator
+  Description
+  Example
+  Result
+ 
+
+
+ 
+  OR
+  logical or
+  5 or 0
+  1
+ 
+ 
+  AND
+  logical and
+  3 and 0
+  0
+ 
+ 
+  NOT
+  logical not
+  not 0
+  1
+ 
+ 
+  =
+  is equal
+  5 = 4
+  0
+ 
+ 
+  
+  is not equal
+  5  4
+  1
+ 
+ 
+  !=
+  is not equal
+  5 != 5
+  0
+ 
+ 
+  
+  lower than
+  5  4
+  0
+ 
+ 
+  =
+  lower or equal
+  5 = 4
+  0
+ 
+ 
+  
+  greater than
+  5  4
+  1
+ 
+ 
+  =
+  greater or equal
+  5 = 4
+  1
+ 
+ 
+  |
+  integer bitwise OR
+  1 | 2
+  3
+ 
+ 
+  #
+  integer bitwise XOR
+  1 # 3
+  2
+ 
+ 
+  
+  integer bitwise AND
+  1  3
+  1
+ 
+ 
+  ~
+  integer bitwise NOT
+  ~ 1
+  -2
+ 
+ 
+  
+  bitwise shift left
+  1  2
+  4
+ 
+ 
+  
+  bitwise shift right
+  8  2
+  2
+ 
+ 
+  +
+  addition
+  5 + 4
+  9
+ 
+ 
+  -
+  substraction
+  3 - 2.0
+  1.0
+ 
+ 
+  *
+  multiplication
+  5 * 4
+  20
+ 
+ 
+  /
+  division (integer truncates the results)
+  5 / 3
+  1
+ 
+ 
+  %
+  modulo
+  3 % 2
+  1
+ 
+ 
+  -
+  opposite
+  - 2.0
+  -2.0
+ 

Re: [HACKERS] lseek/read/write overhead becomes visible at scale ..

2017-01-25 Thread Tobias Oberstein

Hi Alvaro,

Am 24.01.2017 um 19:36 schrieb Alvaro Herrera:

Tobias Oberstein wrote:


I am benchmarking IOPS, and while doing so, it becomes apparent that at
these scales it does matter _how_ IO is done.

The most efficient way is libaio. I get 9.7 million/sec IOPS with low CPU
load. Using any synchronous IO engine is slower and produces higher load.

I do understand that switching to libaio isn't going to fly for PG
(completely different approach).


Maybe it is possible to write a new f_smgr implementation (parallel to
md.c) that uses libaio.  There is no "seek" in that interface, at least,
though the interface does assume that the implementation is blocking.



FWIW, I now systematically compared the IO performance when normalized 
for system load induced over different IO methods.


I use the FIO ioengine terminology:

sync = lseek/read/write
psync = pread/pwrite

Here:

https://github.com/oberstet/scratchbox/raw/master/cruncher/engines-compared/normalized-iops.pdf

Conclusion:

psync has 1.15x the normalized IOPS compared to sync
libaio has up to 6.5x the normalized IOPS compared to sync

---

These measurements where done on 16 NVMe block devices.

As mentioned, when Linux MD comes into the game, the difference between 
sync and psync is much higher - the is a lock contention in MD.


The reason for that is: when MD comes into the game, even our massive 
CPU cannot hide the inefficiency of the double syscalls anymore.


This MD issue is our bigger problem (compared to PG using sync/psync). I 
am going to post to the linux-raid list about that, as being advised by 
FIO developers.


---

That being said, regarding getting maximum performance out of NVMes with 
minimal system load, the real deal probably isn't libaio either, but 
kernel bypass (hinted to my by FIO devs):


http://www.spdk.io/

FIO has a plugin for SPDK, which I am going to explore to establish a 
final conclusive baseline for maximum IOPS normalized for load.


There are similar approaches in networking (BSD netmap, DPDK) to bypass 
the kernel altogether (zero copy to userland, no interrupts but polling 
etc). With hardware like this (NVMe, 100GbE etc), the kernel gets in the 
way ..


Anyway, this is now probably OT as for PG;)

Cheers,
/Tobias






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lseek/read/write overhead becomes visible at scale ..

2017-01-25 Thread Tobias Oberstein

Hi Andres,


Using pread instead of lseek+read halfes the syscalls.

I really don't understand what you are fighting here ..


Sure, there's some overhead. And as I said upthread, I'm much less
against this change than Tom.  What I'm saying is that your benchmarks
haven't shown a benefit in a meaningful way, so I don't think I can
agree with


"Well, my point remains that I see little value in messing with
long-established code if you can't demonstrate a benefit that's clearly
above the noise level."

I have done lots of benchmarking over the last days on a massive box, and I
can provide numbers that I think show that the impact can be significant.


since you've not actually shown that the impact is above the noise level
when measured with an actual postgres workload.


I can follow that.

So real prove cannot be done with FIO, but "actual PG workload".

Synthetic PG workload or real world production workload?

Also: rgd the perf profiles from production that show lseek as #1 syscall.

You said it wouldn't be prove either, because it only shows number of 
syscalls, and though it is clear that millions of syscalls/sec do come 
with overhead, it is still not showing "above noise" level relevance 
(because PG is such a CPU hog in itself anyways;)


So how would I do a perf profile that would be acceptable as prove?

Maybe I can expand our

https://gist.github.com/oberstet/ca03d7ab49be4c8edb70ffa1a9fe160c

profiling function.

Cheers,
/Tobias



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench more operators & functions

2017-01-25 Thread Fabien COELHO



As it stands right now you haven't provided enough context for this patch
and only the social difficulty of actually marking a patch rejected has
prevented its demise in its current form - because while it has interesting
ideas its added maintenance burden for -core without any in-core usage.
But it you make it the first patch in a 3-patch series that implements the
per-spec tpc-b the discussion moves away from these support functions and
into the broader framework in which they are made useful.


I think Fabien already did post something of the sort, or at least 
discussion towards it,


Yep.

and there was immediately objection as to whether his idea of TPC-B 
compliance was actually right. I remember complaining that he had a 
totally artificial idea of what "fetching a data value" requires.


Yep.

I think that the key misunderstanding is that you are honest and assume 
that other people are honest too. This is naïve: There is a long history 
of vendors creatively "cheating" to get better than deserve benchmark 
results. Benchmark specifications try to prevent such behaviors by laying 
careful requirements and procedures.


In this instance, you "know" that when pg has returned the result of the 
query the data is actually on the client side, so you considered it is 
fetched. That is fine for you, but from a benchmarking perspective with 
external auditors your belief is not good enough.


For instance, the vendor could implement a new version of the protocol 
where the data are only transfered on demand, and the result just tells 
that the data is indeed somewhere on the server (eg on "SELECT abalance" 
it could just check that the key exists, no need to actually fetch the 
data from the table, so no need to read the table, the index is 
enough...). That would be pretty stupid for real application performance, 
but the benchmark would could get better tps by doing so.


Without even intentionnaly cheating, this could be part of a useful 
"streaming mode" protocol option which make sense for very large results 
but would be activated for a small result.


Another point is that decoding the message may be a little expensive, so 
that by not actually extracting the data into the client but just keeping 
it in the connection/OS one gets better performance.


Thus, TPC-B 2.0.0 benchmark specification says:

"1.3.2 Each transaction shall return to the driver the Account_Balance 
resulting from successful commit of the transaction.


Comment: It is the intent of this clause that the account balance in the 
database be returned to the driver, i.e., that the application retrieve 
the account balance."


For me the correct interpretation of "the APPLICATION retrieve the account 
balance" is that the client application code, pgbench in this context, did 
indeed get the value from the vendor code, here "libpq" which is handling 
the connection.


Having the value discarded from libpq by calling PQclear instead of 
PQntuples/PQgetvalue/... skips a key part of the client code that no real 
application would skip. This looks strange and is not representative of 
real client code: as a potential auditor, because of this I would not 
check the corresponding item in the audit check list:


 "11.3.1.2 Verify that transaction inputs and outputs satisfy Clause 1.3."

So the benchmark implementation would not be validated.


Another trivial reason to be able to actually retrieve data is that for 
benchmarking purpose it is very easy to want to test a scenario where you 
want to do different things based on data received, which imply that the 
data can be manipulated somehow on the benchmarking client side, which is 
currently not possible.


--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: recursive json_populate_record()

2017-01-25 Thread Tom Lane
Nikita Glukhov  writes:
> On 25.01.2017 23:58, Tom Lane wrote:
>> I think you need to take a second look at the code you're producing
>> and realize that it's not so clean either.  This extract from
>> populate_record_field, for example, is pretty horrid:

> But what if we introduce some helper macros like this:

> #define JsValueIsNull(jsv) \
>  ((jsv)->is_json ? !(jsv)->val.json.str \
>  : !(jsv)->val.jsonb || (jsv)->val.jsonb->type == jbvNull)

> #define JsValueIsString(jsv) \
>  ((jsv)->is_json ? (jsv)->val.json.type == JSON_TOKEN_STRING \
>  : (jsv)->val.jsonb && (jsv)->val.jsonb->type == jbvString)

Yeah, I was wondering about that too.  I'm not sure that you can make
a reasonable set of helper macros that will fix this, but if you want
to try, go for it.

BTW, just as a stylistic thing, I find "a?b:c||d" unreadable: I have
to go back to the manual every darn time to convince myself whether
that means (a?b:c)||d or a?b:(c||d).  It's better not to rely on
the reader (... or the author) having memorized C's precedence rules
in quite that much detail.  Extra parens help.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: function xmltable

2017-01-25 Thread Pavel Stehule
2017-01-25 23:33 GMT+01:00 Andres Freund :

> On 2017-01-25 22:51:37 +0100, Pavel Stehule wrote:
> > 2017-01-25 22:40 GMT+01:00 Andres Freund :
> > > > I afraid when I cannot to reuse a SRF infrastructure, I have to
> > > reimplement
> > > > it partially :( - mainly for usage in "ROWS FROM ()"
> > >
> >
> > The TableExpr implementation is based on SRF now. You and Alvaro propose
> > independent implementation like generic executor node. I am sceptic so
> > FunctionScan supports reading from generic executor node.
>
> Why would it need to?
>

Simply - due consistency with any other functions that can returns rows.

Maybe I don't understand to Alvaro proposal well - I have a XMLTABLE
function - TableExpr that looks like SRF function, has similar behave -
returns more rows, but should be significantly different implemented, and
should to have different limits - should not be used there and there ... It
is hard to see consistency there for me.

Regards

Pavel


Re: [HACKERS] Checksums by default?

2017-01-25 Thread Stephen Frost
Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Wed, Jan 25, 2017 at 2:23 PM, Stephen Frost  wrote:
> > Yet, our default is to have them disabled and *really* hard to enable.
> 
> First of all, that could be fixed by further development.

I'm certainly all for doing so, but I don't agree that it necessairly is
required before we flip the default.  That said, if the way to get
checksums enabled by default is providing a relativly easy way to turn
them off, then that's something which I'll do what I can to help work
towards.  In other words, I'm not going to continue to argue, given the
various opinions of the group, that we should just flip it tomorrow.

I hope to discuss it further after we have the ability to turn it off
easily.

> Second, really hard to enable is a relative term.  I accept that
> enabling checksums is not a pleasant process.  Right now, you'd have
> to do a dump/restore, or use logical replication to replicate the data
> to a new cluster and then switch over.  On the other hand, if
> checksums are really a critical feature, how are people getting to the
> point where they've got a mission-critical production system and only
> then discovering that they want to enable checksums?  

I truely do wish everyone would come talk to me before building out a
database.  Perhaps that's been your experience, in which case, I envy
you, but I tend to get a reaction more along the lines of "wait, what do
you mean I had to pass some option to initdb to enable checksum?!?!".
The fact that we've got a WAL implementation and clearly understand
fsync requirements, why full page writes make sense, and that our WAL
has its own CRCs which isn't possible to disable, tends to lead people
to think we really know what we're doing and that we care a lot about
their data.

> > I agree that it's unfortunate that we haven't put more effort into
> > fixing that- I'm all for it, but it's disappointing to see that people
> > are not in favor of changing the default as I believe it would both help
> > our users and encourage more development of the feature.
> 
> I think it would help some users and hurt others.  I do agree that it
> would encourage more development of the feature -- almost of
> necessity.  In particular, I bet it would spur development of an
> efficient way of turning checksums off -- but I'd rather see us
> approach it from the other direction: let's develop an efficient way
> of turning the feature on and off FIRST.  Deciding that the feature
> has to be on for everyone because turning it on later is too hard for
> the people who later decide they want it is letting the tail wag the
> dog.

As I have said, I don't believe it has to be on for everyone.

> Also, I think that one of the big problems with the way checksums work
> is that you don't find problems with your archived data until it's too
> late.  Suppose that in February bits get flipped in a block.  You
> don't access the data until July[1].  Well, it's nice to have the
> system tell you that the data is corrupted, but what are you going to
> do about it?  By that point, all of your backups are probably
> corrupted.  So it's basically:

If your backup system is checking the checksums when backing up PG,
which I think every backup system *should* be doing, then guess what?
You've got a backup which you can go back to immediately, possibly with
the ability to restore all of the data from WAL.  That won't always be
the case, naturally, but it's a much better position than simply having
a system which continues to degrade until you've actually reached the
"you're screwed" level because PG will no longer read a page or perhaps
can't even start up, *and* you no longer have any backups.

As it is, there are backup solutions which *do* check the checksum when
backing up PG.  This is no longer, thankfully, some hypothetical thing,
but something which really exists and will hopefully keep users from
losing data.

> It's nice to know that (maybe?) but without a recovery strategy a
> whole lot of people who get that message are going to immediately
> start asking "How do I ignore the fact that I'm screwed and try to
> read the data anyway?".  

And we have options for that.

> And then you wonder what the point of having
> the feature turned on is, especially if it's costly.  It's almost an
> attractive nuisance at that point - nobody wants to be the user that
> turns off checksums because they sound good on paper, but when you
> actually have a problem an awful lot of people are NOT going to want
> to try to restore from backup and maybe lose recent transactions.
> They're going to want to ignore the checksum failures.  That's kind of
> awful.

Presently, last I checked at least, the database system doesn't fall
over and die if a single page's checksum fails.  I agree entirely that
we want the system to fail gracefully (unless the user instructs us
otherwise, perhaps because they have a redundant system that they can
flip 

Re: [HACKERS] Assignment of valid collation for SET operations on queries with UNKNOWN types.

2017-01-25 Thread Tom Lane
Ashutosh Bapat  writes:
> On Wed, Jan 25, 2017 at 10:54 AM, Michael Paquier
>  wrote:
>> On Wed, Jan 25, 2017 at 12:46 AM, Tom Lane  wrote:
>>> Here's an updated patch with doc changes.  Aside from that one,
>>> I tried to spell "pseudo-type" consistently, and I changed one
>>> place where we were calling something a pseudo-type that isn't.

> I think, those changes, even though small, deserve their own commit.
> The changes themselves look good.

Pushed, thanks for the reviews!

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speedup twophase transactions

2017-01-25 Thread Nikhil Sontakke
> We are talking about the recovery/promote code path. Specifically this
> call to KnownPreparedRecreateFiles() in PrescanPreparedTransactions().
>
> We write the files to disk and they get immediately read up in the
> following code. We could not write the files to disk and read
> KnownPreparedList in the code path that follows as well as elsewhere.

Thinking more on this.

The only optimization that's really remaining is handling of prepared
transactions that have not been committed or will linger around for
long. The short lived 2PC transactions have been optimized already via
this patch.

The question remains whether saving off a few fsyncs/reads for these
long-lived prepared transactions is worth the additional code churn.
Even if we add code to go through the KnownPreparedList, we still will
have to go through the other on-disk 2PC transactions anyways. So,
maybe not.

Regards,
Nikhils




>
> Regards,
> Nikhils
>
>
>>> The difference between those two is likely noise.
>>>
>>> By the way, in those measurements, the OS cache is still filled with
>>> the past WAL segments, which is a rather best case, no? What happens
>>> if you do the same kind of tests on a box where memory is busy doing
>>> something else and replayed WAL segments get evicted from the OS cache
>>> more aggressively once the startup process switches to a new segment?
>>> This could be tested for example on a VM with few memory (say 386MB or
>>> less) so as the startup process needs to access again the past WAL
>>> segments to recover the 2PC information it needs to get them back
>>> directly from disk... One trick that you could use here would be to
>>> tweak the startup process so as it drops the OS cache once a segment
>>> is finished replaying, and see the effects of an aggressive OS cache
>>> eviction. This patch is showing really nice improvements with the OS
>>> cache backing up the data, still it would make sense to test things
>>> with a worse test case and see if things could be done better. The
>>> startup process now only reads records sequentially, not randomly
>>> which is a concept that this patch introduces.
>>>
>>> Anyway, perhaps this does not matter much, the non-recovery code path
>>> does the same thing as this patch, and the improvement is too much to
>>> be ignored. So for consistency's sake we could go with the approach
>>> proposed which has the advantage to not put any restriction on the
>>> size of the 2PC file contrary to what an implementation saving the
>>> contents of the 2PC files into memory would need to do.
>>
>> Maybe i’m missing something, but I don’t see how OS cache can affect 
>> something here.
>>
>> Total WAL size was 0x44 * 16 = 1088 MB, recovery time is about 20s. 
>> Sequential reading 1GB of data
>> is order of magnitude faster even on the old hdd, not speaking of ssd. Also 
>> you can take a look on flame graphs
>> attached to previous message — majority of time during recovery spent in 
>> pg_qsort while replaying
>> PageRepairFragmentation, while whole xact_redo_commit() takes about 1% of 
>> time. That amount can
>> grow in case of uncached disk read but taking into account total recovery 
>> time this should not affect much.
>>
>> If you are talking about uncached access only during checkpoint than here we 
>> are restricted with
>> max_prepared_transaction, so at max we will read about hundred of small 
>> files (usually fitting into one filesystem page) which will also
>> be barely noticeable comparing to recovery time between checkpoints. Also 
>> wal segments cache eviction during
>> replay doesn’t seems to me as standard scenario.
>>
>> Anyway i took the machine with hdd to slow down read speed and run tests 
>> again. During one of the runs i
>> launched in parallel bash loop that was dropping os cache each second (while 
>> wal fragment replay takes
>>  also about one second).
>>
>> 1.5M transactions
>>  start segment: 0x06
>>  last segment: 0x47
>>
>> patched, with constant cache_drop:
>>   total recovery time: 86s
>>
>> patched, without constant cache_drop:
>>total recovery time: 68s
>>
>> (while difference is significant, i bet that happens mostly because of 
>> database file segments should be re-read after cache drop)
>>
>> master, without constant cache_drop:
>>time to recover 35 segments: 2h 25m (after that i tired to wait)
>>expected total recovery time: 4.5 hours
>>
>> --
>> Stas Kelvich
>> Postgres Professional: http://www.postgrespro.com
>> The Russian Postgres Company
>>
>>
>
>
>
> --
>  Nikhil Sontakke   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services



-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Push down more UPDATEs/DELETEs in postgres_fdw

2017-01-25 Thread Etsuro Fujita

On 2016/11/30 17:29, Etsuro Fujita wrote:

On 2016/11/23 20:28, Rushabh Lathia wrote:


I wrote:

How about inserting that before the
out param **retrieved_attrs, like this?

static void
deparseExplicitTargetList(List *tlist,
  bool is_returning,
  List **retrieved_attrs,
  deparse_expr_cxt *context);



Yes, adding it before retrieved_attrs would be good.



OK, will do.


Done.

You wrote:

5) make_explicit_returning_list() pull the var list from the
returningList and
build the targetentry for the returning list and at the end
rewrites the
fdw_scan_tlist.

AFAIK, in case of DML - which is going to pushdown to the remote
server
ideally fdw_scan_tlist should be same as returning list, as
final output
for the query is query will be RETUNING list only. isn't that
true?


I wrote:

That would be true.  But the fdw_scan_tlist passed from the core
would contain junk columns inserted by the rewriter and planner
work, such as CTID for the target table and whole-row Vars for other
tables specified in the FROM clause of an UPDATE or the USING clause
of a DELETE.  So, I created the patch so that the pushed-down
UPDATE/DELETE retrieves only the data needed for the RETURNING
calculation from the remote server, not the whole fdw_scan_tlist
data.



Yes, I noticed that fdw_scan_tlist have CTID for the target table and
whole-raw vars for
other tables specified in the FROM clause of the DML. But I was thinking
isn't it possible
to create new fdw_scan_tlist once we found that DML is direct pushable
to foreign server?
I tried quickly doing that - but later its was throwing "variable not
found in subplan target list"
from set_foreignscan_references().


We could probably avoid that error by replacing the targetlist of the 
subplan with fdw_scan_tlist, but that wouldn't be enough ...


You wrote:

If yes, then why can't we directly replace the fdw_scan_tlist
with the
returning
list, rather then make_explicit_returning_list()?


I wrote:

I think we could do that if we modified the core (maybe the executor
part).  I'm not sure that's a good idea, though.



Yes modifying core is not good idea. But just want to understand why you
think that this need need to modify core?



Sorry, I don't remember that.  Will check.


The reason why I think so is that the ModifyTable node on top of the 
ForeignScan node requires that the targetlist of the ForeignScan has (1) 
junk whole-row Vars for secondary relations in UPDATE/DELETE and (2) all 
attributes of the target relation to produce the new tuple for UPDATE. 
(So, it wouldn't be enough to just replace the ForeignScan's targetlist 
with fdw_scan_tlist!)  For #1, see this (and the following code) in 
ExecInitModifyTable:


/*
 * If we have any secondary relations in an UPDATE or DELETE, they 
need to

 * be treated like non-locked relations in SELECT FOR UPDATE, ie, the
 * EvalPlanQual mechanism needs to be told about them.  Locate the
 * relevant ExecRowMarks.
 */

And for #2, see this (and the following code, especially where calling 
ExecCheckPlanOutput) in the same function:


 * This section of code is also a convenient place to verify that the
 * output of an INSERT or UPDATE matches the target table(s).

What you proposed would be a good idea because the FDW could calculate 
the user-query RETURNING list more efficiently in some cases, but I'd 
like to leave that for future work.


Attached is the new version of the patch.  I also addressed other 
comments from you: moved rewriting the fdw_scan_tlist to postgres_fdw.c, 
added/revised comments, and added regression tests for the case where a 
pushed down UPDATE/DELETE on a join has RETURNING.


My apologies for having been late to work on this.

Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/deparse.c
--- b/contrib/postgres_fdw/deparse.c
***
*** 130,142  static void deparseTargetList(StringInfo buf,
    Bitmapset *attrs_used,
    bool qualify_col,
    List **retrieved_attrs);
! static void deparseExplicitTargetList(List *tlist, List **retrieved_attrs,
  		  deparse_expr_cxt *context);
  static void deparseReturningList(StringInfo buf, PlannerInfo *root,
  	 Index rtindex, Relation rel,
  	 bool trig_after_row,
  	 List *returningList,
  	 List **retrieved_attrs);
  static void deparseColumnRef(StringInfo buf, int varno, int varattno,
   PlannerInfo *root, bool qualify_col);
  static void deparseRelation(StringInfo buf, Relation rel);
--- 130,151 
    Bitmapset *attrs_used,
    bool qualify_col,
    List **retrieved_attrs);
! static void deparseExplicitTargetList(List *tlist,
! 		  bool is_returning,
! 		  List **retrieved_attrs,
  		  deparse_expr_cxt *context);
  

Re: [HACKERS] pgbench more operators & functions

2017-01-25 Thread Fabien COELHO


Bonjour Michaël, Hello Robert,


Let's mark this Returned with Feedback and move on.  We've only got a
week left in the CommitFest anyhow and there are lots of other things
that still need work (and which actually have been revised to match
previous feedback).


Done as such, let's move on.


Hmmm.

I think that there is a misunderstanding, most of which being my fault.

I have really tried to do everything that was required from committers, 
including revising the patch to match all previous feedback.


Version 6 sent on Oct 4 did include all fixes required at the time (no if, 
no unusual and operators, TAP tests)... However I forgot to remove some 
documentation about the removed stuff, which made Robert think that I had 
not done it. I apologise for this mistake and the subsequent 
misunderstanding:-(


The current v8 sent on Jan 25 should only implement existing server-side 
stuff, including with the same precedence as pointed out by Tom.


So for the implementation side I really think that I have done exactly all 
that was required of me by committers, although sometimes with bugs or 
errors, my apology, again...



As for the motivation, which is another argument, I cannot do more than 
point to actual published official benchmark specifications that do 
require these functions. I'm not inventing anything or providing some 
useless catalog of math functions.


If pgbench is about being seated on a bench and running postgres on your 
laptop to get some heat, my mistake... I thought it was about 
benchmarking, which does imply a few extra capabities.


If the overall feedback is to be undestood as "the postgres community does 
not think that pgbench should be able to be used to implement benchmarks 
such as TPC-B", then obviously I will stop efforts to improve it for that 
purpose.



To conclude:

IMHO the relevant current status of the patch should be "Needs review" and 
possibly "Move to next CF".


If the feedback is "we do not want pgbench to implement benchmarks such as 
TPC-B", then indeed the proposed features are not needed and the status 
should be "Rejected".


In any case, "Returned with feedback" does not really apply.

A+

--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Substantial bloat in postgres_fdw regression test runtime

2017-01-25 Thread Tom Lane
Ashutosh Bapat  writes:
> On Thu, Nov 3, 2016 at 1:58 PM, Jeevan Chalke
>  wrote:
>> Attached patch with test-case modification.

> I verified that this patch indeed bring the time down to 2 to 3
> seconds from 10 seconds.

Thanks for working on this, guys.

> The additional condition t2.c2 = 6 seems to echo the filter t2.c2 = 6
> of aggregate. We wouldn't know which of those actually worked. I
> modified the testcase to use t2.c2 % 6 = 0 instead and keep the filter
> condition intact. This increases the execution time by .2s, which may
> be ok. Let me know what you thing of the attached patch.

Agreed, that seems like a good compromise.  Pushed that way.

> Also, please add this to the commitfest, so that it isn't forgotten.

No need.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench more operators & functions

2017-01-25 Thread Fabien COELHO


Hello Tom,


I concur that this is expanding pgbench's expression language well beyond
what anybody has shown a need for.


As for the motivation, I'm assuming that pgbench should provide features 
necessary to implement benchmarks, so I'm adding operators that appear in 
standard benchmark specifications.



From TPC-B 2.0.0 section 5.3.5:


 | The Account_ID is generated as follows:
 | • A random number X is generated within [0,1]
 | • If X<0.85 or branches = 1, a random Account_ID is selected over all
 |accounts.
 | • If X>=0.85 and branches > 1, a random Account_ID is selected over all
 |   non- accounts.

The above uses a condition (If), logic (or, and), comparisons (=, <, >=).

From TPC-C 5.11 section 2.1.6, a bitwise-or operator is used to skew a 

distribution:

 | NURand (A, x, y) = (((random (0, A) | random (x, y)) + C) % (y - x + 1)) + x

And from section 5.2.5.4 of same, some time is computed based on a 
logarithm:


 | Tt = -log(r) * µ

I'm also concerned that there's an opportunity cost here, in that the 
patch establishes a precedence ordering for its new operators, which 
we'd have a hard time changing later. That's significant because the 
proposed precedence is different from what you'd get for similarly-named 
operators on the backend side. I realize that it's trying to follow the 
C standard instead,


Oops. I just looked at the precedence from a C parser, without realizing 
that precedence there was different from postgres SQL implementation:-( 
This is a bug on my part.



I'd be okay with the parts of this that duplicate existing backend syntax
and semantics, but I don't especially want to invent further than that.


Okay. In the two latest versions sent, discrepancies from that were bugs, 
I was trying to conform.


Version 8 attached hopefully fixes the precedence issue raised above:

 - use precedence taken from "backend/gram.y" instead of C. I'm not sure
   that it is wise that pg has C-like operators with a different
   precedence, but this is probably much too late...

And fixes the documentation:

 - remove a non existing anymore "if" function documentation which made
   Robert assume that I had not taken the hint to remove it. I had!

 - reorder operator documentation by their pg SQL precedence.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 1eee8dc..73101e1 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -828,11 +828,11 @@ pgbench  options  dbname
   The expression may contain integer constants such as 5432,
   double constants such as 3.14159,
   references to variables :variablename,
-  unary operators (+, -) and binary operators
-  (+, -, *, /,
-  %) with their usual precedence and associativity,
-  function calls, and
-  parentheses.
+  operators
+  with their usual precedence and associativity,
+  function calls,
+  SQL CASE generic conditional expressions
+  and parentheses.
  
 
  
@@ -917,6 +917,165 @@ pgbench  options  dbname
   
  
 
+ 
+  Built-In Operators
+
+  
+   The arithmetic, bitwise, comparison and logical operators listed in
+are built into pgbench
+   and may be used in expressions appearing in
+   \set.
+  
+
+  
+   pgbench Operators by increasing precedence
+   
+
+ 
+  Operator
+  Description
+  Example
+  Result
+ 
+
+
+ 
+  OR
+  logical or
+  5 or 0
+  1
+ 
+ 
+  AND
+  logical and
+  3 and 0
+  0
+ 
+ 
+  NOT
+  logical not
+  not 0
+  1
+ 
+ 
+  =
+  is equal
+  5 = 4
+  0
+ 
+ 
+  
+  is not equal
+  5  4
+  1
+ 
+ 
+  !=
+  is not equal
+  5 != 5
+  0
+ 
+ 
+  
+  lower than
+  5  4
+  0
+ 
+ 
+  =
+  lower or equal
+  5 = 4
+  0
+ 
+ 
+  
+  greater than
+  5  4
+  1
+ 
+ 
+  =
+  greater or equal
+  5 = 4
+  1
+ 
+ 
+  |
+  integer bitwise OR
+  1 | 2
+  3
+ 
+ 
+  #
+  integer bitwise XOR
+  1 # 3
+  2
+ 
+ 
+  
+  integer bitwise AND
+  1  3
+  1
+ 
+ 
+  ~
+  integer bitwise NOT
+  ~ 1
+  -2
+ 
+ 
+  
+  bitwise shift left
+  1  2
+  4
+ 
+ 
+  
+  bitwise shift right
+  8  2
+  2
+ 
+ 
+  +
+  addition
+  5 + 4
+  9
+ 
+ 
+  -
+  substraction
+  3 - 2.0
+  1.0
+ 
+ 
+  *
+  multiplication
+  5 * 4
+  20
+ 
+ 
+  /
+  division (integer truncates the results)
+  5 / 3
+  1
+ 
+ 
+  %
+  modulo
+  3 % 2
+  1
+ 
+ 
+  -
+  opposite
+  - 2.0
+  -2.0
+ 
+
+   
+  
+ 
+
  
   Built-In Functions
 
@@ -963,6 +1122,13 @@ pgbench  options  

Re: [HACKERS] PostgreSQL 8.2 installation error on Windows 2016 server

2017-01-25 Thread Shruti Rawal
Hi Aaron,

Thank you so much for the information.

Regards,
Shruti Rawal

-Original Message-
From: Aaron W. Swenson [mailto:titanof...@gentoo.org]
Sent: Tuesday, January 24, 2017 8:22 PM
To: Shruti Rawal
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] PostgreSQL 8.2 installation error on Windows 2016 server

On 2017-01-24 09:20, Shruti Rawal wrote:
> Hi,
>
> We are doing an assessment for for migrating our Perl applications to Windows 
> 2016 server.
> I am trying to install PostgreSQL 8.2 version on my Windows server 2016. But 
> it is giving me following error:
> Malformed permissions property: 'langid'
>
> We could not find any relavant information on PostgreSQl site that if the 
> stated version 8.2 will work on Windows 2016 server.
> I would like to know whether 8.2 version supports on 2016 server, if not 
> which version is supported?

The information you’re looking for is accessible on the home page.

You’re having a tough time finding information on 8.2 because it is way past 
EOL. You will be happier evaluating 9.6.1. 
(https://www.postgresql.org/support/versioning/)

Further, -hackers is not intended as a support mailing list. You should write 
to -general instead.



The contents of this e-mail and any attachment(s) may contain confidential or 
privileged information for the intended recipient(s). Unintended recipients are 
prohibited from taking action on the basis of information in this e-mail and 
using or disseminating the information, and must notify the sender and delete 
it from their system. L Infotech will not accept responsibility or liability 
for the accuracy or completeness of, or the presence of any virus or disabling 
code in this e-mail"

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sequence data type

2017-01-25 Thread Peter Eisentraut
Here is an updated patch that allows changing the sequence type.  This
was clearly a concern for reviewers, and the presented use cases seemed
convincing.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 5aff9dbe7a94417aa0faf56dab37187fcbec23f5 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 25 Jan 2017 09:35:58 -0500
Subject: [PATCH v4] Add CREATE SEQUENCE AS  clause

This stores a data type, required to be an integer type, with the
sequence.  The sequences min and max values default to the range
supported by the type, and they cannot be set to values exceeding that
range.  The internal implementation of the sequence is not affected.

Change the serial types to create sequences of the appropriate type.
This makes sure that the min and max values of the sequence for a serial
column match the range of values supported by the table column.  So the
sequence can no longer overflow the table column.

This also makes monitoring for sequence exhaustion/wraparound easier,
which currently requires various contortions to cross-reference the
sequences with the table columns they are used with.

This commit also effectively reverts the pg_sequence column reordering
in f3b421da5f4addc95812b9db05a24972b8fd9739, because the new seqtypid
column allows us to fill the hole in the struct and create a more
natural overall column ordering.
---
 doc/src/sgml/catalogs.sgml  |  14 +++-
 doc/src/sgml/information_schema.sgml|   4 +-
 doc/src/sgml/ref/alter_sequence.sgml|  30 ++--
 doc/src/sgml/ref/create_sequence.sgml   |  36 ++
 src/backend/catalog/information_schema.sql  |   4 +-
 src/backend/commands/sequence.c |  89 +---
 src/backend/parser/gram.y   |   6 +-
 src/backend/parser/parse_utilcmd.c  |   2 +-
 src/bin/pg_dump/pg_dump.c   | 103 +++-
 src/bin/pg_dump/t/002_pg_dump.pl|   2 +
 src/include/catalog/catversion.h|   2 +-
 src/include/catalog/pg_proc.h   |   2 +-
 src/include/catalog/pg_sequence.h   |   8 ++-
 src/test/modules/test_pg_dump/t/001_base.pl |   1 +
 src/test/regress/expected/sequence.out  |  52 ++
 src/test/regress/expected/sequence_1.out|  52 ++
 src/test/regress/sql/sequence.sql   |  25 +--
 17 files changed, 312 insertions(+), 120 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 524180e011..162975746d 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -5774,10 +5774,11 @@ pg_sequence Columns
  
 
  
-  seqcycle
-  bool
+  seqtypid
+  oid
+  pg_type.oid
   
-  Whether the sequence cycles
+  Data type of the sequence
  
 
  
@@ -5814,6 +5815,13 @@ pg_sequence Columns
   
   Cache size of the sequence
  
+
+ 
+  seqcycle
+  bool
+  
+  Whether the sequence cycles
+ 
 

   
diff --git a/doc/src/sgml/information_schema.sgml b/doc/src/sgml/information_schema.sgml
index c43e325d06..a3a19ce8ce 100644
--- a/doc/src/sgml/information_schema.sgml
+++ b/doc/src/sgml/information_schema.sgml
@@ -4653,9 +4653,7 @@ sequences Columns
   data_type
   character_data
   
-   The data type of the sequence.  In
-   PostgreSQL, this is currently always
-   bigint.
+   The data type of the sequence.
   
  
 
diff --git a/doc/src/sgml/ref/alter_sequence.sgml b/doc/src/sgml/ref/alter_sequence.sgml
index 3b52e875e3..252a668189 100644
--- a/doc/src/sgml/ref/alter_sequence.sgml
+++ b/doc/src/sgml/ref/alter_sequence.sgml
@@ -23,7 +23,9 @@
 
  
 
-ALTER SEQUENCE [ IF EXISTS ] name [ INCREMENT [ BY ] increment ]
+ALTER SEQUENCE [ IF EXISTS ] name
+[ AS data_type ]
+[ INCREMENT [ BY ] increment ]
 [ MINVALUE minvalue | NO MINVALUE ] [ MAXVALUE maxvalue | NO MAXVALUE ]
 [ START [ WITH ] start ]
 [ RESTART [ [ WITH ] restart ] ]
@@ -81,6 +83,26 @@ Parameters
  
 
  
+  data_type
+  
+   
+The optional
+clause AS data_type
+changes the data type of the sequence.  Valid types are
+are smallint, integer,
+and bigint.
+   
+
+   
+Note that changing the data type does not automatically change the
+minimum and maximum values.  You can use the clauses NO
+MINVALUE and NO MAXVALUE to adjust the
+minimum and maximum values to the range of the new data type.
+   
+  
+ 
+
+ 
   increment
   

@@ -102,7 +124,7 @@ Parameters
 class="parameter">minvalue determines
 the minimum value a sequence can generate. If NO
 MINVALUE is specified, the defaults of 1 and
--263 for ascending and descending sequences,
+the minimum value of the data 

Re: [HACKERS] patch: function xmltable

2017-01-25 Thread Alvaro Herrera
Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-01-24 21:32:56 -0300, Alvaro Herrera wrote:
> >> XMLTABLE is specified by the standard to return multiple rows ... but
> >> then as far as my reading goes, it is only supposed to be supported in
> >> the range table (FROM clause) not in the target list.  I wonder if
> >> this would end up better if we only tried to support it in RT.  I asked
> >> Pavel to implement it like that a few weeks ago, but ...
> 
> > Right - it makes sense in the FROM list - but then it should be an
> > executor node, instead of some expression thingy.
> 
> +1 --- we're out of the business of having simple expressions that
> return rowsets.

Well, that's it.  I'm not committing this patch against two other
committers' opinion, plus I was already on the fence about the
implementation anyway.  I think you should just go with the flow and
implement this by creating nodeTableexprscan.c.  It's not even
difficult.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY as a set returning function

2017-01-25 Thread David Fetter
On Mon, Oct 31, 2016 at 04:45:40PM -0400, Corey Huinker wrote:
> Attached is a patch that implements copy_srf().
> 
> The function signature reflects cstate more than it reflects the COPY
> options (filename+is_program instead of  FILENAME or PROGRAM, etc)

The patch as it stands needs a rebase.  I'll see what I can do today.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2017-01-25 Thread Emre Hasegeli
I am responding both of your emails together.

> Perhaps I don't understand it. Many opclass are defined for
> btree. But since (ordinary) btree handles only one-dimentional,
> totally-orderable sets, geometric types even point don't fit
> it. Even if we relaxed it by removing EPSILON, multidimentional
> data still doesn't fit.

Yes, multidimentional data doesn't fit into btree.  Let's say we would
have to introduce operators called *<, *<=, *>, *>= to support btree
opclass for point.  I agree those operators would be useless, but
btree opclass is used for other purposes too.  "ORDER BY point",
merge-joins, btree index support for ~= would be useful.

Lack of ORDER BY, GROUP BY, and DISTINCT support is the major
inconvenience about the geometric types.  There are many complaints
about this on the mailing list.  Many frameworks and ORMs are not
prepared to deal with getting an error when they use certain types on
those clauses.

>> - Only some operators are using the epsilon.
>>
>> I included the list of the ones which doesn't use the epsilon on
>> initial post of this thread.  This makes the operators not compatible
>> with each other.  For example, you cannot say "if box_a @> box_b and
>> box_b @> point then box_a @> box_b".
>
> (It must be "then box_a @> point".)

Yes, I am sorry.

>  Both of the operators don't
> seem to use EPSILON so the transitivity holds, but perhaps we
> should change them to more appropriate shape, that is, to use
> EPSILON. Even having the tolerance, we can define the operators
> to keep the transitivity.

Yes, that would be another way.  In my view, this would make things
worse, because I believe epsilon approach is completely broken.  The
operators which are not using the epsilon are the only ones people can
sanely use to develop applications.

>> > - The application of epsilon is implementation dependent.
>> >
>> > Epsilon works between two numbers.  There is not always a single way
>> > of implementing geometric operators.  This is surprising to the users.
>> > For example, you cannot say "if box_a @> box_b then box_a <-> box_b <=
>> > epsilon".
>>
>> Maybe it should be like "if box_a ~= box_b && box_b @< box_a then
>> ..". I'm not sure off-hand. But I don't see the relationship is
>> significant.

What I meant was the behaviour being unclear to even people who knows
about the epsilon.  If two boxes are overlapping with each other, one
who knows about EPSILON can expect the distance between them to be
less than EPSILON, but this wouldn't be true.

>> - Some operators are violating commutative property.
>>
>> For example, you cannot say "if line_a ?|| line_b then line_b ?|| line_a".
>
> Whether EPSILON is introduced or not, commutativity cannot be
> assured for it from calculation error, I suppose.

It can easily be assured by treating both sides of the operator the
same.  It is actually assured on my patch.

>> - Some operators are violating transitive property.
>>
>> For example, you cannot say "if point_a ~= point_b and point_b ~=
>> point_c then point_a ~= point_c".
>
> It holds only in ideal (or mathematical) world, or for similars
> of integer (which are strictly implemented mathematica world on
> computer).  Without the EPSILON, two points hardly match by
> floating point error, as I mentioned. I don't have an evidence
> though (sorry), mere equality among three floating point numbers
> is not assured.

Of course, it is assured.  Floating point numbers are deterministic.

>> - The operators with epsilon are not compatible with float operators.
>>
>> This is also surprising for the users.  For example, you cannot say
>> "if point_a << point_b then point_a[0] < point_b[0]".
>
> It is because "is strictly left of" just doesn't mean their
> x-coordinates are in such a relationship. The difference might be
> surprising but is a specification (truely not a bug:).

Then what does that mean?  Every operator with EPSILON is currently
surprising in different way.  People use databases to build
applications.  They often need to implement same operations on the
application side.  It is very hard to be bug-compatible with our
geometric types.

>> = Missing Bits on the Operator Classes

> This final section seems to mention the application but sorry, it
> still don't describe for me what application that this patch
> aiming to enable. But I can understand that you want to handle
> floating point numbers like integers. The epsilon is surely quite
> annoying for the purpose.

I will try to itemise what applications I am aiming to enable:

- Applications with very little GIS requirement

PostGIS can be too much work for an application in s different
business domain but has a little requirement to GIS.  The built-in
geometric types are incomplete to support real world geography.
Getting rid of epsilon would make this worse. In contrary, it would
allow people to deal with the difficulties on their own.

- Applications with geometric objects

I believe people could make use the geometric 

Re: [HACKERS] multivariate statistics (v19)

2017-01-25 Thread Alvaro Herrera
Michael Paquier wrote:
> On Wed, Jan 4, 2017 at 11:35 AM, Tomas Vondra
>  wrote:

> > Attached is v22 of the patch series, rebased to current master and fixing
> > the reported bug. I haven't made any other changes - the issues reported by
> > Petr are mostly minor, so I've decided to wait a bit more for (hopefully)
> > other reviews.
> 
> And nothing has happened since. Are there people willing to review
> this patch and help it proceed?

I am going to grab this patch as committer.

> As this patch is quite large, I am not sure if it is fit to join the
> last CF. Thoughts?

All patches, regardless of size, are welcome to join any commitfest.
The last commitfest is not different in that regard.  The rule I
remember is that patches may not arrive *for the first time* in the last
commitfest.  This patch has already seen a lot of work in previous
commitfests, so it's fine.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Radix tree for character conversion

2017-01-25 Thread Ishii Ayumi
HI,

I patched 4 patchset and run "make", but I got failed.
Is this a bug or my mistake ?
I'm sorry if I'm wrong.

[$(TOP)]$ patch -p1 < ../0001-Add-missing-semicolon.patch
[$(TOP)]$ patch -p1 < ../0002-Correct-reference-resolution-syntax.patch
[$(TOP)]$ patch -p1 <
../0003-Apply-pgperltidy-on-src-backend-utils-mb-Unicode.patch
[$(TOP)]$ patch -p1 < ../0004-Use-radix-tree-for-character-conversion.patch
[$(TOP)]$ ./configure
[Unicode]$ make
'/usr/bin/perl' UCS_to_most.pl
Type of arg 1 to keys must be hash (not hash element) at convutils.pm
line 443, near "})
"
Type of arg 1 to values must be hash (not hash element) at
convutils.pm line 596, near "})
"
Type of arg 1 to each must be hash (not private variable) at
convutils.pm line 755, near "$map)
"
Compilation failed in require at UCS_to_most.pl line 19.
make: *** [iso8859_2_to_utf8.map] Error 255

Regars,
-- 
Ayumi Ishii


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-01-25 Thread Robert Haas
On Mon, Jan 23, 2017 at 6:37 PM, Jim Nasby  wrote:
> I'm not sure the default GUC setting of 0 makes sense. If you've loaded the 
> module, presumably you want it to be running. I think it'd be nice if the GUC 
> had a -1 setting that meant to use checkpoint_timeout.

Actually, I think we need to use -1 to mean "don't run the worker at
all".  0 means "run the worker, but don't do timed dumps".  >0 means
"run the worker, and dump at that interval".

I have to admit that when I was first thinking about this feature, my
initial thought was "hey,  let's dump once per checkpoint_timeout".
But I think that Mithun's approach is better.  There's no intrinsic
connection between this and checkpointing, and letting the user pick
the interval is a lot more flexible.  We could still have a magic
value that means "same as checkpoint_timeout" but it's not obvious to
me that there's any value in that; the user might as well just pick
the time interval that they want.

Actually, for busy systems, the interval is probably shorter than
checkpoint_timeout.  Dumping the list of buffers isn't that expensive,
and if you are doing checkpoints every half hour or so that's not
probably longer than what you want for this.  So I suggest that we
should just have the documentation could recommend a suitable value
(e.g. 5 minutes) and not worry about checkpoint_timeout.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support

2017-01-25 Thread Haribabu Kommi
On Wed, Jan 25, 2017 at 6:43 PM, Vitaly Burovoy 
wrote:

> On 1/23/17, Haribabu Kommi  wrote:
> > The patch is split into two parts.
> > 1. Macaddr8 datatype support
> > 2. Contrib module support.
>
> Hello,
>
> I'm sorry for the delay.
> The patch is almost done, but I have two requests since the last review.
>

Thanks for the review.


> 1.
> src/backend/utils/adt/mac8.c:
> +   int a,
> +   b,
> +   c,
> +   d = 0,
> +   e = 0,
> ...
>
> There is no reason to set them as 0. For EUI-48 they will be
> reassigned in the "if (count != 8)" block, for EUI-64 -- in one of
> sscanf.
> They could be set to "d = 0xFF, e = 0xFE," and avoid the "if" block
> mentioned above, but it makes the code be much less readable.
>
> Oh. I see. In the current version it must be assigned because for
> EUI-48 memory can have values <0 or >255 in uninitialized variables.
> It is better to avoid initialization by merging two parts of the code:
> +   if (count != 6)
> +   {
> +   /* May be a 8-byte MAC address */
> ...
> +   if (count != 8)
> +   {
> +   d = 0xFF;
> +   e = 0xFE;
> +   }
>
> to a single one:
> +   if (count == 6)
> +   {
> +   d = 0xFF;
> +   e = 0xFE;
> +   }
> +   else
> +   {
> +   /* May be a 8-byte MAC address */
> ...
>

Changed accordingly.


> 2.
> src/backend/utils/adt/network.c:
> +   res = (mac->a << 24) | (mac->b << 16) |
> (mac->c << 8) | (mac->d);
> +   res = (double)((uint64)res << 32);
> +   res += (mac->e << 24) | (mac->f << 16) |
> (mac->g << 8) | (mac->h);
>
> Khm... I trust that modern compilers can do a lot of optimizations but
> for me it looks terrible because of needless conversions.
> The reason why earlier versions did have two lines "res *= 256 * 256"
> was an integer overflow for four multipliers, but it can be solved by
> defining the first multiplier as a double:
> +   res = (mac->a << 24) | (mac->b << 16) |
> (mac->c << 8) | (mac->d);
> +   res *= (double)256 * 256 * 256 * 256;
> +   res += (mac->e << 24) | (mac->f << 16) |
> (mac->g << 8) | (mac->h);
>
> In this case the left-hand side argument for the "*=" operator is
> computed at the compile time as a single constant.
> The second line can be written as "res *= 256. * 256 * 256 * 256;"
> (pay attention to a dot in the first multiplier), but it is not
> readable at all (and produces the same code).


Corrected as suggested.

Updated patch attached. There is no change in the contrib patch.

Regards,
Hari Babu
Fujitsu Australia


mac_eui64_support_8.patch
Description: Binary data


contrib_macaddr8_1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] simplify sequence test

2017-01-25 Thread Petr Jelinek
On 25/01/17 03:48, Peter Eisentraut wrote:
> We maintain a separate test output file sequence_1.out because the
> log_cnt value can vary if there is a checkpoint happening at the right
> time.  So we have to maintain two files because of a one character
> difference.  I propose the attached patch to restructure the test a bit
> to avoid this, without loss of test coverage.
> 

+1, looks good.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Assignment of valid collation for SET operations on queries with UNKNOWN types.

2017-01-25 Thread Ashutosh Bapat
On Wed, Jan 25, 2017 at 10:54 AM, Michael Paquier
 wrote:
> On Wed, Jan 25, 2017 at 12:46 AM, Tom Lane  wrote:
>> I wrote:
>>> Michael Paquier  writes:
 The table of Pseudo-Types needs to be updated as well with unknown in
 datatype.sgml.
>>
>>> Check.
>>
>> Here's an updated patch with doc changes.  Aside from that one,
>> I tried to spell "pseudo-type" consistently, and I changed one
>> place where we were calling something a pseudo-type that isn't.

I think, those changes, even though small, deserve their own commit.
The changes themselves look good.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-01-25 Thread Amit Kapila
On Tue, Jan 24, 2017 at 5:07 AM, Jim Nasby  wrote:
> I took a look at this again, and it doesn't appear to be working for me. The 
> library is being loaded during startup, but I don't see any further activity 
> in the log, and I don't see an autoprewarm file in $PGDATA.
>
> There needs to be some kind of documentation change as part of this patch.
>
> I'm not sure the default GUC setting of 0 makes sense. If you've loaded the 
> module, presumably you want it to be running. I think it'd be nice if the GUC 
> had a -1 setting that meant to use checkpoint_timeout.
>
> Having the GUC be restart-only is also pretty onerous. I don't think it'd be 
> hard to make the worker respond to a reload... there's code in the autovacuum 
> launcher you could use as an example.
>

+1.  I don't think there should be any problem in making it PGC_SIGHUP.

> I'm also wondering if this really needs to be a permanently running 
> process... perhaps the worker could simply be started as necessary?

Do you want to invoke worker after every buff_dump_interval?  I think
that will be bad in terms of starting a new process and who will
monitor when to start such a process.  I think it is better to keep it
as a permanently running background process if loaded by user.

> Though maybe that means it wouldn't run at shutdown.

Yeah, that will be another drawback.

Few comments found while glancing the patch.

1.
+TO DO:
+--
+Add functionality to dump based on timer at regular interval.

I think you need to remove above TO DO.

2.
+ /* Load the page only if there exist a free buffer. We do not want to
+ * replace an existing buffer. */

This is not a PG style multiline comment.



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improvements in psql hooks for variables

2017-01-25 Thread Rahila Syed
Hello,

The patch works fine on applying on latest master branch and testing it for
various variables as listed in PsqlSettings struct.
I will post further comments on patch soon.

Thank you,
Rahila Syed

On Wed, Jan 25, 2017 at 1:33 AM, Tom Lane  wrote:

> "Daniel Verite"  writes:
> > Here's an update with these changes:
>
> I scanned this patch very quickly and think it addresses my previous
> stylistic objections.  Rahila is the reviewer of record though, so
> I'll wait for his comments before going further.
>
> regards, tom lane
>


Re: [HACKERS] Substantial bloat in postgres_fdw regression test runtime

2017-01-25 Thread Ashutosh Bapat
On Thu, Nov 3, 2016 at 1:58 PM, Jeevan Chalke
 wrote:
>
> On Wed, Nov 2, 2016 at 10:09 PM, Tom Lane  wrote:
>>
>> In 9.6, "make installcheck" in contrib/postgres_fdw takes a shade
>> under 3 seconds on my machine.  In HEAD, it's taking 10 seconds.
>> I am not happy, especially not since there's no parallelization
>> of the contrib regression tests.  That's a direct consumption of
>> my time and all other developers' time too.  This evidently came
>> in with commit 7012b132d (Push down aggregates to remote servers),
>> and while that's a laudable feature, I cannot help thinking that
>> it does not deserve this much of an imposition on every make check
>> that's ever done for the rest of eternity.
>
>
> Thanks Tom for reporting this substantial bloat in postgres_fdw regression
> test runtime. On my machine "make installcheck" in contrib/postgres_fdw
> takes 6.2 seconds on master (HEAD:
> 770671062f130a830aa89100c9aa2d26f8d4bf32).
> However if I remove all tests added for aggregate push down, it drops down
> to 2.2 seconds. Oops 4 seconds more.
>
> I have timed each of my tests added as part of aggregate push down patch and
> observed that one of the test (LATERAL one) is taking around 3.5 seconds.
> This is causing because of the parameterization. I have added a filter so
> that we will have less number of rows for parameterization. Doing this,
> lateral query in question now runs in 100ms. Also updated few more queries
> which were taking more than 100ms to have runtime around 30ms or so. So
> effectively, with changes "make installcheck" now takes around 2.5 seconds.
>
> Attached patch with test-case modification.
>

I verified that this patch indeed bring the time down to 2 to 3
seconds from 10 seconds.

The additional condition t2.c2 = 6 seems to echo the filter t2.c2 = 6
of aggregate. We wouldn't know which of those actually worked. I
modified the testcase to use t2.c2 % 6 = 0 instead and keep the filter
condition intact. This increases the execution time by .2s, which may
be ok. Let me know what you thing of the attached patch.

Also, please add this to the commitfest, so that it isn't forgotten.
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index 0045f3f..3a09280 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -2685,9 +2685,9 @@ select sum(c1%3), sum(distinct c1%3 order by c1%3) filter 
(where c1%3 < 2), c2 f
 
 -- Outer query is aggregation query
 explain (verbose, costs off)
-select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from 
ft1 t1 where t1.c1 = 6) from ft2 t2 order by 1;
-  QUERY PLAN   


+select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from 
ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
+  QUERY PLAN   
   
+--
  Unique
Output: ((SubPlan 1))
->  Sort
@@ -2696,14 +2696,14 @@ select distinct (select count(*) filter (where t2.c2 = 
6 and t2.c1 < 10) from ft
  ->  Foreign Scan
Output: (SubPlan 1)
Relations: Aggregate on (public.ft2 t2)
-   Remote SQL: SELECT count(*) FILTER (WHERE ((c2 = 6) AND ("C 1" 
< 10))) FROM "S 1"."T 1"
+   Remote SQL: SELECT count(*) FILTER (WHERE ((c2 = 6) AND ("C 1" 
< 10))) FROM "S 1"."T 1" WHERE (((c2 % 6) = 0))
SubPlan 1
  ->  Foreign Scan on public.ft1 t1
Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 
< 10
Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" 
= 6))
 (13 rows)
 
-select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from 
ft1 t1 where t1.c1 = 6) from ft2 t2 order by 1;
+select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from 
ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
  count 
 ---
  1
@@ -2711,7 +2711,7 @@ select distinct (select count(*) filter (where t2.c2 = 6 
and t2.c1 < 10) from ft
 
 -- Inner query is aggregation query
 explain (verbose, costs off)
-select distinct (select count(t1.c1) filter (where t2.c2 = 6 and t2.c1 < 10) 
from ft1 t1 where t1.c1 = 6) from ft2 t2 order by 1;
+select distinct (select count(t1.c1) filter (where t2.c2 = 6 and t2.c1 < 10) 
from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
   

Re: [HACKERS] COPY as a set returning function

2017-01-25 Thread David Fetter
On Wed, Jan 25, 2017 at 02:37:57PM +0900, Michael Paquier wrote:
> On Mon, Dec 5, 2016 at 2:10 PM, Haribabu Kommi  
> wrote:
> > On Tue, Nov 1, 2016 at 7:45 AM, Corey Huinker 
> > wrote:
> >>
> >> Attached is a patch that implements copy_srf().
> >
> > Moved to next CF with "needs review" status.
> 
> This patch is still waiting for review. David, are you planning to
> look at it by the end of the CF?

I'll be doing this today.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_ls_dir & friends still have a hard-coded superuser check

2017-01-25 Thread Robert Haas
Commit 1574783b4ced0356fbc626af1a1a469faa6b41e1 gratifyingly removed
hard-coded superuser checks from assorted functions, which makes it
possible to GRANT EXECUTE ON FUNCTION pg_catalog.whatever() to
unprivileged users if the DBA so desires.  However, the functions in
genfile.c still have hard-coded checks: pg_read_file(),
pg_read_binary_file(), pg_stat_file(), and pg_ls_dir().  I think those
functions ought to get the same treatment that the commit mentioned
above gave to a bunch of others.  Obviously, there's some risk of DBAs
doing stupid things there, but stupidity is hard to prevent in a
general way and nanny-ism is annoying.  The use case I have in mind is
a monitoring tool that needs access to one more of those functions --
in keeping with the principle of least privilege, it's much better to
give the monitoring user only the privileges which it actually needs
than to make it a superuser.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance improvement for joins where outer side is unique

2017-01-25 Thread Antonin Houska
David Rowley  wrote:
> On 19 January 2017 at 11:06, David Rowley  
> wrote:
> > Old patch no longer applies, so I've attached a rebased patch. This
> > also re-adds a comment line which I mistakenly removed.
> 
> (meanwhile Andres commits 69f4b9c)
> 
> I should've waited a bit longer.
> 
> Here's another that fixes the new conflicts.

I suspect that "inner" and "outer" relation / tuple are sometimes confused in
comments:


* analyzejoins.c:70

 "searches for subsequent matching outer tuples."


* analyzejoins.c:972

/*
 * innerrel_is_unique
 *Check for proofs which prove that 'innerrel' can, at most, match a
 *single tuple in 'outerrel' based on the join condition in
 *'restrictlist'.
 */


* relation.h:1831

boolinner_unique;   /* inner side of join matches no more 
than one
 * outer side 
tuple */


-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY as a set returning function

2017-01-25 Thread David Fetter
On Wed, Jan 25, 2017 at 06:16:16AM -0800, David Fetter wrote:
> On Mon, Oct 31, 2016 at 04:45:40PM -0400, Corey Huinker wrote:
> > Attached is a patch that implements copy_srf().
> > 
> > The function signature reflects cstate more than it reflects the COPY
> > options (filename+is_program instead of  FILENAME or PROGRAM, etc)
> 
> The patch as it stands needs a rebase.  I'll see what I can do today.

Please find attached a rebase, which fixes an OID collision that crept in.

- The patch builds against master and passes "make check".
- The patch does not contain user-visible documentation or examples.
- The patch does not contain tests.

I got the following when I did a brief test.

SELECT * FROM copy_srf(filename => 'ls', is_program => true) AS t(l text);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql
index 4dfedf8..ae07cfb 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1065,6 +1065,21 @@ LANGUAGE INTERNAL
 STRICT IMMUTABLE PARALLEL SAFE
 AS 'jsonb_insert';
 
+CREATE OR REPLACE FUNCTION copy_srf(
+   IN filename text DEFAULT null,
+   IN is_program boolean DEFAULT false,
+   IN format text DEFAULT 'text',
+   IN delimiter text DEFAULT null,
+   IN null_string text DEFAULT E'\\N',
+   IN header boolean DEFAULT false,
+   IN quote text DEFAULT null,
+   IN escape text DEFAULT null,
+   IN encoding text DEFAULT null)
+RETURNS SETOF RECORD
+LANGUAGE INTERNAL
+VOLATILE ROWS 1000 COST 1000 CALLED ON NULL INPUT
+AS 'copy_srf';
+
 -- The default permissions for functions mean that anyone can execute them.
 -- A number of functions shouldn't be executable by just anyone, but rather
 -- than use explicit 'superuser()' checks in those functions, we use the GRANT
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index f9362be..8e1bd39 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -30,6 +30,7 @@
 #include "commands/defrem.h"
 #include "commands/trigger.h"
 #include "executor/executor.h"
+#include "funcapi.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
@@ -562,7 +563,6 @@ CopyGetData(CopyState cstate, void *databuf, int minread, 
int maxread)
 errmsg("could not read from 
COPY file: %m")));
break;
case COPY_OLD_FE:
-
/*
 * We cannot read more than minread bytes (which in 
practice is 1)
 * because old protocol doesn't have any clear way of 
separating
@@ -4740,3 +4740,377 @@ CreateCopyDestReceiver(void)
 
return (DestReceiver *) self;
 }
+
+Datum
+copy_srf(PG_FUNCTION_ARGS)
+{
+   ReturnSetInfo   *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+   TupleDesc   tupdesc;
+   Tuplestorestate *tupstore = NULL;
+   MemoryContext   per_query_ctx;
+   MemoryContext   oldcontext;
+   FmgrInfo*in_functions;
+   Oid *typioparams;
+   Oid in_func_oid;
+
+   CopyStateData   copy_state;
+   int col;
+
+   Datum   *values;
+   bool*nulls;
+
+   /* check to see if caller supports us returning a tuplestore */
+   if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+   {
+   ereport(ERROR,
+   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+errmsg("set-valued function called in context 
that cannot accept a set")));
+   }
+
+   if (!(rsinfo->allowedModes & SFRM_Materialize) || rsinfo->expectedDesc 
== NULL)
+   {
+   ereport(ERROR,
+   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+errmsg("materialize mode required, but it is 
not allowed in this context")));
+   }
+
+   tupdesc = CreateTupleDescCopy(rsinfo->expectedDesc);
+   values = (Datum *) palloc(tupdesc->natts * sizeof(Datum));
+   nulls = (bool *) palloc(tupdesc->natts * sizeof(bool));
+   in_functions = (FmgrInfo *) palloc(tupdesc->natts * sizeof(FmgrInfo));
+   typioparams = (Oid *) palloc(tupdesc->natts * sizeof(Oid));
+
+   for (col = 0; col < tupdesc->natts; col++)
+   {
+   
getTypeInputInfo(tupdesc->attrs[col]->atttypid,_func_oid,[col]);
+   

Re: [HACKERS] PoC: Grouped base relation

2017-01-25 Thread Antonin Houska
David Rowley  wrote:

> On 20 January 2017 at 00:22, Antonin Houska  wrote:
> > Sorry, it was my thinko - I somehow confused David's CROSS JOIN example with
> > this one. If one side of the join clause is unique and the other becomes
> > unique due to aggregation (and if parallel processing is not engaged) then
> > neither combinefn nor multiplyfn should be necessary before the finalfn.
> 
> Yes, if the join can be detected not to duplicate the groups then a
> normal aggregate node can be pushed below the join. No need for
> Partial Aggregate, or Finalize Aggregate nodes.
> 
> I've a pending patch in the commitfest named "Unique Joins", which
> aims teach the planner about the unique properties of joins. So you
> should just have both stages of aggregation occur for now, and that
> can be improved on once the planner is a bit smart and knows about
> unique joins.

Thanks for a hint. I haven't paid attention to the "Unique Joins" patch until
today. Yes, that's definitely useful.

Given the progress of your patch, I don't worry to make the next version of my
patch depend on it. Implementing temporary solution for the aggregation
push-down seems to me like wasted effort.

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-01-25 Thread Alvaro Herrera
Reading 0001_track_root_lp_v9.patch again:

> +/*
> + * We use the same HEAP_LATEST_TUPLE flag to check if the tuple's t_ctid 
> field
> + * contains the root line pointer. We can't use the same
> + * HeapTupleHeaderIsHeapLatest macro because that also checks for 
> TID-equality
> + * to decide whether a tuple is at the of the chain
> + */
> +#define HeapTupleHeaderHasRootOffset(tup) \
> +( \
> + ((tup)->t_infomask2 & HEAP_LATEST_TUPLE) != 0 \
> +)
>
> +#define HeapTupleHeaderGetRootOffset(tup) \
> +( \
> + AssertMacro(((tup)->t_infomask2 & HEAP_LATEST_TUPLE) != 0), \
> + ItemPointerGetOffsetNumber(&(tup)->t_ctid) \
> +)

Interesting stuff; it took me a bit to see why these macros are this
way.  I propose the following wording which I think is clearer:

  Return whether the tuple has a cached root offset.  We don't use
  HeapTupleHeaderIsHeapLatest because that one also considers the slow
  case of scanning the whole block.

Please flag the macros that have multiple evaluation hazards -- there
are a few of them.  

> +/*
> + * If HEAP_LATEST_TUPLE is set in the last tuple in the update chain. But for
> + * clusters which are upgraded from pre-10.0 release, we still check if c_tid
> + * is pointing to itself and declare such tuple as the latest tuple in the
> + * chain
> + */
> +#define HeapTupleHeaderIsHeapLatest(tup, tid) \
> +( \
> +  (((tup)->t_infomask2 & HEAP_LATEST_TUPLE) != 0) || \
> +  ((ItemPointerGetBlockNumber(&(tup)->t_ctid) == 
> ItemPointerGetBlockNumber(tid)) && \
> +   (ItemPointerGetOffsetNumber(&(tup)->t_ctid) == 
> ItemPointerGetOffsetNumber(tid))) \
> +)

I suggest rewording this comment as:
  Starting from PostgreSQL 10, the latest tuple in an update chain has
  HEAP_LATEST_TUPLE set; but tuples upgraded from earlier versions do
  not.  For those, we determine whether a tuple is latest by testing
  that its t_ctid points to itself.
(as discussed, there is no "10.0 release"; it's called the "10 release"
only, no ".0".  Feel free to use "v10" or "pg10").

> +/*
> + * Get TID of next tuple in the update chain. Caller should have checked that
> + * we are not already at the end of the chain because in that case t_ctid may
> + * actually store the root line pointer of the HOT chain whose member this
> + * tuple is.
> + */
> +#define HeapTupleHeaderGetNextTid(tup, next_ctid) \
> +do { \
> + AssertMacro(!((tup)->t_infomask2 & HEAP_LATEST_TUPLE)); \
> + ItemPointerCopy(&(tup)->t_ctid, (next_ctid)); \
> +} while (0)

Actually, I think this macro could just return the TID so that it can be
used as struct assignment, just like ItemPointerCopy does internally --
callers can do
ctid = HeapTupleHeaderGetNextTid(tup);

or more precisely, this pattern
> + if (!HeapTupleHeaderIsHeapLatest(tp.t_data, _self))
> + HeapTupleHeaderGetNextTid(tp.t_data, >ctid);
> + else
> + ItemPointerCopy(_self, >ctid);

becomes
hufd->ctid = HeapTupleHeaderIsHeapLatest(foo) ?
HeapTupleHeaderGetNextTid(foo) : >t_self;
or something like that.  I further wonder if it'd make sense to hide
this into yet another macro.


The API of RelationPutHeapTuple appears a bit contorted, where
root_offnum is both input and output.  I think it's cleaner to have the
argument be the input, and have the output offset be the return value --
please check whether that simplifies things; for example I think this:

> + root_offnum = InvalidOffsetNumber;
> + RelationPutHeapTuple(relation, buffer, heaptup, false,
> + _offnum);

becomes

root_offnum = RelationPutHeapTuple(relation, buffer, heaptup, false,
InvalidOffsetNumber);


Please remove the words "must have" in this comment:

> + /*
> +  * Also mark both copies as latest and set the root offset information. 
> If
> +  * we're doing a HOT/WARM update, then we just copy the information from
> +  * old tuple, if available or computed above. For regular updates,
> +  * RelationPutHeapTuple must have returned us the actual offset number
> +  * where the new version was inserted and we store the same value since 
> the
> +  * update resulted in a new HOT-chain
> +  */

Many comments lack finishing periods in complete sentences, which looks
odd.  Please fix.


I have not looked at the other patch yet.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check

2017-01-25 Thread Greg Stark
I tend to agree. But in the past when this came up people pointed out
you could equally do things this way and still grant all the access
you wanted using SECURITY DEFINER. Arguably that's a better approach
because then instead of auditing the entire monitor script you only
need to audit this one wrapper function, pg_ls_monitor_dir() which
just calls pg_ls_dir() on this one directory.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check

2017-01-25 Thread Robert Haas
On Wed, Jan 25, 2017 at 11:30 AM, Greg Stark  wrote:
> I tend to agree. But in the past when this came up people pointed out
> you could equally do things this way and still grant all the access
> you wanted using SECURITY DEFINER. Arguably that's a better approach
> because then instead of auditing the entire monitor script you only
> need to audit this one wrapper function, pg_ls_monitor_dir() which
> just calls pg_ls_dir() on this one directory.

I agree that can be done, but it's nicer if you can use the same SQL
all the time.  With that solution, you need one SQL query to run when
you've got superuser privileges and a different SQL query to run when
you are running without superuser privileges but somebody's run the
create-security-definer-wrappers-for-me script.  That's a deployment
nuisance if you want to support both configurations.

Also, the same argument could be made about removing the built-in
superuser check from ANY function, and we've already rejected that
argument for a bunch of other functions.  If we say that argument is
valid for some functions but not others, then we've got to decide for
which ones it's valid and for which ones it isn't, and consensus will
not be forthcoming.  I take the position that hard-coded superuser
checks stink in general, and I'm grateful to Stephen for his work
making dump/restore work properly on system catalog permissions so
that we can support better alternatives.  I'm not asking for anything
more than that we apply that same policy here as we have in other
cases.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2017-01-25 Thread Masahiko Sawada
On Tue, Jan 24, 2017 at 1:49 AM, Claudio Freire  wrote:
> On Fri, Jan 20, 2017 at 6:24 AM, Masahiko Sawada  
> wrote:
>> On Thu, Jan 19, 2017 at 8:31 PM, Claudio Freire  
>> wrote:
>>> On Thu, Jan 19, 2017 at 6:33 AM, Anastasia Lubennikova
>>>  wrote:
 28.12.2016 23:43, Claudio Freire:

 Attached v4 patches with the requested fixes.


 Sorry for being late, but the tests took a lot of time.
>>>
>>> I know. Takes me several days to run my test scripts once.
>>>
 create table t1 as select i, md5(random()::text) from
 generate_series(0,4) as i;
 create index md5_idx ON  t1(md5);
 update t1 set md5 = md5((random() * (100 + 500))::text);
 vacuum;

 Patched vacuum used 2.9Gb of memory and vacuumed the index in one pass,
 while for old version it took three passes (1GB+1GB+0.9GB).
 Vacuum duration results:

 vanilla:
 LOG: duration: 4359006.327 ms  statement: vacuum verbose t1;
 patched:
 LOG: duration: 3076827.378 ms  statement: vacuum verbose t1;

 We can see 30% vacuum speedup. I should note that this case can be
 considered
 as favorable to vanilla vacuum: the table is not that big, it has just one
 index
 and disk used is a fast fusionIO. We can expect even more gain on slower
 disks.

 Thank you again for the patch. Hope to see it in 10.0.
>>>
>>> Cool. Thanks for the review and the tests.
>>>
>>
>> I encountered a bug with following scenario.
>> 1. Create table and disable autovacuum on that table.
>> 2. Make about 20 dead tuples on the table.
>> 3. SET maintenance_work_mem TO 1024
>> 4. VACUUM
>>
>> @@ -729,7 +759,7 @@ lazy_scan_heap(Relation onerel, int options,
>> LVRelStats *vacrelstats,
>>  * not to reset latestRemovedXid since we want
>> that value to be
>>  * valid.
>>  */
>> -   vacrelstats->num_dead_tuples = 0;
>> +   lazy_clear_dead_tuples(vacrelstats);
>> vacrelstats->num_index_scans++;
>>
>> /* Report that we are once again scanning the heap */
>>
>> I think that we should do vacrelstats->dead_tuples.num_entries = 0 as
>> well in lazy_clear_dead_tuples(). Once the amount of dead tuples
>> reached to maintenance_work_mem, lazy_scan_heap can never finish.
>
> That's right.
>
> I added a test for it in the attached patch set, which uncovered
> another bug in lazy_clear_dead_tuples, and took the opportunity to
> rebase.
>
> On Mon, Jan 23, 2017 at 1:06 PM, Alvaro Herrera
>  wrote:
>> I pushed this patch after rewriting it rather completely.  I added
>> tracing notices to inspect the blocks it was prefetching and observed
>> that the original coding was failing to prefetch the final streak of
>> blocks in the table, which is an important oversight considering that it
>> may very well be that those are the only blocks to read at all.
>>
>> I timed vacuuming a 4000-block table in my laptop (single SSD disk;
>> dropped FS caches after deleting all rows in table, so that vacuum has
>> to read all blocks from disk); it changes from 387ms without patch to
>> 155ms with patch.  I didn't measure how much it takes to run the other
>> steps in the vacuum, but it's clear that the speedup for the truncation
>> phase is considerable.
>>
>> ĄThanks, Claudio!
>
> Cool.
>
> Though it wasn't the first time this idea has been floating around, I
> can't take all the credit.
>
>
> On Fri, Jan 20, 2017 at 6:25 PM, Alvaro Herrera
>  wrote:
>> FWIW, I think this patch is completely separate from the maint_work_mem
>> patch and should have had its own thread and its own commitfest entry.
>> I intend to get a look at the other patch next week, after pushing this
>> one.
>
> That's because it did have it, and was left in limbo due to lack of
> testing on SSDs. I just had to adopt it here because otherwise tests
> took way too long.

Thank you for updating the patch!

+   /*
+* Quickly rule out by lower bound (should happen a lot) Upper bound was
+* already checked by segment search
+*/
+   if (vac_cmp_itemptr((void *) itemptr, (void *) rseg->dead_tuples) < 0)
+   return false;

I think that if the above result is 0, we can return true as itemptr
matched lower bound item pointer in rseg->dead_tuples.

 +typedef struct DeadTuplesSegment
+{
+   int num_dead_tuples;/* # of
entries in the segment */
+   int max_dead_tuples;/* # of
entries allocated in the segment */
+   ItemPointerData last_dead_tuple;/* Copy of the last
dead tuple (unset
+
  * until the segment is fully
+
  * populated) */
+   unsigned short padding;
+   ItemPointer dead_tuples;

Re: [HACKERS] COPY as a set returning function

2017-01-25 Thread Alvaro Herrera
David Fetter wrote:

> @@ -562,7 +563,6 @@ CopyGetData(CopyState cstate, void *databuf, int minread, 
> int maxread)
>errmsg("could not read from 
> COPY file: %m")));
>   break;
>   case COPY_OLD_FE:
> -
>   /*
>* We cannot read more than minread bytes (which in 
> practice is 1)
>* because old protocol doesn't have any clear way of 
> separating

This change is pointless as it'd be undone by pgindent.

> + /*
> +  * Function signature is:
> +  * copy_srf( filename text default null,
> +  *   is_program boolean default false,
> +  *   format text default 'text',
> +  *   delimiter text default E'\t' in text mode, ',' in csv mode,
> +  *   null_string text default '\N',
> +  *   header boolean default false,
> +  *   quote text default '"' in csv mode only,
> +  *   escape text default null, -- defaults to whatever quote is
> +  *   encoding text default null
> +  */

This comment would be mangled by pgindent -- please add an /*--- marker
to prevent it.

> + /* param 7: escape text default null, -- defaults to whatever quote is 
> */
> + if (PG_ARGISNULL(7))
> + {
> + copy_state.escape = copy_state.quote;
> + }
> + else
> + {
> + if (copy_state.csv_mode)
> + {
> + copy_state.escape = 
> TextDatumGetCString(PG_GETARG_TEXT_P(7));
> + if (strlen(copy_state.escape) != 1)
> + {
> + ereport(ERROR,
> + 
> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +  errmsg("COPY escape must be a 
> single one-byte character")));
> + }
> + }
> + else
> + {
> + ereport(ERROR,
> + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +  errmsg("COPY escape available only in 
> CSV mode")));
> + }
> + }

I don't understand why do we have all these checks.  Can't we just pass
the values to COPY and let it apply the checks?  That way, when COPY is
updated to support multibyte escape chars (for example) we don't need to
touch this code.  Together with removing the unneeded braces that would
make these stanzas about six lines long instead of fifteen.


> + tuple = heap_form_tuple(tupdesc,values,nulls);
> + //tuple = BuildTupleFromCStrings(attinmeta, field_strings);
> + tuplestore_puttuple(tupstore, tuple);

No need to form a tuple; use tuplestore_putvalues here.


> + }
> +
> + /* close "file" */
> + if (copy_state.is_program)
> + {
> + int pclose_rc;
> +
> + pclose_rc = ClosePipeStream(copy_state.copy_file);
> + if (pclose_rc == -1)
> + ereport(ERROR,
> + (errcode_for_file_access(),
> +  errmsg("could not close pipe to 
> external command: %m")));
> + else if (pclose_rc != 0)
> + ereport(ERROR,
> + 
> (errcode(ERRCODE_EXTERNAL_ROUTINE_EXCEPTION),
> +  errmsg("program \"%s\" failed",
> + copy_state.filename),
> +  errdetail_internal("%s", 
> wait_result_to_str(pclose_rc;
> + }
> + else
> + {
> + if (copy_state.filename != NULL && 
> FreeFile(copy_state.copy_file))
> + ereport(ERROR,
> + (errcode_for_file_access(),
> +  errmsg("could not close file \"%s\": 
> %m",
> + copy_state.filename)));
> + }

I wonder if these should be an auxiliary function in copy.c to do this.
Surely copy.c itself does pretty much the same thing ...


> +DATA(insert OID = 3353 (  copy_srf PGNSP PGUID 12 1 0 0 0 f f f f f t v u 9 
> 0 2249 "25 16 25 25 25 16 25 25 25" _null_ _null_ _null_ _null_ _null_ 
> copy_srf _null_ _null_ _null_ ));
> +DESCR("set-returning COPY proof of concept");

Why is this marked "proof of concept"?  If this is a PoC only, why are
you submitting it as a patch?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument

2017-01-25 Thread Peter Geoghegan
On Wed, Jan 25, 2017 at 2:49 PM, Tom Lane  wrote:
> I looked at the 0002 patch, and while the code is probably OK, I am
> dissatisfied with this API spec:
>
> + * If copy is TRUE, the slot receives a copied tuple that will stay valid
> + * regardless of future manipulations of the tuplesort's state.  Memory is
> + * owned by the caller.  If copy is FALSE, the slot may just receive a 
> pointer
> + * to a tuple held within the tuplesort.  The latter is more efficient, but
> + * the slot contents may be corrupted if there is another call here before
> + * previous slot contents are used.
>
> What does "here" mean?  If that means specifically "another call of
> tuplesort_gettupleslot", say so.  If "here" refers to the whole module,
> it would be better to say something like "the slot contents may be
> invalidated by any subsequent manipulation of the tuplesort's state".
> In any case it'd be a good idea to delineate safe usage patterns, perhaps
> "copy=FALSE is recommended only when the next tuplesort manipulation will
> be another tuplesort_gettupleslot fetch into the same slot."

I agree with your analysis.

It means "another call to tuplesort_gettupleslot", but I believe that
it would be safer (more future-proof) to actually specify "the slot
contents may be invalidated by any subsequent manipulation of the
tuplesort's state" instead.

> There are several other uses of "call here", both in this patch and
> pre-existing in tuplesort.c, that I find equally vague and unsatisfactory.
> Let's try to improve that.

Should I write a patch along those lines?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] safer node casting

2017-01-25 Thread Andres Freund
Hi Peter,


On 2016-12-31 12:08:22 -0500, Peter Eisentraut wrote:
> RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(lc));

Are you planning to add this / update this patch? Because I really would
have liked this a number of times already...  I can update it according
to my suggestions (to avoid multiple eval scenarios) if helpful

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Checksums by default?

2017-01-25 Thread Michael Paquier
On Thu, Jan 26, 2017 at 9:14 AM, Peter Geoghegan  wrote:
> On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost  wrote:
>> As it is, there are backup solutions which *do* check the checksum when
>> backing up PG.  This is no longer, thankfully, some hypothetical thing,
>> but something which really exists and will hopefully keep users from
>> losing data.
>
> Wouldn't that have issues with torn pages?

Why? What do you foresee here? I would think such backup solutions are
careful enough to ensure correctly the durability of pages so as they
are not partially written.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] safer node casting

2017-01-25 Thread Tom Lane
Andres Freund  writes:
> On 2016-12-31 12:08:22 -0500, Peter Eisentraut wrote:
>> RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(lc));

> Are you planning to add this / update this patch? Because I really would
> have liked this a number of times already...  I can update it according
> to my suggestions (to avoid multiple eval scenarios) if helpful

Yeah, I'd like that in sooner rather than later, too.  But we do need
it to be foolproof - no multiple evals.  The first draft had
inadequate-parenthesization hazards, too.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Checksums by default?

2017-01-25 Thread Andres Freund
On 2017-01-26 09:19:28 +0900, Michael Paquier wrote:
> On Thu, Jan 26, 2017 at 9:14 AM, Peter Geoghegan  wrote:
> > On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost  wrote:
> >> As it is, there are backup solutions which *do* check the checksum when
> >> backing up PG.  This is no longer, thankfully, some hypothetical thing,
> >> but something which really exists and will hopefully keep users from
> >> losing data.
> >
> > Wouldn't that have issues with torn pages?
> 
> Why? What do you foresee here? I would think such backup solutions are
> careful enough to ensure correctly the durability of pages so as they
> are not partially written.

That means you have to replay enough WAL to get into a consistent
state...

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Checksums by default?

2017-01-25 Thread Michael Paquier
On Thu, Jan 26, 2017 at 9:22 AM, Andres Freund  wrote:
> On 2017-01-26 09:19:28 +0900, Michael Paquier wrote:
>> On Thu, Jan 26, 2017 at 9:14 AM, Peter Geoghegan  wrote:
>> > On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost  wrote:
>> >> As it is, there are backup solutions which *do* check the checksum when
>> >> backing up PG.  This is no longer, thankfully, some hypothetical thing,
>> >> but something which really exists and will hopefully keep users from
>> >> losing data.
>> >
>> > Wouldn't that have issues with torn pages?
>>
>> Why? What do you foresee here? I would think such backup solutions are
>> careful enough to ensure correctly the durability of pages so as they
>> are not partially written.
>
> That means you have to replay enough WAL to get into a consistent
> state...

Ah, OK I got the point. Yes that would be a problem to check this
field on raw backups except if the page size matches the kernel's one
at 4k.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Checksums by default?

2017-01-25 Thread Robert Haas
On Wed, Jan 25, 2017 at 7:19 PM, Michael Paquier
 wrote:
> On Thu, Jan 26, 2017 at 9:14 AM, Peter Geoghegan  wrote:
>> On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost  wrote:
>>> As it is, there are backup solutions which *do* check the checksum when
>>> backing up PG.  This is no longer, thankfully, some hypothetical thing,
>>> but something which really exists and will hopefully keep users from
>>> losing data.
>>
>> Wouldn't that have issues with torn pages?
>
> Why? What do you foresee here? I would think such backup solutions are
> careful enough to ensure correctly the durability of pages so as they
> are not partially written.

Well, you'd have to keep a read(fd, buf, 8192) performed by the backup
tool from overlapping with a write(fd, buf, 8192) performed by the
backend.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Checksums by default?

2017-01-25 Thread Andres Freund
On 2017-01-25 19:30:08 -0500, Stephen Frost wrote:
> * Peter Geoghegan (p...@heroku.com) wrote:
> > On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost  wrote:
> > > As it is, there are backup solutions which *do* check the checksum when
> > > backing up PG.  This is no longer, thankfully, some hypothetical thing,
> > > but something which really exists and will hopefully keep users from
> > > losing data.
> > 
> > Wouldn't that have issues with torn pages?
> 
> No, why would it?  The page has either been written out by PG to the OS,
> in which case the backup s/w will see the new page, or it hasn't been.

Uh. Writes aren't atomic on that granularity.  That means you very well
*can* see a torn page (in linux you can e.g. on 4KB os page boundaries
of a 8KB postgres page). Just read a page while it's being written out.

You simply can't reliably verify checksums without replaying WAL (or
creating a manual version of replay, as in checking the WAL for a FPW).


> This isn't like a case where only half the page made it to the disk
> because of a system failure though; everything is online and working
> properly during an online backup.

I don't think that really changes anything.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Checksums by default?

2017-01-25 Thread Stephen Frost
* Andres Freund (and...@anarazel.de) wrote:
> On 2017-01-25 19:30:08 -0500, Stephen Frost wrote:
> > * Peter Geoghegan (p...@heroku.com) wrote:
> > > On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost  wrote:
> > > > As it is, there are backup solutions which *do* check the checksum when
> > > > backing up PG.  This is no longer, thankfully, some hypothetical thing,
> > > > but something which really exists and will hopefully keep users from
> > > > losing data.
> > > 
> > > Wouldn't that have issues with torn pages?
> > 
> > No, why would it?  The page has either been written out by PG to the OS,
> > in which case the backup s/w will see the new page, or it hasn't been.
> 
> Uh. Writes aren't atomic on that granularity.  That means you very well
> *can* see a torn page (in linux you can e.g. on 4KB os page boundaries
> of a 8KB postgres page). Just read a page while it's being written out.
> 
> You simply can't reliably verify checksums without replaying WAL (or
> creating a manual version of replay, as in checking the WAL for a FPW).

Looking through the WAL isn't any surprise and is something we've been
planning to do for other reasons anyway.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Checksums by default?

2017-01-25 Thread Stephen Frost
Michael,

* Michael Paquier (michael.paqu...@gmail.com) wrote:
> That would be enough. It should also be rare enough that there would
> not be that many pages to track when looking at records from the
> backup start position to minimum recovery point. It could be also
> simpler, though more time-consuming, to just let a backup recover up
> to the minimum recovery point (recovery_target = 'immediate'), and
> then run the checksum sanity checks. There are other checks usually
> needed on a backup anyway like being sure that index pages are in good
> shape even with a correct checksum, etc.

Belive me, I'm all for *all* of that.

> But here I am really high-jacking the thread, so I'll stop..

If you have further thoughts, I'm all ears.  This is all relatively new,
and I don't expect to have all of the answer or solutions.

Obviously, having to bring up a full database is an extra step (one we
try to make easy to do), but, sadly, we don't have any way to ask PG to
verify all the checksums with released versions, so that's what we're
working with.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] patch: function xmltable

2017-01-25 Thread Pavel Stehule
2017-01-25 15:07 GMT+01:00 Alvaro Herrera :

> Tom Lane wrote:
> > Andres Freund  writes:
> > > On 2017-01-24 21:32:56 -0300, Alvaro Herrera wrote:
> > >> XMLTABLE is specified by the standard to return multiple rows ... but
> > >> then as far as my reading goes, it is only supposed to be supported in
> > >> the range table (FROM clause) not in the target list.  I wonder if
> > >> this would end up better if we only tried to support it in RT.  I
> asked
> > >> Pavel to implement it like that a few weeks ago, but ...
> >
> > > Right - it makes sense in the FROM list - but then it should be an
> > > executor node, instead of some expression thingy.
> >
> > +1 --- we're out of the business of having simple expressions that
> > return rowsets.
>
> Well, that's it.  I'm not committing this patch against two other
> committers' opinion, plus I was already on the fence about the
> implementation anyway.  I think you should just go with the flow and
> implement this by creating nodeTableexprscan.c.  It's not even
> difficult.
>

I am playing with this and the patch looks about 15kB longer - just due
implementation basic scan functionality - and I didn't touch a planner.

I am not happy from this - still I have a feeling so I try to reimplement
reduced SRF.

Regards

Pavel

>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check

2017-01-25 Thread Stephen Frost
Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Wed, Jan 25, 2017 at 3:17 PM, Stephen Frost  wrote:
> > Your email is 'pg_ls_dir & friends', which I took to imply at *least*
> > pg_read_file() and pg_read_binary_file(), and it's not unreasonable to
> > think you may have meant everything in adminpack, which also includes
> > pg_file_write(), pg_file_rename() and pg_file_unlink().
> 
> Well, I was talking about genfile.c, which doesn't contain those
> functions, but for the record I'm in favor of extirpating every single
> hard-coded superuser check from the backend without exception.

Then it was correct for me to assume that's what you meant, and means my
reaction and response are on-point.

> Preventing people from calling functions by denying the ability to
> meaningfully GRANT EXECUTE on those functions doesn't actually stop
> them from delegating those functions to non-superusers.  It either (a)
> causes them to give superuser privileges to accounts that otherwise
> would have had lesser privileges or (b) forces them to use wrapper
> functions to grant access rather than doing it directly or (c) some
> other dirty hack.  If they pick (a), security is worse; if they pick
> (b) or (c), you haven't prevented them from doing what they wanted to
> do anyway.  You've just made it annoying and inconvenient.

The notion that security is 'worse' under (a) is flawed- it's no
different.  With regard to 'b', if their wrapper function is
sufficiently careful to ensure that the caller isn't able to do anything
which would increase the caller's level to that of superuser, then
security is improved.  If the wrapper simply turns around can calls the
underlying function, then it's no different from '(a)'.

I am suggesting that we shouldn't make it look like there are
distinctions when there is actually no difference.  That is a good thing
for our users because it keeps them informed about what they're actually
doing when they grant access.

> Both EnterpriseDB's PEM and check_postgres.pl currently have a bunch
> of things that don't work unless you run as superuser.  I think we
> should be trying to provide ways of reducing those lists.  If I can't
> get agreement on method (a), I'm going to go look for ways of doing
> (b) or (c), which is more work and uglier but if I can't get consensus
> here then oh well.

I've commented on here and spoken numerous times about exactly that goal
of reducing the checks in check_postgres.pl which require superuser.
You're not actually doing that and nothing you've outlined in here so
far makes me believe you see how having pg_write_file() access is
equivilant to giving someone superuser, and that concerns me.

As someone who has contributed code and committed code back to
check_postgres.pl, I would be against making changes there which
install security definer functions to give the monitoring user
superuser-level access, and I believe Greg S-M would feel the same way
considering that he and I have discussed exactly that in the past.  I
don't mean to speak for him and perhaps his opinion has changed, but it
seems unlikely to me.

If the DBA wants to give the monitoring user superuser-level access to
run the superuser-requiring checks that check_postgres.pl has, they're
welcome to do so, but they'll be making an informed decision that they
have weighed the risk of their monitoring user being compromised against
the value of that additional monitoring, which is an entirely
appropriate and reasonable decision for them to make.

> I do not accept your authority to determine what monitoring tools need
> to or should do.  Monitoring tools that use pg_ls_dir are facts on the
> ground, and your disapprobation doesn't change that at all.  It just
> obstructs moving to a saner permissions framework.  

Allowing GRANT to be used to give access to pg_write_file and friends is
not a 'permissions framework'.  Further, I am not claiming authority
over what monitoring tools should need to do or not do, and a great many
people run their monitoring tools as superuser.  I am not trying to take
away their ability to do so.

The way to make progress here is not, however, to just decide that all
those superuser() checks we put in place were silly and should be
removed, it's to provide better ways to monitor PG which provide exactly
the monitoring information needed in a useful and sensible way.

I understand the allure of just removing a few lines of code to make
things "easier" or "faster" or "better", but I don't think it's a good
idea to remove these superuser checks, nor do I think it's a good idea
to remove our WAL CRCs even if it'd help some of my clients, nor do I
think it's a good idea to have checksums disabled by default, or to rip
out all of WAL as being "unnecessary" because we have ZFS and it should
handle all things data and we could just fsync all of the heap files on
commit and be done with it.

Admittedly, you're not argueing for half of what I just 

Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check

2017-01-25 Thread Stephen Frost
Andres,

* Andres Freund (and...@anarazel.de) wrote:
> On 2017-01-25 16:52:38 -0500, Stephen Frost wrote:
> > * Robert Haas (robertmh...@gmail.com) wrote:
> > > Preventing people from calling functions by denying the ability to
> > > meaningfully GRANT EXECUTE on those functions doesn't actually stop
> > > them from delegating those functions to non-superusers.  It either (a)
> > > causes them to give superuser privileges to accounts that otherwise
> > > would have had lesser privileges or (b) forces them to use wrapper
> > > functions to grant access rather than doing it directly or (c) some
> > > other dirty hack.  If they pick (a), security is worse; if they pick
> > > (b) or (c), you haven't prevented them from doing what they wanted to
> > > do anyway.  You've just made it annoying and inconvenient.
> >
> > The notion that security is 'worse' under (a) is flawed- it's no
> > different.
> 
> Huh? Obviously that's nonesense, given the pg_ls_dir example.

Robert's made it clear that he'd like to have a blanket rule that we
don't have superuser checks in these code paths if they can be GRANT'd
at the database level, which goes beyond pg_ls_dir.

If the question was only about pg_ls_dir, then I still wouldn't be for
it, because, as the bits you didn't quote discussed, it encourages users
and 3rd party tool authors to base more things off of pg_ls_dir to
look into the way PG stores data on disk, and affords more access than
the monitoring user has any need for, none of which are good, imv.  It
also discourages people from implementing proper solutions which you can
'just use pg_ls_dir()', which I also don't agree with.

If you really want to do an ls, go talk to the OS.  ACLs are possible to
provide that with more granularity than what would be available through
pg_ls_dir().  We aren't in the "give a user the ability to do an ls"
business, frankly.

> > With regard to 'b', if their wrapper function is
> > sufficiently careful to ensure that the caller isn't able to do anything
> > which would increase the caller's level to that of superuser, then
> > security is improved.
> 
> Given how complex "sufficiently careful" is for security definer UDFs,
> in comparison to estimating the security of granting to a function like
> pg_ls_dir (or pretty much any other that doesn't call out to SQL level
> stuff like operators, output functions, etc), I don't understand this.

If you're implying that security definer UDFs are hard to write and get
correct, then I agree with you there.  I was affording the benefit of
the doubt to that proposed approach.

> > If the wrapper simply turns around can calls the underlying function,
> > then it's no different from '(a)'.
> 
> Except for stuff like search path.

If you consider '(a)' to be the same as superuser, which I was
postulating, then this doesn't strike me as making terribly much
difference.

> > I am suggesting that we shouldn't make it look like there are
> > distinctions when there is actually no difference.  That is a good thing
> > for our users because it keeps them informed about what they're actually
> > doing when they grant access.
> 
> This position doesn't make a lick of sense to me.  There's simply no
> benefit at all in requiring to create wrapper functions, over allowing
> to grant to non-superuser. Both is possible, secdef is a lot harder to
> get right. And you already heavily started down the path of removing
> superuser() type checks - you're just arguing to make it more or less
> randomly less consistent.

I find this bizarre considering I went through a detailed effort to go
look at every superuser check in the system and discussed, on this list,
the reasoning behind each and every one of them.  I do generally
consider arbitrary access to syscalls via the database to be a privilege
which really only the superuser should have.

> > I've commented on here and spoken numerous times about exactly that goal
> > of reducing the checks in check_postgres.pl which require superuser.
> > You're not actually doing that and nothing you've outlined in here so
> > far makes me believe you see how having pg_write_file() access is
> > equivilant to giving someone superuser, and that concerns me.
> 
> That's the users responsibility, and Robert didnt' really suggest
> granting pg_write_file() permissions, so this seems to be a straw man.

He was not arguing for only pg_ls_dir(), but for checks in all
"friends", which he later clarified to include pg_write_file().

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument

2017-01-25 Thread Tom Lane
Peter Geoghegan  writes:
> I should have already specifically pointed out that the original
> discussion on what became 0002-* is here:
> postgr.es/m/7256.1476711...@sss.pgh.pa.us
> As I said already, the general idea seems uncontroversial.

I looked at the 0002 patch, and while the code is probably OK, I am
dissatisfied with this API spec:

+ * If copy is TRUE, the slot receives a copied tuple that will stay valid
+ * regardless of future manipulations of the tuplesort's state.  Memory is
+ * owned by the caller.  If copy is FALSE, the slot may just receive a pointer
+ * to a tuple held within the tuplesort.  The latter is more efficient, but
+ * the slot contents may be corrupted if there is another call here before
+ * previous slot contents are used.

What does "here" mean?  If that means specifically "another call of
tuplesort_gettupleslot", say so.  If "here" refers to the whole module,
it would be better to say something like "the slot contents may be
invalidated by any subsequent manipulation of the tuplesort's state".
In any case it'd be a good idea to delineate safe usage patterns, perhaps
"copy=FALSE is recommended only when the next tuplesort manipulation will
be another tuplesort_gettupleslot fetch into the same slot."

There are several other uses of "call here", both in this patch and
pre-existing in tuplesort.c, that I find equally vague and unsatisfactory.
Let's try to improve that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument

2017-01-25 Thread Tom Lane
[ in the service of closing out this thread... ]

Peter Geoghegan  writes:
> Finally, 0003-* is a Valgrind suppression borrowed from my parallel
> CREATE INDEX patch. It's self-explanatory.

Um, I didn't find it all that self-explanatory.  Why wouldn't we want
to avoid writing undefined data?  I think the comment at least needs
to explain exactly what part of the written data might be uninitialized.
And I'd put the comment into valgrind.supp, too, not in the commit msg.

Also, the suppression seems far too broad.  It would for instance
block any complaint about a write() invoked via an elog call from
any function invoked from any LogicalTape* function, no matter
how far removed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Checksums by default?

2017-01-25 Thread Robert Haas
On Wed, Jan 25, 2017 at 6:30 PM, Stephen Frost  wrote:
> I hope to discuss it further after we have the ability to turn it off
> easily.

I think we should have the ability to flip it in BOTH directions easily.

>> Second, really hard to enable is a relative term.  I accept that
>> enabling checksums is not a pleasant process.  Right now, you'd have
>> to do a dump/restore, or use logical replication to replicate the data
>> to a new cluster and then switch over.  On the other hand, if
>> checksums are really a critical feature, how are people getting to the
>> point where they've got a mission-critical production system and only
>> then discovering that they want to enable checksums?
>
> I truely do wish everyone would come talk to me before building out a
> database.  Perhaps that's been your experience, in which case, I envy
> you, but I tend to get a reaction more along the lines of "wait, what do
> you mean I had to pass some option to initdb to enable checksum?!?!".
> The fact that we've got a WAL implementation and clearly understand
> fsync requirements, why full page writes make sense, and that our WAL
> has its own CRCs which isn't possible to disable, tends to lead people
> to think we really know what we're doing and that we care a lot about
> their data.

It sounds to me like you are misleading users about the positives and
negatives of checksums, which then causes them to be shocked that they
are not the default.

> As I have said, I don't believe it has to be on for everyone.

For the second time, I didn't say that.  But the default has a
powerful influence on behavior.  If it didn't, you wouldn't be trying
to get it changed.

> [ unsolicited bragging about an unspecified backup tool, presumably 
> pgbackrest ]

Great.

> Presently, last I checked at least, the database system doesn't fall
> over and die if a single page's checksum fails.

This is another thing that I never said.

> [ more unsolicited bragging an unspecified backup tool, presumably still 
> pgbackrest ]

Swell.

>> I'm not trying to downplay the usefulness of checksums *in a certain
>> context*.  It's a good feature, and I'm glad we have it.  But I think
>> you're somewhat inflating the utility of it while discounting the very
>> real costs.
>
> The costs for checksums don't bother me any more than the costs for WAL
> or WAL CRCs or full page writes.

Obviously.  But I think they should.  Frankly, I think the costs for
full page writes should bother the heck out of all of us, but the
solution isn't to shut them off any more than it is to enable
checksums despite the cost.  It's to find a way to reduce the costs.

> They may not be required on every
> system, but they're certainly required on more than 'zero' entirely
> reasonable systems which people deploy in their production environments.

Nobody said otherwise.

> I'd rather walk into an engagement where the user is saying "yeah, we
> enabled checksums and it caught this corruption issue" than having to
> break the bad news, which I've had to do over and over, that their
> existing system hasn't got checksums enabled.  This isn't hypothetical,
> it's what I run into regularly with entirely reasonable and skilled
> engineers who have been deploying PG.

Maybe you should just stop telling them and use the time thus freed up
to work on improving the checksum feature.

I'm skeptical of this whole discussion because you seem to be filled
with unalloyed confidence that checksums have little performance
impact and will do wonderful things to prevent data loss, whereas I
think they have significant performance impact and will only very
slightly help to prevent data loss.  I admit that the idea of having
pgbackrest verify checksums while backing up seems like it could
greatly improve the chances of checksums being useful, but I'm not
going to endorse changing PostgreSQL's default for pgbackrest's
benefit.  It's got to be to the benefit of PostgreSQL users broadly,
not just the subset of those people who use one particular backup
tool.  Also, the massive hit that will probably occur on
high-concurrency OLTP workloads larger than shared_buffers is going to
be had to justify for any amount of backup security.  I think that
problem's got to be solved or at least mitigated before we think about
changing this.  I realize that not everyone would set the bar that
high, but I see far too many customers with exactly that workload to
dismiss it lightly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Checksums by default?

2017-01-25 Thread Stephen Frost
* Peter Geoghegan (p...@heroku.com) wrote:
> On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost  wrote:
> > As it is, there are backup solutions which *do* check the checksum when
> > backing up PG.  This is no longer, thankfully, some hypothetical thing,
> > but something which really exists and will hopefully keep users from
> > losing data.
> 
> Wouldn't that have issues with torn pages?

No, why would it?  The page has either been written out by PG to the OS,
in which case the backup s/w will see the new page, or it hasn't been.
Our testing has not turned up any issues as yet.  That said, it's
relatively new and I wouldn't be surprised if we need to do some
adjustments in that area, which might be system-dependent even.  We
could certainly check the WAL for the page that had a checksum error (we
currently simply report them, though don't throw away a prior backup if
we detect one).

This isn't like a case where only half the page made it to the disk
because of a system failure though; everything is online and working
properly during an online backup.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Checksums by default?

2017-01-25 Thread Stephen Frost
* Michael Paquier (michael.paqu...@gmail.com) wrote:
> On Thu, Jan 26, 2017 at 9:14 AM, Peter Geoghegan  wrote:
> > On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost  wrote:
> >> As it is, there are backup solutions which *do* check the checksum when
> >> backing up PG.  This is no longer, thankfully, some hypothetical thing,
> >> but something which really exists and will hopefully keep users from
> >> losing data.
> >
> > Wouldn't that have issues with torn pages?
> 
> Why? What do you foresee here? I would think such backup solutions are
> careful enough to ensure correctly the durability of pages so as they
> are not partially written.

I believe his concern was that the backup sw might see a
partially-updated page when it reads the file while PG is writing it.
In other words, would the kernel return some intermediate state of data
while an fwrite() is in progress.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Checksums by default?

2017-01-25 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
> On Wed, Jan 25, 2017 at 7:19 PM, Michael Paquier
>  wrote:
> > On Thu, Jan 26, 2017 at 9:14 AM, Peter Geoghegan  wrote:
> >> On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost  wrote:
> >>> As it is, there are backup solutions which *do* check the checksum when
> >>> backing up PG.  This is no longer, thankfully, some hypothetical thing,
> >>> but something which really exists and will hopefully keep users from
> >>> losing data.
> >>
> >> Wouldn't that have issues with torn pages?
> >
> > Why? What do you foresee here? I would think such backup solutions are
> > careful enough to ensure correctly the durability of pages so as they
> > are not partially written.
> 
> Well, you'd have to keep a read(fd, buf, 8192) performed by the backup
> tool from overlapping with a write(fd, buf, 8192) performed by the
> backend.

As Michael mentioned, that'd depend on if things are atomic from a
user's perspective at certain sizes (perhaps 4k, which wouldn't be too
surprising, but may also be system-dependent), in which case verifying
that the page is in the WAL would be sufficient.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Checksums by default?

2017-01-25 Thread Michael Paquier
On Thu, Jan 26, 2017 at 9:32 AM, Stephen Frost  wrote:
> * Robert Haas (robertmh...@gmail.com) wrote:
>> On Wed, Jan 25, 2017 at 7:19 PM, Michael Paquier
>>  wrote:
>> > On Thu, Jan 26, 2017 at 9:14 AM, Peter Geoghegan  wrote:
>> >> On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost  wrote:
>> >>> As it is, there are backup solutions which *do* check the checksum when
>> >>> backing up PG.  This is no longer, thankfully, some hypothetical thing,
>> >>> but something which really exists and will hopefully keep users from
>> >>> losing data.
>> >>
>> >> Wouldn't that have issues with torn pages?
>> >
>> > Why? What do you foresee here? I would think such backup solutions are
>> > careful enough to ensure correctly the durability of pages so as they
>> > are not partially written.
>>
>> Well, you'd have to keep a read(fd, buf, 8192) performed by the backup
>> tool from overlapping with a write(fd, buf, 8192) performed by the
>> backend.
>
> As Michael mentioned, that'd depend on if things are atomic from a
> user's perspective at certain sizes (perhaps 4k, which wouldn't be too
> surprising, but may also be system-dependent), in which case verifying
> that the page is in the WAL would be sufficient.

That would be enough. It should also be rare enough that there would
not be that many pages to track when looking at records from the
backup start position to minimum recovery point. It could be also
simpler, though more time-consuming, to just let a backup recover up
to the minimum recovery point (recovery_target = 'immediate'), and
then run the checksum sanity checks. There are other checks usually
needed on a backup anyway like being sure that index pages are in good
shape even with a correct checksum, etc.

But here I am really high-jacking the thread, so I'll stop..
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pdf versus single-html

2017-01-25 Thread Peter Eisentraut
On 1/21/17 6:29 AM, Erik Rijkers wrote:
> It might even be good to include such a single-file html in the Makefile 
> as an option.
> 
> I'll give it a try but has anyone done this work already, perhaps?

Already exists:

doc/src/sgml$ make postgres.html

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY as a set returning function

2017-01-25 Thread David Fetter
On Wed, Jan 25, 2017 at 12:23:23PM -0500, Corey Huinker wrote:
> 
> Feel free to mark it returned as feedback. The concept didn't
> generate as much enthusiasm as I had hoped, so I think the right
> thing to do now is scale it back to a patch that makes
> CopyFromRawFields() externally visible so that extensions can use
> them.

You're getting enthusiasm in the form of these reviews and suggestions
for improvement.  That it doesn't always happen immediately is a
byproduct of the scarcity of developer time and the sheer volume of
things to which people need to pay attention.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical Replication WIP

2017-01-25 Thread Peter Eisentraut
On 1/22/17 8:11 PM, Petr Jelinek wrote:
> 0001 - Changes the libpqrcv_connect to use async libpq api so that it
> won't get stuck forever in case of connect is stuck. This is preexisting
> bug that also affects walreceiver but it's less visible there as there
> is no SQL interface to initiate connection there.

Probably a mistake here:

+   case PGRES_POLLING_READING:
+   extra_flag = WL_SOCKET_READABLE;
+   /* pass through */
+   case PGRES_POLLING_WRITING:
+   extra_flag = WL_SOCKET_WRITEABLE;

extra_flag gets overwritten in the reading case.

Please elaborate in the commit message what this change is for.

> 0002 - Close replication connection when CREATE SUBSCRIPTION gets
> canceled (otherwise walsender on the other side may stay in idle in
> transaction state).

committed

> 0003 - Fixes buffer initialization in walsender that I found when
> testing the above two. This one should be back-patched to 9.4 since it's
> broken since then.

Can you explain more in which code path this problem occurs?

I think we should get rid of the global variables and give each function
its own buffer that it initializes the first time through.  Otherwise
we'll keep having to worry about this.

> 0004 - Fixes the foreign key issue reported by Thom Brown and also adds
> tests for FK and trigger handling.

I think the trigger handing should go into execReplication.c.

> 0005 - Adds support for renaming publications and subscriptions.

Could those not be handled in the generic rename support in
ExecRenameStmt()?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical-replication.sgml improvements

2017-01-25 Thread Peter Eisentraut
On 1/20/17 11:00 AM, Erik Rijkers wrote:
> logical-replication.sgml.diff changes

Committed, thanks!

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Checksums by default?

2017-01-25 Thread Robert Haas
On Wed, Jan 25, 2017 at 12:02 AM, Jim Nasby  wrote:
> I'm not completely grokking your second paragraph, but I would think that an
> average user would love got get a heads-up that their hardware is failing.

Sure.  If the database runs fast enough with checksums enabled,
there's basically no reason to have them turned off.  The issue is
when it doesn't.

Also, it's not as if there are no other ways of checking whether your
disks are failing.  SMART, for example, is supposed to tell you about
incipient hardware failures before PostgreSQL ever sees a bit flip.
Surely an average user would love to get a heads-up that their
hardware is failing even when that hardware is not being used to power
PostgreSQL, yet many people don't bother to configure SMART (or
similar proprietary systems provided by individual vendors).

Trying to force those people to use checksums is just masterminding;
they've made their own decision that it's not worth bothering with.
When something goes wrong, WE still care about distinguishing hardware
failure from PostgreSQL failure.   Our pride is on the line.  But the
customer often doesn't.  The DBA isn't the same person as the
operating system guy, and the operating system guy isn't going to
listen to the DBA even if the DBA complains of checksum failures.  Or
the customer has 100 things on the same piece of hardware and
PostgreSQL is the only one that failed; or alternatively they all
failed around the same time; either way the culprit is obvious.  Or
the remedy is to restore from backup[1] whether the problem is
hardware or software and regardless of whose software is to blame.  Or
their storage cost a million dollars and is a year old and they simply
won't believe that it's failing.  Or their storage cost a hundred
dollars and is 8 years old and they're looking for an excuse to
replace it whether it's responsible for the problem du jour or not.

I think it's great that we have a checksum feature and I think it's
great for people who want to use it and are willing to pay the cost of
it to turn it on.  I don't accept the argument that all of our users,
or even most of them, fall into that category.  I also think it's
disappointing that there's such a vigorous argument for changing the
default when so little follow-on development has gone into this
feature.  If we had put any real effort into making this easier to
turn on and off, for example, the default value would be less
important, because people could change it more easily.  But nobody's
making that effort.  I suggest that the people who think this a
super-high-value feature should be willing to put some real work into
improving it instead of trying to force it down everybody's throat
as-is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] Alternatively, sometimes the remedy is to wish the had a usable
backup while frantically running pg_resetxlog.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical Replication WIP

2017-01-25 Thread Peter Eisentraut
On 1/23/17 11:19 AM, Fujii Masao wrote:
> The copyright in each file that the commit of logical rep added needs to
> be updated.

I have fixed that.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [FEATURE PATCH] pg_stat_statements with plans

2017-01-25 Thread Julian Markwort

Hello psql-hackers!

TL:DR;
We extended the functionality of pg_stat_statements so it can track 
worst and best case execution plans.


Based on a suggestion of my colleague Arne Scheffer, Marius Timmer and I 
extended pg_stat_statements so it can also record execution plans, 
whenever the execution time is exceeded (or deceeded) by a definable 
factor.
We were largely inspired by the pg_stat_plans extension by Peter 
Geoghegan and Simon Riggs - we don't claim any originality on this part 
- which is unfortunately not available on newer postgresql versions. 
There are a few differences which will become apparent in the following 
lines.


By default, the modified pg_stat_statements extension will now track 
good plans and bad plans for each entry in pg_stat_statements.
The plans are not normalized or hashed (as opposed to pg_stat_plans), 
they represent discreet statements.
A good plan is saved, whenever this sort of query has been used for the 
first time or the time of the previously recorded good plan has been 
deceeded by a smaller factor than 0.9 .
Analogous to this, a bad_plan is saved, when the time has been exceeded 
by a factor greater than 1.1 .
There are GUCs available so these parameters can be tuned to your 
liking. Tracking can be disabled for both plans individually.
A plan_format can be defined to enable better readability or 
processability through other tools.


You can reset your good and bad plans by using a
select on pg_stat_statements_good_plan_reset([queryid]);
resetting bad plans uses pg_stat_statements_bad_plan_reset, obviously.
In case of a reset, the execution time, timestamp and plan itself are 
just set to 0 respective NULL.


The pg_stat_statements view now provides six extra columns:
good_plan, good_plan_time, good_plan_timestamp, bad_plan, bad_plan_time 
and bad_plan_timestamp.


Plans are only displayed if the showtext argument is true and the user 
is the superuser or the user who has been associated with that entry.


Furthermore, we implemented a GUC that allows you to control the maximum 
refresh frequency to avoid performance impacts on restarts or resets.
A plan is only updated when tracking is enabled and more time than 
"plan_min_interval" has passed (default: 5 seconds) and the previously 
mentioned conditions for the execution time have been met.


The major selling point of this feature?
Beeing able to find plans that need optimization (e.g. by creating 
indexes). As pg_stat_statements tracks normalized queries, there might 
be certain values or even daytimes that result in very bad plans, while 
others result in perfectly fine plans.
Of course, the GUC log_min_duration_statement can also detect long 
runners, but the advantage of pg_stat_statements is that we count the 
total calls of normalized queries, which enables us to find plans, that 
don't count as long runners, while their aggregated time might show 
shortcomings regarding their plans.


We've found this sort of tool really useful when dealing with queries 
produced by ORM libraries, where optimization is not intuitive.


Various tests using pg_bench suggest that this extension does not worsen 
the performance of the database.


We're really looking forward to your opinions and feedback on this 
feature patch

Julian, Marius and Arne
diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index 298951a..2a22eb5 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -4,7 +4,7 @@ MODULE_big = pg_stat_statements
 OBJS = pg_stat_statements.o $(WIN32RES)
 
 EXTENSION = pg_stat_statements
-DATA = pg_stat_statements--1.4.sql pg_stat_statements--1.3--1.4.sql \
+DATA = pg_stat_statements--1.5.sql pg_stat_statements--1.4--1.5.sql pg_stat_statements--1.3--1.4.sql \
 	pg_stat_statements--1.2--1.3.sql pg_stat_statements--1.1--1.2.sql \
 	pg_stat_statements--1.0--1.1.sql pg_stat_statements--unpackaged--1.0.sql
 PGFILEDESC = "pg_stat_statements - execution statistics of SQL statements"
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 6edc3d9..a3cfe6d 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -61,7 +61,9 @@
 #include 
 #include 
 
+#include "utils/timestamp.h"
 #include "access/hash.h"
+#include "commands/explain.h"
 #include "executor/instrument.h"
 #include "funcapi.h"
 #include "mb/pg_wchar.h"
@@ -118,7 +120,8 @@ typedef enum pgssVersion
 	PGSS_V1_0 = 0,
 	PGSS_V1_1,
 	PGSS_V1_2,
-	PGSS_V1_3
+	PGSS_V1_3,
+	PGSS_V1_5
 } pgssVersion;
 
 /*
@@ -159,6 +162,14 @@ typedef struct Counters
 	double		usage;			/* usage factor */
 } Counters;
 
+typedef struct pgssPlan
+{
+	Size offset;
+	int len;
+	double		time;	/* execution time in msec when the latest plan was updated */
+	TimestampTz timestamp;
+} pgssPlan;
+
 /*
  * Statistics per statement
  *
@@ -172,6 +183,8 @@ typedef struct pgssEntry
 	Counters	counters;		

Re: [HACKERS] Checksums by default?

2017-01-25 Thread Robert Haas
On Sat, Jan 21, 2017 at 11:57 AM, Andres Freund  wrote:
> On 2017-01-21 11:39:18 +0100, Magnus Hagander wrote:
>> Is it time to enable checksums by default, and give initdb a switch to turn
>> it off instead?
>
> -1 - the WAL overhead is quite massive, and in contrast to the other
> GUCs recently changed you can't just switch this around.

I agree.  I bet that if somebody does the test suggested by Amit
downthread, it'll turn out that the performance is just awful.  And
those cases are common.  I think the people saying "well, the overhead
is worth it" must be people whose systems (or whose customer's
systems) aren't processing continuous heavy OLTP workloads.  If you've
got a data warehousing workload, checksums are probably pretty cheap.
If you've got a low-velocity OLTP workload, or an OLTP workload that
fits in shared_buffers, it's probably bearable.  But if you've got 8GB
of shared_buffers and 100GB of data, and you've got 100 or so backends
continuously doing random updates, I think checksums are going nail
you to the wall.  And EnterpriseDB, at least, has lots of customers
that do exactly that sort of thing.

Having said that, I've certain run into situations where I speculated
that a customer had a hardware problem and they speculated that we had
given them buggy database software.  In a pretty significant number of
cases, the customer turned out to be right; for example, some of those
people were suffering from multixact bugs that resulted in
unexplainable corruption.  Now, would it have been useful to know that
checksums were passing (suggesting a PostgreSQL problem) rather than
failing (suggesting an OS problem)?  Yes, that would have been great.
I could have given those customers better support.  On the other hand,
I think I've run into MORE cases where the customer was desperately
seeking options to improve write performance, which remains a pretty
significant problem for PostgreSQL.  I can't see taking a significant
hit in that area for my convenience in understanding what's going on
in data corruption situations.  The write performance penalty is paid
by everybody all the time, whereas data corruption is a rare event
even among support cases.

And even when you do have corruption, whether or not the data
corruption is accompanied by a checksum failure is only ONE extra bit
of useful data.  A failure doesn't guarantee a hardware problem; it
could be caused by a faulty backup procedure, like forgetting to run
pg_start_backup().  The lack of a failure doesn't guarantee a software
problem; it could be caused by a faulty backup procedure, like using
an OS-level snapshot facility that isn't exactly simultaneous across
tablespaces.  What you really need to do when a customer has
corruption is figure out why they have corruption, and the leading
cause by far is neither the hardware nor the software but some kind of
user error.  Checksums are at best a very modest assist in figuring
out whether an error has been made and if so of what type.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning vs. information_schema

2017-01-25 Thread Robert Haas
On Wed, Jan 25, 2017 at 1:04 PM, Peter Eisentraut
 wrote:
> On 1/18/17 2:32 PM, Robert Haas wrote:
>> Unless we can find something official, I suppose we should just
>> display BASE TABLE in that case as we do in other cases.  I wonder if
>> the schema needs some broader revision; for example, are there
>> information_schema elements intended to show information about
>> partitions?
>
> Is it intentional that we show the partitions by default in \d,
> pg_tables, information_schema.tables?  Or should we treat those as
> somewhat-hidden details?

I'm not really sure what the right thing to do is there.  I was hoping
you had an opinion.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY as a set returning function

2017-01-25 Thread Corey Huinker
On Wed, Jan 25, 2017 at 11:57 AM, Alvaro Herrera 
wrote:

> David Fetter wrote:
>
> > @@ -562,7 +563,6 @@ CopyGetData(CopyState cstate, void *databuf, int
> minread, int maxread)
> >errmsg("could not read
> from COPY file: %m")));
> >   break;
> >   case COPY_OLD_FE:
> > -
> >   /*
> >* We cannot read more than minread bytes (which
> in practice is 1)
> >* because old protocol doesn't have any clear way
> of separating
>
> This change is pointless as it'd be undone by pgindent.
>
> > + /*
> > +  * Function signature is:
> > +  * copy_srf( filename text default null,
> > +  *   is_program boolean default false,
> > +  *   format text default 'text',
> > +  *   delimiter text default E'\t' in text mode, ',' in csv
> mode,
> > +  *   null_string text default '\N',
> > +  *   header boolean default false,
> > +  *   quote text default '"' in csv mode only,
> > +  *   escape text default null, -- defaults to whatever
> quote is
> > +  *   encoding text default null
> > +  */
>
> This comment would be mangled by pgindent -- please add an /*--- marker
> to prevent it.
>
> > + /* param 7: escape text default null, -- defaults to whatever
> quote is */
> > + if (PG_ARGISNULL(7))
> > + {
> > + copy_state.escape = copy_state.quote;
> > + }
> > + else
> > + {
> > + if (copy_state.csv_mode)
> > + {
> > + copy_state.escape = TextDatumGetCString(PG_GETARG_
> TEXT_P(7));
> > + if (strlen(copy_state.escape) != 1)
> > + {
> > + ereport(ERROR,
> > +
>  (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > +  errmsg("COPY escape must
> be a single one-byte character")));
> > + }
> > + }
> > + else
> > + {
> > + ereport(ERROR,
> > + (errcode(ERRCODE_FEATURE_NOT_
> SUPPORTED),
> > +  errmsg("COPY escape available
> only in CSV mode")));
> > + }
> > + }
>
> I don't understand why do we have all these checks.  Can't we just pass
> the values to COPY and let it apply the checks?  That way, when COPY is
> updated to support multibyte escape chars (for example) we don't need to
> touch this code.  Together with removing the unneeded braces that would
> make these stanzas about six lines long instead of fifteen.
>
>
> > + tuple = heap_form_tuple(tupdesc,values,nulls);
> > + //tuple = BuildTupleFromCStrings(attinmeta,
> field_strings);
> > + tuplestore_puttuple(tupstore, tuple);
>
> No need to form a tuple; use tuplestore_putvalues here.
>
>
> > + }
> > +
> > + /* close "file" */
> > + if (copy_state.is_program)
> > + {
> > + int pclose_rc;
> > +
> > + pclose_rc = ClosePipeStream(copy_state.copy_file);
> > + if (pclose_rc == -1)
> > + ereport(ERROR,
> > + (errcode_for_file_access(),
> > +  errmsg("could not close pipe to
> external command: %m")));
> > + else if (pclose_rc != 0)
> > + ereport(ERROR,
> > + (errcode(ERRCODE_EXTERNAL_
> ROUTINE_EXCEPTION),
> > +  errmsg("program \"%s\" failed",
> > +
>  copy_state.filename),
> > +  errdetail_internal("%s",
> wait_result_to_str(pclose_rc;
> > + }
> > + else
> > + {
> > + if (copy_state.filename != NULL &&
> FreeFile(copy_state.copy_file))
> > + ereport(ERROR,
> > + (errcode_for_file_access(),
> > +  errmsg("could not close file
> \"%s\": %m",
> > +
>  copy_state.filename)));
> > + }
>
> I wonder if these should be an auxiliary function in copy.c to do this.
> Surely copy.c itself does pretty much the same thing ...
>
>
> > +DATA(insert OID = 3353 (  copy_srf PGNSP PGUID 12 1 0 0 0 f f f f f t v
> u 9 0 2249 "25 16 25 25 25 16 25 25 25" _null_ _null_ _null_ _null_ _null_
> copy_srf _null_ _null_ _null_ ));
> > +DESCR("set-returning COPY proof of concept");
>
> Why is this marked "proof of concept"?  If this is a PoC only, why are
> you submitting it as a patch?
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Feel free to mark it returned as feedback. The concept didn't generate as
much enthusiasm as I had hoped, so I think the 

Re: [HACKERS] [FEATURE PATCH] pg_stat_statements with plans

2017-01-25 Thread Simon Riggs
On 25 January 2017 at 17:34, Julian Markwort
 wrote:

> Analogous to this, a bad_plan is saved, when the time has been exceeded by a
> factor greater than 1.1 .

...and the plan differs?

Probably best to use some stat math to calculate deviation, rather than fixed %.

Sounds good.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning vs. information_schema

2017-01-25 Thread Peter Eisentraut
On 1/18/17 2:32 PM, Robert Haas wrote:
> Unless we can find something official, I suppose we should just
> display BASE TABLE in that case as we do in other cases.  I wonder if
> the schema needs some broader revision; for example, are there
> information_schema elements intended to show information about
> partitions?

Is it intentional that we show the partitions by default in \d,
pg_tables, information_schema.tables?  Or should we treat those as
somewhat-hidden details?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY as a set returning function

2017-01-25 Thread Corey Huinker
On Wed, Jan 25, 2017 at 1:10 PM, David Fetter  wrote:

> On Wed, Jan 25, 2017 at 12:23:23PM -0500, Corey Huinker wrote:
> >
> > Feel free to mark it returned as feedback. The concept didn't
> > generate as much enthusiasm as I had hoped, so I think the right
> > thing to do now is scale it back to a patch that makes
> > CopyFromRawFields() externally visible so that extensions can use
> > them.
>
> You're getting enthusiasm in the form of these reviews and suggestions
> for improvement.  That it doesn't always happen immediately is a
> byproduct of the scarcity of developer time and the sheer volume of
> things to which people need to pay attention.


True about that. I was referring to "ooh! I need that!"-type interest. I'll
proceed, keeping in mind that there's a fallback position of just making
some of the guts of COPY available to be called by extensions like was done
for file_fdw.


Re: [HACKERS] patch: function xmltable

2017-01-25 Thread Pavel Stehule
2017-01-25 22:40 GMT+01:00 Andres Freund :

> Hi,
>
> > > > I'll try to explain my motivation. Please, check it and correct me
> if I
> > > am
> > > > wrong. I don't keep on my implementation - just try to implement
> XMLTABLE
> > > > be consistent with another behave and be used all time without any
> > > > surprise.
> > > >
> > > > 1. Any function that produces a content can be used in target list.
> We
> > > > support SRF in target list and in FROM part. Why XMLTABLE should be a
> > > > exception?
> > >
> > > targetlist SRFs were a big mistake. They cause a fair number of
> problems
> > > code-wise. They permeated for a long while into bits of both planner
> and
> > > executor, where they really shouldn't belong. Even after the recent
> > > changes there's a fair amount of uglyness associated with them.  We
> > > can't remove tSRFs for backward compatibility reasons, but that's not
> > > true for XMLTABLE
> > >
> > >
> > >
> > ok
> >
> > I afraid when I cannot to reuse a SRF infrastructure, I have to
> reimplement
> > it partially :( - mainly for usage in "ROWS FROM ()"
>

The TableExpr implementation is based on SRF now. You and Alvaro propose
independent implementation like generic executor node. I am sceptic so
FunctionScan supports reading from generic executor node.

Regards

Pavel


> Huh?
>
> Greetings,
>
> Andres Freund
>


Re: [HACKERS] patch: function xmltable

2017-01-25 Thread Andres Freund
On 2017-01-25 22:51:37 +0100, Pavel Stehule wrote:
> 2017-01-25 22:40 GMT+01:00 Andres Freund :
> > > I afraid when I cannot to reuse a SRF infrastructure, I have to
> > reimplement
> > > it partially :( - mainly for usage in "ROWS FROM ()"
> >
> 
> The TableExpr implementation is based on SRF now. You and Alvaro propose
> independent implementation like generic executor node. I am sceptic so
> FunctionScan supports reading from generic executor node.

Why would it need to?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check

2017-01-25 Thread Andres Freund
Hi,

On 2017-01-25 16:52:38 -0500, Stephen Frost wrote:
> * Robert Haas (robertmh...@gmail.com) wrote:
> > Preventing people from calling functions by denying the ability to
> > meaningfully GRANT EXECUTE on those functions doesn't actually stop
> > them from delegating those functions to non-superusers.  It either (a)
> > causes them to give superuser privileges to accounts that otherwise
> > would have had lesser privileges or (b) forces them to use wrapper
> > functions to grant access rather than doing it directly or (c) some
> > other dirty hack.  If they pick (a), security is worse; if they pick
> > (b) or (c), you haven't prevented them from doing what they wanted to
> > do anyway.  You've just made it annoying and inconvenient.
>
> The notion that security is 'worse' under (a) is flawed- it's no
> different.

Huh? Obviously that's nonesense, given the pg_ls_dir example.


> With regard to 'b', if their wrapper function is
> sufficiently careful to ensure that the caller isn't able to do anything
> which would increase the caller's level to that of superuser, then
> security is improved.

Given how complex "sufficiently careful" is for security definer UDFs,
in comparison to estimating the security of granting to a function like
pg_ls_dir (or pretty much any other that doesn't call out to SQL level
stuff like operators, output functions, etc), I don't understand this.


> If the wrapper simply turns around can calls the underlying function,
> then it's no different from '(a)'.

Except for stuff like search path.


> I am suggesting that we shouldn't make it look like there are
> distinctions when there is actually no difference.  That is a good thing
> for our users because it keeps them informed about what they're actually
> doing when they grant access.

This position doesn't make a lick of sense to me.  There's simply no
benefit at all in requiring to create wrapper functions, over allowing
to grant to non-superuser. Both is possible, secdef is a lot harder to
get right. And you already heavily started down the path of removing
superuser() type checks - you're just arguing to make it more or less
randomly less consistent.


> I've commented on here and spoken numerous times about exactly that goal
> of reducing the checks in check_postgres.pl which require superuser.
> You're not actually doing that and nothing you've outlined in here so
> far makes me believe you see how having pg_write_file() access is
> equivilant to giving someone superuser, and that concerns me.

That's the users responsibility, and Robert didnt' really suggest
granting pg_write_file() permissions, so this seems to be a straw man.



Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Checksums by default?

2017-01-25 Thread Peter Geoghegan
On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost  wrote:
> As it is, there are backup solutions which *do* check the checksum when
> backing up PG.  This is no longer, thankfully, some hypothetical thing,
> but something which really exists and will hopefully keep users from
> losing data.

Wouldn't that have issues with torn pages?


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should buffer of initialization fork have a BM_PERMANENT flag

2017-01-25 Thread Michael Paquier
(Adding Robert in CC.)

On Thu, Jan 26, 2017 at 4:34 AM, Wang Hao  wrote:
> An unlogged table has an initialization fork. The initialization fork does
> not have an BM_PERMANENT flag when get a buffer.
> In checkpoint (not shutdown or end of recovery), it will not write to disk.
> after a crash recovery, the page of initialization fork will not correctly,
> then make the main fork not correctly too.

For init forks the flush need absolutely to happen, so that's really
not good. We ought to fix BufferAlloc() appropriately here.

> Here is an example for GIN index.
>
> create unlogged table gin_test_tbl(i int4[]);
> create index gin_test_idx on gin_test_tbl using gin (i);
> checkpoint;
>
> kill all the postgres process, and restart again.
>
> vacuum gin_test_tbl;  -- crash.
>
> It seems have same problem in BRIN, GIN, GiST and HASH index which using
> buffer for meta page initialize in ambuildempty function.

Yeah, other index AMs deal directly with the sync of the page, that's
why there is no issue for them.

So the patch attached fixes the problem by changing BufferAlloc() in
such a way that initialization forks are permanently written to disk,
which is what you are suggesting. As a simple fix for back-branches
that's enough, though on HEAD I think that we should really rework the
empty() routines so as the write goes through shared buffers first,
that seems more solid than relying on the sgmr routines to do this
work. Robert, what do you think?
-- 
Michael


unlogged-flush-fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check

2017-01-25 Thread Andres Freund
On 2017-01-25 18:04:09 -0500, Stephen Frost wrote:
> Andres,
> 
> * Andres Freund (and...@anarazel.de) wrote:
> > On 2017-01-25 16:52:38 -0500, Stephen Frost wrote:
> > > * Robert Haas (robertmh...@gmail.com) wrote:
> > > > Preventing people from calling functions by denying the ability to
> > > > meaningfully GRANT EXECUTE on those functions doesn't actually stop
> > > > them from delegating those functions to non-superusers.  It either (a)
> > > > causes them to give superuser privileges to accounts that otherwise
> > > > would have had lesser privileges or (b) forces them to use wrapper
> > > > functions to grant access rather than doing it directly or (c) some
> > > > other dirty hack.  If they pick (a), security is worse; if they pick
> > > > (b) or (c), you haven't prevented them from doing what they wanted to
> > > > do anyway.  You've just made it annoying and inconvenient.
> > >
> > > The notion that security is 'worse' under (a) is flawed- it's no
> > > different.
> > 
> > Huh? Obviously that's nonesense, given the pg_ls_dir example.
> 
> Robert's made it clear that he'd like to have a blanket rule that we
> don't have superuser checks in these code paths if they can be GRANT'd
> at the database level, which goes beyond pg_ls_dir.

That seems right to me.  I don't see much benefit for the superuser()
style checks, with a few exceptions.  Granting by default is obviously
an entirely different question.


> If the question was only about pg_ls_dir, then I still wouldn't be for
> it, because, as the bits you didn't quote discussed, it encourages users
> and 3rd party tool authors to base more things off of pg_ls_dir to
> look into the way PG stores data on disk, and affords more access than
> the monitoring user has any need for, none of which are good, imv.  It
> also discourages people from implementing proper solutions which you can
> 'just use pg_ls_dir()', which I also don't agree with.

In other words, you're trying to force people to do stuff your preferred
way, instead of allowing them to get things done is a reasonable manner.


> If you really want to do an ls, go talk to the OS.  ACLs are possible to
> provide that with more granularity than what would be available through
> pg_ls_dir().  We aren't in the "give a user the ability to do an ls"
> business, frankly.

Wut.


> > > I am suggesting that we shouldn't make it look like there are
> > > distinctions when there is actually no difference.  That is a good thing
> > > for our users because it keeps them informed about what they're actually
> > > doing when they grant access.
> > 
> > This position doesn't make a lick of sense to me.  There's simply no
> > benefit at all in requiring to create wrapper functions, over allowing
> > to grant to non-superuser. Both is possible, secdef is a lot harder to
> > get right. And you already heavily started down the path of removing
> > superuser() type checks - you're just arguing to make it more or less
> > randomly less consistent.
> 
> I find this bizarre considering I went through a detailed effort to go
> look at every superuser check in the system and discussed, on this list,
> the reasoning behind each and every one of them.  I do generally
> consider arbitrary access to syscalls via the database to be a privilege
> which really only the superuser should have.

Just because you argued doesn't mean I agree.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument

2017-01-25 Thread Tom Lane
Peter Geoghegan  writes:
> It means "another call to tuplesort_gettupleslot", but I believe that
> it would be safer (more future-proof) to actually specify "the slot
> contents may be invalidated by any subsequent manipulation of the
> tuplesort's state" instead.

WFM.

>> There are several other uses of "call here", both in this patch and
>> pre-existing in tuplesort.c, that I find equally vague and unsatisfactory.
>> Let's try to improve that.

> Should I write a patch along those lines?

Please.  You might want to hit the existing ones with a separate patch,
but it doesn't much matter; I'd be just as happy with a patch that did
both things.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument

2017-01-25 Thread Peter Geoghegan
On Wed, Jan 25, 2017 at 3:11 PM, Tom Lane  wrote:
> Please.  You might want to hit the existing ones with a separate patch,
> but it doesn't much matter; I'd be just as happy with a patch that did
> both things.

Got it.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Radix tree for character conversion

2017-01-25 Thread Michael Paquier
On Tue, Jan 10, 2017 at 8:22 PM, Kyotaro HORIGUCHI
 wrote:
> [...patch...]

Nobody has showed up yet to review this patch, so I am giving it a shot.

The patch file sizes are scary at first sight, but after having a look:
 36 files changed, 1411 insertions(+), 54398 deletions(-)
Yes that's a surprise, something like git diff --irreversible-delete
would have helped as most of the diffs are just caused by 3 files
being deleted in patch 0004, making 50k lines going to the abyss of
deletion.

> Hello, I found a bug in my portion while rebasing.

Right, that's 0001. Nice catch.

> The attached files are the following. This patchset is not
> complete missing changes of map files. The change is tremendously
> large but generatable.
>
> 0001-Add-missing-semicolon.patch
>
>   UCS_to_EUC_JP.pl has a line missing teminating semicolon. This
>   doesn't harm but surely a syntax error. This patch fixes it.
>   This might should be a separate patch.

This requires a back-patch. This makes me wonder how long this script
has actually not run...

> 0002-Correct-reference-resolution-syntax.patch
>
>   convutils.pm has lines with different syntax of reference
>   resolution. This unifies the syntax.

Yes that looks right to me. I am the best perl guru on this list but
looking around $$var{foo} is bad, ${$var}{foo} is better, and
$var->{foo} is even better. This also generates no diffs when running
make in src/backend/utils/mb/Unicode/. So no objections to that.

> 0003-Apply-pgperltidy-on-src-backend-utils-mb-Unicode.patch
>
>   Before adding radix tree stuff, applied pgperltidy and inserted
>   format-skipping pragma for the parts where perltidy seems to do
>   too much.

Which version of perltidy did you use? Looking at the archives, the
perl code is cleaned up with a specific version, v20090616. See
https://www.postgresql.org/message-id/20151204054322.ga2070...@tornado.leadboat.com
for example on the matter. As perltidy changes over time, this may be
a sensitive change if done this way.

> 0004-Use-radix-tree-for-character-conversion.patch
>
>   Radix tree body.

Well, here a lot of diffs could have been saved.

> The unattached fifth patch is generated by the following steps.
>
> [$(TOP)]$ ./configure
> [Unicode]$ make
> [Unicode]$ make distclean
> [Unicode]$ git add .
> [Unicode]$ commit
> === COMMITE MESSSAGE
> Replace map files with radix tree files.
>
> These encodings no longer uses the former map files and uses new radix
> tree files.
> ===

OK, I can see that working, with 200k of maps generated.. So going
through the important bits of this jungle..

+/*
+ * radix tree conversion function - this should be identical to the function in
+ * ../conv.c with the same name
+ */
+static inline uint32
+pg_mb_radix_conv(const pg_mb_radix_tree *rt,
+int l,
+unsigned char b1,
+unsigned char b2,
+unsigned char b3,
+unsigned char b4)
This is not nice. Having a duplication like that is a recipe to forget
about it as this patch introduces a dependency with conv.c and the
radix tree generation.

Having a .gitignore in Unicode/ would be nice, particularly to avoid
committing map_checker.

A README documenting things may be welcome, or at least comments at
the top of map_checker.c. Why is map_checker essential? What does it
do? There is no way to understand that easily, except that it includes
a "radix tree conversion function", and that it performs sanity checks
on the radix trees to be sure that they are on a good shape. But as
this something that one would guess only after looking at your patch
and the code (at least I will sleep less stupid tonight after reading
this stuff).

--- a/src/backend/utils/mb/Unicode/UCS_to_SJIS.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_SJIS.pl
 # Drop these SJIS codes from the source for UTF8=>SJIS conversion
 #<<< do not let perltidy touch this
-my @reject_sjis =(
+my @reject_sjis = (
0xed40..0xeefc, 0x8754..0x875d, 0x878a, 0x8782,
-   0x8784, 0xfa5b, 0xfa54, 0x8790..0x8792, 0x8795..0x8797,
+   0x8784, 0xfa5b, 0xfa54, 0x8790..0x8792, 0x8795..0x8797,
0x879a..0x879c
-);
+   );
This is not generated, it would be nice to drop the noise from the patch.

Here is another one:
-   $i->{code} = $jis | (
-   $jis < 0x100
-   ? 0x8e00
-   : ($sjis >= 0xeffd ? 0x8f8080 : 0x8080));
-
+#<<< do not let perltidy touch this
+   $i->{code} = $jis | ($jis < 0x100 ? 0x8e00:
+($sjis >= 0xeffd ? 0x8f8080 : 0x8080));
+#>>>

if (l == 2)
{
-   iutf = *utf++ << 8;
-   iutf |= *utf++;
+   b3 = *utf++;
+   b4 = *utf++;
}
Ah, OK. This conversion is important so as it performs a minimum of
bitwise operations. Yes let's keep that. That's pretty cool to get a
faster operation.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your 

Re: [HACKERS] Speedup twophase transactions

2017-01-25 Thread Michael Paquier
On Thu, Jan 26, 2017 at 4:09 PM, Nikhil Sontakke
 wrote:
>>I look at this patch from you and that's present for me:
>>https://www.postgresql.org/message-id/CAMGcDxf8Bn9ZPBBJZba9wiyQq->Qk5uqq=vjomnrnw5s+fks...@mail.gmail.com
>
>> --- a/src/backend/access/transam/xlog.c
>> +++ b/src/backend/access/transam/xlog.c
>> @@ -9573,6 +9573,7 @@ xlog_redo(XLogReaderState *record)
>>  (errmsg("unexpected timeline ID %u (should be %u)
>> in checkpoint record",
>>  checkPoint.ThisTimeLineID, ThisTimeLineID)));
>>
>> +KnownPreparedRecreateFiles(checkPoint.redo);
>>  RecoveryRestartPoint();
>>  }
>
> Oh, sorry. I was asking about CheckpointTwoPhase(). I don't see a
> function by this name. And now I see, the name is CheckPointTwoPhase()
> :-)

My mistake then :D

>> And actually, when a XLOG_CHECKPOINT_SHUTDOWN record is taken, 2PC
>> files are not flushed to disk with this patch. This is a problem as a
>> new restart point is created... Having the flush in CheckpointTwoPhase
>> really makes the most sense.
>
> Umm, AFAICS, CheckPointTwoPhase() does not get called in the "standby
> promote" code path.

CreateRestartPoint() calls it via CheckPointGuts() while in recovery.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical decoding of two-phase transactions

2017-01-25 Thread Stas Kelvich
>> 
>> Yes, that’s also possible but seems to be less flexible restricting us to 
>> some
>> specific GID format.
>> 
>> Anyway, I can measure WAL space overhead introduced by the GID’s inside 
>> commit records
>> to know exactly what will be the cost of such approach.
> 
> Stas,
> 
> Have you had a chance to look at this further?

Generally i’m okay with Simon’s approach and will send send updated patch. 
Anyway want to
perform some test to estimate how much disk space is actually wasted by extra 
WAL records.

> I think the approach of storing just the xid and fetching the GID
> during logical decoding of the PREPARE TRANSACTION is probably the
> best way forward, per my prior mail.

I don’t think that’s possible in this way. If we will not put GID in commit 
record, than by the time
when logical decoding will happened transaction will be already 
committed/aborted and there will
be no easy way to get that GID. I thought about several possibilities:

* Tracking xid/gid map in memory also doesn’t help much — if server reboots 
between prepare 
and commit we’ll lose that mapping. 
* We can provide some hooks on prepared tx recovery during startup, but that 
approach also fails
if reboot happened between commit and decoding of that commit.
* Logical messages are WAL-logged, but they don’t have any redo function so 
don’t helps much.

So to support user-accessible 2PC over replication based on 2PC decoding we 
should invent
something more nasty like writing them into a table.

> That should eliminate Simon's
> objection re the cost of tracking GIDs and still let us have access to
> them when we want them, which is the best of both worlds really.

Having 2PC decoding in core is a good thing anyway even without GID tracking =)

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speedup twophase transactions

2017-01-25 Thread Nikhil Sontakke
>I look at this patch from you and that's present for me:
>https://www.postgresql.org/message-id/CAMGcDxf8Bn9ZPBBJZba9wiyQq->Qk5uqq=vjomnrnw5s+fks...@mail.gmail.com

> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -9573,6 +9573,7 @@ xlog_redo(XLogReaderState *record)
>  (errmsg("unexpected timeline ID %u (should be %u)
> in checkpoint record",
>  checkPoint.ThisTimeLineID, ThisTimeLineID)));
>
> +KnownPreparedRecreateFiles(checkPoint.redo);
>  RecoveryRestartPoint();
>  }

Oh, sorry. I was asking about CheckpointTwoPhase(). I don't see a
function by this name. And now I see, the name is CheckPointTwoPhase()
:-)

> And actually, when a XLOG_CHECKPOINT_SHUTDOWN record is taken, 2PC
> files are not flushed to disk with this patch. This is a problem as a
> new restart point is created... Having the flush in CheckpointTwoPhase
> really makes the most sense.

Umm, AFAICS, CheckPointTwoPhase() does not get called in the "standby
promote" code path.

Regards,
Nikhils
-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Radix tree for character conversion

2017-01-25 Thread Michael Paquier
On Wed, Jan 25, 2017 at 7:18 PM, Ishii Ayumi  wrote:
> I patched 4 patchset and run "make", but I got failed.
> Is this a bug or my mistake ?
> I'm sorry if I'm wrong.
>
> [$(TOP)]$ patch -p1 < ../0001-Add-missing-semicolon.patch
> [$(TOP)]$ patch -p1 < ../0002-Correct-reference-resolution-syntax.patch
> [$(TOP)]$ patch -p1 <
> ../0003-Apply-pgperltidy-on-src-backend-utils-mb-Unicode.patch
> [$(TOP)]$ patch -p1 < ../0004-Use-radix-tree-for-character-conversion.patch
> [$(TOP)]$ ./configure
> [Unicode]$ make
> '/usr/bin/perl' UCS_to_most.pl
> Type of arg 1 to keys must be hash (not hash element) at convutils.pm
> line 443, near "})
> "
> Type of arg 1 to values must be hash (not hash element) at
> convutils.pm line 596, near "})
> "
> Type of arg 1 to each must be hash (not private variable) at
> convutils.pm line 755, near "$map)
> "
> Compilation failed in require at UCS_to_most.pl line 19.
> make: *** [iso8859_2_to_utf8.map] Error 255

Hm, I am not sure what you are missing. I was able to get things to build.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check

2017-01-25 Thread Stephen Frost
Andres,

* Andres Freund (and...@anarazel.de) wrote:
> On 2017-01-25 18:04:09 -0500, Stephen Frost wrote:
> > Robert's made it clear that he'd like to have a blanket rule that we
> > don't have superuser checks in these code paths if they can be GRANT'd
> > at the database level, which goes beyond pg_ls_dir.
> 
> That seems right to me.  I don't see much benefit for the superuser()
> style checks, with a few exceptions.  Granting by default is obviously
> an entirely different question.

Well, for my part at least, I disagree.  Superuser is a very different
animal, imv, than privileges which can be GRANT'd, and I feel that's an
altogether good thing.

> In other words, you're trying to force people to do stuff your preferred
> way, instead of allowing them to get things done is a reasonable manner.

Apparently we disagree about what is a 'reasonable manner'.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] pg_hba_file_settings view patch

2017-01-25 Thread Michael Paquier
On Thu, Jan 26, 2017 at 2:32 AM, Tom Lane  wrote:
> The way I'd be inclined to make the individual reporting changes is like
>
>  if (!EnableSSL)
> +{
> -   ereport(LOG,
> +   ereport(elevel,
>  (errcode(ERRCODE_CONFIG_FILE_ERROR),
> errmsg("hostssl record cannot match because SSL is disabled"),
>   errhint("Set ssl = on in postgresql.conf."),
>   errcontext("line %d of configuration file \"%s\"",
>  line_num, HbaFileName)));
> +*err_msg = pstrdup("hostssl record cannot match because SSL 
> is disabled");
> +}
>
> which is considerably less invasive and hence easier to review, and
> supports reporting different text in the view than appears in the log,
> should we need that.  It seems likely also that we could drop the pstrdup
> in the case of constant strings (we'd still need psprintf if we want to
> insert values into the view messages), which would make this way cheaper
> than what's in the patch now.

I don't really understand the argument about readability of the patch
as what Haribabu has proposed is simply to avoid a duplicate of the
strings and the diffs of the patch are really clear. For the sake of
not translating the strings sent back to the system view though I can
buy it.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Checksums by default?

2017-01-25 Thread Greg Stark
On 26 January 2017 at 01:58, Thomas Munro  wrote:
>
> I don't know how comparable it is to our checksum technology, but
> MySQL seems to have some kind of checksums on table data, and you can
> find public emails, blogs etc lamenting corrupted databases by
> searching Google for the string "InnoDB: uncompressed page, stored
> checksum in field1" (that's the start of a longer error message that
> includes actual and expected checksums).

I'm not sure what exactly that teaches us however. I see these were
often associated with software bugs (Apparently MySQL long assumed
that a checksum of 0 never happened for example).

In every non software case I stumbled across seemed to be following a
power failure. Apparently MySQL uses a "doublewrite buffer" to protect
against torn pages but when I search for that I get tons of people
inquiring how to turn it off... So even without software bugs in the
checksum code I don't know that the frequency of the error necessarily
teaches us anything about the frequency of hardware corruption either.

And more to the point it seems what people are asking for in all those
lamentations is how they can convince MySQL to continue and ignore the
corruption. A typical response was "We slightly modified innochecksum
and added option -f that means if the checksum of a page is wrong,
rewrite it in the InnoDB page header." Which begs the question...

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Checksums by default?

2017-01-25 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
> On Wed, Jan 25, 2017 at 7:37 PM, Andres Freund  wrote:
> > On 2017-01-25 19:30:08 -0500, Stephen Frost wrote:
> >> * Peter Geoghegan (p...@heroku.com) wrote:
> >> > On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost  
> >> > wrote:
> >> > > As it is, there are backup solutions which *do* check the checksum when
> >> > > backing up PG.  This is no longer, thankfully, some hypothetical thing,
> >> > > but something which really exists and will hopefully keep users from
> >> > > losing data.
> >> >
> >> > Wouldn't that have issues with torn pages?
> >>
> >> No, why would it?  The page has either been written out by PG to the OS,
> >> in which case the backup s/w will see the new page, or it hasn't been.
> >
> > Uh. Writes aren't atomic on that granularity.  That means you very well
> > *can* see a torn page (in linux you can e.g. on 4KB os page boundaries
> > of a 8KB postgres page). Just read a page while it's being written out.
> 
> Yeah.  This is also why backups force full page writes on even if
> they're turned off in general.

I've got a question into David about this, I know we chatted about the
risk at one point, I just don't recall what we ended up doing (I can
imagine a few different possible things- re-read the page, which isn't a
guarantee but reduces the chances a fair bit, or check the LSN, or
perhaps the plan was to just check if it's in the WAL, as I mentioned)
or if we ended up concluding it wasn't a risk for some, perhaps
incorrect, reason and need to revisit it.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Speedup twophase transactions

2017-01-25 Thread Nikhil Sontakke
>> The question remains whether saving off a few fsyncs/reads for these
>> long-lived prepared transactions is worth the additional code churn.
>> Even if we add code to go through the KnownPreparedList, we still will
>> have to go through the other on-disk 2PC transactions anyways. So,
>> maybe not.
>
> We should really try to do things right now, or we'll never come back
> to it. 9.3 (if my memory does not fail me?) has reduced the time to do
> promotion by removing the need of the end-of-recovery checkpoint,
> while I agree that there should not be that many 2PC transactions at
> this point, if there are for a reason or another, the time it takes to
> complete promotion would be impacted. So let's refactor
> PrescanPreparedTransactions() so as it is able to handle 2PC data from
> a buffer extracted by XlogReadTwoPhaseData(), and we are good to go.
>

Not quite. If we modify PrescanPreparedTransactions(), we also need to
make RecoverPreparedTransactions() and
StandbyRecoverPreparedTransactions() handle 2PC data via
XlogReadTwoPhaseData().


> +   /*
> +* Move prepared transactions, if any, from KnownPreparedList to 
> files.
> +* It is possible to skip this step and teach subsequent code about
> +* KnownPreparedList, but PrescanPreparedTransactions() happens once
> +* during end of recovery or on promote, so probably it isn't worth
> +* the additional code.
> +*/


This comment is misplaced. Does not make sense before this specific call.

> +   KnownPreparedRecreateFiles(checkPoint.redo);
> RecoveryRestartPoint();
> Looking again at this code, I think that this is incorrect. The
> checkpointer should be in charge of doing this work and not the
> startup process, so this should go into CheckpointTwoPhase() instead.

I don't see a function by the above name in the code?

Regards,
Nikhils
-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sequence data type

2017-01-25 Thread Michael Paquier
On Wed, Jan 25, 2017 at 11:53 PM, Peter Eisentraut
 wrote:
> Here is an updated patch that allows changing the sequence type.  This
> was clearly a concern for reviewers, and the presented use cases seemed
> convincing.

I have been torturing this patch and it looks rather solid to me. Here
are a couple of comments:

@@ -15984,6 +15992,9 @@ dumpSequence(Archive *fout, TableInfo *tbinfo)
  "CREATE SEQUENCE %s\n",
  fmtId(tbinfo->dobj.name));

+   if (strcmp(seqtype, "bigint") != 0)
+   appendPQExpBuffer(query, "AS %s\n", seqtype);
Wouldn't it be better to assign that unconditionally? There is no
reason that a dump taken from pg_dump version X will work on X - 1 (as
there is no reason to not make the life of users uselessly difficult
as that's your point), but that seems better to me than rely on the
sequence type hardcoded in all the pre-10 dump queries for sequences.
That would bring also more consistency to the CREATE SEQUENCE queries
of test_pg_dump/t/001_base.pl.

Could you increase the regression test coverage to cover some of the
new code paths? For example such cases are not tested:
=# create sequence toto as smallint;
CREATE SEQUENCE
=# alter sequence toto as smallint maxvalue 1000;
ERROR:  22023: RESTART value (2147483646) cannot be greater than MAXVALUE (1000)
LOCATION:  init_params, sequence.c:1537
=# select setval('toto', 1);
 setval

  1
(1 row)
=# alter sequence toto as smallint;
ERROR:  22023: MAXVALUE (2147483647) is too large for sequence data
type smallint
LOCATION:  init_params, sequence.c:1407

+   if ((seqform->seqtypid == INT2OID && seqform->seqmin < PG_INT16_MIN)
+   || (seqform->seqtypid == INT4OID && seqform->seqmin < PG_INT32_MIN)
+   || (seqform->seqtypid == INT8OID && seqform->seqmin < PG_INT64_MIN))
+   {
+   charbufm[100];
+
+   snprintf(bufm, sizeof(bufm), INT64_FORMAT, seqform->seqmin);
+
+   ereport(ERROR,
+   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg("MINVALUE (%s) is too large for sequence data type %s",
+   bufm, format_type_be(seqform->seqtypid;
+   }
"large" does not apply to values lower than the minimum, no? The int64
path is never going to be reached (same for the max value), it doesn't
hurt to code it I agree.

Testing serial columns, the changes are consistent with the previous releases.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speedup twophase transactions

2017-01-25 Thread Michael Paquier
On Thu, Jan 26, 2017 at 1:38 PM, Nikhil Sontakke
 wrote:
>> We should really try to do things right now, or we'll never come back
>> to it. 9.3 (if my memory does not fail me?) has reduced the time to do
>> promotion by removing the need of the end-of-recovery checkpoint,
>> while I agree that there should not be that many 2PC transactions at
>> this point, if there are for a reason or another, the time it takes to
>> complete promotion would be impacted. So let's refactor
>> PrescanPreparedTransactions() so as it is able to handle 2PC data from
>> a buffer extracted by XlogReadTwoPhaseData(), and we are good to go.
>
> Not quite. If we modify PrescanPreparedTransactions(), we also need to
> make RecoverPreparedTransactions() and
> StandbyRecoverPreparedTransactions() handle 2PC data via
> XlogReadTwoPhaseData().

Ah, right for both, even for RecoverPreparedTransactions() that
happens at the end of recovery. Thanks for noticing. The patch
mentions that as well:
+ ** At the end of recovery we move all known prepared transactions to disk.
+ *  This allows RecoverPreparedTransactions() and
+ *  StandbyRecoverPreparedTransactions() to do their work.
I need some strong coffee..

>> +   KnownPreparedRecreateFiles(checkPoint.redo);
>> RecoveryRestartPoint();
>> Looking again at this code, I think that this is incorrect. The
>> checkpointer should be in charge of doing this work and not the
>> startup process, so this should go into CheckpointTwoPhase() instead.
>
> I don't see a function by the above name in the code?

I look at this patch from you and that's present for me:
https://www.postgresql.org/message-id/CAMGcDxf8Bn9ZPBBJZba9wiyQq-Qk5uqq=vjomnrnw5s+fks...@mail.gmail.com
If I look as well at the last version of Stas it is here:
https://www.postgresql.org/message-id/becc988a-db74-48d5-b5d5-a54551a62...@postgrespro.ru

As this change:
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9573,6 +9573,7 @@ xlog_redo(XLogReaderState *record)
 (errmsg("unexpected timeline ID %u (should be %u)
in checkpoint record",
 checkPoint.ThisTimeLineID, ThisTimeLineID)));

+KnownPreparedRecreateFiles(checkPoint.redo);
 RecoveryRestartPoint();
 }
And actually, when a XLOG_CHECKPOINT_SHUTDOWN record is taken, 2PC
files are not flushed to disk with this patch. This is a problem as a
new restart point is created... Having the flush in CheckpointTwoPhase
really makes the most sense.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Checksums by default?

2017-01-25 Thread Peter Geoghegan
On Wed, Jan 25, 2017 at 1:22 PM, Peter Geoghegan  wrote:
> I understand that my experience with storage devices is unusually
> narrow compared to everyone else here. That's why I remain neutral on
> the high level question of whether or not we ought to enable checksums
> by default. I'll ask other hackers to answer what may seem like a very
> naive question, while bearing what I just said in mind. The question
> is: Have you ever actually seen a checksum failure in production? And,
> if so, how helpful was it?

I'm surprised that nobody has answered my question yet.

I'm not claiming that not actually seeing any corruption in the wild
due to a failing checksum invalidates any argument. I *do* think that
data points like this can be helpful, though.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench more operators & functions

2017-01-25 Thread Stephen Frost
Fabien,

* Fabien COELHO (coe...@cri.ensmp.fr) wrote:
> I think that there is a misunderstanding, most of which being my fault.

No worries, it happens. :)

> I have really tried to do everything that was required from
> committers, including revising the patch to match all previous
> feedback.

Thanks for continuing to try to work through everything.  I know it can
be a difficult process, but it's all towards a (hopefully) improved and
better PG.

> Version 6 sent on Oct 4 did include all fixes required at the time
> (no if, no unusual and operators, TAP tests)... However I forgot to
> remove some documentation about the removed stuff, which made Robert
> think that I had not done it. I apologise for this mistake and the
> subsequent misunderstanding:-(

Ok, that helps clarify things.  As does the rest of your email, for me,
anyway.

> If pgbench is about being seated on a bench and running postgres on
> your laptop to get some heat, my mistake... I thought it was about
> benchmarking, which does imply a few extra capabities.

I believe we do want to improve pgbench and your changes are generally
welcome when it comes to adding useful capabilities.  Your explanation
was also helpful about the specific requirements.

> IMHO the relevant current status of the patch should be "Needs
> review" and possibly "Move to next CF".

For my 2c, at least, while I'm definitely interested in this, it's not
nearly high enough on my plate with everything else going on to get any
attention in the next few weeks, at least.

I do think that, perhaps, this patch may deserve a bit of a break, to
allow people to come back to it with a fresh perspective, so perhaps
moving it to the next commitfest would be a good idea, in a Needs Review
state.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Checksums by default?

2017-01-25 Thread Thomas Munro
On Thu, Jan 26, 2017 at 2:28 PM, Stephen Frost  wrote:
> Sadly, without having them enabled by default, there's not a huge corpus
> of example cases to draw from.
>
> There have been a few examples already posted about corruption failures
> with PG, but one can't say with certainty that they would have been
> caught sooner if checksums had been enabled.

I don't know how comparable it is to our checksum technology, but
MySQL seems to have some kind of checksums on table data, and you can
find public emails, blogs etc lamenting corrupted databases by
searching Google for the string "InnoDB: uncompressed page, stored
checksum in field1" (that's the start of a longer error message that
includes actual and expected checksums).

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speedup twophase transactions

2017-01-25 Thread Michael Paquier
On Wed, Jan 25, 2017 at 11:55 PM, Nikhil Sontakke
 wrote:
>> We are talking about the recovery/promote code path. Specifically this
>> call to KnownPreparedRecreateFiles() in PrescanPreparedTransactions().
>>
>> We write the files to disk and they get immediately read up in the
>> following code. We could not write the files to disk and read
>> KnownPreparedList in the code path that follows as well as elsewhere.
>
> Thinking more on this.
>
> The only optimization that's really remaining is handling of prepared
> transactions that have not been committed or will linger around for
> long. The short lived 2PC transactions have been optimized already via
> this patch.
>
> The question remains whether saving off a few fsyncs/reads for these
> long-lived prepared transactions is worth the additional code churn.
> Even if we add code to go through the KnownPreparedList, we still will
> have to go through the other on-disk 2PC transactions anyways. So,
> maybe not.

We should really try to do things right now, or we'll never come back
to it. 9.3 (if my memory does not fail me?) has reduced the time to do
promotion by removing the need of the end-of-recovery checkpoint,
while I agree that there should not be that many 2PC transactions at
this point, if there are for a reason or another, the time it takes to
complete promotion would be impacted. So let's refactor
PrescanPreparedTransactions() so as it is able to handle 2PC data from
a buffer extracted by XlogReadTwoPhaseData(), and we are good to go.

--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9573,6 +9573,15 @@ xlog_redo(XLogReaderState *record)
(errmsg("unexpected timeline ID %u (should be %u)
in checkpoint record",
checkPoint.ThisTimeLineID, ThisTimeLineID)));

+
+   /*
+* Move prepared transactions, if any, from KnownPreparedList to files.
+* It is possible to skip this step and teach subsequent code about
+* KnownPreparedList, but PrescanPreparedTransactions() happens once
+* during end of recovery or on promote, so probably it isn't worth
+* the additional code.
+*/
+   KnownPreparedRecreateFiles(checkPoint.redo);
RecoveryRestartPoint();
Looking again at this code, I think that this is incorrect. The
checkpointer should be in charge of doing this work and not the
startup process, so this should go into CheckpointTwoPhase() instead.
At the end, we should be able to just live without
KnownPreparedRecreateFiles() and just rip it off from the patch.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Checksums by default?

2017-01-25 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
> On Wed, Jan 25, 2017 at 6:30 PM, Stephen Frost  wrote:
> > I hope to discuss it further after we have the ability to turn it off
> > easily.
> 
> I think we should have the ability to flip it in BOTH directions easily.

Presumably you imply this to mean "before we enable it by default."  I'm
not sure that I can agree with that, but we haven't got it in either
direction yet, so it's not terribly interesting to discuss that
particular "what if."

> It sounds to me like you are misleading users about the positives and
> negatives of checksums, which then causes them to be shocked that they
> are not the default.

I don't try to claim that they are without downsides or performance
impacts, if that's the implication here.

> > [ more unsolicited bragging an unspecified backup tool, presumably still 
> > pgbackrest ]

It was explicitly to counter the claim that there aren't things out
there which are working to actively check the checksums.

> > I'd rather walk into an engagement where the user is saying "yeah, we
> > enabled checksums and it caught this corruption issue" than having to
> > break the bad news, which I've had to do over and over, that their
> > existing system hasn't got checksums enabled.  This isn't hypothetical,
> > it's what I run into regularly with entirely reasonable and skilled
> > engineers who have been deploying PG.
> 
> Maybe you should just stop telling them and use the time thus freed up
> to work on improving the checksum feature.

I'm working to improve the usefulness of our checksum feature in a way
which will produce practical and much more immediate results than
anything I could do today in PG.  That said, I do plan to also support
working on checksums as I'm able to.  At the moment, that's supporting
Magnus' thread about enabling them by default.  I'd be a bit surprised
if he was trying to force a change on PG because he thinks it's going to
improve things for pgbackrest, but if so, I'm not going to complain when
it seems like an entirely sensible and good change which will benefit
PG's users too.

Even better would be if we had an independent tool to check checksums
endorsed by the PG community, but that won't happen for a release cycle.
I'd also be extremely happy if the other backup tools out there grew
the ability to check checksums in PG pages; frankly, I hope that adding
it to pgbackrest will push them to do so.

> I'm skeptical of this whole discussion because you seem to be filled
> with unalloyed confidence that checksums have little performance
> impact and will do wonderful things to prevent data loss, whereas I
> think they have significant performance impact and will only very
> slightly help to prevent data loss.  

I admit that they'll have a significant performance impact in some
environments, but I think the vast majority of installations won't see
anything different, while some of them may be saved by it, including, as
likely as not, a number of actual corruption issues that have been
brought up on these lists in the past few days, simply because reports
were asked for.

> I admit that the idea of having
> pgbackrest verify checksums while backing up seems like it could
> greatly improve the chances of checksums being useful, but I'm not
> going to endorse changing PostgreSQL's default for pgbackrest's
> benefit.  

I'm glad to hear that you generally endorse the idea of having a backup
tool verify checksums.  I'd love it if all of them did and I'm not going
to apologize for, as far as I'm aware, being the first to even make an
effort in that direction.

> It's got to be to the benefit of PostgreSQL users broadly,
> not just the subset of those people who use one particular backup
> tool.  

Hopefully, other backup solutions will add similar capability, and
perhaps someone will also write an independent tool, and eventually
those will get out in released versions, and maybe PG will grow a tool
to check checksums too, but I can't make other tool authors implement
it, nor can I make other committers work on it and while I'm doing what
I can, as I'm sure you understand, we all have a lot of different hats.

> Also, the massive hit that will probably occur on
> high-concurrency OLTP workloads larger than shared_buffers is going to
> be had to justify for any amount of backup security.  I think that
> problem's got to be solved or at least mitigated before we think about
> changing this.  I realize that not everyone would set the bar that
> high, but I see far too many customers with exactly that workload to
> dismiss it lightly.

I have a sneaking suspicion that the customers which you get directly
involved with tend to be at a different level than the majority of PG
users which exist out in the wild (I can't say that it's really any
different for me).  I don't think that's a bad thing, but I do think
users at all levels deserve consideration and not just those running
close to the limits of 

Re: [HACKERS] Checksums by default?

2017-01-25 Thread Stephen Frost
Peter,

* Peter Geoghegan (p...@heroku.com) wrote:
> On Wed, Jan 25, 2017 at 1:22 PM, Peter Geoghegan  wrote:
> > I understand that my experience with storage devices is unusually
> > narrow compared to everyone else here. That's why I remain neutral on
> > the high level question of whether or not we ought to enable checksums
> > by default. I'll ask other hackers to answer what may seem like a very
> > naive question, while bearing what I just said in mind. The question
> > is: Have you ever actually seen a checksum failure in production? And,
> > if so, how helpful was it?
> 
> I'm surprised that nobody has answered my question yet.
> 
> I'm not claiming that not actually seeing any corruption in the wild
> due to a failing checksum invalidates any argument. I *do* think that
> data points like this can be helpful, though.

Sadly, without having them enabled by default, there's not a huge corpus
of example cases to draw from.

There have been a few examples already posted about corruption failures
with PG, but one can't say with certainty that they would have been
caught sooner if checksums had been enabled.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] COPY as a set returning function

2017-01-25 Thread Corey Huinker
>
>
>> I don't understand why do we have all these checks.  Can't we just pass
>> the values to COPY and let it apply the checks?  That way, when COPY is
>> updated to support multibyte escape chars (for example) we don't need to
>> touch this code.  Together with removing the unneeded braces that would
>> make these stanzas about six lines long instead of fifteen.
>>
>
> If I understand you correctly, COPY (via BeginCopyFrom) itself relies on
> having a relation in pg_class to reference for attributes.
> In this case, there is no such relation. So I'd have to fake a relcache
> entry, or refactor BeginCopyFrom() to extract a ReturnSetInfo from the
> Relation and pass that along to a new function BeginCopyFromReturnSet. I'm
> happy to go that route if you think it's a good idea.
>
>
>>
>>
>> > + tuple = heap_form_tuple(tupdesc,values,nulls);
>> > + //tuple = BuildTupleFromCStrings(attinmeta,
>> field_strings);
>> > + tuplestore_puttuple(tupstore, tuple);
>>
>> No need to form a tuple; use tuplestore_putvalues here.
>>
>
> Good to know!
>
>
>
>>
>> I wonder if these should be an auxiliary function in copy.c to do this.
>> Surely copy.c itself does pretty much the same thing ...
>>
>
> Yes. This got started as a patch to core because not all of the parts of
> COPY are externally callable, and aren't broken down in a way that allowed
> for use in a SRF.
>
> I'll get to work on these suggestions.
>

I've put in some more work on this patch, mostly just taking Alvaro's
suggestions, which resulted in big code savings.

I had to add a TupleDesc parameter to BeginCopy() and BeginCopyFrom(). This
seemed the easiest way to leverage the existing tested code (and indeed, it
worked nearly out-of-the-box). The only drawback is that a minor change
will have to be made to the BeginCopyFrom() call in file_fdw.c, and any
other extensions that leverage COPY. We could make compatibility functions
that take the original signature and pass it along to the corresponding
function with rsTupDesc set to NULL.

Some issues:
- I'm still not sure if the direction we want to go is a set returning
function, or a change in grammar that lets us use COPY as a CTE or similar.
- This function will have the same difficulties as adding the program
option did to file_fdw: there's very little we can reference that isn't
os/environment specific
- Inline (STDIN) prompts the user for input, but gives the error: server
sent data ("D" message) without prior row description ("T" message). I
looked for a place where the Relation was consulted for the row
description, but I'm not finding it.

I can continue to flesh this out with documentation and test cases if there
is consensus that this is the way to go.


# select * from copy_srf('echo "x\ty"',true) as t(x text, y text);
 x | y
---+---
 x | y
(1 row)

Time: 1.074 ms
# select * from copy_srf('echo "x\t4"',true) as t(x text, y integer);
 x | y
---+---
 x | 4
(1 row)

Time: 1.095 ms
# select * from copy_srf(null) as t(x text, y integer);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> a4
>> b5
>> \.
server sent data ("D" message) without prior row description ("T" message)
diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql
index 4dfedf8..26f81f3 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1065,6 +1065,21 @@ LANGUAGE INTERNAL
 STRICT IMMUTABLE PARALLEL SAFE
 AS 'jsonb_insert';
 
+CREATE OR REPLACE FUNCTION copy_srf(
+   IN filename text,
+   IN is_program boolean DEFAULT false,
+   IN format text DEFAULT null,
+   IN delimiter text DEFAULT null,
+   IN null_string text DEFAULT null,
+   IN header boolean DEFAULT null,
+   IN quote text DEFAULT null,
+   IN escape text DEFAULT null,
+   IN encoding text DEFAULT null)
+RETURNS SETOF RECORD
+LANGUAGE INTERNAL
+VOLATILE ROWS 1000 COST 1000 CALLED ON NULL INPUT
+AS 'copy_srf';
+
 -- The default permissions for functions mean that anyone can execute them.
 -- A number of functions shouldn't be executable by just anyone, but rather
 -- than use explicit 'superuser()' checks in those functions, we use the GRANT
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index f9362be..4e6a32c 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -30,6 +30,7 @@
 #include "commands/defrem.h"
 #include "commands/trigger.h"
 #include "executor/executor.h"
+#include "funcapi.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
@@ -286,7 +287,8 @@ static const char BinarySignature[11] = 
"PGCOPY\n\377\r\n\0";
 
 
 /* non-export function prototypes */
-static CopyState BeginCopy(ParseState *pstate, bool is_from, Relation rel,
+static CopyState BeginCopy(ParseState *pstate, bool is_from, Relation rel, 
+ TupleDesc rsTupDesc,
  RawStmt *raw_query, 

Re: [HACKERS] logical decoding of two-phase transactions

2017-01-25 Thread Craig Ringer
On 5 January 2017 at 20:43, Stas Kelvich  wrote:
>
>> On 5 Jan 2017, at 13:49, Simon Riggs  wrote:
>>
>> Surely in this case the master server is acting as the Transaction
>> Manager, and it knows the mapping, so we are good?
>>
>> I guess if you are using >2 nodes then you need to use full 2PC on each node.
>>
>> Please explain precisely how you expect to use this, to check that GID
>> is required.
>>
>
> For example if we are using logical replication just for failover/HA and 
> allowing user
> to be transaction manager itself. Then suppose that user prepared tx on 
> server A and server A
> crashed. After that client may want to reconnect to server B and commit/abort 
> that tx.
> But user only have GID that was used during prepare.
>
>> But even then, if you adopt the naming convention that all in-progress
>> xacts will be called RepOriginId-EPOCH-XID, so they have a fully
>> unique GID on all of the child nodes then we don't need to add the
>> GID.
>
> Yes, that’s also possible but seems to be less flexible restricting us to some
> specific GID format.
>
> Anyway, I can measure WAL space overhead introduced by the GID’s inside 
> commit records
> to know exactly what will be the cost of such approach.

Stas,

Have you had a chance to look at this further?

I think the approach of storing just the xid and fetching the GID
during logical decoding of the PREPARE TRANSACTION is probably the
best way forward, per my prior mail. That should eliminate Simon's
objection re the cost of tracking GIDs and still let us have access to
them when we want them, which is the best of both worlds really.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check

2017-01-25 Thread Robert Haas
On Wed, Jan 25, 2017 at 8:22 PM, Stephen Frost  wrote:
> Apparently we disagree about what is a 'reasonable manner'.

Yes.  I think that a "reasonable manner" should mean "what the DBA
thinks is reasonable", whereas you apparently think it should mean
"what the DBA thinks is reasonable, but only if the core developers
and in particular Stephen Frost also think it's reasonable".

Your proposed policy is essentially that functions should have
built-in superuser checks if having access to that function is
sufficient to escalate your account to full superuser privileges.
But:

1. There's no consensus on any such policy.

2. If there were such a policy it would favor, not oppose, changing
pg_ls_dir(), because you can't escalate to superuser given access to
pg_ls_dir().  Yet you are also opposed to changing pg_ls_dir() for
reasons that boil down to a personal preference on your part for
people not using it to build monitoring scripts.

3. Such a policy can only be enforced to the extent that we can
accurately predict which functions can be used to escalate to
superuser, which is not necessarily obvious in every case.  Under your
proposed policy, if a given function turns out to be more dangerous
than we'd previously thought, we'd have to stick the superuser check
back in for the next release.  And if it turns out to be less
dangerous than we thought, we'd take the check out.  That would be
silly.

4. Such a policy is useless from a security perspective because you
can't actually prevent superusers from delegating access to those
functions.  You can force them to use wrapper functions but that
doesn't eo ipso improve security.  It might make security better or
worse depending on how well the functions are written, and it seems
extremely optimistic to suppose that everyone who writes a security
definer wrapper function will actually do anything more than expose
the underlying function as-is (and maybe forget to schema-qualify
something).

5. If you're worried about people granting access to functions that
allow escalation to the superuser account, what you really ought to do
is put some effort into documenting which functions have such hazards
and for what general reasons.  That would have a much better chance of
preventing people from delegating access to dangerous functions
inadvertently than the current method, which relies on people knowing
(without documentation) that you've attempted to leave hard-coded
superuser() checks in some functions but not others for reasons that
sometimes but not always include privilege escalation, correctly
distinguishing which such cases involve privilege escalation as
opposed to other arbitrary criteria, and deciding neither to create
secdef wrappers for anything that has a built-in check for reasons of
privilege isolation nor to give up on privilege isolation and hand out
superuser to people who need pg_ls_dir().  While such a clever chain
of deductive reasoning cannot be ruled out, it would be astonishing if
it happened very often.

I'd be willing to agree to write documentation along the lines
suggested in (5) as a condition of removing the remaining superuser
checks if you'd be willing to review it and suggest a place to put it.
But I have a feeling compromise may not be possible here.  To me, the
hand-wringing about the evils of pg_ls_dir() on this thread contrasts
rather starkly with the complete lack of discussion about whether the
patch removing superuser checks from pgstattuple was opening up any
security vulnerabilities.  And given that the aforesaid patch lets a
user who has EXECUTE privileges on pgstattuple run that function even
on relations for which they have no other privileges, such as say
pg_authid, it hardly seems self-evident that there are no leaks there.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >