date:20170125

Re: [HACKERS] patch: function xmltable

2017-01-25 Thread Pavel Stehule

2017-01-24 21:38 GMT+01:00 Alvaro Herrera :

> Pavel Stehule wrote:
>
> > * SELECT (xmltable(..)).* + regress tests
> > * compilation and regress tests without --with-libxml
>
> Thanks.  I just realized that this is doing more work than necessary --
>

?? I don't understand?


> I think it would be simpler to have tableexpr fill a tuplestore with the
> results, instead of just expecting function execution to apply
> ExecEvalExpr over and over to obtain the results.  So evaluating a
> tableexpr returns just the tuplestore, which function evaluation can
> return as-is.  That code doesn't use the value-per-call interface
> anyway.
>

ok


> I also realized that the expr context callback is not called if there's
> an error, which leaves us without shutting down libxml properly.  I
> added PG_TRY around the fetchrow calls, but I'm not sure that's correct
> either, because there could be an error raised in other parts of the
> code, after we've already emitted a few rows (for example out of
> memory).  I think the right way is to have PG_TRY around the execution
> of the whole thing rather than just row at a time; and the tuplestore
> mechanism helps us with that.
>


ok.



>
> I think it would be good to have a more complex test case in regress --
> let's say there is a table with some simple XML values, then we use
> XMLFOREST (or maybe one of the table_to_xml functions) to generate a
> large document, and then XMLTABLE uses that document as input document.
>
> Please fix.
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Re: [HACKERS] pgbench more operators & functions

2017-01-25 Thread Fabien COELHO







Hello Tom,


I concur that this is expanding pgbench's expression language well beyond
what anybody has shown a need for.


As for the motivation, I'm assuming that pgbench should provide features 
necessary to implement benchmarks, so I'm adding operators that appear in 
standard benchmark specifications.



From TPC-B 2.0.0 section 5.3.5:


 | The Account_ID is generated as follows:
 | • A random number X is generated within [0,1]
 | • If X<0.85 or branches = 1, a random Account_ID is selected over all
 |accounts.
 | • If X>=0.85 and branches > 1, a random Account_ID is selected over all
 |   non- accounts.

The above uses a condition (If), logic (or, and), comparisons (=, <, >=).

From TPC-C 5.11 section 2.1.6, a bitwise-or operator is used to skew a 

distribution:

 | NURand (A, x, y) = (((random (0, A) | random (x, y)) + C) % (y - x + 1)) + x

And from section 5.2.5.4 of same, some time is computed based on a logarithm:

 | Tt = -log(r) * µ

I'm also concerned that there's an opportunity cost here, in that the patch 
establishes a precedence ordering for its new operators, which we'd have a 
hard time changing later. That's significant because the proposed precedence 
is different from what you'd get for similarly-named operators on the backend 
side. I realize that it's trying to follow the C standard instead,


Oops. I just looked at the precedence from a C parser, without realizing that 
precedence there was different from postgres SQL implementation:-( This is a 
bug on my part.



I'd be okay with the parts of this that duplicate existing backend syntax
and semantics, but I don't especially want to invent further than that.


Okay. In the two latest versions sent, discrepancies from that were bugs, I was 
trying to conform.


Version 8 attached hopefully fixes the precedence issue raised above:

 - use precedence taken from "backend/gram.y" instead of C. I'm not sure
   that it is wise that pg has C-like operators with a different
   precedence, but this is probably much too late...

And fixes the documentation:

 - remove a non existing anymore "if" function documentation which made
   Robert assume that I had not taken the hint to remove it. I had!

 - reorder operator documentation by their pg SQL precedence.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 1eee8dc..73101e1 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -828,11 +828,11 @@ pgbench  options  dbname
   The expression may contain integer constants such as 5432,
   double constants such as 3.14159,
   references to variables :variablename,
-  unary operators (+, -) and binary operators
-  (+, -, *, /,
-  %) with their usual precedence and associativity,
-  function calls, and
-  parentheses.
+  operators
+  with their usual precedence and associativity,
+  function calls,
+  SQL CASE generic conditional expressions
+  and parentheses.
  
 
  
@@ -917,6 +917,165 @@ pgbench  options  dbname
   
  
 
+ 
+  Built-In Operators
+
+  
+   The arithmetic, bitwise, comparison and logical operators listed in
+are built into pgbench
+   and may be used in expressions appearing in
+   \set.
+  
+
+  
+   pgbench Operators by increasing precedence
+   
+
+ 
+  Operator
+  Description
+  Example
+  Result
+ 
+
+
+ 
+  OR
+  logical or
+  5 or 0
+  1
+ 
+ 
+  AND
+  logical and
+  3 and 0
+  0
+ 
+ 
+  NOT
+  logical not
+  not 0
+  1
+ 
+ 
+  =
+  is equal
+  5 = 4
+  0
+ 
+ 
+  
+  is not equal
+  5  4
+  1
+ 
+ 
+  !=
+  is not equal
+  5 != 5
+  0
+ 
+ 
+  
+  lower than
+  5  4
+  0
+ 
+ 
+  =
+  lower or equal
+  5 = 4
+  0
+ 
+ 
+  
+  greater than
+  5  4
+  1
+ 
+ 
+  =
+  greater or equal
+  5 = 4
+  1
+ 
+ 
+  |
+  integer bitwise OR
+  1 | 2
+  3
+ 
+ 
+  #
+  integer bitwise XOR
+  1 # 3
+  2
+ 
+ 
+  
+  integer bitwise AND
+  1  3
+  1
+ 
+ 
+  ~
+  integer bitwise NOT
+  ~ 1
+  -2
+ 
+ 
+  
+  bitwise shift left
+  1  2
+  4
+ 
+ 
+  
+  bitwise shift right
+  8  2
+  2
+ 
+ 
+  +
+  addition
+  5 + 4
+  9
+ 
+ 
+  -
+  substraction
+  3 - 2.0
+  1.0
+ 
+ 
+  *
+  multiplication
+  5 * 4
+  20
+ 
+ 
+  /
+  division (integer truncates the results)
+  5 / 3
+  1
+ 
+ 
+  %
+  modulo
+  3 % 2
+  1
+ 
+ 
+  -
+  opposite
+  - 2.0
+  -2.0
+

Re: [HACKERS] lseek/read/write overhead becomes visible at scale ..

2017-01-25 Thread Tobias Oberstein

Hi Alvaro,

Am 24.01.2017 um 19:36 schrieb Alvaro Herrera:

Tobias Oberstein wrote:

I am benchmarking IOPS, and while doing so, it becomes apparent that at
these scales it does matter _how_ IO is done.

The most efficient way is libaio. I get 9.7 million/sec IOPS with low CPU
load. Using any synchronous IO engine is slower and produces higher load.

I do understand that switching to libaio isn't going to fly for PG
(completely different approach).

Maybe it is possible to write a new f_smgr implementation (parallel to
md.c) that uses libaio. There is no "seek" in that interface, at least,
though the interface does assume that the implementation is blocking.

FWIW, I now systematically compared the IO performance when normalized
for system load induced over different IO methods.

I use the FIO ioengine terminology:

sync = lseek/read/write
psync = pread/pwrite

Here:

https://github.com/oberstet/scratchbox/raw/master/cruncher/engines-compared/normalized-iops.pdf

Conclusion:

psync has 1.15x the normalized IOPS compared to sync
libaio has up to 6.5x the normalized IOPS compared to sync

---

These measurements where done on 16 NVMe block devices.

As mentioned, when Linux MD comes into the game, the difference between
sync and psync is much higher - the is a lock contention in MD.

The reason for that is: when MD comes into the game, even our massive
CPU cannot hide the inefficiency of the double syscalls anymore.

This MD issue is our bigger problem (compared to PG using sync/psync). I
am going to post to the linux-raid list about that, as being advised by
FIO developers.

---

That being said, regarding getting maximum performance out of NVMes with
minimal system load, the real deal probably isn't libaio either, but
kernel bypass (hinted to my by FIO devs):

http://www.spdk.io/

FIO has a plugin for SPDK, which I am going to explore to establish a
final conclusive baseline for maximum IOPS normalized for load.

There are similar approaches in networking (BSD netmap, DPDK) to bypass
the kernel altogether (zero copy to userland, no interrupts but polling
etc). With hardware like this (NVMe, 100GbE etc), the kernel gets in the
way ..

Anyway, this is now probably OT as for PG;)

Cheers,
/Tobias

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] lseek/read/write overhead becomes visible at scale ..

2017-01-25 Thread Tobias Oberstein


Hi Andres,


Using pread instead of lseek+read halfes the syscalls.

I really don't understand what you are fighting here ..


Sure, there's some overhead. And as I said upthread, I'm much less
against this change than Tom.  What I'm saying is that your benchmarks
haven't shown a benefit in a meaningful way, so I don't think I can
agree with


"Well, my point remains that I see little value in messing with
long-established code if you can't demonstrate a benefit that's clearly
above the noise level."

I have done lots of benchmarking over the last days on a massive box, and I
can provide numbers that I think show that the impact can be significant.


since you've not actually shown that the impact is above the noise level
when measured with an actual postgres workload.


I can follow that.

So real prove cannot be done with FIO, but "actual PG workload".

Synthetic PG workload or real world production workload?

Also: rgd the perf profiles from production that show lseek as #1 syscall.

You said it wouldn't be prove either, because it only shows number of 
syscalls, and though it is clear that millions of syscalls/sec do come 
with overhead, it is still not showing "above noise" level relevance 
(because PG is such a CPU hog in itself anyways;)


So how would I do a perf profile that would be acceptable as prove?

Maybe I can expand our

https://gist.github.com/oberstet/ca03d7ab49be4c8edb70ffa1a9fe160c

profiling function.

Cheers,
/Tobias



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench more operators & functions

2017-01-25 Thread Fabien COELHO




As it stands right now you haven't provided enough context for this patch
and only the social difficulty of actually marking a patch rejected has
prevented its demise in its current form - because while it has interesting
ideas its added maintenance burden for -core without any in-core usage.
But it you make it the first patch in a 3-patch series that implements the
per-spec tpc-b the discussion moves away from these support functions and
into the broader framework in which they are made useful.


I think Fabien already did post something of the sort, or at least 
discussion towards it,


Yep.

and there was immediately objection as to whether his idea of TPC-B 
compliance was actually right. I remember complaining that he had a 
totally artificial idea of what "fetching a data value" requires.


Yep.

I think that the key misunderstanding is that you are honest and assume 
that other people are honest too. This is naïve: There is a long history 
of vendors creatively "cheating" to get better than deserve benchmark 
results. Benchmark specifications try to prevent such behaviors by laying 
careful requirements and procedures.


In this instance, you "know" that when pg has returned the result of the 
query the data is actually on the client side, so you considered it is 
fetched. That is fine for you, but from a benchmarking perspective with 
external auditors your belief is not good enough.


For instance, the vendor could implement a new version of the protocol 
where the data are only transfered on demand, and the result just tells 
that the data is indeed somewhere on the server (eg on "SELECT abalance" 
it could just check that the key exists, no need to actually fetch the 
data from the table, so no need to read the table, the index is 
enough...). That would be pretty stupid for real application performance, 
but the benchmark would could get better tps by doing so.


Without even intentionnaly cheating, this could be part of a useful 
"streaming mode" protocol option which make sense for very large results 
but would be activated for a small result.


Another point is that decoding the message may be a little expensive, so 
that by not actually extracting the data into the client but just keeping 
it in the connection/OS one gets better performance.


Thus, TPC-B 2.0.0 benchmark specification says:

"1.3.2 Each transaction shall return to the driver the Account_Balance 
resulting from successful commit of the transaction.


Comment: It is the intent of this clause that the account balance in the 
database be returned to the driver, i.e., that the application retrieve 
the account balance."


For me the correct interpretation of "the APPLICATION retrieve the account 
balance" is that the client application code, pgbench in this context, did 
indeed get the value from the vendor code, here "libpq" which is handling 
the connection.


Having the value discarded from libpq by calling PQclear instead of 
PQntuples/PQgetvalue/... skips a key part of the client code that no real 
application would skip. This looks strange and is not representative of 
real client code: as a potential auditor, because of this I would not 
check the corresponding item in the audit check list:


 "11.3.1.2 Verify that transaction inputs and outputs satisfy Clause 1.3."

So the benchmark implementation would not be validated.


Another trivial reason to be able to actually retrieve data is that for 
benchmarking purpose it is very easy to want to test a scenario where you 
want to do different things based on data received, which imply that the 
data can be manipulated somehow on the benchmarking client side, which is 
currently not possible.


--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: recursive json_populate_record()

2017-01-25 Thread Tom Lane

Nikita Glukhov  writes:
> On 25.01.2017 23:58, Tom Lane wrote:
>> I think you need to take a second look at the code you're producing
>> and realize that it's not so clean either.  This extract from
>> populate_record_field, for example, is pretty horrid:

> But what if we introduce some helper macros like this:

> #define JsValueIsNull(jsv) \
>  ((jsv)->is_json ? !(jsv)->val.json.str \
>  : !(jsv)->val.jsonb || (jsv)->val.jsonb->type == jbvNull)

> #define JsValueIsString(jsv) \
>  ((jsv)->is_json ? (jsv)->val.json.type == JSON_TOKEN_STRING \
>  : (jsv)->val.jsonb && (jsv)->val.jsonb->type == jbvString)

Yeah, I was wondering about that too.  I'm not sure that you can make
a reasonable set of helper macros that will fix this, but if you want
to try, go for it.

BTW, just as a stylistic thing, I find "a?b:c||d" unreadable: I have
to go back to the manual every darn time to convince myself whether
that means (a?b:c)||d or a?b:(c||d).  It's better not to rely on
the reader (... or the author) having memorized C's precedence rules
in quite that much detail.  Extra parens help.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: function xmltable

2017-01-25 Thread Pavel Stehule

2017-01-25 23:33 GMT+01:00 Andres Freund :

> On 2017-01-25 22:51:37 +0100, Pavel Stehule wrote:
> > 2017-01-25 22:40 GMT+01:00 Andres Freund :
> > > > I afraid when I cannot to reuse a SRF infrastructure, I have to
> > > reimplement
> > > > it partially :( - mainly for usage in "ROWS FROM ()"
> > >
> >
> > The TableExpr implementation is based on SRF now. You and Alvaro propose
> > independent implementation like generic executor node. I am sceptic so
> > FunctionScan supports reading from generic executor node.
>
> Why would it need to?
>

Simply - due consistency with any other functions that can returns rows.

Maybe I don't understand to Alvaro proposal well - I have a XMLTABLE
function - TableExpr that looks like SRF function, has similar behave -
returns more rows, but should be significantly different implemented, and
should to have different limits - should not be used there and there ... It
is hard to see consistency there for me.

Regards

Pavel

1 2 >

1 - 100 of 140 matches

Mail list logo