Re: [PyGreSQL] PyGreSQL #77: Bad performance of query_formatted()

2019-04-20 Thread PyGreSQL issue tracker
#77: Bad performance of query_formatted()
---+-
  Reporter:  justin|  Owner:  cito
  Type:  task  | Status:  closed
  Priority:  major |  Milestone:  5.1
 Component:  DB API 2  |Version:  5.0
Resolution:  fixed |   Keywords:  pg, performance
---+-
Changes (by cito):

 * status:  new => closed
 * resolution:   => fixed


--
Ticket URL: 
PyGreSQL 
PyGreSQL Tracker
___
PyGreSQL mailing list
PyGreSQL@Vex.Net
https://mail.vex.net/mailman/listinfo/pygresql


Re: [PyGreSQL] PyGreSQL #77: Bad performance of query_formatted()

2019-04-20 Thread Christoph Zwerschke

Am 20.04.2019 um 01:20 schrieb Justin Pryzby:

  Also, caching the types of the values works
  only if you have very few different values (like in your case only a
  zero), but for most real-world examples with different strings or floats
  this will not work.


Do you mean "..works only if you have very few different TYPES OF values" ?


Sorry, it was too late and somehow I thought you were using a mapping 
from values to types. That would be a very bad idea as the range of 
values is infinite and some may not even been hashable.


But of course you were using a mapping from Python types to PyGres 
simple types. I've changed the code to use such a mapping for the most 
frequent types, too. I'm not using a self-extending mapping like 
_SimpleTypes() though, because for complex types, it would not work - 
e.g. the values [1] and ['x'] have the same Python type "list" but 
should map to the different PyGres simple types "int[]" and "text[]".


-- Christoph
___
PyGreSQL mailing list
PyGreSQL@Vex.Net
https://mail.vex.net/mailman/listinfo/pygresql


Re: [PyGreSQL] PyGreSQL #77: Bad performance of query_formatted()

2019-04-19 Thread Justin Pryzby
Thanks for looking at this

On Fri, Apr 19, 2019 at 09:46:18PM -, PyGreSQL issue tracker wrote:
>  But I think the huge performance difference in your example is only an
>  artifact of the special kind of query you were using, with very simple
>  values (just integer zeroes) and many columns.

FWIW this is our workload at telsasoft.  We have very wide tables (historically
some up to the postgres limit of 1600 columns), nearly all of which are
small/big/int/float/double.

>  This is because strings need to be escaped when inserted inline, which
>  turns the tide against inline=True.

Good point :)

>  I also had a look at the code, checked with a profiler and made some
>  optimizations in r970, r971 and r972, but did not really find much more we
>  can do here, at least no low-hanging fruits. The optimizations in your
>  patch will not work in the general case, when you have mixed parameters
>  containing Literal values. Also, caching the types of the values works
>  only if you have very few different values (like in your case only a
>  zero), but for most real-world examples with different strings or floats
>  this will not work.

Do you mean "..works only if you have very few different TYPES OF values" ?

>  So I think the best we can do is document the issue as you suggested, and
>  refer to `query` and `query_prepared` for performance-critial queries and
>  batch runs which I have done in r973.

Thank you.  I think this is sufficient.

Justin
___
PyGreSQL mailing list
PyGreSQL@Vex.Net
https://mail.vex.net/mailman/listinfo/pygresql


Re: [PyGreSQL] PyGreSQL #77: Bad performance of query_formatted()

2019-04-18 Thread PyGreSQL issue tracker
#77: Bad performance of query_formatted()
---+-
  Reporter:  justin|  Owner:  cito
  Type:  task  | Status:  new
  Priority:  major |  Milestone:  5.1
 Component:  DB API 2  |Version:  5.0
Resolution:|   Keywords:  pg, performance
---+-

Comment (by cito):

 Also posted by Justin 2019-01-19:

 1) in 5.0, document that relative to query, query_formatted has an
 overhead "which can be significant for queries repeated many times", and
 document that the mitigation is to use inline=True; or, use prepared
 statements "available since 5.1".  Note that for simple queries like
 INSERT, the significant overhead is in pygres, but for complex queries
 like JOINs/large inheritence trees/etc, the more overhead is in planning.

 2) For 5.1.1 (and maybe 5.0), something to mitigate the cost of
 isinstance() in pg and pgdb.

 3) In 5.1 (but probably not 5.0?), consider changing `query_formatted`
 default to `inline=True`. In my test, this inserted 30% faster (!) even
 with no 2nd patch.

 {{{
 $ python2.7 ./testinsert.py
 diff 192.718273878
 vs
 $ python2.7 ./testinsert.py
 diff 309.562824965
 }}}

 That might be good to consider for other reasons: there's 1) pqExec vs 2)
 pqExecParams.  1) supports multiple commands; but 2) allows binary
 protocol (which pygres doesn't currently support).  Binary protocol (or
 anything using pqExecParams) will never support multiple commands.

 If there aren't params, `query_formatted` currently calls `query` and
 `pqExec`, to allow the possibility of including multiple commands.  I
 wonder whether (starting in v5.1) perhaps pygres shouldn't call
 pqExecParams() in the case that there are no params?  Otherwise it's odd
 that query_formatted would call pqExec sometimes only, and an odd
 conditional which complicates any future support for things like binary
 format.  I realize that binary format isn't going to happentime anytime
 soon, if ever, but 5.1 is maybe an opportunity to make that change.

 Maybe multiple commands should be documented, and it's odd to write that
 `query_formatted` supports multiple command "if there are no params, or if
 inline=True".  It'd be better to be able to say "multiple commands are not
 supported except when inline=True"; or, if that was the default, "multiple
 commands are not supported if inline=False".

 If pg always (or defaulted) used `inline=True`, we could always use
 multiple commands (even with pqExecParams), and it would be similar to
 pgdb.  But maybe that's moving in the wrong direction, too.. Or maybe that
 should wait until binary protocol is on the table

 I realize this message is addressing multiple things and maybe not very
 focused, but a few of these are kind of connected.

--
Ticket URL: 
PyGreSQL 
PyGreSQL Tracker
___
PyGreSQL mailing list
PyGreSQL@Vex.Net
https://mail.vex.net/mailman/listinfo/pygresql


Re: [PyGreSQL] PyGreSQL #77: Bad performance of query_formatted()

2019-04-18 Thread PyGreSQL issue tracker
#77: Bad performance of query_formatted()
---+-
  Reporter:  justin|  Owner:  cito
  Type:  task  | Status:  new
  Priority:  major |  Milestone:  5.1
 Component:  DB API 2  |Version:  5.0
Resolution:|   Keywords:  pg, performance
---+-
Changes (by cito):

 * Attachment "v1-0001-Optimize-query-formatting.patch" added.

 Optimized query_formatted patch

--
Ticket URL: 
PyGreSQL 
PyGreSQL Tracker
___
PyGreSQL mailing list
PyGreSQL@Vex.Net
https://mail.vex.net/mailman/listinfo/pygresql