Re: [GENERAL] Serializable isolation -- are predicate locks still held across all databases?

2017-05-19 Thread Kevin Grittner
On Fri, May 19, 2017 at 6:56 AM, Karl O. Pinc <k...@meme.com> wrote:

> I think if I was to make an argument for doing something it would
> be based on reliability -- how many users can you give their
> own database before somebody leaves an open transaction hanging?

Yeah, I guess it's worth having on the list, where it will compete
with other possible enhancements on a cost/benefit basis.  Thanks
for raising the issue!

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Serializable isolation -- are predicate locks still held across all databases?

2017-05-18 Thread Kevin Grittner
On Thu, May 18, 2017 at 11:07 AM, Karl O. Pinc <k...@meme.com> wrote:

> I forget all the details, but some time ago I found
> that I had to increase max_pred_locs_per_transaction.
> What I recall about the reason for this is that I'm
> using the serializable transaction isolation, and that
> I've a test database which occasionally has extremely
> long running transactions.  The PG serializable
> snapshot isolation implementation at the time (9.1?)
> was holding predicate locks across all databases
> during transactions.  This even though databases
> are independent of each other.  The long transaction
> times in the test database lead to predicate lock
> exhaustion in production databases -- only a single
> transaction would be executing in the test database
> but many would occur in the production databases.
> (I don't know if there was potential for other bad effects
> due to the production transactions "hanging around" until the
> transaction in the test db finished.)
>
> My question is whether this has changed.  Does PG
> now pay attention to database in it's SSI implementation?

Well, it pays attention as far as the scope of each lock, but there
is only one variable to track how far back the oldest transaction ID
for a running serializable transaction goes, which is used in
cleanup of old locks.  I see your point, and it might be feasible to
change that to a list or map that tracks it by database; but I don't
even have a gut feel estimate for the scale of such work without
investigating it.  Just out of curiosity, what is the reason you
don't move the production and test databases to separate instances?
If nothing else, extremely long-running transaction in one database
can lead to bloat in others.

> Thanks for the help and apologies if I'm not framing
> the question perfectly.  It's not often I think about
> this.

No sweat -- your concern/question is perfectly clear.  It's the
first time I've heard of someone with this particular issue, so at
this point I'm inclined to recommend the workaround of using a
separate cluster; but if we get other reports it might be worth
adding to the list of enhancements that SSI could use.

Thanks!

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] update error with serializable

2017-01-20 Thread Kevin Grittner
On Fri, Jan 20, 2017 at 11:27 AM, Rob Sargent <robjsarg...@gmail.com> wrote:
> On 01/20/2017 10:05 AM, Kevin Grittner wrote:

>> https://www.postgresql.org/message-id/flat/d8joa0eh9yw@dalvik.ping.uio.no#d8joa0eh9yw@dalvik.ping.uio.no

> Configurable or dynamic?  Wouldn't something related to tuples per page (and
> maybe fillfactor) do the trick?

Please keep discussion such as that on the thread for the patch.

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] update error with serializable

2017-01-20 Thread Kevin Grittner
On Fri, Jan 20, 2017 at 4:44 AM, Tom DalPozzo <t.dalpo...@gmail.com> wrote:

> I've two threads countinuously updataing rows in the same table.
> Each one does: BEGIN, UPDATE,UPDATECOMMIT
> There can't be two active transactions updating the same row (my
> bug apart but I don't think so).
> I'm using default_transaction_isolation = 'serializable'
> I get "could not serialize access due to read/write dependencies
> among transactions"
> I din't expect to see it, hence there must be something  in
> postgresql theory that I haven't understood well and I'd like a
> clarification.

Most likely one or both transactions have have updated 3 or more
tuples on a single page, causing the tuple locks for the
transaction on that page to be combined into a single page lock for
that transaction.  This is intended to prevent the memory required
for tracking predicate locks from growing too large.  This
threshold of 3 per page was entirely arbitrary and always seen as
something which could and should be improved someday.  That might
happen for version 10 (expected to be released next year), since a
patch has been submitted to make that configurable.

https://www.postgresql.org/message-id/flat/d8joa0eh9yw@dalvik.ping.uio.no#d8joa0eh9yw@dalvik.ping.uio.no

If you are able to build from source, you might want to test the
efficacy of the patch for your situation.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Re: [ADMIN] postgresql : could not serialize access due to read/write dependencies among transactions

2017-01-18 Thread Kevin Grittner
On Tue, Jan 17, 2017 at 10:54 PM, Neslisah Demirci
<neslisah.demi...@markafoni.com> wrote:

> could not serialize access due to read/write dependencies among
> transactions

> I also add an index to my query and my query's execution plan
> don't use seq scan .

These two issues are likely to be somewhat related -- if a
sequential scan is used, then any write to that table by another
connection causes a read-write dependency (a/k/a rw-conflict),
which can eventually contribute to a serialization failure.  If you
can cause narrower access through indexes, you may see a
significant drop in the frequency of these serialization failures.
You might want to post the query and its execution plan with all
the information suggested here (exact pg version, configuration
information, machine descriptions, etc.):

https://wiki.postgresql.org/wiki/SlowQueryQuestions

On the other hand, if you are going to use serializable
transactions (or even repeatable read transactions) you should
probably be using some framework that can retry the transaction
from the start on a serialization failure.

You might be interested in this set of examples of how serializable
transactions differ from repeatable read:

https://wiki.postgresql.org/wiki/SSI

And of course, if you haven't already read the fine manual on the
topic:

https://www.postgresql.org/docs/current/static/mvcc.html

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] raise notice question

2017-01-16 Thread Kevin Grittner
On Sat, Jan 14, 2017 at 11:15 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> ProPAAS DBA <d...@propaas.com> writes:
>> Is it possible to execute a raise notice without the "Notice" keyword
>> being part of the output

> This is a matter for how your client code presents the message data.
> So far as psql is concerned, the answer would be "no", but a custom
> application could deconstruct the notice message however it wanted.

It wouldn't be crazy to provide a way in psql to suppress the level only
for NOTICE.  It would become much more important to be able to do so
(and perhaps do so by default) if we get stored procedures which can
return a complex result stream like TDS does.  The series of literals
and results sets of different types is something which can be quite
useful to DBAs.

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Re: Are new connection/security features in order, given connection pooling?

2017-01-16 Thread Kevin Grittner
On Fri, Jan 13, 2017 at 7:39 PM, Karl Czajkowski <kar...@isi.edu> wrote:

> The concern was raised about the risk of the subsequent queries being
> able to reverse the "set role" to subvert such authorization. Others
> in the thread provided the common advice of parametric queries to
> prevent query injection, but I think you need more than that.  I think
> one would like a stronger isolation between the connection
> setup/security config layer and the fancier (and possibly flawed)
> application layer.  We struggle with this because all these mechanisms
> are multiplexed as commands in one SQL connection, instead of having
> an in-band versus out-of-band protocol layering available to the web
> application, so the more risky data-dependent SQL can flow in-band
> while the basic request-handler lifecycle could configure the security
> context out-of-band.

Wasn't there some previous discussion about the ability to have a
form of SET ROLE that would specify that some token would be needed
to do a successful RESET ROLE?  (I don't remember whether people
were leaning toward providing the token on the SET ROLE statement,
or getting that as part of the return state.)  If you made the
token a suitably secure random value, and made an incorrect token
on the RESET ROLE a FATAL error, it seems like it could be secure
enough.  You still need to have the ability to authenticate the
client side connection in the connection pooler, and have the
connection pooler connect to the server with a login with rights to
do the appropriate SET ROLE (preferably without requiring superuser
rights).

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Materialized view vs. view

2017-01-10 Thread Kevin Grittner
On Tue, Jan 10, 2017 at 2:01 PM, Melvin Davidson <melvin6...@gmail.com>
wrote:

> Can we all agree that the "Materialized View" should be faster

I think we have.

> and stop this pointless bickering about naming convention,
> which I have already stated, is just an opinion and too late to change at
this point?

Novel opinions about what words mean can lead to confusion.  Left
alone, what you said might have confused some readers about what
"materialized" means.  "Materialized view" has been a term of art,
part of database jargon, for over 30 years.

https://en.wikipedia.org/wiki/Jargon :

"A main driving force in the creation of technical jargon is
precision and efficiency of communication when a discussion must
easily range from general themes to specific, finely differentiated
details without circumlocution."

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [GENERAL] Materialized view vs. view

2017-01-10 Thread Kevin Grittner
On Tue, Jan 10, 2017 at 1:36 PM, Melvin Davidson <melvin6...@gmail.com>
wrote:

> IMHO, I disagree. I feel a better name would be "materialized
> table".

The dictionary defines "materialize" as meaning "become actual
fact" or "appear in bodily form".  In the database environment, it
generally means that the data is actually stored, rather than being
something which can be generated.  For example, in query execution
the relation produced by an execution node may feed into a
Materialize node if the generated relation is expected to be
scanned multiple times by a higher-level node and scanning a stored
copy of the relation each time is expected to be faster than
regenerating the relation each time.  "Materialized table" would be
redundant; a table is always materialized.  A view is data
generated by running a query. In the simple case, the resulting
relation is not stored, but is regenerated on each reference.  The
"materialized view" feature lets you materialize it, like a table.

If you don't think materializing data means storing a copy of it
for re-use, I'm not sure what you think it means.

This is not to beat up on you, but to try to keep terminology
clear, to facilitate efficient communication.  There are some terms
we have been unable to avoid using with different meanings in
different contexts (e.g., "serialization"); that's unfortunate, but
hard to avoid.  I want to keep it to the minimum necessary by
avoiding creep of other terms to multiple definitions.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [GENERAL] Materialized view vs. view

2017-01-10 Thread Kevin Grittner
On Tue, Jan 10, 2017 at 12:44 PM, Melvin Davidson <melvin6...@gmail.com>
wrote:

> fyi, a view is nothing more than just that, a view.
> A materialized view, afaic, is a misleading name, it is actually
> a valid table and you can create indexes on them,

I disagree with the notion that defining a relation in terms of a
query (like a view) and materializing the results (like a table)
makes "materialized view" a misleading name.  I don't think I can
say it better than others already have, so I recommend reading the
first three paragraphs of the "Introduction" section of this paper:

http://homepages.inf.ed.ac.uk/wenfei/qsx/reading/gupta95maintenance.pdf
Ashish Gupta and Inderpal Singh Mumick.
Maintenance of Materialized Views: Problems, Techniques, and Applications.

> so theoretically you should be able to reduce response time on
> them.

As the above-referenced text suggests, a materialized view is
essentially a cache of the results of the specified query.  While,
in rare cases, this may be captured to provide the query results as
of some particular moment in time, the overwhelming reason for
creating a materialized view is to improve performance over a
non-materialized view.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [GENERAL] Improve PostGIS performance with 62 million rows?

2017-01-09 Thread Kevin Grittner
On Mon, Jan 9, 2017 at 11:49 AM, Israel Brewster <isr...@ravnalaska.net> wrote:

> [load of new data]

>  Limit  (cost=354643835.82..354643835.83 rows=1 width=9) (actual
> time=225998.319..225998.320 rows=1 loops=1)

> [...] I ran the query again [...]

>  Limit  (cost=354643835.82..354643835.83 rows=1 width=9) (actual
> time=9636.165..9636.166 rows=1 loops=1)

> So from four minutes on the first run to around 9 1/2 seconds on the second.
> Presumably this difference is due to caching?

It is likely to be, at least in part.  Did you run VACUUM on the
data before the first run?  If not, hint bits may be another part
of it.  The first access to each page after the bulk load would
require some extra work for visibility checking and would cause a
page rewrite for the hint bits.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Row value expression much faster than equivalent OR clauses

2017-01-04 Thread Kevin Grittner
On Wed, Dec 28, 2016 at 10:58 AM, Steven Grimm <sgr...@thesegovia.com> wrote:

> WHERE ((e.timeStamp > '2016-12-19T20:34:22.315Z')
> OR (e.timeStamp = '2016-12-19T20:34:22.315Z'
> AND e.sequenceNumber > 0)
> OR (e.timeStamp = '2016-12-19T20:34:22.315Z'
> AND e.sequenceNumber = 0
> AND e.aggregateIdentifier >
> 'dev:642e1953-2562-4768-80d9-0c3af9b0ff84'))

> This uses the index on the three columns it's using for ordering of events,
> but (if I'm reading the explain output correctly) does a full scan of the
> index.

> I played around with it a little and one thing I tried was to restructure
> the WHERE clause using a row value expression that's semantically equivalent
> to the original.

> WHERE (e.timeStamp, e.sequenceNumber, e.aggregateIdentifier) >
> ('2016-11-19T20:34:22.315Z', 0, 
> 'dev:642e1953-2562-4768-80d9-0c3af9b0ff84')

> This ends up being a LOT faster:

Yup.

> I wonder if the query planner could recognize that the two queries are
> equivalent and choose the second plan for the OR-clause version, or at least
> use the index more efficiently.

Theoretically it could, but that would add significant time to
planning for a large number of queries, with no benefit to those
who explicitly write the query in the faster (and more concise!)
fashion.

You could come a lot closer to the performance of the row value
expression technique by using the logical equivalent of your
original query that puts AND at the higher level and OR at the
lower level.  (Having OR at the top is generally inefficient.)

WHERE (e.timeStamp >= '2016-12-19T20:34:22.315Z'
  AND (e.timeStamp >  '2016-12-19T20:34:22.315Z'
 OR (e.sequenceNumber >= 0
AND (e.sequenceNumber >  0
   OR (e.aggregateIdentifier >
'dev:642e1953-2562-4768-80d9-0c3af9b0ff84')

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Re: [GENERAL] Love Your Database project — Thoughts on effectively handling constraints?

2016-12-17 Thread Kevin Grittner
On Fri, Dec 16, 2016 at 3:54 PM, Guyren Howe <guy...@gmail.com> wrote:

> What I need to do is turn this into something similar to the equivalent
> Rails-side constraint failure, which is a nicely formatted error message on
> the model object.

Can you show what the text in such a message looks like?

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] does postgres log the create/refresh of a materialized view anywhere?

2016-12-13 Thread Kevin Grittner
On Tue, Dec 13, 2016 at 7:37 PM, Melvin Davidson <melvin6...@gmail.com> wrote:
> On Tue, Dec 13, 2016 at 7:36 PM, Kevin Grittner <kgri...@gmail.com> wrote:
>> On Tue, Dec 13, 2016 at 5:57 PM, Jonathan Vanasco <postg...@2xlp.com> wrote:
>>
>>> Is there a way to find out when a materialized view was
>>> created/refreshed?
>>
>>> I can log this manually in postgresql if needed, but was hoping
>>> there was some "timestamp" on the view in a system table.
>>
>> This is not currently tracked in the system catalogs.

> This goes back to a discussion of my request to add relcreated
> column to pg_class.
> https://www.postgresql.org/message-id/CANu8FiyiRPGZ+gB=1jayryx3hxcuqnflsfgdzfxsvlmj-jt...@mail.gmail.com
> Apparently the naysayers do not feel it is worthwhile.

Do you see relcreated as being something to set anew whenever the
data contents of a materialized view change due to the REFRESH
command?  I wouldn't have thought so, but I guess the problem with
that proposal is that everyone has a different idea of what the
semantics of the column would be.  Suggesting that field as the
solution here seems to reinforce that perception, anyway.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] does postgres log the create/refresh of a materialized view anywhere?

2016-12-13 Thread Kevin Grittner
On Tue, Dec 13, 2016 at 5:57 PM, Jonathan Vanasco <postg...@2xlp.com> wrote:

> Is there a way to find out when a materialized view was
> created/refreshed?

> I can log this manually in postgresql if needed, but was hoping
> there was some "timestamp" on the view in a system table.

This is not currently tracked in the system catalogs.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] FreeBSD 10 => 11: Dump and reload your PostgreSQL database unless you like it broken

2016-12-07 Thread Kevin Grittner
On Wed, Dec 7, 2016 at 7:33 AM, Michael Sheaver <mshea...@me.com> wrote:
> I would like to echo the sentiment on collation and expand it to
> character sets in general. When issues with them come up, they do
> take an incredible amount of time and effort to resolve, and are
> one of my own biggest pain points when dealing with databases and
> datasets from other sources.

I would be much happier if most developers understood the
difference between a character set (e.g., Unicode) and a character
encoding scheme (e.g., UTF-8 or UTF-16) and how the two concepts
relate.  If we reached a point where most DBAs understood the point
of being able to set a client_encoding that is different from the
server_encoding, I think I would need to pop the cork on some
champagne.

Hm.  Maybe a topic for a blog post....

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Methods to quiesce PostgreSQL DB without configuring log archival

2016-11-23 Thread Kevin Grittner
On Wed, Nov 23, 2016 at 11:53 AM, Scott Mead <sco...@openscg.com> wrote:
> On Nov 22, 2016, at 01:23, MEERA <meeranai...@gmail.com> wrote:

>> If archive_mode is not configured, and i use snapshot solution for backup of
>> the server, how can i ensure data consistency? Is there a way to quiesce all
>> the connections to DB?
>
> If your snapshot solution is atomic, then you are *probably* okay. I would
> do a few restores to test, but atomic snapshots give you a point in time and
> should be consistent.

You should be OK *if all files in the backup are part of a single
atomic snapshot*.  You can't put pg_xlog or selected tablespaces on
a separate filesystem that has a separate snapshot and expect the
various snapshot to be combined to be used as a coherent backup
unless you are archiving WAL and use pg_start_backup() (before the
first snapshot is initiated) and pg_stop_backup() (after the last
snapshot is completes) to take a PITR-style recovery.

Be sure to follow all the rules for PITR-style backup and recovery,
like deleting the postmaster.pid file and all files under pg_xlog
before starting the recovery.  And of course, do NOT delete the
backup_label file created by pg_start_backup().

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] max_connections limit violation not showing in pg_stat_activity

2016-11-23 Thread Kevin Grittner
On Wed, Nov 23, 2016 at 4:43 AM, Charles Clavadetscher
<clavadetsc...@swisspug.org> wrote:
> From: Kevin Grittner [mailto:kgri...@gmail.com]

>> Is it possible to upgrade?  You are missing over a year's worth
>> of fixes for serious bugs and security vulnerabilities.
>
> Yes. Actually it is foreseen to move to 9.6 (and RedHat 7.x).
>
>> https://www.postgresql.org/support/versioning/

An upgrade from 9.3.x to 9.6.x is a major release upgrade, which
gets you new features and usually gets you improvements in
performance and scalability.  The 9.3 major release will be
supported for almost 2 more years, so I wasn't so concerned about
that as being on 9.3.10 when the latest bug fix version of 9.3 is
9.3.15.  To avoid hitting bugs that others have already hit and
reported, with fixes published, it is wise to try to upgrade to the
latest minor release fairly quickly.  If the minor release fixes a
serious security vulnerability, I think it is a good idea to update
within a day or two of release.

> I will have to check with our hosting people how many cores we
> have or can have on the new environment.  I have seen that there
> is pgBouncer and pgPool. Would you recommend one of those?

pgBouncer is more lightweight, so if you don't need any of the
features present only pgPool, I would go with pgBouncer.  Depending
on your application software environment, it may be even better to
use a pool built into the application development framework.  There
are several very good pools available to Java environments.  I can
personally attest to the quality of Apache dbcp, but others have
mentioned that they like other Java connection pools even better.
You might want to search the archives, or maybe someone will
mention the others again on this thread.

>>> SELECT count(*) FROM pg_stat_activity; watch 1;
>>
>> At the times when the resources are overloaded by more
>> connections than the resources can efficiently service -- well
>> that's precisely the time that a sleeping "monitoring" process
>> is least likely to be given a time slice to run. If you can
>> manage to get pgbadger to run on your environment, and you turn
>> on logging of connections and disconnections, you will be able
>> to get far more accurate information.
>
> Yes, it sounds reasonable. I assumed that this kind of
> measurements have a higher priority or reserved slots for them.
> In those occasions is when they are most needed.

There is no such feature in PostgreSQL.  It might be worthwhile,
although how that would be implemented is not obvious, short of a
sysadmin looking for the monitoring backend process and running
"nice" against it.

> And thank you for the hint to pgbadger. I will take a look into
> it, but an installation on the server completely depends on our
> hosting service people. I am not sure this is feasible, but I can
> imagine an alternative scenario, using a local installation for
> tuning and then pass the parameters to the hosters for
> implementation.

As long as you can control the PostgreSQL configuration (to set the
right logging options) and can retrieve the log files, you should
be able to use it.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] max_connections limit violation not showing in pg_stat_activity

2016-11-22 Thread Kevin Grittner
On Tue, Nov 22, 2016 at 12:48 PM, Charles Clavadetscher
<clavadetsc...@swisspug.org> wrote:

> We are using PostgreSQL 9.3.10 on RedHat (probably 6.x).

Is it possible to upgrade?  You are missing over a year's worth of
fixes for serious bugs and security vulnerabilities.

https://www.postgresql.org/support/versioning/

> Among other thing the database is the backend for a web application that
> expects a load of a some hundred users at a time (those are participans
> to online surveys that we use for computing economic indicators and
> access the system every month). The whole amount of people expected is
> above 5000, but we don't expect a too high concurrent access to the
> database. As mentioned a few hundreds at the beginning of the surveys.
>
> To be sure that we won't have problems with the peak times we created a
> load test using gatling that ramps up to 1000 users in 5 minutes in
> bunches of 10. At the beginning we had problems with the web server
> response that we were able to correct. Now we face problem with the
> max_connections limit of PostgreSQL. Currently it is set to the default
> of 100. We are going to look into it and either increase that limit or
> consider connections pooling.

On a web site with about 3000 active users, I found (through
adjusting the connection pool size on the production database and
monitoring performance) that we got best performance with a pool of
about 40 connections.  This was on a machine with 16 cores (never
count HT "threads" as cores), 512GB RAM, and a RAID with 40 drives
of spinning rust.

http://tbeitr.blogspot.com/2015/11/for-better-service-please-take-number.html

> What bothers me however is that running a query on pg_stat_activity with
> a watch of 1 seconds never shows any value higher than 37 of concurrent
> active connections.
>
> SELECT count(*) FROM pg_stat_activity; watch 1;

At the times when the resources are overloaded by more connections
than the resources can efficiently service -- well that's precisely
the time that a sleeping "monitoring" process is least likely to be
given a time slice to run.  If you can manage to get pgbadger to
run on your environment, and you turn on logging of connections and
disconnections, you will be able to get far more accurate
information.

> Increasing max_connections has repercussions on the configuration
> of work_mem (if I remember well)

Each connection can allocate one work_mem allocation per node which
requires a sort, hash, CTE, etc.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pg_class (system) table increasing size.

2016-11-21 Thread Kevin Grittner
On Mon, Nov 21, 2016 at 11:34 AM, dhaval jaiswal <dhava...@hotmail.com> wrote:

> Due to business impact auto vacuum is off.

You have now discovered some of the the negative business impact of
turning it off.  If you leave it off, much worse is likely to
follow.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pg_class (system) table increasing size.

2016-11-17 Thread Kevin Grittner
On Thu, Nov 17, 2016 at 3:16 PM, Alvaro Herrera
<alvhe...@2ndquadrant.com> wrote:

> Looks like you lost the stat data awhile ago (probably due to a server
> crash, or pg_stats_reset()) and it never got updated.  I suggest doing
> "ANALZYE pg_class" to create initial stats; that might prompt autovacuum
> to vacuum the table.  If the bloat is excessive, vacuuming might take a
> very long time, in which case perhaps consider VACUUM FULL (but be very
> aware of its consequences first).
>
> I think it's likely that this has happened to other catalogs as well, so
> check the pg_stat_sys_tables view for other entries with all zeroes in
> the n_tup_* columns.

+1

Also, you may want to review your autovacuum settings to make sure
they are aggressive enough.  You didn't describe you machine, your
workload, or your vacuum regimen, but if it's a large machine you
would probably need to raise autovacuum_vacuum_cost limit.  And if
autovacuum somehow got turned *off* you are likely to have all
kinds of problems with bloat, and may need to schedule some down
time to get it cleaned up.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Locking question

2016-10-27 Thread Kevin Grittner
On Thu, Oct 27, 2016 at 1:37 AM, Frank Millman <fr...@chagford.com> wrote:

> As mentioned previously, I am trying to avoid using PostgreSQL-specific
> techniques, as I need to support sqlite3 and SQL Server as well.

The SERIALIZABLE transaction isolation level is portable.  It it
part of the SQL standard (and has been since the beginning), and is
supported by just about every database product, including SQLite
and SQL Server.  (In fact, you have to go well out of your way for
SQLite transactions *not* to be SERIALIZABLE --
https://www.sqlite.org/isolation.html )

> For costing purposes, I want to run a FIFO system. This means I have to
> maintain separate entries for each receipt of stock, and allocate any sales
> of stock against the receipts ‘oldest first’.

The two ways of doing this which spring to mind are window
functions (supported by PostgreSQL and SQL Server, but not SQLite)
and cursors (supported by most database products, including the
three you mention).

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] SERIALIZABLE and INSERTs with multiple VALUES

2016-10-26 Thread Kevin Grittner
On Wed, Oct 26, 2016 at 3:20 PM, Peter Geoghegan <p...@bowt.ie> wrote:
> On Mon, Oct 24, 2016 at 8:07 AM, Kevin Grittner <kgri...@gmail.com> wrote:
>> My initial thought is that since reducing the false positive rate
>> would only help when there was a high rate of conflicts under the
>> existing patch, and it would add code complexity and cost for the
>> case where conflict rate is low, that we might want to just leave
>> the current fix and see whether there are complaints from the field
>> about the false positive rate.
>>
>> Reducing the rate of false positive serialization failures is a
>> worthy goal, but it's gotta make sense from a cost/benefit
>> perspective.
>
> What are your thoughts on the back-and-forth between myself and Tom
> concerning predicate locks within heap_fetch_tuple() path last
> weekend? I now think that there might be an outstanding concern about
> ON CONFLICT DO NOTHING + SSI here.

As far as I can see, Tom's fixes are all spot-on.

It *might* be possible to call some special custom variant of
heap_fetch() instead of the usual one here, but it would be very
hard to prove that it was safe against introducing a bug in the
serialization logic in all cases, and seems to me that it would be
fragile against future changes.  What's more, if we do decide that
the false positive rate with the existing fix is too high, we would
need to use the standard heap_fetch() function in implementing the
more targeted logic.

By the way, if we do get complaints from the field about
performance problems from the false positives in this area, they
would need to show that the hit was pretty severe to justify
back-patching to any stable branches.  We are talking about a
performance tuning change, and one that is not necessarily a winner
in all cases; such tuning is rarely back-patched to stable
branches.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Locking question

2016-10-26 Thread Kevin Grittner
On Wed, Oct 26, 2016 at 3:42 AM, Frank Millman <fr...@chagford.com> wrote:

> I am designing an inventory application, and I want to ensure
> that the stock level of any item cannot go negative.

One way to do this is to use only transactions at the SERIALIZABLE
transaction isolation level to maintain and query this data.  When
you do that, you can write the transactions as though each would
only ever be run by itself, and if a concurrent transaction would
cause incorrect behavior you will get an error with a SQLSTATE
starting with "40", and you can retry the transaction from the
start.  For applications like you describe, this often performs
better than approaches which use blocking locks (assuming proper
configuration and reasonable indexes).

https://www.postgresql.org/docs/current/static/transaction-iso.html

Logically, the problem is similar to the overdraft protection
example here:

https://wiki.postgresql.org/wiki/SSI#Overdraft_Protection

Basically, you need some way to catch serialization failure errors
and retry the failed transaction from the start, and that frees you
from worrying about where race conditions exist and covering each
one individually.

If you want to use a less strict isolation level, you need to
either promote the conflict from read-write to write-write by using
SELECT FOR UPDATE or you need to materialize the conflict.  The
latter could be accomplished by maintaining a total within any
transactions modifying the detail (either from triggers or
application code), which will cause a write conflict if two
transactions try to update the same total at the same time, or by
using explicit locking controlled from the application.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] SERIALIZABLE and INSERTs with multiple VALUES

2016-10-24 Thread Kevin Grittner
On Thu, Oct 13, 2016 at 5:26 PM, Thomas Munro
<thomas.mu...@enterprisedb.com> wrote:

> (1)  postgres=# create table bank_account (id int primary key, cash int);
> (1)  CREATE TABLE
> (1)  postgres=# begin transaction isolation level serializable ;
> (1)  BEGIN
>
> (2)  postgres=# begin transaction isolation level serializable ;
> (2)  BEGIN
>
> (1)  postgres=# select * from bank_account where id = 1;
> (1)  ┌┬──┐
> (1)  │ id │ cash │
> (1)  ├┼──┤
> (1)  └┴──┘
> (1)  (0 rows)
>
> (2)  postgres=# insert into bank_account values (1, 100);
> (2)  INSERT 0 1
>
> (1)  postgres=# insert into bank_account values (1, 200) on conflict do 
> nothing;
> (1)  ...waits for tx2...
>
> (2)  postgres=# commit;
> (2)  COMMIT
>
> (1)  INSERT 0 0
> (1)  postgres=# commit;
> (1)  COMMIT

This is a really diabolical example.  (Thanks for that!)

Without use of the ON CONFLICT option, using standard statements in
SERIALIZABLE transactions there would be no serialization anomaly.
If both transactions first tested for the existence of the row with
a SELECT statement tx1 would get a serialization failure; in the
example as shown tx1 would get a duplicate key error (even though
we would like to get to the point of having it get a serialization
failure).

If we took out the existing check and modified INSERT under
SERIALIZABLE transactions to acquire predicate locks during initial
insertion of the index key in the index used by the ON CONFLICT
clause (like the ones which would be acquired by a SELECT which
used the index), we would handle many cases, but not this one
(since the INSERT in tx2 does not use the ON CONFLICT clause).
This example would cause a rw-conflict from tx1 to tx2, but not
vice versa, since tx2 doesn't read.  So, no "dangerous structure"
in the SSI sense, and no serialization failure, even though the
results are not consistent with any serial execution of the
transactions.

I *think* that to generate the correct rw-conflicts to allow
dropping the existing check (recently proposed by Thomas Munro and
implemented by Tom Lane), we would need to acquire predicate locks
during the descent to insert into each unique index during any
INSERT under SERIALIZABLE transaction isolation.  The obvious
question is whether the overhead of doing that would be made up by
the work saved through reduced false positive serialization
failures.  It does not seem obvious which would win on overall
performance; in fact it seems very likely that it would depend on
the workload.

My initial thought is that since reducing the false positive rate
would only help when there was a high rate of conflicts under the
existing patch, and it would add code complexity and cost for the
case where conflict rate is low, that we might want to just leave
the current fix and see whether there are complaints from the field
about the false positive rate.

Reducing the rate of false positive serialization failures is a
worthy goal, but it's gotta make sense from a cost/benefit
perspective.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] out-of-order XID insertion in KnownAssignedXids

2016-10-24 Thread Kevin Grittner
On Mon, Oct 24, 2016 at 8:10 AM,  <fred...@huitfeldt.com> wrote:

> This was actually introduced some time back, and I am not completely certain
> how it crept into our codebase. I think that at least part of the
> explanation lies in the fact that we are experiencing a fair amount of
> growth in the database size and use on some of our installations. This could
> be the reason why extensive testing did not show the issue back then and why
> we are seeing it now.

If there is no checkpoint during the backup, you dodge the
corruption.  Higher update volume increases the frequency of
checkpoints and larger cluster size makes the backup take longer --
either of which would make corruption more likely.

> Would it make sense to log a warning in the case of a missing backup_label
> file, or would it be difficult to identify that situation in the code?

The problem is, without a backup_label file things look exactly
like a crash recovery, which is why it just goes to the last usable
checkpoint; that's the correct behavior for crash recovery.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] out-of-order XID insertion in KnownAssignedXids

2016-10-21 Thread Kevin Grittner
On Thu, Oct 20, 2016 at 8:21 AM,  <fred...@huitfeldt.com> wrote:

> Version : 9.2.13

You are missing over a year's worth of bug fixes.

https://www.postgresql.org/support/versioning/

> - remove a file called backup_label

http://tbeitr.blogspot.com/2015/07/deleting-backuplabel-on-restore-will.html

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] SERIALIZABLE and INSERTs with multiple VALUES

2016-10-14 Thread Kevin Grittner
On Thu, Oct 13, 2016 at 5:26 PM, Thomas Munro
<thomas.mu...@enterprisedb.com> wrote:
> On Fri, Oct 14, 2016 at 2:04 AM, Kevin Grittner <kgri...@gmail.com> wrote:

>> Where do you see a problem if REPEATABLE READ handles INSERT/ON
>> CONFLICT without error?

> I think the ON CONFLICT
> equivalent might be something like the following (rather contrived)
> schedule, which happily commits if you comment out Peter's check:
>
> (1)  postgres=# create table bank_account (id int primary key, cash int);
> (1)  CREATE TABLE
> (1)  postgres=# begin transaction isolation level serializable ;
> (1)  BEGIN
>
> (2)  postgres=# begin transaction isolation level serializable ;
> (2)  BEGIN
>
> (1)  postgres=# select * from bank_account where id = 1;
> (1)  ┌┬──┐
> (1)  │ id │ cash │
> (1)  ├┼──┤
> (1)  └┴──┘
> (1)  (0 rows)
>
> (2)  postgres=# insert into bank_account values (1, 100);
> (2)  INSERT 0 1
>
> (1)  postgres=# insert into bank_account values (1, 200) on conflict do 
> nothing;
> (1)  ...waits for tx2...
>
> (2)  postgres=# commit;
> (2)  COMMIT
>
> (1)  INSERT 0 0
> (1)  postgres=# commit;
> (1)  COMMIT
>
> If tx1 ran before tx2, then it would have succeeded in inserting (1,
> 200), and tx2 would have failed with unique_violation.  If tx2 ran
> before tx1, then tx1's SELECT command would have seen (1, 100) and
> possibly taken a different course of action.  So this schedule is
> non-serializable, right?

Right.  This is a case that needs something done if we take out the
rather overzealous check that is there now.  Thanks for finding an
example.  The trick now is to generalize to find the boundaries of
what is a problem and what isn't, so we can know what we are aiming
for as an "ideal" solution, and compare possible solutions for how
close they come.

> If you remove ON CONFLICT DO NOTHING, then tx1 gets a unique_violation
> after tx2 commits, which is similar to the last case in
> read-write-unique-4.spec.  To be able to produce a cycle that SSI can
> detect, perhaps an INSERT containing an implicit uniqueness check
> would need to be modelled as a read followed by a write.  I couldn't
> make that work, but I'm not sure if it's sensible anyway: wouldn't
> overlapping transactions consisting of just a single INSERT with the
> same key then produce a false positive, instead of unique_violation in
> one transaction?

If two transactions simultaneously attempted an INSERT of the same
key, one would block (as it would now) and if the other
successfully committed the blocked transaction would then get a
serialization failure error.  If the transactions did not overlap
you would get a duplicate key error.  That would arguably be nicer
behavior than we have now.  I think that if, within a serializable
transaction, we internally add a predicate lock for each page as we
descend to the point of insertion on a unique index, we might get
exactly that behavior.

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] SERIALIZABLE and INSERTs with multiple VALUES

2016-10-13 Thread Kevin Grittner
On Thu, Oct 13, 2016 at 3:16 PM, Kevin Grittner <kgri...@gmail.com> wrote:
> On Thu, Oct 13, 2016 at 2:16 PM, Peter Geoghegan <p...@bowt.ie> wrote:

>> We must still determine if a fix along the lines of the one proposed
>> by Thomas is basically acceptable (that is, that it does not clearly
>> break any documented guarantees, even if it is overly strict).
>> Separately, I'd be interested in seeing how specifically we could do
>> better with the patch that you have in the works for this.
>
> Basically, rather than just failing, I think we should call
> CheckForSerializableConflictOut() (which determines whether the
> tuple we are reading causes a rw-conflict between our current
> transaction and the transaction which last wrote that tuple) and
> PredicateLockTuple() (which tells later updates or deletes that
> we've read the tuple).

I'm wondering whether the error is appropriate on the INSERT ... ON
CONFLICT UPDATE case.  For example, with
ExecCheckHeapTupleVisible() commented out, this does not violate
the SERIALIZABLE requirements (which is that the behavior of any
set of concurrent serializable transactions which successfully
commit must be consistent with running them one at a time in some
order):

CREATE TABLE with_pk (i integer PRIMARY KEY, v text);

-- T1:
START TRANSACTION ISOLATION LEVEL SERIALIZABLE;
INSERT INTO with_pk VALUES (4) ON CONFLICT DO NOTHING;

-- T2:
START TRANSACTION ISOLATION LEVEL SERIALIZABLE;
INSERT INTO with_pk AS t VALUES (8)
  ON CONFLICT (i) DO UPDATE SET v = 'updated';
-- T2 blocks, waiting for T1

-- T1:
COMMIT;
-- T2 unblocks and does the "ON CONFLICT ... UPDATE"

-- T2:
COMMIT;

It seems to me that the result is consistent with T1 -> T2.  There
is no cycle in the apparent order of execution, and no error is
needed.  Can you show a case where there is a problem?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] SERIALIZABLE and INSERTs with multiple VALUES

2016-10-13 Thread Kevin Grittner
On Thu, Oct 13, 2016 at 2:16 PM, Peter Geoghegan <p...@bowt.ie> wrote:
> On Thu, Oct 13, 2016 at 6:19 AM, Kevin Grittner <kgri...@gmail.com> wrote:

>> Every situation that generates a false positive hurts performance;
>> we went to great lengths to minimize those cases.

>> To generate a
>> serialization failure on a single transaction has to be considered
>> a bug, because a retry *CAN NOT SUCCEED*!  This is likely to break
>> many frameworks designed to work with serializable transactions.
>
> It sounds like you're talking about the original complaint about a
> multi-value INSERT. It took me a minute to decide that that's probably
> what you meant, because everyone already agrees that that isn't okay
> -- you don't need to convince me.

That second part, yeah -- that's about generating a serialization
failure with one transaction.  It's pretty bad if you can't get a
set that contains one transaction to behave as though the
transactions in that set were run one at a time.  ;-)

> We must still determine if a fix along the lines of the one proposed
> by Thomas is basically acceptable (that is, that it does not clearly
> break any documented guarantees, even if it is overly strict).
> Separately, I'd be interested in seeing how specifically we could do
> better with the patch that you have in the works for this.

Basically, rather than just failing, I think we should call
CheckForSerializableConflictOut() (which determines whether the
tuple we are reading causes a rw-conflict between our current
transaction and the transaction which last wrote that tuple) and
PredicateLockTuple() (which tells later updates or deletes that
we've read the tuple).

> In general, I see value in reducing false positives, but I don't
> understand why your concern here isn't just about preferring to keep
> them to a minimum (doing our best).

That's exactly what I want to do, rather that what is the easiest
and first thing to come to mind.

> In other words, I don't understand
> why these false positives are special, and I'm still not even clear on
> whether you are actually arguing that they are special. (Except, of
> course, the multi-value case -- that's clearly not okay.)
>
> So, with the fix proposed by Thomas applied, will there be any
> remaining false positives that are qualitatively different to existing
> false positive cases? And, if so, how?

The INSERT ... ON CONFLICT DO NOTHING case does not write the
tuple, so this would be the first place we would be generating a
"write conflict" when we're not writing a tuple.  (You might argue
that "behind the scenes we write a tuple that disappears
automagically, but that's an implementation detail that might
someday change and should not be something users need to think
about a lot.)  We put a lot of effort into minimizing false
positives everywhere we could, and I'm not sure why you seem to be
arguing that we should not do so here.  If it proves impractical to
"do it right", we would not destroy logical correctness by using
the patch Thomas proposed, but we would increase the number of
transaction rollbacks and retries, which has a performance hit.

BTW, feel free to post a fix for the locking issue separately
when/if you have one.  I'm not looking at that for the moment,
since it sounded like you had already looked at it and were working
on something.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] SERIALIZABLE and INSERTs with multiple VALUES

2016-10-13 Thread Kevin Grittner
On Wed, Oct 12, 2016 at 8:32 PM, Peter Geoghegan <p...@bowt.ie> wrote:
> On Wed, Oct 12, 2016 at 6:06 PM, Thomas Munro <thomas.mu...@enterprisedb.com> 
> wrote:
>> But yeah, the existing code raises false positive serialization
>> failures under SERIALIZABLE, and that's visible in the isolation test
>> I posted: there is actually a serial order of those transactions with
>> the same result.
>
> I was under the impression that false positives of this kind are
> allowed by SSI. Why focus on this false positive scenario in
> particular?

Every situation that generates a false positive hurts performance;
we went to great lengths to minimize those cases.  In addition, we
made sure that at the point that a serialization failure is
returned, that retrying the transaction from the start could not
fail on the same combination of transactions, by ensuring that at
least one transaction in the set had successfully committed, and
that it was a transaction which had done writes.  To generate a
serialization failure on a single transaction has to be considered
a bug, because a retry *CAN NOT SUCCEED*!  This is likely to break
many frameworks designed to work with serializable transactions.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] SERIALIZABLE and INSERTs with multiple VALUES

2016-10-13 Thread Kevin Grittner
On Wed, Oct 12, 2016 at 8:06 PM, Thomas Munro
<thomas.mu...@enterprisedb.com> wrote:
> On Thu, Oct 13, 2016 at 10:06 AM, Kevin Grittner <kgri...@gmail.com> wrote:
>> On Wed, Oct 12, 2016 at 3:02 PM, Peter Geoghegan <p...@bowt.ie> wrote:
>>
>>> I agree that the multi-value case is a bug.
>>
>>> I think that it should be pretty obvious to you why the check exists
>>> at all, Kevin. It exists because it would be improper to decide to
>>> take the DO NOTHING path on the basis of some other committed tuple
>>> existing that is not visible to the original MVCC snapshot, at higher
>>> isolation levels.
>>
>> That's only true if it causes a cycle in the apparent order of
>> execution.  If we rip out the check what we have is behavior
>> completely consistent with the other transaction executing first;
>> in other words, it creates a read-write dependency (a/k/a
>> rw-conflict) from the current transaction to the concurrent
>> transaction which succeeds in its insert.  That may or may not
>> cause a cycle, depending on what else is happening.
>
> The "higher isolation levels" probably shouldn't be treated the same way.
>
> I think Peter's right about REPEATABLE READ.  We should definitely
> raise the error immediately as we do in that level, because our RR
> (SI) doesn't care about write skew and all that stuff, it just
> promises that you can only see data in your snapshot.

But the whole point of the special code for both RI and INSERT/ON
CONFLICT is to get "underneath" that and provide a "primitive" that
can see things an application statement can't, for better
performance and error handling.  What SERIALIZABLE promises is that
it runs exactly the same as REPEATABLE READ, but with some
additional monitoring for serialization failure errors in some
places that REPEATABLE READ does not generate them -- this would be
the first and only place that SERIALIZABLE would break that model.
The idea seems completely wrong and arbitrary.

Where do you see a problem if REPEATABLE READ handles INSERT/ON
CONFLICT without error?  In many cases it would actually be
providing a result consistent with a serial execution of the
transactions; and where it doesn't, it would be the same anomalies
that are possible with anything else under REPEATABLE READ.

> We can't allow you to take a different course of action based on
> data that your snapshot can't see,

But that is exactly what INSERT/ON CONFLICT always does!  That is
the only way to avoid the so-called "unprincipled deadlocks" the
feature aims to avoid.

> so the only reasonable thing to do is abandon ship.

I disagree.

> But yeah, the existing code raises false positive serialization
> failures under SERIALIZABLE, and that's visible in the isolation test
> I posted: there is actually a serial order of those transactions with
> the same result.

Exactly.  The error based on the write conflict with ON CONFLICT DO
NOTHING in your patch is really a false positive.  That doesn't
break correctness, but it hurts performance, so it should be
avoided if possible.

> When working on commit fcff8a57 I became suspicious of the way ON
> CONFLICT interacts with SSI, as I mentioned in passing back then[1],
> thinking mainly of false negatives.  I failed to find a
> non-serializable schedule involving ON CONFLICT that was allowed to
> run, though I didn't spend much time on it.   One thing that worries
> me is the final permutation of read-write-unique-4.spec, which
> produces an arguably spurious UCV, that is, a transaction that doesn't
> commit but raises a UCV instead of the serialization failure you might
> expect.  The ON CONFLICT equivalent might be a transaction that takes
> the ON CONFLICT path and then commits, even though it should be
> considered non-serializable.  I would really like to understand that
> case better, and until then I wouldn't bet my boots that it isn't
> possible to commit anomalies using ON CONFLICT under SERIALIZABLE
> without Peter's check (or even with it), despite the fact that it
> reaches predicate locking code via heap_fetch etc.

Hm.  With the duplicate key error I fail to see how any anomaly
could make it to a committed state in the database, although I
agree it is unfortunate that there is that one case where it really
should be considered a serialization failure that we haven't yet
coerced to yield that instead of the duplicate key error.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] SERIALIZABLE and INSERTs with multiple VALUES

2016-10-13 Thread Kevin Grittner
On Wed, Oct 12, 2016 at 5:21 PM, Peter Geoghegan <p...@bowt.ie> wrote:
> On Wed, Oct 12, 2016 at 2:06 PM, Kevin Grittner <kgri...@gmail.com> wrote:
>> If the "proper" fix is impossible (or just too freaking ugly) we
>> might fall back on the fix Thomas suggested, but I would like to
>> take advantage of the "special properties" of the INSERT/ON
>> CONFLICT DO NOTHING code to avoid false positives where we can.
>
> Do you intend to propose a patch to do that?

Yes, I'm working on that.

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] SERIALIZABLE and INSERTs with multiple VALUES

2016-10-12 Thread Kevin Grittner
On Wed, Oct 12, 2016 at 3:55 PM, Peter Geoghegan <p...@bowt.ie> wrote:
> On Wed, Oct 12, 2016 at 1:41 PM, Kevin Grittner <kgri...@gmail.com> wrote:
>> Aren't these two completely separate and independent bugs?
>
> Technically they are, but they are both isolated to the same small
> function. Surely it's better to fix them both at once?

If the code is hopelessly intertwined, maybe.  Generally it is
better to fix two independent bugs with two independent patches.
It documents things better and make each fix easier to understand.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] SERIALIZABLE and INSERTs with multiple VALUES

2016-10-12 Thread Kevin Grittner
On Wed, Oct 12, 2016 at 3:02 PM, Peter Geoghegan <p...@bowt.ie> wrote:

> I agree that the multi-value case is a bug.

> I think that it should be pretty obvious to you why the check exists
> at all, Kevin. It exists because it would be improper to decide to
> take the DO NOTHING path on the basis of some other committed tuple
> existing that is not visible to the original MVCC snapshot, at higher
> isolation levels.

That's only true if it causes a cycle in the apparent order of
execution.  If we rip out the check what we have is behavior
completely consistent with the other transaction executing first;
in other words, it creates a read-write dependency (a/k/a
rw-conflict) from the current transaction to the concurrent
transaction which succeeds in its insert.  That may or may not
cause a cycle, depending on what else is happening.

Now, generating a write conflict serialization failure will prevent
serialization anomalies, but unless I'm missing something it is a
cruder test than is required, and will generate false positives in
simple cases like Thomas created.  What I think would be more
appropriate is to record the dependency and test for a
serialization failure.

> There is a similar consideration for DO UPDATE.

But in DO UPDATE the current transaction is writing something that
the other transaction attempted to read, so that *does* create
write skew and its attendant anomalies.  There I think we *do* need
to throw a serialization failure error.

> I'm slightly surprised that you're contemplating just ripping the check out.

The lack of any regression tests to demonstrate the failure the
code is preventing puts the burden on others to figure it out fresh
each time, which had me a little miffed.  Note that you are
mis-quoting me a bit -- I said "if I can't find some justification
for this function I will rip it (and the calls to it) out of the
code."  I was still looking.  I don't think the write conflict
justifies it, but the rw-conflict and attendant risk of a cycle in
apparent order of execution does.

If the "proper" fix is impossible (or just too freaking ugly) we
might fall back on the fix Thomas suggested, but I would like to
take advantage of the "special properties" of the INSERT/ON
CONFLICT DO NOTHING code to avoid false positives where we can.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] SERIALIZABLE and INSERTs with multiple VALUES

2016-10-12 Thread Kevin Grittner
On Wed, Oct 12, 2016 at 3:07 PM, Peter Geoghegan <p...@bowt.ie> wrote:
> On Wed, Oct 12, 2016 at 1:05 PM, Thomas Munro
> <thomas.mu...@enterprisedb.com> wrote:
>> Here's a patch that shows one way to fix it.  I think it does make
>> sense to change this, because otherwise automatic
>> retry-on-serialization-failure strategies will be befuddle by this
>> doomed transaction.  And as you and Vitaly have said, there is
>> literally no concurrent update.
>
> I think that you have the right idea, but we still need to fix that
> buffer lock bug I mentioned...

Aren't these two completely separate and independent bugs?

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] SERIALIZABLE and INSERTs with multiple VALUES

2016-10-12 Thread Kevin Grittner
On Wed, Oct 12, 2016 at 10:06 AM, Kevin Grittner <kgri...@gmail.com> wrote:

> The test in ExecCheckHeapTupleVisible() seems wrong to me.  It's
> not immediately obvious what the proper fix is.

To identify what cases ExecCheckHeapTupleVisible() was meant to
cover I commented out the body of the function to see which
regression tests failed.  None did.  The failures shown on this
thread are fixed by doing so.  If there is really a need for this
function, speak up now and provide a test case showing what is
broken without it; otherwise if I can't find some justification for
this function I will rip it (and the calls to it) out of the code.
If you do have some test case showing what breaks without the
function, let's get it added to the regression tests!

I'm currently running `make check-world` with TAP tests enabled,
just in case there is some test there which demonstrates the need
for this.  It seems unlikely that such a test would be under the
TAP tests, but I'm making sure...

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] SERIALIZABLE and INSERTs with multiple VALUES

2016-10-12 Thread Kevin Grittner
[adding Peter Geoghegan to the email addresses]

On Wed, Oct 12, 2016 at 7:11 AM, Vitaly Burovoy <vitaly.buro...@gmail.com>
wrote:
> On 10/12/16, Thomas Munro <thomas.mu...@enterprisedb.com> wrote:

>> This happens in both SERIALIZABLE and REPEATABLE READ when a single
>> command inserts conflicting rows with an ON CONFLICT cause, and it
>> comes from the check in ExecCheckHeapTupleVisible whose comment says:
>>
>> /*
>>  * ExecCheckHeapTupleVisible -- verify heap tuple is visible
>>  *
>>  * It would not be consistent with guarantees of the higher isolation
levels to
>>  * proceed with avoiding insertion (taking speculative insertion's
alternative
>>  * path) on the basis of another tuple that is not visible to MVCC
snapshot.
>>  * Check for the need to raise a serialization failure, and do so as
necessary.
>>  */
>>
>> So it seems to be working as designed.  Perhaps someone could argue
>> that you should make an exception for tuples inserted by the current
>> command.
>
> I disagree. It is designed to prevent updating a tuple which was
> updated in a parallel transaction which has been just committed.
> This case is a little bit different: this tuple has been inserted in
> the same transaction in the same command.
> I think it is an obvious bug because there is no "concurrent update".
> There is no update at all.

Yeah, the concept of a serialization failure is that the
transaction would have succeeded except for the actions of a
concurrent transaction.  It is supposed to indicate that a retry
has a chance to succeed, because the conflicting transaction is
likely to have completed.  To generate a "serialization failure"
(or, IMO, any error with a SQLSTATE in class "40") from the actions
of a single transaction is completely wrong, and should be
considered a bug.

> ON CONFLICT handling just does not cover all possible ways which can
happen.
> Normally (without "ON CONFLICT" clause) INSERT raises "duplicate key
> value violates unique constraint" and doesn't run to
> "ExecCheckHeapTupleVisible" check.
> The "ExecInsert" handles constraint checks but not later checks like
> ExecCheckHeapTupleVisible.

The test in ExecCheckHeapTupleVisible() seems wrong to me.  It's
not immediately obvious what the proper fix is.  Peter, do you have
any ideas on this?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [GENERAL] SERIALIZABLE and INSERTs with multiple VALUES

2016-10-12 Thread Kevin Grittner
On Wed, Oct 12, 2016 at 2:50 AM, Albe Laurenz <laurenz.a...@wien.gv.at> wrote:
> Kevin Grittner wrote:

>> I don't see that on development HEAD.  What version are you
>> running?  What is your setting for default_transaction_isolation?
>
> The subject says SERIALIZABLE, and I can see it on my 9.5.4 database:

Oh, I see -- it doesn't happen if the row already exists at the
start of the transaction, which it does if you run the OP's entire
sample.  If you skip the transaction in the middle of his sample,
the error occurs.

> test=> CREATE TABLE with_pk (i integer PRIMARY KEY);
> CREATE TABLE
> test=> START TRANSACTION ISOLATION LEVEL SERIALIZABLE;
> START TRANSACTION
> test=> INSERT INTO with_pk VALUES (2), (2) ON CONFLICT DO NOTHING;
> ERROR:  could not serialize access due to concurrent update

Or that, as a nice, self-contained test case.  :-)

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] SERIALIZABLE and INSERTs with multiple VALUES

2016-10-11 Thread Kevin Grittner
On Tue, Oct 11, 2016 at 2:29 PM, Jason Dusek <jason.du...@gmail.com> wrote:

> I notice the following oddity:

>  =# CREATE TABLE with_pk (i integer PRIMARY KEY);
> CREATE TABLE

>  =# BEGIN;
> BEGIN
>  =# INSERT INTO with_pk VALUES (2), (2) ON CONFLICT DO NOTHING;
> ERROR:  could not serialize access due to concurrent update
>  =# END;
> ROLLBACK

I don't see that on development HEAD.  What version are you
running?  What is your setting for default_transaction_isolation?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Transactional-DDL DROP/CREATE TABLE

2016-10-06 Thread Kevin Grittner
On Thu, Oct 6, 2016 at 9:31 AM, Adrian Klaver <adrian.kla...@aklaver.com> wrote:

> This is how I can trigger the ERROR:
>
> Session 1:
>
> test=# begin ;
> BEGIN
> test=# drop table if exists ddl_test;
> NOTICE:  table "ddl_test" does not exist, skipping
> DROP TABLE
> test=# create table ddl_test(id int);
> CREATE TABLE
> test=# commit ;
> COMMIT
>
> Session 2 (concurrent to session1):
>
> test=# begin ;
> BEGIN
> test=# drop table if exists ddl_test;
> NOTICE:  table "ddl_test" does not exist, skipping
> DROP TABLE
> test=# create table ddl_test(id int);
> ERROR:  duplicate key value violates unique constraint
> "pg_type_typname_nsp_index"
> DETAIL:  Key (typname, typnamespace)=(ddl_test, 2200) already exists.
> test=# commit ;
> ROLLBACK

I recommend using a transactional advisory lock to serialize these.

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Time travel?

2016-09-29 Thread Kevin Grittner
On Thu, Sep 29, 2016 at 2:06 PM, Melvin Davidson <melvin6...@gmail.com> wrote:

> I am using PostgreSQL 9.4.0 (Yes, I know 9.4.9 is most recent but
> it's out of my control)

As long as the decision-maker is aware that 9.4.0 has known bugs
(fixed in later minor releases) that can render the database
unusable without warning...

> How is it possible for the WAL file to be accessed BEFORE it was
> created?

Perhaps renaming it counts as "creation" without affecting access
time.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] isnull() function in pgAdmin3

2016-09-28 Thread Kevin Grittner
On Tue, Sep 27, 2016 at 3:12 PM, dudedoe01 <marsala...@gmail.com> wrote:
> The tables I migrated from MySQL into postgreSQL have exactly the same amount
> of rows of data so the problem is inside the view being created.

Have you actually confirmed that there are any cases where
isnull(expression) yields a different result than (expression) is
null when the expression gives the same value?  I'm suspicious that
an expression could be yielding a different result, perhaps based
on join conditions handling comparisons between null values
differently.  Remember, for example, that in PostgreSQL NULL = NULL
does not evaluate to TRUE.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Update two tables returning id from insert CTE Query

2016-09-27 Thread Kevin Grittner
On Tue, Sep 27, 2016 at 3:33 PM, Patrick B <patrickbake...@gmail.com> wrote:

> 4 - Get the table_1.c_id from the STEP NUMBER 2 and put it into table_2.c_id
> - This is the problem.. how can I get the inserted id from STEP2 and put it
> into c_id respecting the order?

For DML you need to think of the data as being unordered sets, not
ordered lists.  The whole concept of a relational database is that
related rows can be associated through their common data values.
You are splitting them apart and then trying to match them up again
to link them back together.  You will be better off if you can
leave the relationship intact all the way through -- perhaps by
adding name_last to table_1.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Update two tables returning id from insert CTE Query

2016-09-27 Thread Kevin Grittner
On Tue, Sep 27, 2016 at 2:59 PM, Patrick B <patrickbake...@gmail.com> wrote:

[sel is a relation which can have multiple rows; the fact that it
is being generated in a CTE isn't relevant for purposes of the
error.]

>>>> UPDATE table_2 SET c_id =
>>>> (
>>>>   SELECT c_id
>>>>   FROM sel
>>>>   ORDER BY c_id
>>>> )
>>>> WHERE clientid = 124312;

>>>> ERROR:  more than one row returned by a subquery used as an expression

> isn't clear what I'm trying to achieve?

Nope.

> That's what I need, I just want a way to do that, as the way I'm
> doing isn't working.

You are specifying that you want to assign all the "c_id" values
from the "sel" relation to the "c_id" column in "table2" for any
and all rows which have a "clientid" value of 124312.  Effectively
the database is complaining that it can only store one value, not a
set of values.  I can only guess at what you might be intending to
ask the database to do.  Can you explain what you are trying to do?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] isnull() function in pgAdmin3

2016-09-27 Thread Kevin Grittner
On Tue, Sep 27, 2016 at 12:04 PM, dudedoe01 <marsala...@gmail.com> wrote:

> I am trying to emulate the isnull() function used in MySQL into postreSQL.

In the future, please describe the results you are trying to achieve
or at least describe the semantics of the function from elsewhere that
you are trying to emulate.  I had no idea what the ISNLL() function of
MySQL does; on a quick web search, it looks like you can replace:
  isnull(expression)

with:

  (expression) IS NULL


> In MySQL:
>
> (case
> when
> ((`s`.`Funding_Date` = '')
> and (isnull(`s`.`Actual_Close_Date`)
> or (`s`.`Actual_Close_Date` = '')))
> then
> 'RPG_INV'
> when
> ((isnull(`s`.`Funding_Date`)
> or (`s`.`Funding_Date` <> ''))
> and ((`s`.`Actual_Close_Date` = '')
> or isnull(`s`.`Actual_Close_Date`)))
> then
> 'Builder_Inventory'
> else 'Owner_Inventory'
> end) AS `Lot_Status`,

In PostgreSQL perhaps:

(case
when
(("s"."Funding_Date" = '')
and (("s"."Actual_Close_Date") is null
or ("s"."Actual_Close_Date" = '')))
then
'RPG_INV'
when
((("s"."Funding_Date") is null
or ("s"."Funding_Date" <> ''))
and (("s"."Actual_Close_Date" = '')
    or ("s"."Actual_Close_Date") is null))
then
'Builder_Inventory'
else 'Owner_Inventory'
end) AS "Lot_Status",

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Incrementally refreshed materialized view

2016-09-26 Thread Kevin Grittner
On Mon, Sep 26, 2016 at 3:16 PM, Adam Brusselback
<adambrusselb...@gmail.com> wrote:

> Well I feel like I've learned a ton already reading through the links you
> provided earlier and that example above.

Yeah, I know that example can really help show what will happen
"under the covers", and make it more concrete.  The theory that
it's based on seems brilliant to me.  That fact that it's based on
relational algebra (RA) means that it is much more likely to be
implemented in a complete and correct manner than something ad hoc.
 I know I started at this with a try (similar to yours) at
analyzing from scratch, then went and searched the literature.
When I came back to my ad hoc notes, the RA pointed out some holes
in the logic where corner cases would have been wrong because of
missed details.  RA is well thought out and solid; it seems to me
to be the perfect thing to underlie a solution to this problem.

> I'm very interested in getting this into core. I'll look into what I need to
> do to review. Not crazy familiar with C, as I mainly do Java development.
> I'll see if I can help in any way though.

Just testing it and reviewing the documentation for errors,
omissions, or just plain hard-to-follow language would be a big
help.  Please post back about any problems getting things patched
and build.

> The main reason I was working on an alternative is because I need something
> now rather than in a couple years, but I've been dealing with manually
> creating the few I do need for my database. What I proposed above was just
> me thinking about what could be done with things as they are. Obviously it's
> junk compared to a real solution in-core.  Would you consider my approach
> even worth trying, or should I just suck it up and do things manually for
> now and put that effort into getting incremental refresh into core?

Oh, I've used plain tables and triggers many times myself.  If you
need something now, you kind of have to go that route.  The
algorithms I cited do provide an interesting potential alternative
for how to go about that, although operating a row at a time you
probably won't approach the speed of statement-level set logic for
statements that affect very many rows.  :-(

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Incrementally refreshed materialized view

2016-09-26 Thread Kevin Grittner
 (
   SELECT delta.src, before.dst, delta."count(t)" * 1
 FROM "Δ(hop)" delta
 JOIN link1 before ON (before.src = delta.dst)
   UNION ALL
   SELECT after.src, delta.dst, 1 * delta."count(t)"
 FROM hop2 after
 JOIN "Δ(link)" delta ON (delta.src = after.dst)
 ) x(src, dst, "count(t)")
GROUP BY src, dst
HAVING sum("count(t)") <> 0;
SELECT * FROM "Δ(tri_hop)" ORDER BY 1, 2;

-- Now we're done with snapshots and all but the highest-level delta.
DROP TABLE link1, link2, "Δ(link)";
DROP TABLE hop1, hop2, "Δ(hop)";

-- At some later time the MV delta is processed "behind the scenes".
-- We can't do the demonstration maintenance against the MV, so copy it.
CREATE TEMP TABLE tri_hop2 AS SELECT * FROM tri_hop;
BEGIN;
DELETE FROM tri_hop2 t1
  USING "Δ(tri_hop)" t2
  WHERE t1.src = t2.src
AND t1.dst = t2.dst
AND t1."count(t)" + t2."count(t)" = 0;
UPDATE tri_hop2 t1
  SET "count(t)" = t1."count(t)" + t2."count(t)"
  FROM "Δ(tri_hop)" t2
  WHERE t1.src = t2.src
AND t1.dst = t2.dst;
INSERT INTO tri_hop2
  SELECT * FROM "Δ(tri_hop)" t1
WHERE "count(t)" > 0
  AND NOT EXISTS (SELECT * FROM tri_hop2 t2 WHERE t2.src = t1.src
AND t2.dst = t1.dst);
COMMIT;

-- Show that we got to the same result with incremental maintenance as
with REFRESH.
REFRESH MATERIALIZED VIEW tri_hop;
SELECT * FROM tri_hop ORDER BY 1, 2;
SELECT * FROM tri_hop2 ORDER BY 1, 2;

-- Now we're done with the MV delta.
DROP TABLE tri_hop2, "Δ(tri_hop)";


-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Incrementally refreshed materialized view

2016-09-26 Thread Kevin Grittner
On Mon, Sep 26, 2016 at 1:22 PM, Melvin Davidson <melvin6...@gmail.com> wrote:
> On Mon, Sep 26, 2016 at 1:56 PM, Adam Brusselback <adambrusselb...@gmail.com> 
> wrote:
>>
>> I am working on a plan to implement incrementally refreshed
>> materialized "views" with the existing functionality in
>> Postgres.
>>
>> Below is the plan for doing that:

>> [design from scratch, incomplete]

>> I am however stuck on: How do we know what to refresh?

>> Pretty much, I need to figure out how to follow the joins in the
>> view back to whatever key was defined as the "refresh key" for
>> each dependent table.  I know about the
>> information_schema.view_column_usage, but I don't think that'll
>> get me everything I need.
>>
>> I'd really appreciate any help with this, as i'd love a better
>> way to get eagerly refreshed materialized views in Postgres
>> rather than doing everything manually as I have to now.
>>
>> If I can provide any more info please let me know.

> I am a bit curious. Why are you reinventing the wheel?
> What is wrong with:
>
> REFRESH MATERIALIZED VIEW [ CONCURRENTLY ] name
> [ WITH [ NO ] DATA ]
>
> https://www.postgresql.org/docs/9.4/static/sql-refreshmaterializedview.html
>
> Can't you do that in a cron job?

Well, that is not *incremental* maintenance -- the entire query is
executed, with the resulting relation either replacing the previous
contents of the matview or "diffed" against the previous contents
(so that the difference can be applied with transactional
semantics), depending on whether CONCURRENTLY was specified.

The OP is still reinventing the wheel though.  A summary of
available techniques as of the mid-90s can be found here:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.40.2254=rep1=pdf

With some detail for what to me look like the two most promising
techniques here:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.31.3208=rep1=pdf

The first step in using either of those techniques (counting or
DRed) is to capture a delta relation to feed into the relational
algebra used by these techniques.  As a first step in that
direction I have been floating a patch to implement the
SQL-standard "transition tables" feature for AFTER triggers.

https://commitfest.postgresql.org/10/778/

If you want to help that effort, reading the thread and reviewing
the patch would be good.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] COL unique (CustomerID) plus COL unique (COUNT) inside CustomerID

2016-09-01 Thread Kevin Grittner
On Thu, Sep 1, 2016 at 4:28 AM, Michelle Konzack
<linux4miche...@gmail.com> wrote:

> I need a table with an UNIQUE CustomerID which is working fine...
>
> ...BUT I need also a second column with a count, which  must  be  UNIQUE
> inside the CustomerID.

Just to be clear, you probably have a customer table with "CustIomerId"
as its key and are talking about a "child"of that where you want a
2-column key?

> In clear this:
>
> CustID   Count
>  1   1
>  1   2
>  1   3
>
>  2   1
>  2   2
>
>  3   1
>  3   2
>  3   3
>  ...
>
> How to do this?

If, for example, the child table is a list of customer contacts,
you might add a "LastContactNo" column to the customer table,
defaulting to zero on customer insert, and which you increment to
get values for the second key column in the contact table.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Understanding Postgres Memory Usage

2016-08-25 Thread Kevin Grittner
On Thu, Aug 25, 2016 at 1:25 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Hmm.  I find it mighty suspicious that the USS, PSS, and RSS numbers are
> all increasing to pretty much the same tune, ie from very little to circa
> 100MB.  I think there is a decent chance that smem is not doing what it
> says on the tin, and in fact is including shared memory consumption in
> "USS".  In which case the apparent leak just corresponds to the process
> gradually touching more and more of the shared buffer arena.  (If your
> shared_buffers settings is not somewhere near 100MB, then this theory
> breaks down.)

I can't speak to every implementation of smem, but I have used it
quite a bit under SLES and Ubuntu, and it always seemed to do what
it says -- USS is unshared (process-local) memory and PSS is that
plus the process's portion of shared memory.  (The sum of
differences between PSS and USS == total shared memory.)  RSS has
the usual meaning.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] incorrect checksum detected on "global/pg_filenode.map" when VACUUM FULL is executed

2016-08-22 Thread Kevin Grittner
On Mon, Aug 22, 2016 at 3:02 AM, Michael Paquier
<michael.paqu...@gmail.com> wrote:
> On Mon, Aug 22, 2016 at 4:45 PM, Tatsuki Kadomoto
> <tatsuki.kadom...@proceranetworks.com> wrote:
>> Thanks for suggestion for upgrade. I know that's the way to go, but it's not
>> so easy due to circumstances on my side.
>
> Well, I guess it depends on how much you care about your data.

Right.  Make sure that whoever is responsible for this decision
knows that until they upgrade they are running with known bugs
which could render the database unusable without warning.  It
should at least be an informed decision so that the decision-maker
can stand behind it and feel as good as possible about
circumstances should that happen.

You might want to keep a copy of the email or memo in which you
point this out, in case anyone's memory gets foggy during such a
crisis.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Serializable read and blocking

2016-08-11 Thread Kevin Grittner
On Thu, Aug 11, 2016 at 7:11 AM, Rakesh Kumar
<rakeshkumar46...@gmail.com> wrote:

> The question is re the following claim:
>
> - Readers do not block readers.
> - Readers do not block writers.
> - Writers do not block readers.
> - Writers may block writers.
>
>
> Are the above statements true even with SET TRANSACTION SERIALIZEABLE mode.
> I am specifically interested in the 3rd condition (- Writers do not
> block readers.)

Yes.

https://wiki.postgresql.org/wiki/SSI

http://vldb.org/pvldb/vol5/p1850_danrkports_vldb2012.pdf

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Column order in multi column primary key

2016-08-09 Thread Kevin Grittner
On Mon, Aug 8, 2016 at 5:59 PM, Craig Boucher <cr...@wesvic.com>
wrote:

> Thanks Kevin for your response.  I've Googled and debated natural
> vs surrogate keys and I just find surrogate keys easier to work
> with (maybe I'm just being lazy).  It just seems that a
> description or name is most often the natural key.  I just can't
> see, In my case, using a department description as part of the
> primary key in the department table and having it repeated in
> millions of rows.

I agree that would not make sense, but most organizations I've
worked with already have user-visible mnemonic or numeric codes for
such things.  Those make great keys, or columns within multi-column
keys.

Anyway, the synthetic key discussion was peripheral to my main
point, which was that you were looking at moving from something
that looked like 2nd or 3rd normal form down to 1st normal form,
which will open you up to whole new classes of data integrity
problems.  I strongly recommend you don't do that.  It looked like
the reason was to try to introduce more meaning into the key, which
is why I ventured into the synthetic key discussion in spite of it
being such an unfortunate trigger for rehashing old flame-wars.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Materialized view auto refresh

2016-08-09 Thread Kevin Grittner
On Tue, Aug 9, 2016 at 4:50 AM, hari.prasath <hari.pras...@zohocorp.com> wrote:

>   Is there any tentative schedule for real-time or incremental(only
> applying delta changes) refresh of materialized views.?.

There is work in progress, but no hard schedule.  Unfortunately, it
has often been set aside to address more immediate issues for
particular end users; but I expect to get back to it Real Soon Now.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Column order in multi column primary key

2016-08-08 Thread Kevin Grittner
On Mon, Aug 8, 2016 at 4:01 PM, Craig Boucher <cr...@wesvic.com> wrote:
> From: Tom Lane [mailto:t...@sss.pgh.pa.us]

>> I'm pretty skeptical of the notion of redefining what your PK
>> is on performance grounds.  With this definition, you'd allow
>> two entries with the same work_session_id, if they chanced to
>> have different customer_ids.  Is that really OK?

> It could actually be beneficial if we need to migrate a customer
> from one database to another because wouldn't have to worry about
> pk constraint violations.

Isn't "wouldn't have to worry about pk constraint violations"
another way of saying "we're OK with allowing the same logical row
to be present multiple times in the table with inconsistent data
among the various copies"?  You may want to read up on data
normalization and shoot for 3rd normal form and its benefits.

The short version you can use as a memory device when mentally
testing whether a key is write is that every non-key column should
be functionally dependent on the key, the whole key, and nothing
but the key ("so help you Codd").  That means that:
 (1) given the key values you can determine the values of all other
columns (1st normal form),
 (2) if you omit any of the key columns from the key you cannot
determine the values of all other columns (2nd normal form), and
 (3) no column's value may be determined from non-key columns, and
therefore only determined from the key indirectly (3rd normal form).

You are talking about degrading your normalization to 1st normal
form.  There is a wealth of literature on the problems that can
introduce.  What would be in line with well-established theory and
practice is looking for a "natural key" in each table -- some
combination of columns which naturally occur in the table which
uniquely identify the rows.  In some cases it is necessary to add
some "tie-breaker" column to the end of the key when you do this --
like a sequence within the group or a timestamp.

If you search the archives you will find periodic discussions of
the relative merits of this approach versus adding a meaningless
synthetic key (often called "id" in every table) to use by itself
as the primary key.  This is essentially the same as adding
"pointers" among the various records and constraining how
"navigation" among tables can happen.  It has a big performance
downside in generating statistics and large reports because it
requires a lot of pointer chasing.  Proponents will point out how
convenient it is to be able to change human-visible identifying
values, potentially on a very large scale, by modifying one column
of one row.  That is, of course, a double-edged sword -- in
discussing design alternatives with the CPAs who were going to be
auditing financial data stored in a database, they didn't tend to
see that as nearly as much of a plus as some programmers do.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Detecting if current transaction is modifying the database

2016-08-08 Thread Kevin Grittner
On Fri, Aug 5, 2016 at 4:24 PM, Christian Ohler <oh...@shift.com> wrote:

Your check for a exclusive self-lock on transactionid should work.
It may be possible to find a way to do it that is less expensive,
so I would definitely encapsulate that in a function; but off-hand
I'm not thinking of a better way.

You might be tempted to use the txid_current() function, but note
that it assigns a transaction ID if there is not yet one assigned.
That has prevented that function from being useful to me in every
case I've considered it so far; I wish we had a function that told
the current transaction ID and just returned NULL if none has yet
been assigned.  I'm not sure what the best name would be for such a
function when we already have a function called txid_current()
which does something different from that.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Allowing multiple versions of PG under NetBSD

2016-07-29 Thread Kevin Grittner
On Fri, Jul 29, 2016 at 3:22 PM, Larry Rosenman <l...@lerctr.org> wrote:

> I'm willing to help on the FreeBSD side.

One more tip -- if you are running multiple clusters (same version
or not) on the same machine, it is best to run each cluster under a
separate OS user.  It's not *required*, but it makes a restart
after a crash less problematic and it is generally better from a
security standpoint, so you might want to look for a way to allow
it.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Uber migrated from Postgres to MySQL

2016-07-29 Thread Kevin Grittner
On Fri, Jul 29, 2016 at 3:18 PM, Condor <con...@stz-bg.com> wrote:
> On 29-07-2016 20:33, Jerry Sievers wrote:

>> I've done several ~7TB pg_upgrades and with the hard link option and a
>> framework that parallelizes the post-analyzer phase...
>>
>> ...45 minutes till completion.

> GL to you

Luck has nothing to do with anything.  You really might want to
post with more details and see whether people can help sort out why
you have seen such slow performance where so many others have not.
I hope it's not just a matter of saying "I have a 1TB database and
upgrade is slow, therefore it is slow because it is 1TB."  That
would be roughly the equivalent of saying "I have a blue car and it
is slow, therefore it is slow because it is blue."  It just might
be the flat tire that actually matters.  If your upgrade is slow
because you have 10 million database objects, that might be a hard
one to overcome, but it might be something with an easy solution in
the pg_upgrade options or server configuration.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Uber migrated from Postgres to MySQL

2016-07-27 Thread Kevin Grittner
On Wed, Jul 27, 2016 at 4:34 PM, Patrick B <patrickbake...@gmail.com> wrote:
> We can't use the pg_upgrade in our 3TB database just does not work..
> that's the main reason we're still using 9.2.

I think it's safe to say that that has absolutely nothing to do
with the size being 3TB.  They symptoms you report are a little
thin to diagnose the actual cause.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] question on parsing postgres sql queries

2016-07-27 Thread Kevin Grittner
On Tue, Jul 26, 2016 at 4:20 AM, Jerome Wagner <jerome.wag...@m4x.org> wrote:

> I am doing some research on postgres sql query parsing.

> I was wondering what people think of the conformance with regards to the
> real parser of the documentations on
>  - https://www.postgresql.org/docs/current/static/sql-select.html
>  - https://www.postgresql.org/docs/current/static/sql-copy.html
> ... and more generally sgmls in
> https://github.com/postgres/postgres/tree/master/doc/src/sgml/ref
>
> Would it make sense to use these sgml synopsis as some kind of source of
> truth, parse them, and automatically generate a parser for a specifc
> language ?

It might be interesting to do as an academic exercise or to audit
the accuracy of the synopses, but I don't think it's practical for
generating production-quality parsers -- at least in the short
term.  Besides issues mentioned by others (e.g., parser support for
legacy syntax we don't want to document or encourage), we sometimes
allow things through the parser so that we can code more
user-friendly messages off of the parse tree than a generated
parser would provide.

I also don't remember seeing anyone mention the problems with
forward references and metadata from system catalogs.  These either
need to be handled by a "rewind and try again" approach or (better
IMO) an additional pass or two walking the parse tree to emit a
version where generic "place-holders" are replaced by something
more specific.  See the "parse analysis" and "rewrite" steps in
PostgreSQL for how that is currently handled.  Before working in
the PostgreSQL source I had helped develop a SQL parser in ANTLR,
where the same basic parser generator is used for lexer, parser,
and tree-walker phases (using pretty much the same grammar
specifier for all of them), just taking characters, tokens, or
parse tree nodes as input -- automatic generation of "main" parser
might be feasible in such an environment (possibly with some sort
of annotations or hand-written light initial parsing phase), but I
think the later tree walkers would need to be hand-coded.

> I feel like the conformance level of the documentation is high and that the
> sgml synopis seem to be nearly programmatically sufficient to create
> parsers.
>
> what do you think ?

Nearly.

> Could the parser commiters share some lights on how the documentation
> process interacts with the parser commits ?

There is no automated interaction there -- it depends on human
attention.  On the other hand, try connecting to a database with
psql and typing:

\h create index

... (or any other command name).  The help you get there is fished
out of the docs.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Uber migrated from Postgres to MySQL

2016-07-27 Thread Kevin Grittner
On Wed, Jul 27, 2016 at 9:22 AM, Scott Mead <sco...@openscg.com> wrote:
> On Wed, Jul 27, 2016 at 3:34 AM, Achilleas Mantzios 
> <ach...@matrix.gatewaynet.com> wrote:

>> Our last 1TB upgrade from 9.0 -> 9.3 went like a charm in something like
>> seconds. (with the -k option)
>> However, be warned that the planing and testing took one full week.

Agreed -- you should spend a lot more time on planning and testing
than the actual upgrade will take; but that will probably be true
with any product.

> That being said, it doesn't really provide a back-out plan.

Until you get to the end of the upgrade and *start the cluster
under the new version*  you can fall back to the old version.  I
remember a couple times that we saw something during a pg_upgrade
--link run that we weren't expecting, and did exactly that so we
could investigate and try again later.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Uber migrated from Postgres to MySQL

2016-07-27 Thread Kevin Grittner
On Wed, Jul 27, 2016 at 2:15 AM, Condor <con...@stz-bg.com> wrote:

> They are right for upgrades.
> It's a hard to shutdown 1 TB database and wait couple of days pg_upgrade to
> finish upgrade and meanwhile database is offline.

What?  I had a cluster over 3TB and it was offline for only 10
minutes doing a major version upgrade using pg_upgrade's --link
option.  Of course, that was with minimal statistics -- just enough
to keep from getting truly stupid plans.  If it takes *days* to run
pg_upgrade on a 1TB cluster either you have an insane number of
database objects or you are not taking advantage of the available
features.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] unique constraint with several null values

2016-07-20 Thread Kevin Grittner
On Wed, Jul 20, 2016 at 1:48 PM, David G. Johnston
<david.g.johns...@gmail.com> wrote:
> On Wed, Jul 20, 2016 at 2:14 PM, Mark Lybarger <mlybar...@gmail.com> wrote:

>> Another solution I can think of is to just use a trigger to
>> prevent the duplicate rows.

If you go that route you will need to use serializable
transactions, explicit locking, or trigger-based update of some
otherwise-unneeded column to avoid race conditions.  See:

https://www.postgresql.org/docs/current/static/mvcc.html

That can be a perfectly valid option as long as you handle race
conditions somehow.

>> Any thoughts are certainly appreciated.  I can't do much about
>> the data model itself right now, I need to protect the integrity
>> of the data.

Rather than unique constraints, you could add a unique index on the
COALESCE of each column with some impossible value.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: --EXTERNAL--Re: [GENERAL] PSQL does not remove obvious useless joins

2016-07-02 Thread Kevin Grittner
On Fri, Jul 1, 2016 at 7:15 PM, Sfiligoi, Igor <igor.sfili...@ga.com> wrote:
> OK.  Will change our query generation code to not use the view.
> (I have tried the LEFT JOIN approach, but it just does not seem to perform.)

> PS: Here are the numbers for the real production query (will not provide 
> details):
> Original query:  300s
> Query on a manually optimized view:1ms
> Using left joins:200s

Please show a self-contained case (i.e., one that can be run
against an empty database to demonstrate the problem).  You might
start from this one and modify it until you see the problem that
you describe:

create table a (id int primary key, name varchar(128));
create table b (id int primary key, name varchar(128));
create table c (id int primary key,
a_id int not null references a(id),
b1_id int not null references b(id),
b2_id int not null references b(id),
b3_id int not null references b(id));

create view v as
  select
  c.id, c.a_id, c.b1_id, c.b2_id, c.b3_id, a.name a_name,
  b1.name b1_name, b2.name b2_name, b3.name b3_name
  from c
  left join aon a.id = c.a_id
  left join b b1 on b1.id = c.b1_id
  left join b b2 on b2.id = c.b2_id
  left join b b3 on b3.id = c.b3_id;

insert into a values (1, 'a1');
insert into b values (1, 'b1'), (2, 'b2'), (3, 'b3');
insert into c values (1, 1, 1, 2, 3);

vacuum analyze a;
vacuum analyze b;
vacuum analyze c;

select id, b1_name from v;
explain (analyze, buffers, verbose) select id, b1_name from v;

I'm seeing the unreferenced tables pruned from the plan, and a 1ms
execution time for the select from the view.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: --EXTERNAL--Re: [GENERAL] PSQL does not remove obvious useless joins

2016-07-01 Thread Kevin Grittner
On Fri, Jul 1, 2016 at 3:33 PM, Sfiligoi, Igor <igor.sfili...@ga.com> wrote:
> No, I don't want to use LEFT JOINS.
> I want to use regular joins.
>
> But (as mentioned in my other follow-up), all the fields are not
> null (was not in the original email, sorry), and are foreign
> keys, so it is guaranteed to always match.

In that case there is no difference between the inner join and the
left join except that the left join currently supports and
optimization that makes your query faster if the optional table is
not reference.  Whether you want to take advantage of that is up to
you.

> The key part (in my mind) is that I am not filtering on any of
> the useless tables, and I am not returning any columns from those
> tables either.
> Both is known at planning time.

The fact that something can be determined at planning time doesn't
mean that checking for it is free.

> is my logic still broken?

Your logic seems OK with the table definitions you are now showing.

Whether we ever decide it is OK to omit tables which use an inner
join rather than only considering omitting them when the query
specifies that the join is optional is anybody's guess.  If it is
important enough to you you could submit a patch or fund
development of such a feature; but since it would add at least some
small amount of planning time to every inner join just to avoid
specifying that the join is an optional one when writing the query,
it seems to me unlikely to be accepted.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PSQL does not remove obvious useless joins

2016-07-01 Thread Kevin Grittner
s=3 loops=1)
 Planning time: 0.177 ms
 Execution time: 0.044 ms
(8 rows)

Note the difference in results using inner joins versus left outer joins.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sequences, txids, and serial order of transactions

2016-06-14 Thread Kevin Grittner
On Sat, Jun 11, 2016 at 9:03 PM, Christian Ohler <oh...@shift.com> wrote:

> we have a use case similar to auditing packages like pgMemento or Audit
> Trigger 91plus – we are looking to keep an ordered history of certain write
> transactions.  I'm trying to understand the trade-offs between different
> ways of getting that order, i.e., assigning numbers to transactions (ideally
> strictly monotonic, modulo concurrency).  All of our transactions are
> serializable (for now).

The guarantee that serializable transactions provide is that for
any group of concurrent serializable transactions which
successfully commit, there is some serial (one-at-a-time) order in
which they could have been run which would provide the same
results.  Note that in PostgreSQL that order is not necessarily
commit order.  So the first question is whether you want the order
of the numbers to match the apparent order of execution of the
serializable transactions which committed or the commit order.
Those almost certainly won't always be the same.

If you are satisfied with the commit order, there is a way to do
that with minimal loss of concurrency.  As the very last thing
before commit, take out an exclusive transactional advisory lock
(pg_advisory_xact_lock):

https://www.postgresql.org/docs/current/static/functions-admin.html#FUNCTIONS-ADVISORY-LOCKS-TABLE

Under cover of that lock, assign the number.  You may need to write
some custom code for assigning that across multiple backends with
the right characteristics (e.g., the database may need to make a
request of some external service for the number).  There is some
actual serialization of this small bit at the end of the
transaction, but if you're careful it can be a very small window of
time.

If you want the numbers to be assigned in the apparent order of
execution of the serializable transactions, I'm afraid that I don't
know of any good solution for that right now.  There has been some
occasional talk of providing a way to read the AOoE, but nothing
has come of it so far.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] dumb question

2016-06-02 Thread Kevin Grittner
On Thu, Jun 2, 2016 at 3:23 PM, David G. Johnston
<david.g.johns...@gmail.com> wrote:
> On Thu, Jun 2, 2016 at 4:11 PM, John R Pierce <pie...@hogranch.com> wrote:
>> Thanks all the below seem to do the trick.

I doubt it -- using NOT IN requires (per the SQL specification)
handling NULLs in a way that probably does not give you the answer
you want.  (NOT IN also is often much slower than the NOT EXISTS
test which will actually give you the answer you want.)

test=# create table t (id int not null primary key, ref_id int, sts
int not null default 0);
CREATE TABLE
test=# insert into t values
(1,null,0),(2,1,1),(3,null,0),(4,null,0),(5,4,1),(6,null,0),(7,6,1);
INSERT 0 7
test=# select max(id) from t where sts=0 and id not in (select ref_id from t);
 max
-

(1 row)

test=# select max(id) from t t1 where sts = 0 and not exists (select *
from t t2 where t2.ref_id = t1.id);
 max
-
   3
(1 row)

Note that providing minimal setup (like the above) helps in getting
good answers quickly.

>> do note, this is whats known as an 'anti-join', and these can be pretty
>> expensive on large tables.
>
> +1

*Can* be.  Proper indexing can make them very reasonable.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] plql and or clausule

2016-05-31 Thread Kevin Grittner
On Tue, May 31, 2016 at 4:18 PM,  <car...@lpis.com> wrote:

> ERROR: el operador no existe: character varying == character varying
> LINE 1: SELECT OLD.Peticionario == NEW.Peticionario or OLD.interlocc...

Perhaps you want the = operator?

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] full text search index

2016-05-26 Thread Kevin Grittner
You don't provide much context, like PostgreSQL version or machine
characteristics.

https://wiki.postgresql.org/wiki/SlowQueryQuestions

On Wed, May 25, 2016 at 11:04 PM, Patrick Baker
<patrickbake...@gmail.com> wrote:

>> SELECT COUNT(DISTINCT j0_.id) AS sclr10
>> FROM customers j0_
>> WHERE ((LOWER(j0_.name_first) LIKE '%some%'
>> OR LOWER(j0_.name_last) LIKE '%some%')
>>AND j0_.id = 5)
>>   AND j0_.id = 5
>
> The query is taking ages to run.
>
> I read about wildcards and it seems I have to use a function with
> to_tsvector ?

I very much doubt that full text search is going to be helpful here
-- perhaps trigrams with an appropriate gist or gin index could
help.  Depending on table sizes and data present, picking out rows
based on the OR of scanning for a sequence of characters in a
couple character string columns might not be your fastest query to
run.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Timestamp with timezone output

2016-05-25 Thread Kevin Grittner
On Wed, May 25, 2016 at 9:47 AM, Arnaud Inovia Team
<arnaud.bec...@inovia-team.com> wrote:

> While using "psql", when selecting a column timestamp with
> timezone, I get results with different timezones:
>
> -[ RECORD 6 ]---+---
> expiration_date | 2015-09-07 00:00:00+02
> -[ RECORD 7 ]---+---
> expiration_date | 2015-11-27 00:00:00+01
>
> Shouldn't all value be converted to the same timezone ?

Perhaps your local time zone ends Daylight Saving Time between
those dates, so the offset from UTC is different on those dates?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Increased I/O / Writes

2016-05-23 Thread Kevin Grittner
On Mon, May 23, 2016 at 4:32 PM, Lucas Possamai <drum.lu...@gmail.com> wrote:

> The spikes were happening because, after the reboot, the HUGE PAGES were
> enabled.
>
> After disabling them, all got back to normal.

Since you said earlier you were on PostgreSQL version 9.2, you must
be talking about transparent huge pages.  Yeah, be sure those are
configured to be disabled in a way that "sticks" on your OS.  When
you get to version 9.4 you will notice that we support huge pages
directly.  That would be expected to work without problems even
though TRANSPARENT huge pages are debilitating.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Increased I/O / Writes

2016-05-17 Thread Kevin Grittner
On Tue, May 17, 2016 at 12:11 AM, Lucas Possamai <drum.lu...@gmail.com> wrote:
> This is my postgresql.conf at the moment:
>
> shared_buffer(51605MB) +
> effective_cache_size(96760MB) +
> work_mem(32MB) +
> max_connections(200)
>
> = 148397.08 MB

You are comparing some very dissimilar settings.  effective_cache_size
does not allocate memory, it tells the planner how much cache you have
allocated (i.e., the sum of shared_buffers and the OS cache).
work_mem can be allocated zero to a large number of times per active
query.  Every open connection will use some RAM, but the amount is
hard to predict exactly.

You might want to go over this page:

https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server

... and then read the documentation of any setting you are thinking of
adjusting.

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Invalid data read from synchronously replicated hot standby

2016-05-11 Thread Kevin Grittner
On Wed, May 11, 2016 at 5:44 AM,
<martin.kamp.jen...@schneider-electric.com> wrote:

>> We are getting invalid data when reading from a synchronously
>> replicated hot standby node in a 2-node setup. To better understand
>> the situation, we have created a document that provides an overview.
>> We are hoping that someone might be able to confirm whether or not
>> the setup makes sense, i.e., whether we are using PostgreSQL
>> correctly and experiencing a bug, or if we are using PostgreSQL
>> incorrectly.
>>
>> Link to document that contains a step-by-step description of the
>> situation:
>> https://docs.google.com/document/d/1MuX8rq1gKw_WZ-HVflqxFslvXNTRGKa77A4NHto4ue0/edit?usp=sharing

Please include such information in your post or as an attachment.
Who knows whether that link will still be usable and unchanged 20
years from now?

>> If the setup is sane (and expected to work),

I didn't see anywhere that you correctly handled WAL in setting up
your standby.  I am not surprised by there being corruption,
including duplicate keys in a unique index.  You might try -x or -X
when you run pg_basebackup, or use archiving.  Whatever you do, do
NOT delete the backup_label file!

> In the mean time, we are preparing a new platform on 9.5.2 where
> I have not been able to reproduce the issue (however, we have
> introduced a lot of changes besides upgrading PostgreSQL).

We would need a lot more detail to be able to even guess at whether
you have actually solved the flaws in your process or have just
been lucky so far.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] MVIEW refresh consistently faster then insert ... select

2016-05-04 Thread Kevin Grittner
On Wed, May 4, 2016 at 1:46 AM, Thomas Kellerer <spam_ea...@gmx.net> wrote:

> I have a table that is an aggregation of another table.
> This aggregation reduces an input of ~14 million rows to ~4
> million rows.

> The refresh takes approx 2 minutes (fastest was 1:40) on our
> development server (CentOS, Postgres 9.5.0)

> However, when I create a materialized view:

> Subsequent refreshs using "REFRESH MATERIALIZED VIEW mv_stock"
> are consistently much faster: between 40 seconds and 1 minute
>
> I have run both refreshs about 10 times now, so caching effects
> should not be there.
>
> My question is: what is refresh mview doing differently then a
> plain insert ... select that it makes that so much faster?

Just to confirm, is this with or without the CONCURRENTLY keyword
on the REFRESH command?

If *without*, I would guess the difference is probably in creating
the index "from scratch" with sort and load versus retail insertion
of index entries.  You could approximate this by dropping the index
before the TRUNCATE and INSERT and creating it again after it is
loaded.

If *with*, I would guess that it is because most of the work is
done in temporary files and workspace, with just the delta applied
to the table and index in permanent storage.

It's hard to guess which way will be faster for the use case you
describe -- it will probably depend on what percentage of rows
remain unchanged on each REFRESH.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Proper relational database?

2016-04-23 Thread Kevin Grittner
On Sat, Apr 23, 2016 at 1:53 AM, Guyren Howe <guy...@gmail.com> wrote:
> On Apr 22, 2016, at 18:56 , <da...@andl.org> <da...@andl.org> wrote:

> Why schema-on-demand? Can you explain what you mean by that?
>
> Something that is attractive, for beginners or perhaps when prototyping is
> that you don't have to declare a table. You can just insert tuples into a
> predicate whose name you provide and they go in and you've defined a
> relation just by using it.

test=# \d
No relations found.
test=# select * into people from (values (1,'Fred'), (2, 'Bob')) x(id, name);
SELECT 2
test=# select * from people;
 id | name
+--
  1 | Fred
  2 | Bob
(2 rows)

test=# \d
 List of relations
 Schema |  Name  | Type  |  Owner
++---+-
 public | people | table | kgrittn
(1 row)

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Add relcreated (timestamp) column to pg_class catalog to record the time an object was created

2016-04-20 Thread Kevin Grittner
On Wed, Apr 20, 2016 at 4:40 PM, Melvin Davidson <melvin6...@gmail.com>
wrote:

> As for what I want from the community, I would like other users
> and dba's to weigh in on this request and it's usefulness.

When I was a DBA on a team responsible for hundreds of
geographically distributed databases, initially using products with
this feature and then moving to PostgreSQL, I occasionally found
this feature to be a minor convenience when it was present.  We
kept the DDL for recreating everything under source control, and
each new release contained the DDL to move from one state to the
next, so such a column didn't give us anything we couldn't get by
consulting the "official" DDL.  But, as an example of where it
could save a few minutes, if someone had been allowed to run ad hoc
reports or data cleanup on a database it was a quick way to look
for stray tables they may have generated to keep intermediate
results or exceptions, so we could follow up on disposition of
those tables.

It would take a lot of such incidents to add up to enough time to
add this as a proper feature, which is probably why nobody with
resources to devote to adding features has prioritized it to the
point of developing a proposed patch.  That and the fact that there
is no guarantee that the community as a whole would feel that the
feature "carried its own weight" in terms of benefit / maintenance
cost, so it might not make it in anyway.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [GENERAL] pgpool-II: cannot use serializable mode in a hot standby

2016-04-13 Thread Kevin Grittner
On Wed, Apr 13, 2016 at 5:54 PM, Kevin Grittner <kgri...@gmail.com> wrote:

> See this example, and imagine that
> the transaction generating the list of receipts for the closed
> batch is run on the standby before the transaction adding the last
> receipt commits.  Or test it.

https://wiki.postgresql.org/wiki/SSI#Deposit_Report

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pgpool-II: cannot use serializable mode in a hot standby

2016-04-13 Thread Kevin Grittner
On Wed, Apr 13, 2016 at 1:30 AM, Alexander Pyhalov <a...@rsu.ru> wrote:
> Tatsuo Ishii писал 13.04.2016 02:36:
>>>
>>> On 04/12/2016 16:50, Adrian Klaver wrote:
>>>>
>>>> On 04/12/2016 01:06 AM, Alexander Pyhalov wrote:
>>>
>>> I understand. I mean perhaps pgpool shouldn't forward these statements
>>> to slaves.

It should not forward transactions which are requested to be
SERIALIZABLE to standbys.  If you just suppress the SET statement
(or substitute REPEATABLE READ), queries in that transaction can
return incorrect results.

>> Yeah, PostgreSQL used to accept the command on standbys (at least in
>> 9.0). The restriction was added later on.

... in 9.1, for a reason.

>> It woule be nice if you send
>> a bug report to the pgpool-II bug tracker to not forget it.
>>
>> http://pgpool.net/mediawiki/index.php/Bug_tracking_system
>
> Filed http://www.pgpool.net/mantisbt/view.php?id=191

As the entry stands at the moment, the suggestions for fixes will
allow incorrect query results.  See this example, and imagine that
the transaction generating the list of receipts for the closed
batch is run on the standby before the transaction adding the last
receipt commits.  Or test it.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to quote the COALESCE function?

2016-04-04 Thread Kevin Grittner
On Tue, Mar 29, 2016 at 12:38 PM, Pavel Stehule <pavel.steh...@gmail.com> wrote:

> The coalesce is one few functions implemented by special rule in
> PostgreSQL parser.

In the SQL standard the COALESCE feature is not listed as a
function; it is listed as one of the short forms of CASE
expression.  While it has function-like syntax, thinking of it as a
function is semantically incorrect.

  COALESCE(a, b)

is supposed to be semantically equivalent to:

  CASE WHEN a is not null THEN a ELSE b END

Among other things, that means that this statement should not
generate a divide by zero error:

  SELECT COALESCE(1, 1/0);

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Fwd: How does PostgreSQL serializable transaction works? (SELECT/UPDATE vs INSERT conflict)

2016-03-10 Thread Kevin Grittner
On Thu, Mar 10, 2016 at 1:50 PM, Alexandru Lazarev
<alexandru.laza...@gmail.com> wrote:

> One more "offtop" question - What kind of frameworks do
> automatically retries for failed transactions? Are
> Hibernate/Spring in that list?

I have seen that done in Hibernate/Spring using dependency
injection to create a transaction manager with the necessary logic.
I was told by the developer that doing so was not trivial, but not
outrageously hard, either.

Every framework may have a different way to do this; I would just
say that any framework which does not provide a reasonable
mechanism for implementing such behavior is not one I would
consider to be mature enough for "prime time" -- although others
might feel differently.

Kevin Grittner


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Fwd: How does PostgreSQL serializable transaction works? (SELECT/UPDATE vs INSERT conflict)

2016-03-09 Thread Kevin Grittner
On Wed, Mar 9, 2016 at 1:39 PM, Alexandru Lazarev
<alexandru.laza...@gmail.com> wrote:

Jeff's answer is entirely correct; I'm just going to go into more
detail -- just in case you're interested enough to work through it.

> `CREATE TABLE public.foo (id serial PRIMARY KEY, mynum integer);`
>
> and following data
>
>  id | mynum
> +---
>   1 |10
>   2 |10
>   3 |10
>   4 |10
> (4 rows)
>
> I run 2 serialize transactions in parallel (2 `psql` consoles):
>
> -- both transactions
> mydb=# begin;
> BEGIN
> mydb=# set transaction isolation level serializable;
> SET
>
> -- tx1
> mydb=# select * from foo where mynum < 100;
> id | mynum
> +---
>   1 |10
>   2 |10
>   3 |10
>   4 |10
> (4 rows)
> --tx1: Shouldn't freeze data visible for tx1 select?

Yes, tx1 does have a snapshot which will guarantee that it sees a
repeatable set view of the data for this predicate.

> --tx2
> mydb=# insert into foo (mynum) values (10);
> INSERT 0 1
> -- tx2 will insert next row with id 5 in foo table
> -- Shouldn't insert of tx2 broke data snapshot visible for tx1?

The snapshot tx1 has guarantees that overlapping changes won't
change it's view of things, and there is no reason for anything to
be blocked or canceled here.  The insert creates what is called a
read-write dependency (or rw-conflict for short) that establishes
that in any serial ordering of a set of transactions which includes
tx1 and tx2, tx1 must precede tx2 in the apparent order of
execution.

> --tx1
> mydb=# update foo set mynum = 20 where id < 100;
> UPDATE 4
> -- Shouldn't here appear serialization fail or at least on tx1 commit?

No, there is no cycle in the apparent order of execution.  The
snapshot for tx1 still limits it to the same set of rows, and there
is nothing visible that is inconsistent with tx1 running before
tx2.

> --tx2
> mydb=# commit;
> COMMIT
>
> --tx1
> mydb=# commit;
> COMMIT
> -- tx1 Commit is OK - no any error

According to the SQL standard, and in the PostgreSQL implementation
of SERIALIZABLE transactions, commit order does not, by itself,
establish apparent  order of execution.

> -- implicit tx
> mydb=# select * from foo;
> id | mynum
> +---
>   1 |20
>   2 |20
>   3 |20
>   4 |20
>   5 |10
> (5 rows)

As Jeff said, this is consistent with the implicit transaction
running last, so tx1 -> tx2 -> implicit_tx.

Now, you are pretty close to a situation which does need to trigger
a serialization failure -- just switch the commit of tx1 and the
implicit transaction.  If tx2 has committed but tx1 has not yet
committed:

mydb=# select * from foo;
 id | mynum
+---
  1 |10
  2 |10
  3 |10
  4 |10
  5 |10
(5 rows)

*Now* we have a problem -- this only makes sense if the implicit tx
was run after tx2 and before tx1.  So apparent order of execution
is tx1 -> tx2 -> implicit_tx -> tx1.  There is a cycle in the
apparent order of execution, which causes anomalies which can ruin
data integrity.  Now, if the implicit transaction is not
serializable, it is allowed to see such things, but if you make it
serializable (and let's call it tx3 now) it sees a state where only
tx2 ran; tx1 could not have run:

-- tx3
mydb=# begin;
BEGIN
mydb=# set transaction isolation level serializable;
SET
mydb=# select * from foo;
 id | mynum
+---
  1 |10
  2 |10
  3 |10
  4 |10
  5 |10
(5 rows)

mydb=# commit;
COMMIT

So now, tx1 is not allowed to commit, or for that matter do
anything else -- it has been "doomed" by tx3:

mydb=# select * from foo;
ERROR:  could not serialize access due to read/write dependencies
among transactions
DETAIL:  Reason code: Canceled on identification as a pivot, during
conflict out checking.
HINT:  The transaction might succeed if retried.

Hopefully you are using some framework to automatically detect this
SQLSTATE and retry the transaction from the start.  So on retry,
tx1 does this:

-- tx1 (retry)
mydb=# begin;
BEGIN
mydb=# set transaction isolation level serializable;
SET
mydb=# select * from foo where mynum < 100;
 id | mynum
+---
  1 |10
  2 |10
  3 |10
  4 |10
  5 |10
(5 rows)

mydb=# update foo set mynum = 20 where id < 100;
UPDATE 5
mydb=# select * from foo;
 id | mynum
+---
  1 |20
  2 |    20
  3 |20
  4 |20
  5 |20
(5 rows)

mydb=# commit;
COMMIT

Now the result of all successfully committed serializiable
transactions is consistent with the order tx2 -> tx3 -> tx1.  All
is good.

Kevin Grittner


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] multicolumn index and setting effective_cache_size using human-readable-numbers

2016-02-29 Thread Kevin Grittner
On Mon, Feb 29, 2016 at 2:10 PM, Geoff Winkless <pgsqlad...@geoff.dj> wrote:

> I'm not really sure what changes I could make that would make one
> index that's ostensibly equivalent to the other not be attractive to
> the planner though. I can mess with those figures but as I said before
> the only one that flicks the switch is to change effective_cache_size
> to 8GB, which makes no sense to me.

effective_cache_size doesn't affect how memory is allocated, it
tells the optimizer what to assume about the combined cache space
(essentially shared_buffers + OS cache) so that it can estimate the
amount of random storage I/O needed to use an indexed plan.  If you
tell it that you only have 64MB between those two types of cache,
it will assume that the index (particularly if it is deep and/or
wide) will be very expensive.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Let's Do the CoC Right

2016-01-22 Thread Kevin Grittner
On Fri, Jan 22, 2016 at 4:05 PM, David E. Wheeler <da...@justatheory.com> wrote:
> On Jan 22, 2016, at 11:47 AM, Luz Violeta <luz.stanc...@www.com.ar> wrote:
>
>> P.S → even now, I'm kinda terrified of a shitstorm in my first
>> mail to the mailing list ... but definitely this spark of hope
>> made me come forward and say something, dunno.

Welcome!

I do wonder what it is that made you terrified of a shitstorm, and
what it is that you're hoping for that you don't feel is already
present.

> the better the likelihood of getting something that creates the
> safe environment I firmly believe we all want.

Not only do I want that, but I thought we had it.  I have still not
seen anything to show me otherwise; the hypothetical examples I can
remember seeing on these recent threads bear no resemblance to
anything I can remember ever seeing on the PostgreSQL lists.  Can
you point to something as an example of the kind of behavior that
you think a Code of Conduct would have prevented?

Regarding the question of the Code of Conduct having short, general
statements versus listing "protected groups", etc. -- I would like
to see everyone protected.  Any list, by its nature, is going to
make someone feel excluded and unprotected.  In my view, the closer
it is to a statement of "The Golden Rule"[1], the better.

In particular, I think that if (hypothetically) someone who is part
of the community makes some idiotic, offensive, insensitive
statement of blathering idiocy *outside PostgreSQL forums*, they
should enjoy the same right to respect and prevention of attack *on
the PostgreSQL forums* as everyone else.  They just better not
repeat the idiocy here.  I would hope that major contributors would
keep in mind the impact that such statements in other venues might
have on the public perception of the community.  I've come around
to the point of view that encouraging such consideration is outside
the scope of what a Code of Conduct should formally address.

The PostgreSQL forums should be a safe place, and rancor engendered
elsewhere should not be brought in.  Problems should be resolved in
a way that minimizes the chance of escalation, recognizing that
there could be miscommunication.[2]

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] https://en.wikipedia.org/wiki/Golden_Rule

[2] 
http://www.khou.com/story/news/local/2016/01/21/brown-gay-sign-causes-amusing-misunderstanding/79116720/


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] CoC [Final]

2016-01-20 Thread Kevin Grittner
On Wed, Jan 20, 2016 at 12:47 PM, Simon Riggs <si...@2ndquadrant.com> wrote:
> On 18 January 2016 at 18:02, Joshua D. Drake <j...@commandprompt.com> wrote:

>> * We are tolerant of people’s right to have opposing views.
>>
>> * Participants must ensure that their language and actions are free
>> of personal attacks and disparaging personal remarks.
>>
>> * When interpreting the words and actions of others, participants
>> should always assume good intentions.
>>
>> * Participants who disrupt the collaborative space, or participate in a
>> pattern of behaviour which could be considered harassment will not be
>> tolerated.

> I suggest we remove point 3 entirely. Point 2 is sufficient to limit what is
> said.

That came about because of the point made by someone for whom
English is a second language, who attempted to complement someone
by saying the work was "gross" (meaning "a big thing"), when that
was initially taken as an insult (thinking "disgusting" was meant).
Perhaps it belongs more in the preamble or could be omitted, but
it was an attempt to recognize that simple miscommunication due to
language or cultural differences can turn into flame wars if people
don't give each other some benefit of the doubt.

> Who will decide how this code is enacted? Rules imply rulers, so what is the
> constitution of the governing body?

It has been stated several times on this thread by multiple people
that we should settle on the code to implement before talking about
enforcement processes.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] CoC [Final]

2016-01-18 Thread Kevin Grittner
On Mon, Jan 18, 2016 at 12:02 PM, Joshua D. Drake <j...@commandprompt.com> 
wrote:

> * Participants who disrupt the collaborative space, or participate in a
> pattern of behaviour which could be considered harassment will not be
> tolerated.

Personally, I was comfortable with the rest of it, but this one
made me squirm a little.  Could we spin that to say that those
behaviors will not be tolerated, versus not tolerating the people?
Maybe:

* Disruption of the collaborative space or any pattern of
behaviour which could be considered harassment will not be
tolerated.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Data Packaging/Data Unpacking

2016-01-13 Thread Kevin Grittner
On Wed, Jan 13, 2016 at 2:57 PM, oleg yusim <olegyu...@gmail.com> wrote:

> Say, I got network package. The package was decrypted by OpenSSL. Where this
> data are, physically, at this moment?

Process-local memory for the PostgreSQL backend process associated
with the database connection.

> Opposite situation: we are about to send results of SELECT statement over
> the OpenSSL tunnel. Where are the data, physically, between the moment DB
> engine returned results for my query and moment the package was encrypted by
> OpenSSL?

Process-local memory for the PostgreSQL backend process associated
with the database connection.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Data Packaging/Data Unpacking

2016-01-13 Thread Kevin Grittner
On Tue, Jan 12, 2016 at 10:00 PM, oleg yusim <olegyu...@gmail.com> wrote:

> Important: let's assume data at rest is encrypted using EFS and data at
> transit is encrypted using ciphers, provided by OpenSSL.
>
> So, with that in mind, please, help me to understand movement and location
> of the data between the moment when it is pulled from file system and
> encrypted as network package going through the SSL tunnel.
>
> And reversing it - between the moment network package arrived through the
> SSL tunnel is decrypted and the moment its content is placed into the file
> system.

At all times the data is present only in files owned by the OS user
which runs the database server or in RAM allocated to processes run
by that user.  Files and RAM are freed without overwrite; we count
on the OS to not gratuitously show the old values to processes
making new allocations.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Data Packaging/Data Unpacking

2016-01-13 Thread Kevin Grittner
On Wed, Jan 13, 2016 at 3:54 PM, oleg yusim <olegyu...@gmail.com> wrote:

> Answer "postgres" would suffice.

But the user would not always be "postgres".  To be accurate, it is
the user which owns the files for the "cluster" (database instance)
and which runs the database service.  If a machine contains
multiple clusters it is (IMO) best practice, for both security and
operational reasons, to use a separate OS user for each cluster.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] WIP: CoC V5

2016-01-13 Thread Kevin Grittner
On Tue, Jan 12, 2016 at 10:04 PM, Joshua D. Drake <j...@commandprompt.com> 
wrote:
> On 01/12/2016 07:10 PM, Tom Lane wrote:
>> Kevin Grittner <kgri...@gmail.com> writes:

>>> * To maintain a safe, respectful, productive and collaborative
>>> environment all participants must ensure that their language and
>>> actions are free of personal attacks and disparaging remarks of any
>>> kind.
>>
>> The "disparaging remarks" part of this could easily be taken to forbid
>> technical criticism of any sort, eg "this patch is bad because X,Y, and
>> Z", even when X,Y, and Z are perfectly neutral technical points.  "Of any
>> kind" doesn't improve that either.  I'm on board with the "personal
>> attacks" part.  Maybe "disparaging personal remarks" would be better?
>
> Hrm, I see your point but the definition of disparaging is:
>
> expressing the opinion that something is of little worth; derogatory.

Below is a modified version of what I posted, attempting to improve
it based on further thoughts of my own as well as suggestions from
Tom, JD, and Bill.  I see a lot to like in the variation proposed
by Chris, but wasn't sure quite how to meld that with this.  I've
left off the enforcement part for now.

I still feel it is more productive to discuss a proposed document
than proposed language for some "motion to adopt".  (I'm not sure
where such a motion would be made and adopted.)



== PostgreSQL Community Code of Conduct (CoC) ==

This document is intended to provide community guidelines for
creating and enforcing a safe, respectful, productive, and
collaborative place for any person who is willing to contribute in
a safe, respectful, productive and collaborative way.  It applies
to all "collaborative space", which is defined as community
communications channels (such as mailing lists, IRC, submitted
patches, commit comments, etc.) and to public events (such as
meetings and conferences) which are associated with the PostgreSQL
community.  Private communications which result from words or
actions in the collaborative space should also conform to the
standards stated here.

* Participants must ensure that their language and actions are free
of personal attacks and disparaging personal remarks.  Critical
remarks regarding patches and/or technical work are necessary to
ensure a quality product; however, critical remarks directed at
individuals are not constructive and therefore not acceptable.

* When interpreting the words and actions of others, participants
should always assume good intentions.  Consider that due to
language and cultural differences, something may be intended in a
benign or helpful way, even if some participants initially see a
possible interpretation which is otherwise.

* Participants must avoid sustained disruption of the collaborative
space, or any pattern of behavior which could reasonably be
considered harassment.

There is a distinction between words and actions taken within the
community and words and actions outside community communication
channels and events, but there is a gray area when using public
forums or social media where a person identifies as a member of
this community.  Members of the community, especially those with a
high profile within the community, should be mindful of this and
avoid saying or doing anything in such venues which might create an
unwelcoming or hostile attitude toward the community.



-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] WIP: CoC V5

2016-01-12 Thread Kevin Grittner
I think that this is fairly close to something that would make
sense, but there is sort of a weirdness in the CoC referring to
itself in the 3rd person.  It sound more like an argument for
*having* a CoC than the document itself.

I'm not the greatest word-smith, but I'll attempt to rework Josh's
draft to something that seems more "natural" to me.  At the same
time, I'll try to incorporate other comments, like Tom's comment
about enforcement mechanisms and the gray areas in members of the
community communicating in public forums.


On Tue, Jan 12, 2016 at 3:37 PM, Joshua D. Drake <j...@commandprompt.com> wrote:

> PostgreSQL Global Development Group (PGDG) Code of Conduct (CoC):
>
> 1. The CoC is to provide community guidelines for creating and enforcing a
> safe, respectful, productive, and collaborative place for any person who is
> willing to contribute in a safe, respectful, productive and collaborative
> way.
>
> 2. A safe, respectful, productive and collaborative environment is free of
> personal attacks and disparaging remarks of any kind.
>
> 3. The CoC is not about being offended. One should always assume good
> intentions. As with any diverse community, anyone can get offended at
> anything.
>
> 4. Any sustained disruption of the collaborative space (mailing lists, IRC
> etc..) or other PostgreSQL events shall be construed as a violation of the
> CoC and appropriate action will be taken by the CoC committee.
>
> 5. The CoC is only about interaction with the PostgreSQL community. Your
> private and public lives outside of the PostgreSQL community are your own.



== PostgreSQL Community Code of Conduct (CoC) ==

This document is intended to provide community guidelines for
creating and enforcing a safe, respectful, productive, and
collaborative place for any person who is willing to contribute in
a safe, respectful, productive and collaborative way.

* To maintain a safe, respectful, productive and collaborative
environment all participants must ensure that their language and
actions are free of personal attacks and disparaging remarks of any
kind.

* When interpreting the words and actions of others, participants
should always assume good intentions.  Consider that due to
language and cultural differences, something may be intended in a
benign or helpful way, even if some participants initially see a
possible interpretation which is otherwise.

* All participants must avoid sustained disruption of the
collaborative space (mailing lists, IRC etc..) or other PostgreSQL
events.

* There is a distinction between words and actions taken inside the
community and words and actions outside community communication
channels and events, but there is a gray area when using public
forums or social media where a person identifies as a member of
this community.  Members of the community, especially those with a
high profile within the community, should be mindful of this and
avoid anything which might create an unwelcoming or hostile
attitude toward the community in such venues.

* Participants who feel that they have not been treated in
accordance with this Code of Conduct may want to try to sort things
out in the forum where there was a perception of a problem; asking
for a clarification or an apology either in a public discussion
context or privately can often resolve an issue quickly to
everyone's satisfaction.  Where this fails, the Core Team is
responsible for determining what, if any, action is appropriate.
The core team is listed, with a link to the purpose of team, at the
top of the community's "Contributor Profiles" page:
http://www.postgresql.org/community/contributors/



To me, this reads more like the document itself.  I hope I have
done justice to Josh's points as well as Tom's, although I would
bet there are a number of people on the list that can improve on my
effort here.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Code of Conduct: Is it time?

2016-01-11 Thread Kevin Grittner
On Mon, Jan 11, 2016 at 3:42 PM, FarjadFarid(ChkNet)
<farjad.fa...@checknetworks.com> wrote:

Five days (and I don't know how many posts) ago, there was this:

http://www.postgresql.org/message-id/20160106184818.gt21...@crankycanuck.ca

Which said in part:

> The other thing I note is that the IETF got
> most of these documents because someone thought the problem was
> important enough to write a draft proposal first.  As I said in a
> recent IETF plenary, the organization works partly because at the IETF
> you don't need anyone's permission to try something; you don't even
> need forgiveness.  The worst that can happen is that people reject the
> proposal.  It always seemed to me that the Postgres project worked in
> a similar way, so I'd encourage those who think there is a problem to
> be solved to make a scratch proposal and see whether it flies.  It's
> always easier to discuss a concrete proposal than to try to figure out
> whether something is a good idea in the abstract.

I'm going to give this a belated +1, and ignore any further posts on
this thread.

If someone wants to take the step of posting a concrete proposal,
please start a new thread with a different subject line.

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Code of Conduct: Is it time?

2016-01-11 Thread Kevin Grittner
On Mon, Jan 11, 2016 at 4:10 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Kevin Grittner <kgri...@gmail.com> writes:

>> If someone wants to take the step of posting a concrete proposal,
>> please start a new thread with a different subject line.
>
> I thought we were already at that point; see Regina Obe's posts.

Oh, are you referring to this:?

http://www.postgresql.org/message-id/001201d14c96$fc26ed70$f474c850$@pcorp.us

For some reason that shows up as a quote of a quote in my gmail, so
I skipped over it without noticing it.  Apologies.

Even after finding it, formatting is mangled, and I see an
amendment was just posted.  Can we get this into a more readable
format somehow, where changes can be reflected without sequentially
scanning the thread?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Code of Conduct: Is it time?

2016-01-11 Thread Kevin Grittner
On Mon, Jan 11, 2016 at 2:58 PM, Regina Obe <l...@pcorp.us> wrote:

> How would you feel about the original thread that started this.
>
> https://github.com/opal/opal/issues/941

I'm not interested in opal, and don't have time to read a thread
with (when I looked) 374 messages, but if the gist of it is that
they have a code of conduct that attempts to control the speech or
actions of contributors outside of the venue of the lists or events
of the project, count me as -1, regardless of how offensive I might
find said speech or actions.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Re: [BUGS] BUG #13847: WARNING: skipping "pg_toast_" --- cannot vacuum indexes, views, or special system tables VACUUM

2016-01-06 Thread Kevin Grittner
Please quote only enough to remind readers of context an respond
below the quoted text.  This is the conventional style for the
PostgreSQL lists, and saves a lot of time for the thousands who
will read this.  Thanks!

On Wed, Jan 6, 2016 at 12:36 PM, Navaneethakrishnan Gopal
<gn...@yahoo.co.in> wrote:

>  PostgreSQL 8.2.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.3

Ouch!  Not only did the 8.2 major release go out of support in
2011, but there were 20 minor releases over about a five year
period after 8.2.3, each of which fixed serious bugs and/or
security vulnerabilities!  Please read this, and try to stay more
current with minor releases for any major release you are running,
and not get so far out of support for the major release:

http://www.postgresql.org/support/versioning/

> 2) This is the current error customer is facing; which I think
> same as this
> https://wiki.postgresql.org/wiki/20110408pg_upgrade_fix#What_is_the_underlying_cause_of_this_bug.3F

It simply cannot be that bug which caused the problem, because that
was a bug that only existed in early minor releases of the 8.4 and
9.0 major releases.  You may be getting the same error message, but
it is not due to the bug described on the page you cite.

> org.postgresql.util.PSQLException: ERROR: could not access status of 
> transaction 192282624
>   Detail: Could not open file "pg_clog/00B7": No such file or directory.

So, an internal file necessary for database integrity went missing.
There were bugs in early versions of 9.3 and 9.4 which could cause
this, but outside of that the most common cause that I've seen is
that someone tried to free disk space by deleting files, without
realizing their importance.  I can't rule out a bug, but since
you're missing five years of bug fixes for 8.2 on a major release
that went out of support more than four years ago, I don't think
anyone will want to put a lot of time into looking for such a
possible bug.

> 3) We tried applying the fix in two ways
> https://wiki.postgresql.org/wiki/20110408pg_upgrade_fix#What_is_the_underlying_cause_of_this_bug.3F
> 3.i) Just copy and past the text from this page in the psql windown
> 3.ii) By running like this  "psql -U postgres -a -f pg_upgrade_fix.sql 
> unicorn >> error.txt 2>&1"
>Have attached both "pg_upgrade_fix.sql" and error.txt with this mail
>
> Please help us on solving these errors.

I strongly recommend that you stop the database and copy the data
directory structure before attempting any recovery, in case it makes
things worse.

You might want to go back to your most recent good backup.  If you
can't do that, you might want to dummy up the missing clog file(s)
(perhaps using the `dd` utility).  Any in-place recovery attempt is
likely to leave some corruption, so I would recommend using pg_dump
and/or pg_dumpall to save the data and restore it into a fresh
cluster (created from initdb).  If you can still find a copy of
8.2.23 you might want to install that.

>  PostgreSQL 9.4.1 on x86_64-mv-linux-gnu, compiled by 
> i686-montavista-linux-gnu-gcc (MontaVista Linux G++ 4.4-1311130628) 4.4.1, 
> 64-bit

9.3 and 9.4 had serious bugs in early releases which could cause
database corruption and lost data.  Please use the latest 9.4
minor release.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] A unique pairs version of UNNEST() ?

2016-01-05 Thread Kevin Grittner
On Tue, Jan 5, 2016 at 1:59 AM, Alban Hertroys <haram...@gmail.com> wrote:

> with list_of_ids as (
>   select unnest(list_of_ids) as id from table
> )
> select a.id, b.id
>   from list_of_ids a, list_of_ids b
>  where b.id > a.id;

Or, to morph this to array output (which the OP seemed to want):

test=# with list_of_ids as (
test(#   select unnest('{1,2,3,4,5}'::int[]) as id
test(# )
test-# select array [a.id, b.id]
test-#   from list_of_ids a, list_of_ids b
test-#  where b.id > a.id;
 array
---
 {1,2}
 {1,3}
 {1,4}
 {1,5}
 {2,3}
 {2,4}
 {2,5}
 {3,4}
 {3,5}
 {4,5}
(10 rows)

Nothing in that not already mentioned; just putting it all
together.

The OP mentioned wanting a count, but that wasn't too clear to me;
using a window function to number the rows, changing the comparison
from > to >= while excluding self-matches should make that pretty
easy.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Secret Santa List

2015-12-23 Thread Kevin Grittner
On Tue, Dec 22, 2015 at 9:49 PM, Lou Duchez <l...@paprikash.com> wrote:
> I have a company with four employees who participate in a Secret Santa
> program, where each buys a gift for an employee chosen at random.  (For now,
> I do not mind if an employee ends up buying a gift for himself.)  How can I
> make this work with an SQL statement?
>
> Here is my Secret Santa table:
>
> --
> create table secretsanta
> (giver text,
> recipient text,
> primary key (giver));
>
> insert into secretsanta (giver) values ('Frank'), ('Joe'), ('Steve'),
> ('Earl');
> --

with
  g as (select giver, row_number() over () as rownum from secretsanta),
  r as (select giver, row_number() over () as rownum from (select
giver from secretsanta order by random()) as x)
update secretsanta
  set recipient = r.giver
  from g join r on g.rownum = r.rownum
  where secretsanta.giver = g.giver;

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] connections not getting closed on a replica

2015-12-11 Thread Kevin Grittner
On Fri, Dec 11, 2015 at 3:37 PM, Carlo Cabanilla <ca...@datadoghq.com> wrote:

> 16 cores

> a default pool size of 650, steady state of 500-600 server
> connections

With so many more connections than resources to serve them, one
thing that can happen is that just by happen-stance enough processes
become busy at one time that they start context switching a lot
before they finish, leaving spinlocks blocked and causing other
contention that slows all query run times.  This causes bloat to
increase because some database transactions are left active for
longer times.  If the client software and/or pooler don't queue
requests at that point there will be more connections made because
connections have not been freed because of the contention causing
slowness -- which exacerbates that problem and leads to a downward
spiral.  That can become so bad that there is no recovery until
either the clients software is stopped or the database is
restarted.

>> I don't suppose you have vmstat 1 output from the incident?  If
>> it happens again, try to capture that.
>
> Are you looking for a stat in particular?

Not really; what I like about `vmstat 1` is how many useful pieces
of information are on each line, allowing me to get a good overview
of what's going on.  For example, if system CPU time is high, it is
very likely to be a problem with transparent huge pages, which is
one thing that can cause these symptoms.  A "write glut" can also
do so, which can be controlled by adjusting checkpoint and
background writer settings, plus the OS vm.dirty_* settings (and
maybe keeping shared_buffers smaller than you otherwise might).
NUMA problems are not at issue, since there is only one memory
node.

Without more evidence of what is causing the problem, suggestions
for a solution are shots in the dark.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [JDBC] plpgsql function with RETURNS SETOF refcursor in JAVA

2015-12-11 Thread Kevin Grittner
On Fri, Dec 11, 2015 at 2:11 PM, Corradini, Carlos
<ccorrad...@correoargentino.com.ar> wrote:

> with your and Mr. Kevin explanations, the Java
> program have worked fine and have printed the data obtained from a two
> cursors inside a PostgreSQL Database Stored Function.
>
> Then, I can confirm that this version of DB ( 9.4 ) use the OUT
> parameter with refcursors and works fine. The JDBC interface provided by
> the Server Postgresql can read the data inserted into these two cursors
> via a callablestatement.registeroutparameter.

For the benefit of others who may later have a similar problem and
find this thread, it would be great if you could provide a little
self-contained example of a Java program which uses the technique
that you settled on.

Thanks!

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


  1   2   3   4   5   6   >