date:20140301

Re: [HACKERS] Securing "make check" (CVE-2014-0067)

2014-03-01 Thread Dave Page



> On 2 Mar 2014, at 05:20, Noah Misch  wrote:
> 
>> On Sat, Mar 01, 2014 at 05:51:46PM -0500, Andrew Dunstan wrote:
>>> On 03/01/2014 05:10 PM, Tom Lane wrote:
>>> One other thought here: is it actually reasonable to expend a lot of effort
>>> on the Windows case?  I'm not aware that people normally expect a Windows
>>> box to have multiple users at all, let alone non-mutually-trusting users.
>> 
>> As Stephen said, it's fairly unusual. There are usually quite a few
>> roles, but it's rare to have more than one "human" type role
>> connected to the machine at a given time.
> 
> I, too, agree it's rare.  Rare enough to justify leaving the vulnerability
> open on Windows, indefinitely?

It's not that rare in my experience - certainly there are far more single user 
installations, but Terminal Server configurations are common for deploying apps 
"Citrix-style" or VDI. The one and only Windows server maintained by the EDB 
infrastructure team is a terminal server for example.

>  I'd say not.  Windows itself has been pushing
> steadily toward better multi-user support over the past 15 years or so.
> Releasing software for Windows as though it were a single-user platform is
> backwards-looking.  We should be a model in this area, not a straggler.

Definitely.

> 
>> I'd be happy doing nothing in this case, or not very much. e.g.
>> provide a password but not with great cryptographic strength.
> 
> One option that would simplify things is to fix only non-Windows in the back
> branches, via socket protection, and fix Windows in HEAD only.  We could even
> do so by extending HAVE_UNIX_SOCKETS support to Windows through named pipes.
> 
> Using weak passwords on Windows alone would not simplify the effort.
> 
> -- 
> Noah Misch
> EnterpriseDB http://www.enterprisedb.com
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Windows exit code 128 ... it's baaack

2014-03-01 Thread Amit Kapila

On Fri, Feb 28, 2014 at 5:44 AM, Tom Lane  wrote:
> I looked at the postmaster log for the ongoing issue on narwhal
> (to wit, that the contrib/dblink test dies the moment it tries
> to do anything dblink-y), and looky here what the postmaster
> has logged:
>
> 530fc965.bac:2] LOG:  server process (PID 2144) exited with exit code 128
> [530fc965.bac:3] DETAIL:  Failed process was running: SELECT *
> FROM dblink('dbname=contrib_regression','SELECT * FROM foo') AS t(a 
> int, b text, c text[])
> WHERE t.a > 7;
> [530fc965.bac:4] LOG:  server process (PID 2144) exited with exit code 0
> [530fc965.bac:5] LOG:  terminating any other active server processes
>
>
> Now, back in the 2010 thread where we agreed to put in the ignore-128
> kluge, it was asserted that all known cases of this exit code were
> irreproducible Windows flakiness occurring at process start or exit.
> This is evidently neither start nor exit, but likely is being provoked
> by trying to load libpq into the backend.

Most of the information on net regarding this error code indicates
that it is related to some particular windows version and even there
are few Hot-Fixes for it, for example:
http://support.microsoft.com/kb/974509

Not sure, how relevant such hot-fixes are to current case, as most
information indicates that it happens during CreateProcess(), but the
current failure doesn't seem to have relation with CreateProcess().

I have tried to reproduce it on my local Windows m/c (Win-7), but
couldn't able to reproduce it. I think looking into Event Viewer logs
of that time might give some clue.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Securing "make check" (CVE-2014-0067)

2014-03-01 Thread Noah Misch

On Sat, Mar 01, 2014 at 09:43:19PM -0500, Tom Lane wrote:
> Andrew Dunstan  writes:
> > On 03/01/2014 05:10 PM, Tom Lane wrote:
> >> BTW, a different problem with the proposed patch is that it changes
> >> some test cases in ecpg and contrib/dblink, apparently to avoid session
> >> reconnections.  That seems likely to me to be losing test coverage.
> >> Perhaps there is no alternative, but I'd like to have some discussion
> >> around that point as well.

connect/test5.pgc has the same number of session reconnections before and
after the patch.  The change is to assign passwords to the connection test
accounts and use those passwords during the tests.  Test coverage actually
increased there; before, it did not matter whether ecpg conveyed each password
in good order.  The dblink change does replace a non-superuser reconnection
with a SET SESSION AUTHORIZATION.

> I believe the point of those changes is that the scaffolding
> only sets up a password for the original superuser, so that connecting
> as anybody else will fail if the test postmaster is configured for
> password auth.

Essentially, yes.  You can connect as another user if you assign a password;
the ecpg suite does so.  (That user had better be unprivileged.)  The dblink
change has a second point.  Since the dblink_regression_test role has use of
the dblink_connect_u() function, it needs to be treated as a privileged
account.  (I'll add a comment about that.)

> Another issue is that (I presume, haven't checked) "make installcheck"
> on contrib or ecpg will currently fail against a server that's not
> configured for trust auth.  Ideally you should be able to do "make
> installcheck" against a reasonably-configured server.

No, I had verified "make installcheck-world" under md5 auth.  The setup I
recommend, which I mentioned in the initial message of this thread, is to put
the same PGPASSFILE in the postmaster environment as you put in the "make
installcheck" environment.  (It's contrib/dblink and contrib/postgres_fdw that
would otherwise fail.)

nm

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Securing "make check" (CVE-2014-0067)

2014-03-01 Thread Noah Misch

On Sat, Mar 01, 2014 at 05:51:46PM -0500, Andrew Dunstan wrote:
> On 03/01/2014 05:10 PM, Tom Lane wrote:
> >One other thought here: is it actually reasonable to expend a lot of effort
> >on the Windows case?  I'm not aware that people normally expect a Windows
> >box to have multiple users at all, let alone non-mutually-trusting users.
> 
> As Stephen said, it's fairly unusual. There are usually quite a few
> roles, but it's rare to have more than one "human" type role
> connected to the machine at a given time.

I, too, agree it's rare.  Rare enough to justify leaving the vulnerability
open on Windows, indefinitely?  I'd say not.  Windows itself has been pushing
steadily toward better multi-user support over the past 15 years or so.
Releasing software for Windows as though it were a single-user platform is
backwards-looking.  We should be a model in this area, not a straggler.

> I'd be happy doing nothing in this case, or not very much. e.g.
> provide a password but not with great cryptographic strength.

One option that would simplify things is to fix only non-Windows in the back
branches, via socket protection, and fix Windows in HEAD only.  We could even
do so by extending HAVE_UNIX_SOCKETS support to Windows through named pipes.

Using weak passwords on Windows alone would not simplify the effort.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] commit fest status and release timeline

2014-03-01 Thread Fabrízio de Royes Mello

On Sun, Mar 2, 2014 at 12:56 AM, Michael Paquier 
wrote:
>
> On Sun, Mar 2, 2014 at 7:43 AM, Vik Fearing 
wrote:
> > On 03/01/2014 07:50 PM, Josh Berkus wrote:
> >> On 03/01/2014 09:01 AM, Peter Eisentraut wrote:
> >>> Status Summary. Needs Review: 36, Waiting on Author: 7, Ready for
> >>> Committer: 16, Committed: 43, Returned with Feedback: 8, Rejected: 4.
> >>> Total: 114.
> >>>
> >>> We're still on track to achieve about 50% committed patches, which
would
> >>> be similar to the previous few commit fests.  So decent job so far.
> >> So, other than Hstore2/JSONB and Logical Changesets, what are the
> >> big/difficult patches left?
> >
> > For me, I'd really like to see the reduced locks on ALTER TABLE.
> The patch by Peter to improve test coverage for client programs. This
> is helpful for QE/QA teams evaluating Postgres, and it could be
> extended for other things like replication test suite as well as far
> as I understood.
>

+1

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello

Re: [HACKERS] Patch to add support of "IF NOT EXISTS" to others "CREATE" statements

2014-03-01 Thread Fabrízio de Royes Mello

On Sat, Mar 1, 2014 at 7:39 PM, Tom Lane  wrote:
>
> =?ISO-8859-1?Q?Fabr=EDzio_de_Royes_Mello?= 
writes:
> > On Sat, Mar 1, 2014 at 2:11 PM, Tom Lane  wrote:
> >> [ re schema upgrade scenarios ]
> >> Why wouldn't COR semantics answer that requirement just as well, if not
> >> better?
>
> > Just because it will replace the object content... and in some cases
this
> > cannot happen because it will regress the schema to an old version.
>
> That argument seems awfully darn flimsy.

Sorry, I know my use case is very specific...

We don't have this feature is a strong argument just because we can
implement COR instead? Or maybe just we don't want to add more complexity
to source code?

The complexity to source code added by this feature is minimal, but the
result is very useful, and can be used for many tools (i.e. rails
migrations, python alembic, doctrine, and others)

> In any case, given the existence of DO it's simple to code up
> create-if-not-exists behavior with a couple lines of plpgsql; that seems
> to me to be a sufficient answer for corner cases.  create-or-replace is
> not equivalently fakable if the system doesn't supply the functionality.
>

You are completely right.

But we already have "DROP ... IF EXISTS", then I think if we would have
"CREATE ... IF NOT EXISTS" (the inverse behavior) will be very natural...
and I agree in implement "CREATE OR REPLACE" too.

Grettings,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello

Re: [HACKERS] commit fest status and release timeline

2014-03-01 Thread Michael Paquier

On Sun, Mar 2, 2014 at 7:43 AM, Vik Fearing  wrote:
> On 03/01/2014 07:50 PM, Josh Berkus wrote:
>> On 03/01/2014 09:01 AM, Peter Eisentraut wrote:
>>> Status Summary. Needs Review: 36, Waiting on Author: 7, Ready for
>>> Committer: 16, Committed: 43, Returned with Feedback: 8, Rejected: 4.
>>> Total: 114.
>>>
>>> We're still on track to achieve about 50% committed patches, which would
>>> be similar to the previous few commit fests.  So decent job so far.
>> So, other than Hstore2/JSONB and Logical Changesets, what are the
>> big/difficult patches left?
>
> For me, I'd really like to see the reduced locks on ALTER TABLE.
The patch by Peter to improve test coverage for client programs. This
is helpful for QE/QA teams evaluating Postgres, and it could be
extended for other things like replication test suite as well as far
as I understood.
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Securing "make check" (CVE-2014-0067)

2014-03-01 Thread Tom Lane

Andrew Dunstan  writes:
> On 03/01/2014 05:10 PM, Tom Lane wrote:
>> BTW, a different problem with the proposed patch is that it changes
>> some test cases in ecpg and contrib/dblink, apparently to avoid session
>> reconnections.  That seems likely to me to be losing test coverage.
>> Perhaps there is no alternative, but I'd like to have some discussion
>> around that point as well.

> Yeah. Assuming we make the changes you're suggesting that should no 
> longer be necessary, right?

Not sure.  I believe the point of those changes is that the scaffolding
only sets up a password for the original superuser, so that connecting
as anybody else will fail if the test postmaster is configured for
password auth.  If so, the only way to avoid doing any work would be
if we don't implement any fix at all for Windows.

Whether or not you're worried about the security of "make check" on
Windows, there are definitely some things to be unhappy about here.
One being that the core regression tests are evidently not testing
connecting as anybody but superuser; and the proposed changes would remove
such testing from contrib and ecpg as well, which is surely not better
from a coverage standpoint.  (It's not terribly hard to imagine
permissions bugs that would cause postinit.c to fail for non-superusers.)

Another issue is that (I presume, haven't checked) "make installcheck"
on contrib or ecpg will currently fail against a server that's not
configured for trust auth.  Ideally you should be able to do "make
installcheck" against a reasonably-configured server.

I'm not real sure what to do about either of those points, but I am sure
that the proposed patch isn't moving the ball downfield.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-03-01 Thread Kohei KaiGai

2014-03-02 10:38 GMT+09:00 Robert Haas :
> On Wed, Feb 26, 2014 at 10:23 AM, Stephen Frost  wrote:
>> * Kouhei Kaigai (kai...@ak.jp.nec.com) wrote:
>>> IIUC, his approach was integration of join-pushdown within FDW APIs,
>>> however, it does not mean the idea of remote-join is rejected.
>>
>> For my part, trying to consider doing remote joins *without* going
>> through FDWs is just nonsensical.
>
> That is, of course, true by definition, but I think it's putting the
> focus in the wrong place.  It's possible that there are other cases
> when a scan might a plausible path for a joinrel even if there are no
> foreign tables in play.  For example, you could cache the joinrel
> output and then inject a cache scan as a path for the joinrel.
>
That might be an idea to demonstrate usage of custom-scan node,
rather than the (ad-hoc) enhancement of postgres_fdw.
As I have discussed in another thread, it is available to switch heap
reference by cache reference on the fly, it shall be a possible use-
case for custom-scan node.

So, I'm inclined to drop the portion for postgres_fdw in my submission
to focus on custom-scan capability.

Thanks,
-- 
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-03-01 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
> For what it's worth, and I can't claim to have all the answers here,
> this doesn't match my expectation.  I think we'll do two kinds of
> parallelism.  One will be parallelism within nodes, like parallel sort
> or parallel seqscan.  Any node we parallelize this way is likely to be
> heavily rewritten, or else to get a sister that looks quite different
> from the original.  

Sure.

> The other kind of parallelism will involve pushing
> a whole subtree of the plan into a different node.  In this case we'll
> need to pass data between nodes in some different way (this was one of
> the major reasons I designed the shm_mq stuff) but the nodes
> themselves should change little if at all.

It's that "some different way" of passing data between the nodes that
makes me worry, but I hope you're right and we won't actually need to
change the interfaces or the nodes very much.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-03-01 Thread Kohei KaiGai

2014-03-02 10:29 GMT+09:00 Stephen Frost :
> * Kohei KaiGai (kai...@kaigai.gr.jp) wrote:
>> As you mentioned, it is a headache for packagers, and does not make
>> sense for us if packager disabled the feature that requires proprietary
>> drivers.
>
> No, I disagree with that.  I don't expect this use-case to be very
> common to begin with and telling individuals that they have to compile
> it themselves is certainly not out of the question.
>
>> In fact, Fedora / RHEL does not admit to distribute software
>> under the none open source software license.
>
> I'm pretty confident you can get RPMs for those distributions.
>
>> Obviously, nvidia's cuda
>> is a library being distributed under the proprietary license, thus out of
>> the scope for the Linux distributors.
>
> This also doesn't make any sense to me- certainly the CUDA libraries are
> available under Debian derivatives, along with open-source wrappers for
> them like pycuda.
>
>> It also leads them to turn off the
>> feature that shall be linked with proprietary drivers.
>> All we can do is to implement these features as extension, then offer
>> an option for users to use or not to use.
>
> No, we can tell individuals who want it that they're going to need to
> build with support for it.  We don't offer OpenSSL as an extension (I
> certainly wish we did- and had a way to replace it w/ GNUTLS or one of
> the better licensed options).
>
I know there is some alternative ways. However, it requires users to take
additional knowledge and setting up efforts, also loses the benefit to use
distributor's Linux if alternative RPMs are required.
I don't want to recommend users such a complicated setting up procedure.

>> What I'd like to implement is GPU acceleration that can perform on
>> regular tables, not only foreign tables. Also, regarding to the backlog
>> in the developer meeting, pluggable executor node is also required
>> feature by PG-XC folks to work their project with upstream.
>> I think custom-scan feature is capable to host these requirement,
>> and does not prevent enhancement FDW features.
>
> I think you're conflating things again- while it might be possible to
> use CustomScan to implement FDW join-pushdown or FDW aggregate-pushdown,
> *I* don't think it's the right approach.  Regarding the PG-XC
> requirement, I expect they're looking for FDW join/aggregate-pushdown
> and also see that it *could* be done w/ CustomScan.
>
The reason why I submitted the part-3 patch (that enhances postgres_fdw
for remote-join using custom-scan) is easy to demonstrate the usage of
join-replacement by a special scan, with minimum scale of the patch to be
reviewed. If we have another idea to demonstrate it, I don't stick on the remot-
join feature on foreign tables.
Regarding to the PG-XC, I didn't know their exact needs because I didn't
attend the cluster meeting, but the someone mentioned about pluggable
plan/exec node in this context.

> We could punt on the whole thing and drop in hooks which could be used
> to replace everything done from the planner through to the executor and
> then anyone *could* implement any of the above, and parallel query too.
> That doesn't make it the right approach.
>
That is a problem I pointed out in the last developer meeting. Because we
have no way to enhance a part of plan / exec logic by extension, extension
has to replace whole of the planner / executor using hooks. It is painful for
authors of extensions.

Thanks,
-- 
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-03-01 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
> I don't see that parallelizing Append is any easier than any other
> problem in this space.  There's no parallel I/O facility, so you need
> a background worker per append branch to wait on I/O.  And you have
> all the problems of making sure that the workers have the same
> snapshot, making sure they don't self-deadlock, etc. that you have for
> any other case.

Erm, my thought was to use a select() loop which sends out I/O requests
and then loops around waiting to see who finishes it.  It doesn't
parallelize the CPU cost of getting the rows back to the caller, but
it'd at least parallelize the I/O, and if what's underneath is actually
a remote FDW running a complex query (because the other side is actually
a view), it would be a massive win to have all the remote FDWs executing
concurrently instead of serially as we have today.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-03-01 Thread Robert Haas

On Sat, Mar 1, 2014 at 8:49 PM, Stephen Frost  wrote:
>> This critique seems pretty odd to me.  I haven't had the time to look
>> at this patch set, but I don't see why anyone would want to use the
>> background worker facility for GPU acceleration, which is what
>> KaiGai's trying to accomplish here.
>
> Eh, that didn't come out quite right.  I had intended it to be more
> along the lines of "look at what Robert's doing".
>
> I was trying to point out that parallel query execution is coming soon
> thanks to the work on background worker and that parallel query
> execution might drive changes to the way nodes interact in the executor
> driving a need to change the API.  In other words, CustomScan could
> easily end up being broken by that change and I'd rather we not have to
> worry about such breakage.

I think the relation is pretty tangential.  I'm worried about the
possibility that the Custom Scan API is broken *ab initio*, but I'm
not worried about a conflict with parallel query.

>> I seriously doubt there's any real conflict with parallelism here.
>> Parallelism may indeed add more ways to scan a relation
>> (ParallelSeqScan, ParallelIndexScan?) but that doesn't mean that we
>> shouldn't have a Custom Scan node too.
>
> What about parallel execution through the tree itself, rather than just
> at specific end nodes like SeqScan and IndexScan?  Or parallel
> aggregates?  I agree that simple parallel SeqScan/IndexScan isn't going
> to change any of the interfaces, but surely we're going for more than
> that.  Indeed, I'm wishing that I had found more time to spend on just
> a simple select-based Append node which could parallelize I/O across
> tablespaces and FDWs underneath the Append.

Well, as I said in another recent reply that you probably got after
sending this, when you split between nodes, that mostly just has to do
with how you funnel the tuples from one node to another.  The nodes
themselves probably don't even need to know.  Or at least that's what
I'd hope.

I don't see that parallelizing Append is any easier than any other
problem in this space.  There's no parallel I/O facility, so you need
a background worker per append branch to wait on I/O.  And you have
all the problems of making sure that the workers have the same
snapshot, making sure they don't self-deadlock, etc. that you have for
any other case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-03-01 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
> On Wed, Feb 26, 2014 at 3:02 AM, Stephen Frost  wrote:
> > The line between a foreign table and a local one is becoming blurred
> > already, but still, if this is the goal then I really think the
> > background worker is where you should be focused, not on this Custom
> > Scan API.  Consider that, once we've got proper background workers,
> > we're going to need new nodes which operate in parallel (or some other
> > rejiggering of the nodes- I don't pretend to know exactly what Robert is
> > thinking here, and I've apparently forgotten it if he's posted it
> > somewhere) and those interfaces may drive changes which would impact the
> > Custom Scan API- or worse, make us deprecate or regret having added it
> > because now we'll need to break backwards compatibility to add in the
> > parallel node capability to satisfy the more general non-GPU case.
> 
> This critique seems pretty odd to me.  I haven't had the time to look
> at this patch set, but I don't see why anyone would want to use the
> background worker facility for GPU acceleration, which is what
> KaiGai's trying to accomplish here.

Eh, that didn't come out quite right.  I had intended it to be more
along the lines of "look at what Robert's doing".

I was trying to point out that parallel query execution is coming soon
thanks to the work on background worker and that parallel query
execution might drive changes to the way nodes interact in the executor
driving a need to change the API.  In other words, CustomScan could
easily end up being broken by that change and I'd rather we not have to
worry about such breakage.

> I seriously doubt there's any real conflict with parallelism here.
> Parallelism may indeed add more ways to scan a relation
> (ParallelSeqScan, ParallelIndexScan?) but that doesn't mean that we
> shouldn't have a Custom Scan node too.

What about parallel execution through the tree itself, rather than just
at specific end nodes like SeqScan and IndexScan?  Or parallel
aggregates?  I agree that simple parallel SeqScan/IndexScan isn't going
to change any of the interfaces, but surely we're going for more than
that.  Indeed, I'm wishing that I had found more time to spend on just
a simple select-based Append node which could parallelize I/O across
tablespaces and FDWs underneath the Append.

> Indeed, my principle concern
> about this patch set isn't that it's too specialized, as you seem to
> be worrying about, but that it's aiming to satisfy *too many* use
> cases.  I think FDW join pushdown is a fairly different problem from
> adding a custom scan type, and I doubt one patch should try to solve
> both problems.

Yeah, I've voiced those same concerns later in the thread also,
specifically that this punts on nearly everything and expects the
implementor to figure it all out.  We should be able to do better wrt
FDW join-pushdown, etc.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-03-01 Thread Robert Haas

On Fri, Feb 28, 2014 at 10:36 AM, Stephen Frost  wrote:
> * Kouhei Kaigai (kai...@ak.jp.nec.com) wrote:
>> * Stephen Frost (sfr...@snowman.net) wrote:
>> > I don't see how you can be when there hasn't been any discussion that I've
>> > seen about how parallel query execution is going to change things for us.
>> >
>> If parallel query execution changes whole of the structure of plan nodes,
>> it will also affect to the interface of custom-scan because it is a thin-
>> abstraction of plan-node. However, if parallel execution feature is
>> implemented as one of new plan node in addition to existing one, I cannot
>> imagine a scenario that affects to the structure of another plan node.
>
> Let's just say that I have doubts that we'll be able to implement
> parallel execution *without* changing the plan node interface in some
> way which will require, hopefully minor, changes to all of the nodes.
> The issue is that even a minor change would break the custom-scan API
> and we'd immediately be in the boat of dealing with complaints regarding
> backwards compatibility.  Perhaps we can hand-wave that, and we've had
> some success changing hook APIs between major releases, but such changes
> may also be in ways which wouldn't break in obvious ways or even
> possibly be changes which have to be introduced into back-branches.
> Parallel query is going to be brand-new real soon and it's reasonable to
> think we'll need to make bug-fix changes to it after it's out which
> might even involve changes to the API which is developed for it.

For what it's worth, and I can't claim to have all the answers here,
this doesn't match my expectation.  I think we'll do two kinds of
parallelism.  One will be parallelism within nodes, like parallel sort
or parallel seqscan.  Any node we parallelize this way is likely to be
heavily rewritten, or else to get a sister that looks quite different
from the original.  The other kind of parallelism will involve pushing
a whole subtree of the plan into a different node.  In this case we'll
need to pass data between nodes in some different way (this was one of
the major reasons I designed the shm_mq stuff) but the nodes
themselves should change little if at all.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-03-01 Thread Robert Haas

On Wed, Feb 26, 2014 at 10:23 AM, Stephen Frost  wrote:
> * Kouhei Kaigai (kai...@ak.jp.nec.com) wrote:
>> IIUC, his approach was integration of join-pushdown within FDW APIs,
>> however, it does not mean the idea of remote-join is rejected.
>
> For my part, trying to consider doing remote joins *without* going
> through FDWs is just nonsensical.

That is, of course, true by definition, but I think it's putting the
focus in the wrong place.  It's possible that there are other cases
when a scan might a plausible path for a joinrel even if there are no
foreign tables in play.  For example, you could cache the joinrel
output and then inject a cache scan as a path for the joinrel.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-03-01 Thread Robert Haas

On Wed, Feb 26, 2014 at 3:02 AM, Stephen Frost  wrote:
>> The custom-scan node is intended to perform on regular relations, not
>> only foreign tables. It means a special feature (like GPU acceleration)
>> can perform transparently for most of existing applications. Usually,
>> it defines regular tables for their work on installation, not foreign
>> tables. It is the biggest concern for me.
>
> The line between a foreign table and a local one is becoming blurred
> already, but still, if this is the goal then I really think the
> background worker is where you should be focused, not on this Custom
> Scan API.  Consider that, once we've got proper background workers,
> we're going to need new nodes which operate in parallel (or some other
> rejiggering of the nodes- I don't pretend to know exactly what Robert is
> thinking here, and I've apparently forgotten it if he's posted it
> somewhere) and those interfaces may drive changes which would impact the
> Custom Scan API- or worse, make us deprecate or regret having added it
> because now we'll need to break backwards compatibility to add in the
> parallel node capability to satisfy the more general non-GPU case.

This critique seems pretty odd to me.  I haven't had the time to look
at this patch set, but I don't see why anyone would want to use the
background worker facility for GPU acceleration, which is what
KaiGai's trying to accomplish here.  Surely you want, if possible, to
copy the data directly from the user backend into the GPU's working
memory.  What would the use of a background worker be?  We definitely
need background workers to make use of additional *CPUs*, but I don't
see what good they are in leveraging *GPUs*.

I seriously doubt there's any real conflict with parallelism here.
Parallelism may indeed add more ways to scan a relation
(ParallelSeqScan, ParallelIndexScan?) but that doesn't mean that we
shouldn't have a Custom Scan node too.  Indeed, my principle concern
about this patch set isn't that it's too specialized, as you seem to
be worrying about, but that it's aiming to satisfy *too many* use
cases.  I think FDW join pushdown is a fairly different problem from
adding a custom scan type, and I doubt one patch should try to solve
both problems.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-03-01 Thread Stephen Frost

* Kohei KaiGai (kai...@kaigai.gr.jp) wrote:
> As you mentioned, it is a headache for packagers, and does not make
> sense for us if packager disabled the feature that requires proprietary
> drivers.

No, I disagree with that.  I don't expect this use-case to be very
common to begin with and telling individuals that they have to compile
it themselves is certainly not out of the question.

> In fact, Fedora / RHEL does not admit to distribute software
> under the none open source software license.

I'm pretty confident you can get RPMs for those distributions.

> Obviously, nvidia's cuda
> is a library being distributed under the proprietary license, thus out of
> the scope for the Linux distributors. 

This also doesn't make any sense to me- certainly the CUDA libraries are
available under Debian derivatives, along with open-source wrappers for
them like pycuda.

> It also leads them to turn off the
> feature that shall be linked with proprietary drivers.
> All we can do is to implement these features as extension, then offer
> an option for users to use or not to use.

No, we can tell individuals who want it that they're going to need to
build with support for it.  We don't offer OpenSSL as an extension (I
certainly wish we did- and had a way to replace it w/ GNUTLS or one of
the better licensed options).

> What I'd like to implement is GPU acceleration that can perform on
> regular tables, not only foreign tables. Also, regarding to the backlog
> in the developer meeting, pluggable executor node is also required
> feature by PG-XC folks to work their project with upstream.
> I think custom-scan feature is capable to host these requirement,
> and does not prevent enhancement FDW features.

I think you're conflating things again- while it might be possible to
use CustomScan to implement FDW join-pushdown or FDW aggregate-pushdown,
*I* don't think it's the right approach.  Regarding the PG-XC
requirement, I expect they're looking for FDW join/aggregate-pushdown
and also see that it *could* be done w/ CustomScan.

We could punt on the whole thing and drop in hooks which could be used
to replace everything done from the planner through to the executor and
then anyone *could* implement any of the above, and parallel query too.
That doesn't make it the right approach.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-03-01 Thread Kohei KaiGai

2014-03-02 9:51 GMT+09:00 Stephen Frost :
> KaiGai,
>
> * Kohei KaiGai (kai...@kaigai.gr.jp) wrote:
>> Now we have two options for GPU programming: CUDA or OpenCL.
>> Both of libraries and drivers are provided under the proprietary license,
>> so it does not fit for the core implementation of PostgreSQL, but
>> extensions that shall be installed on user's responsibility.
>
> Being able to work with libraries which are not BSD licensed doesn't
> change the license under which PostgreSQL code is released.  Nor does it
> require PostgreSQL to be licensed in any different way from how it is
> today.  Where it would get a bit ugly, I agree, is for the packagers who
> have to decide if they want to build against those libraries or not.  We
> might be able to make things a bit easier by having a startup-time
> determination of if these nodes are able to be used or not.  This isn't
> unlike OpenSSL which certainly isn't BSD nor is it even GPL-compatible,
> a situation which causes a great deal of difficulty already due to the
> whole readline nonsense- but it's difficulty for the packagers, not for
> the PostgreSQL project, per se.
>
As you mentioned, it is a headache for packagers, and does not make
sense for us if packager disabled the feature that requires proprietary
drivers. In fact, Fedora / RHEL does not admit to distribute software
under the none open source software license. Obviously, nvidia's cuda
is a library being distributed under the proprietary license, thus out of
the scope for the Linux distributors. It also leads them to turn off the
feature that shall be linked with proprietary drivers.
All we can do is to implement these features as extension, then offer
an option for users to use or not to use.

>> Because of the story, I brought up a discussion about pluggable
>> planner/executor node (as a basis of GPU acceleration) in the last
>> developer meeting, then has worked for custom-scan node interface.
>
> And I'm still unconvinced of this approach and worry that it's going to
> break more often than it works.  That's my 2c on it, but I won't get in
> the way if someone else wants to step up and support it.  As I mentioned
> up-thread, I'd really like to see FDW join push-down, FDW aggregate
> push-down, parallel query execution, and parallel remote-FDW execution
> and I don't see this CustomScan approach as the right answer to any of
> those.
>
It's right approach for FDW functionality enhancement, I never opposed to.

What I'd like to implement is GPU acceleration that can perform on
regular tables, not only foreign tables. Also, regarding to the backlog
in the developer meeting, pluggable executor node is also required
feature by PG-XC folks to work their project with upstream.
I think custom-scan feature is capable to host these requirement,
and does not prevent enhancement FDW features.

Thanks,
-- 
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-03-01 Thread Stephen Frost

KaiGai,

* Kohei KaiGai (kai...@kaigai.gr.jp) wrote:
> Now we have two options for GPU programming: CUDA or OpenCL.
> Both of libraries and drivers are provided under the proprietary license,
> so it does not fit for the core implementation of PostgreSQL, but
> extensions that shall be installed on user's responsibility.

Being able to work with libraries which are not BSD licensed doesn't
change the license under which PostgreSQL code is released.  Nor does it
require PostgreSQL to be licensed in any different way from how it is
today.  Where it would get a bit ugly, I agree, is for the packagers who
have to decide if they want to build against those libraries or not.  We
might be able to make things a bit easier by having a startup-time
determination of if these nodes are able to be used or not.  This isn't
unlike OpenSSL which certainly isn't BSD nor is it even GPL-compatible,
a situation which causes a great deal of difficulty already due to the
whole readline nonsense- but it's difficulty for the packagers, not for
the PostgreSQL project, per se.

> Because of the story, I brought up a discussion about pluggable
> planner/executor node (as a basis of GPU acceleration) in the last
> developer meeting, then has worked for custom-scan node interface.

And I'm still unconvinced of this approach and worry that it's going to
break more often than it works.  That's my 2c on it, but I won't get in
the way if someone else wants to step up and support it.  As I mentioned
up-thread, I'd really like to see FDW join push-down, FDW aggregate
push-down, parallel query execution, and parallel remote-FDW execution
and I don't see this CustomScan approach as the right answer to any of
those.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-03-01 Thread Kohei KaiGai

2014-03-01 22:38 GMT+09:00 Stephen Frost :
> KaiGai,
>
> * Kohei KaiGai (kai...@kaigai.gr.jp) wrote:
>> BTW, this kind of discussion looks like a talk with a ghost because
>> we cannot see the new interface according to the parallel execution
>> right now, so we cannot have tangible investigation whether it becomes
>> really serious backward incompatibility, or not.
>
> Yeah, it would certainly be nice if we had all of the answers up-front.
> What I keep hoping for is that someone who has been working on this area
> (eg: Robert) would speak up...
>
I'd also like to see his opinion.

>> My bet is minor one. I cannot imagine plan-node interface that does
>> not support existing non-parallel SeqScan or NestLoop and so on.
>
> Sure you can- because once we change the interface, we're probably going
> to go through and make everything use the new one rather than have to
> special-case things.  That's more-or-less exactly my point here because
> having an external hook like CustomScan would make that kind of
> wholesale change more difficult.
>
I think, we should follow the general rule in case of custom-scan also.
In other words, it's responsibility of extension's author to follow up the
latest specification of interfaces.
For example, I have an extension module that is unable to work on the
latest PG- code because of interface changes at ProcessUtility_hook.
Is it a matter of backward incompatibility? Definitely, no. It should be
my job.

> That does *not* mean I'm against using GPUs and GPU optimizations.  What
> it means is that I'd rather see that done in core, which would allow us
> to simply change that interface along with the rest when doing wholesale
> changes and not have to worry about backwards compatibility and breaking
> other people's code.
>
I also have to introduce a previous background discussion.
Now we have two options for GPU programming: CUDA or OpenCL.
Both of libraries and drivers are provided under the proprietary license,
so it does not fit for the core implementation of PostgreSQL, but
extensions that shall be installed on user's responsibility.
Because of the story, I brought up a discussion about pluggable
planner/executor node (as a basis of GPU acceleration) in the last
developer meeting, then has worked for custom-scan node interface.

Thanks,
-- 
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: Patch FORCE_NULL option for copy COPY in CSV mode

2014-03-01 Thread Andrew Dunstan



On 01/29/2014 10:59 AM, Ian Lawrence Barwick wrote:

2014/1/29 Ian Lawrence Barwick :

2014-01-29 Andrew Dunstan :

On 01/28/2014 05:55 AM, Ian Lawrence Barwick wrote:


Hi Payal

Many thanks for the review, and my apologies for not getting back to
you earlier.

Updated version of the patch attached with suggested corrections.

On a very quick glance, I see that you have still not made adjustments to
contrib/file_fdw to accommodate this new option. I don't see why this COPY
option should be different in that respect.

Hmm, that idea seems to have escaped me completely. I'll get onto it forthwith.

Striking while the keyboard is hot... version with contrib/file_fdw
modifications
attached.






I have reviewed this. Generally it's good, but the author has made a 
significant error - the idea is not to force a quoted empty string to 
null, but to force a quoted null string to null, whatever the null 
string might be. The default case has these the same, but if you specify 
a non-empty null string they aren't.


That difference actually made the file_fdw regression results plain 
wrong, in my view, in that they expected a quoted empty string to be 
turned to null even when the null string was something else.


I've adjusted this and the docs and propose to apply the attached patch 
in the next day or two unless there are any objections.


cheers

andrew



diff --git a/contrib/file_fdw/data/text.csv b/contrib/file_fdw/data/text.csv
index ed348a9..f55d9cf 100644
--- a/contrib/file_fdw/data/text.csv
+++ b/contrib/file_fdw/data/text.csv
@@ -1,4 +1,5 @@
-AAA,aaa
-XYZ,xyz
-NULL,NULL
-ABC,abc
+AAA,aaa,123,""
+XYZ,xyz,"",321
+NULL,NULL,NULL,NULL
+NULL,NULL,"NULL",NULL
+ABC,abc,"",""
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 5639f4d..7fb1dbc 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -48,9 +48,9 @@ struct FileFdwOption
 
 /*
  * Valid options for file_fdw.
- * These options are based on the options for COPY FROM command.
- * But note that force_not_null is handled as a boolean option attached to
- * each column, not as a table option.
+ * These options are based on the options for the COPY FROM command.
+ * But note that force_not_null and force_null are handled as boolean options
+ * attached to a column, not as table options.
  *
  * Note: If you are adding new option for user mapping, you need to modify
  * fileGetOptions(), which currently doesn't bother to look at user mappings.
@@ -69,7 +69,7 @@ static const struct FileFdwOption valid_options[] = {
 	{"null", ForeignTableRelationId},
 	{"encoding", ForeignTableRelationId},
 	{"force_not_null", AttributeRelationId},
-
+	{"force_null", AttributeRelationId},
 	/*
 	 * force_quote is not supported by file_fdw because it's for COPY TO.
 	 */
@@ -187,6 +187,7 @@ file_fdw_validator(PG_FUNCTION_ARGS)
 	Oid			catalog = PG_GETARG_OID(1);
 	char	   *filename = NULL;
 	DefElem*force_not_null = NULL;
+	DefElem*force_null = NULL;
 	List	   *other_options = NIL;
 	ListCell   *cell;
 
@@ -243,10 +244,10 @@ file_fdw_validator(PG_FUNCTION_ARGS)
 		}
 
 		/*
-		 * Separate out filename and force_not_null, since ProcessCopyOptions
-		 * won't accept them.  (force_not_null only comes in a boolean
-		 * per-column flavor here.)
+		 * Separate out filename and column-specific options, since
+		 * ProcessCopyOptions won't accept them.
 		 */
+
 		if (strcmp(def->defname, "filename") == 0)
 		{
 			if (filename)
@@ -255,16 +256,42 @@ file_fdw_validator(PG_FUNCTION_ARGS)
 		 errmsg("conflicting or redundant options")));
 			filename = defGetString(def);
 		}
+		/*
+		 * force_not_null is a boolean option; after validation we can discard
+		 * it - it will be retrieved later in get_file_fdw_attribute_options()
+		 */
 		else if (strcmp(def->defname, "force_not_null") == 0)
 		{
 			if (force_not_null)
 ereport(ERROR,
 		(errcode(ERRCODE_SYNTAX_ERROR),
-		 errmsg("conflicting or redundant options")));
+		 errmsg("conflicting or redundant options"),
+		 errhint("option \"force_not_null\" supplied more than once for a column")));
+			if(force_null)
+ereport(ERROR,
+		(errcode(ERRCODE_SYNTAX_ERROR),
+		 errmsg("conflicting or redundant options"),
+		 errhint("option \"force_not_null\" cannot be used together with \"force_null\"")));
 			force_not_null = def;
 			/* Don't care what the value is, as long as it's a legal boolean */
 			(void) defGetBoolean(def);
 		}
+		/* See comments for force_not_null above */
+		else if (strcmp(def->defname, "force_null") == 0)
+		{
+			if (force_null)
+ereport(ERROR,
+		(errcode(ERRCODE_SYNTAX_ERROR),
+		 errmsg("conflicting or redundant options"),
+		 errhint("option \"force_null\" supplied more than once for a column")));
+			if(force_not_null)
+ereport(ERROR,
+		(errcode(ERRCODE_SYNTAX_ERROR),
+		 errmsg("conflicting or redundant options"),
+		 errhint("option \"force_null\" cannot be used together with

Re: [HACKERS] proposal: new long psql parameter --on-error-stop

2014-03-01 Thread Fabrízio de Royes Mello

On Sat, Mar 1, 2014 at 5:37 AM, Pavel Stehule 
wrote:
>
> Hello
>
> here is a prototype:
>
> bash-4.1$ /usr/local/pgsql/bin/psql --help-variables
> List of some variables (options) for use from command line.
> Complete list you find in psql section in the PostgreSQL documentation.
>
> psql variables:
> Usage:
>   psql --set=NAME=VALUE
>   or \set NAME VALUE in interactive mode
>
>   AUTOCOMMIT when is on, successful SQL command is automatically
commited
>   COMP_KEYWORD_CASE  determines which letter case to use when completing
an SQL key word
>   ECHO   all lines from input can be written to standard
output
>   ECHO_HIDDENdisplay queries for internal commands (same as -E
option)
>   FETCH_COUNThow many rows should be for one page (default 0
unlimited)
>   HISTFILE   file name that be used for store history list
>   HISTSIZE   the number of commands to store in the command
history
>   ON_ERROR_ROLLBACK  when is on, raise ROLLBACK on error automatically
>   ON_ERROR_STOP  when is set, then batch execution stop immediately
after error
>   VERBOSITY  control verbosity of error reports [default,
verbose, terse]
>
> Printing options:
> Usage:
>   psql --pset=NAME[=VALUE]
>   or \pset NAME [VALUE] in interactive mode
>
>   border number of border style
>   fieldsep   specify field separator for unaligned output
>   fieldsep_zero  field separator in unaligned mode will be zero
>   format set output format [unaligned, aligned, wrapped,
html, latex, ..]
>   linestyle  sets the border line drawing style [ascii,
old-ascii, unicode]
>   null   sets the string to be printed in place of a null
value
>   pager  when the pager option is off, the pager program is
not used
>   recordsep  specifies the record (line) separator to use in
unaligned output format
>   recordsep_zero record separator be in unaligned output format a
zero byte
>   title  sets the table title for any subsequently printed
tables
>   tuples_onlyin tuples-only mode, only actual table data is shown
>
> Environment options:
> Usage:
>   NAME=VALUE, [NAME=VALUE] psql ...
>   or \setenv NAME [VALUE] in interactive mode
>
>   COLUMNSnumber of columns for wrapped format
>   PAGER  used pager
>   PGHOST same as the host connection parameter
>   PGDATABASE same as the dbname connection parameter
>   PGUSER same as the user connection parameter
>   PGPASSWORD possibility to set password
>   PSQL_EDITOR, EDITOR, VISUAL  editor used by \e \ef commands
>   PSQL_EDITOR_LINE_NUMBER_ARG  style how to line number is used in editor
>   PSQL_HISTORY   alternative location for the command history file
>   PSQL_RCalternative location of the user's .psqlrc file
>   SHELL  command executed by the \! command
>   TMPDIR directory for storing temporary files
>
> For more information consult the psql section in the PostgreSQL
> documentation.
>

The patch is ok (apply to master and apply to master without errors).

Maybe we must show the possible values for each variable/option too.

Thinking more about it, would be nice if we have the possibility to show
help for commands too. Some like that:

$ psql -H vacuum
Command: VACUUM
Description: garbage-collect and optionally analyze a database
Syntax:
VACUUM [ ( { FULL | FREEZE | VERBOSE | ANALYZE } [, ...] ) ] [ table_name [
(column_name [, ...] ) ] ]
VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ table_name ]
VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] ANALYZE [ table_name [ (column_name
[, ...] ) ] ]

$ psql --help-command=vacuum
Command: VACUUM
Description: garbage-collect and optionally analyze a database
Syntax:
VACUUM [ ( { FULL | FREEZE | VERBOSE | ANALYZE } [, ...] ) ] [ table_name [
(column_name [, ...] ) ] ]
VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ table_name ]
VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] ANALYZE [ table_name [ (column_name
[, ...] ) ] ]

It's only an idea that occurred to me reading this thread!

Grettings,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello

Re: [HACKERS] Securing "make check" (CVE-2014-0067)

2014-03-01 Thread Andrew Dunstan



On 03/01/2014 05:10 PM, Tom Lane wrote:


One other thought here: is it actually reasonable to expend a lot of effort
on the Windows case?  I'm not aware that people normally expect a Windows
box to have multiple users at all, let alone non-mutually-trusting users.



As Stephen said, it's fairly unusual. There are usually quite a few 
roles, but it's rare to have more than one "human" type role connected 
to the machine at a given time.


I'd be happy doing nothing in this case, or not very much. e.g. provide 
a password but not with great cryptographic strength.




BTW, a different problem with the proposed patch is that it changes
some test cases in ecpg and contrib/dblink, apparently to avoid session
reconnections.  That seems likely to me to be losing test coverage.
Perhaps there is no alternative, but I'd like to have some discussion
around that point as well.





Yeah. Assuming we make the changes you're suggesting that should no 
longer be necessary, right?


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] commit fest status and release timeline

2014-03-01 Thread Vik Fearing

On 03/01/2014 07:50 PM, Josh Berkus wrote:
> On 03/01/2014 09:01 AM, Peter Eisentraut wrote:
>> Status Summary. Needs Review: 36, Waiting on Author: 7, Ready for
>> Committer: 16, Committed: 43, Returned with Feedback: 8, Rejected: 4.
>> Total: 114.
>>
>> We're still on track to achieve about 50% committed patches, which would
>> be similar to the previous few commit fests.  So decent job so far.
> So, other than Hstore2/JSONB and Logical Changesets, what are the
> big/difficult patches left?

For me, I'd really like to see the reduced locks on ALTER TABLE. 

-- 
Vik



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to add support of "IF NOT EXISTS" to others "CREATE" statements

2014-03-01 Thread Fabrízio de Royes Mello

On Sat, Mar 1, 2014 at 2:11 PM, Tom Lane  wrote:
>
> =?ISO-8859-1?Q?Fabr=EDzio_de_Royes_Mello?= 
writes:
> > On Sat, Jan 18, 2014 at 11:12 PM, Stephen Frost 
wrote:
> >> Fabrízio, can you clarify the use-case for things like CREATE AGGREGATE
> >> to have IF NOT EXISTS rather than OR REPLACE, or if there is a reason
> >> why both should exist?  Complicating our CREATE options is not
something
> >> we really wish to do without good reason and we certainly don't want to
> >> add something now that we'll wish to remove in another version or two.
>
> > Well I have a scenario with many servers to deploy DDL scripts, and
most of
> > them we must run without transaction control because some tasks like
CREATE
> > INDEX CONCURRENTLY, DROP/CREATE DATABASE, CLUSTER, etc.
>
> > When an error occurs the script stops, but the previous commands was
> > commited, then we must review the script to comment parts that was
already
> > executed and then run it again. Until now is not a really trouble, but
in
> > some cases we must deploy another DDL script that contains a new
version of
> > some object before we finish to fix the previous version that was in
> > production, and if we have CINE for all CREATE objects this task will
more
> > easy because we just run it again without care if will replace the
content
> > and do not produce an error.
>
> Why wouldn't COR semantics answer that requirement just as well, if not
> better?
>

Just because it will replace the object content... and in some cases this
cannot happen because it will regress the schema to an old version.

I know it's a very specific use case, but in a scenario with many servers
and many automated tasks in different pipelines, CINE will be very useful.
I have this kind of troubles mostly with functions (we use COR), and
sometimes we will discover that the production version of function is wrong
after we receive a user notify, and in this situation many times we spend a
lot of effort do fix the whole damage.

Grettings,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQ
>> Timbira: http://www.timbira.com.br
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello

Re: [HACKERS] Patch to add support of "IF NOT EXISTS" to others "CREATE" statements

2014-03-01 Thread Tom Lane

=?ISO-8859-1?Q?Fabr=EDzio_de_Royes_Mello?=  writes:
> On Sat, Mar 1, 2014 at 2:11 PM, Tom Lane  wrote:
>> [ re schema upgrade scenarios ]
>> Why wouldn't COR semantics answer that requirement just as well, if not
>> better?

> Just because it will replace the object content... and in some cases this
> cannot happen because it will regress the schema to an old version.

That argument seems awfully darn flimsy.  On what grounds would you argue
that the script you're sourcing contains versions you want of objects that
aren't there, but not versions you want of objects that are there?  If
the script is out of date, it seems more likely that you'd end up with
back-rev versions of the newly created objects, which very possibly won't
interact well with the newer objects that were already in the database.

In any case, given the existence of DO it's simple to code up
create-if-not-exists behavior with a couple lines of plpgsql; that seems
to me to be a sufficient answer for corner cases.  create-or-replace is
not equivalently fakable if the system doesn't supply the functionality.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Securing "make check" (CVE-2014-0067)

2014-03-01 Thread Tom Lane

Magnus Hagander  writes:
> For a one-off password used locally only, we could also consider just using
> a guid, and generate it using
> http://msdn.microsoft.com/en-us/library/windows/desktop/aa379205(v=vs.85).aspx.

Not sure if that API is intended to create an unpredictable UUID, rather
than just a unique one.

> Obviously windows only though - do we have *any* Unix platforms that can't
> do unix sockets?

I'm not aware of any.  A look into the git history of pg_config_manual.h
shows that QNX and BEOS used to be marked as not having Unix sockets,
but of course we dropped support for those in 2006.  I'd rather bet on
"all non-Windows platforms have Unix sockets" than "all non-Windows
platforms have /dev/urandom"; the former standard is far older than
the latter.

One other thought here: is it actually reasonable to expend a lot of effort
on the Windows case?  I'm not aware that people normally expect a Windows
box to have multiple users at all, let alone non-mutually-trusting users.

BTW, a different problem with the proposed patch is that it changes
some test cases in ecpg and contrib/dblink, apparently to avoid session
reconnections.  That seems likely to me to be losing test coverage.
Perhaps there is no alternative, but I'd like to have some discussion
around that point as well.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Securing "make check" (CVE-2014-0067)

2014-03-01 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> In the case of Unix systems, there is a *far* simpler and more portable
> solution technique, which is to tell the test postmaster to put its socket
> in some non-world-accessible directory created by the test scaffolding.

Yes, yes, yes.

> Of course that doesn't work for Windows, which is why we looked at the
> random-password solution.  But I wonder whether we shouldn't use the
> nonstandard-socket-location approach everywhere else, and only use random
> passwords on Windows.  That would greatly reduce the number of cases to
> worry about for portability of the password-generation code; and perhaps
> we could also push the crypto issue into reliance on some Windows-supplied
> functionality (though I'm just speculating about that part).

Multi-user Windows build systems are *far* more rare than unix
equivilants (though even those are semi-rare in these days w/ all the
VMs running around, but still, you may have University common unix
systems with students building PG- the same just doesn't exist in my
experience on the Windows side).

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2014-03-01 Thread Vik Fearing

On 03/01/2014 12:06 PM, Simon Riggs wrote:
> On 27 February 2014 08:48, Simon Riggs  wrote:
>> On 26 February 2014 15:25, Andres Freund  wrote:
>>> On 2014-02-26 15:15:00 +, Simon Riggs wrote:
 On 26 February 2014 13:38, Andres Freund  wrote:
> Hi,
>
> On 2014-02-26 07:32:45 +, Simon Riggs wrote:
>>> * This definitely should include isolationtester tests actually
>>>   performing concurrent ALTER TABLEs. All that's currently there is
>>>   tests that the locklevel isn't too high, but not that it actually 
>>> works.
>> There is no concurrent behaviour here, hence no code that would be
>> exercised by concurrent tests.
> Huh? There's most definitely new concurrent behaviour. Previously no
> other backends could have a relation open (and locked) while it got
> altered (which then sends out relcache invalidations). That's something
> that should be tested.
 It has been. High volume concurrent testing has been performed, per
 Tom's original discussion upthread, but that's not part of the test
 suite.
>>> Yea, that's not what I am looking for.
>>>
 For other tests I have no guide as to how to write a set of automated
 regression tests. Anything could cause a failure, so I'd need to write
 an infinite set of tests to prove there is no bug *somewhere*. How
 many tests are required? 0, 1, 3, 30?
>>> I think some isolationtester tests for the most important changes in
>>> lock levels are appropriate. Say, create a PRIMARY KEY, DROP INHERIT,
>>> ...  while a query is in progress in a nother session.
>> OK, I'll work on some tests.
>>
>> v18 attached, with v19 coming soon
> v19 complete apart from requested comment additions

I've started to look at this patch and re-read the thread.  The first
thing I noticed is what seems like an automated replace error.  The docs
say "This form requires only an SHARE x EXCLUSIVE lock." where the "an"
was not changed to "a".

Attached is a patch-on-patch to fix this.  A more complete review will
come later.

-- 
Vik

*** a/doc/src/sgml/ref/alter_table.sgml
--- b/doc/src/sgml/ref/alter_table.sgml
***
*** 157,163  ALTER TABLE [ IF EXISTS ] name
table to change.

!   This form requires only an SHARE ROW EXCLUSIVE lock.

--- 157,163 
table to change.

!   This form requires only a SHARE ROW EXCLUSIVE lock.

***
*** 188,194  ALTER TABLE [ IF EXISTS ] name
.

!   This form requires only an SHARE UPDATE EXCLUSIVE lock.

--- 188,194 
.

!   This form requires only a SHARE UPDATE EXCLUSIVE lock.

***
*** 223,229  ALTER TABLE [ IF EXISTS ] name
planner, refer to .

!   This form requires only an SHARE UPDATE EXCLUSIVE lock.

--- 223,229 
planner, refer to .

!   This form requires only a SHARE UPDATE EXCLUSIVE lock.

***
*** 344,350  ALTER TABLE [ IF EXISTS ] name
created. Currently only foreign key constraints may be altered.

!   This form requires only an SHARE ROW EXCLUSIVE lock.

--- 344,350 
created. Currently only foreign key constraints may be altered.

!   This form requires only a SHARE ROW EXCLUSIVE lock.

***
*** 366,372  ALTER TABLE [ IF EXISTS ] name
does not prevent normal write commands against the table while it runs.

!   This form requires only an SHARE UPDATE EXCLUSIVE lock
on the table being altered. If the constraint is a foreign key then
a ROW SHARE lock is also required on
the table referenced by the constraint.
--- 366,372 
does not prevent normal write commands against the table while it runs.

!   This form requires only a SHARE UPDATE EXCLUSIVE lock
on the table being altered. If the constraint is a foreign key then
a ROW SHARE lock is also required on
the table referenced by the constraint.
***
*** 411,417  ALTER TABLE [ IF EXISTS ] name
fire regardless of the current replication mode.

!   This form requires only an SHARE ROW EXCLUSIVE lock.

--- 411,417 
fire regardless of the current replication mode.

!   This form requires only a SHARE ROW EXCLUSIVE lock.

***
*** 439,445  ALTER TABLE [ IF EXISTS ] name
operations.  It does not actually re-cluster the table.

!   This form requires only an SHARE UPDATE EXCLUSIVE lock.

--- 439,445 
operations.  It does not actually re-cluster the table.

!   This

[HACKERS] [PATCH] `pg_dump -Fd` doesn't check write return status...

2014-03-01 Thread Sean Chittenden

The attached patch fixes the case when `pg_dump -Fd …` is called on a partition 
where write(2) fails for some reason or another. In this case, backup jobs were 
returning with a successful exit code even though most of the files in the dump 
directory were all zero length.

I haven’t tested this patch’s failure conditions but the fix seems simple 
enough: cfwrite() needs to have its return status checked everywhere and 
exit_horribly() upon any failure. In this case, callers of _WriteData() were 
not checking the return status and were discarding the negative return status 
(e.g. ENOSPC).

I made a cursory pass over the code and found one other instance where write 
status wasn’t being checked and also included that.

-sc




pg_dump_check_write.patch
Description: Binary data


--
Sean Chittenden
s...@chittenden.org


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] commit fest status and release timeline

2014-03-01 Thread Josh Berkus

On 03/01/2014 09:01 AM, Peter Eisentraut wrote:
> Status Summary. Needs Review: 36, Waiting on Author: 7, Ready for
> Committer: 16, Committed: 43, Returned with Feedback: 8, Rejected: 4.
> Total: 114.
> 
> We're still on track to achieve about 50% committed patches, which would
> be similar to the previous few commit fests.  So decent job so far.

So, other than Hstore2/JSONB and Logical Changesets, what are the
big/difficult patches left?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Securing "make check" (CVE-2014-0067)

2014-03-01 Thread Noah Misch

On Sat, Mar 01, 2014 at 12:29:38PM -0500, Tom Lane wrote:
> There are two big problems with the lets-generate-a-random-password
> approach.  Noah acknowledged the portability issue of possibly not having
> a strong entropy source available.  The other issue though is whether
> doing this doesn't introduce enough crypto dependency into the core code
> to create an export-restrictions hazard.  We've kept contrib/pgcrypto
> out of core all these years in order to give people a relatively
> straightforward solution if they are worried about export laws: just omit
> contrib/pgcrypto.  But "just omit regression testing" isn't palatable.

If pgrand.c poses an export control problem, then be-secure.c poses a greater
one.  pgrand.c is just glue to an OS-provided CSPRNG.

> In the case of Unix systems, there is a *far* simpler and more portable
> solution technique, which is to tell the test postmaster to put its socket
> in some non-world-accessible directory created by the test scaffolding.
> 
> Of course that doesn't work for Windows, which is why we looked at the
> random-password solution.  But I wonder whether we shouldn't use the
> nonstandard-socket-location approach everywhere else, and only use random
> passwords on Windows.  That would greatly reduce the number of cases to
> worry about for portability of the password-generation code;

Further worrying about systems lacking /dev/urandom is a waste of time.  They
will only get less common, and they are already rare enough that we have zero
of them on the buildfarm.  I provided them with a straightforward workaround:
point the PGTESTPWFILE environment variable to a file containing a password.

Having that said, I can appreciate the value of tightening the socket mode for
a bit of *extra* safety:

--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2299,4 +2299,5 @@ regression_main(int argc, char *argv[], init_function 
ifunc, test_function tfunc
fputs("\n# Configuration added by pg_regress\n\n", pg_conf);
fputs("max_prepared_transactions = 2\n", pg_conf);
+   fputs("unix_socket_permissions = 0700\n", pg_conf);

Taking the further step of retaining trust authentication on Unix would make
it easier to commit tests guaranteed to fail on Windows.  I see no benefit
cancelling that out.

> and perhaps
> we could also push the crypto issue into reliance on some Windows-supplied
> functionality (though I'm just speculating about that part).

Every version of the patch has done it that way.  It has used OS-supplied
cryptography on every platform.

nm

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Securing "make check" (CVE-2014-0067)

2014-03-01 Thread Magnus Hagander

On Sat, Mar 1, 2014 at 7:09 PM, Andrew Dunstan  wrote:

>
> On 03/01/2014 12:29 PM, Tom Lane wrote:
>
>
>> In the case of Unix systems, there is a *far* simpler and more portable
>> solution technique, which is to tell the test postmaster to put its socket
>> in some non-world-accessible directory created by the test scaffolding.
>>
>
>
> +1 - I'm all for KISS.
>
>
>
>> Of course that doesn't work for Windows, which is why we looked at the
>> random-password solution.  But I wonder whether we shouldn't use the
>> nonstandard-socket-location approach everywhere else, and only use random
>> passwords on Windows.  That would greatly reduce the number of cases to
>> worry about for portability of the password-generation code; and perhaps
>> we could also push the crypto issue into reliance on some Windows-supplied
>> functionality (though I'm just speculating about that part).
>>
>
>
> See for example  aa379942%28v=vs.85%29.aspx>
>

For a one-off password used locally only, we could also consider just using
a guid, and generate it using
http://msdn.microsoft.com/en-us/library/windows/desktop/aa379205(v=vs.85).aspx.
Obviously windows only though - do we have *any* Unix platforms that can't
do unix sockets?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Securing "make check" (CVE-2014-0067)

2014-03-01 Thread Andrew Dunstan



On 03/01/2014 12:29 PM, Tom Lane wrote:



In the case of Unix systems, there is a *far* simpler and more portable
solution technique, which is to tell the test postmaster to put its socket
in some non-world-accessible directory created by the test scaffolding.



+1 - I'm all for KISS.



Of course that doesn't work for Windows, which is why we looked at the
random-password solution.  But I wonder whether we shouldn't use the
nonstandard-socket-location approach everywhere else, and only use random
passwords on Windows.  That would greatly reduce the number of cases to
worry about for portability of the password-generation code; and perhaps
we could also push the crypto issue into reliance on some Windows-supplied
functionality (though I'm just speculating about that part).



See for example 



cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Securing "make check" (CVE-2014-0067)

2014-03-01 Thread Tom Lane

Noah Misch  writes:
> As announced with last week's releases, use of trust authentication in the
> "make check" temporary database cluster makes it straightforward to hijack the
> OS user account involved.  The prerequisite is another user account on the
> same system.  The solution we discussed on secur...@postgresql.org was to
> switch to md5 authentication with a generated, strong password.

Noah is leaving the impression that there was consensus on that approach;
there was not, which is one reason that this patch didn't simply get
committed last week.

There are two big problems with the lets-generate-a-random-password
approach.  Noah acknowledged the portability issue of possibly not having
a strong entropy source available.  The other issue though is whether
doing this doesn't introduce enough crypto dependency into the core code
to create an export-restrictions hazard.  We've kept contrib/pgcrypto
out of core all these years in order to give people a relatively
straightforward solution if they are worried about export laws: just omit
contrib/pgcrypto.  But "just omit regression testing" isn't palatable.

In the case of Unix systems, there is a *far* simpler and more portable
solution technique, which is to tell the test postmaster to put its socket
in some non-world-accessible directory created by the test scaffolding.

Of course that doesn't work for Windows, which is why we looked at the
random-password solution.  But I wonder whether we shouldn't use the
nonstandard-socket-location approach everywhere else, and only use random
passwords on Windows.  That would greatly reduce the number of cases to
worry about for portability of the password-generation code; and perhaps
we could also push the crypto issue into reliance on some Windows-supplied
functionality (though I'm just speculating about that part).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to add support of "IF NOT EXISTS" to others "CREATE" statements

2014-03-01 Thread Tom Lane

=?ISO-8859-1?Q?Fabr=EDzio_de_Royes_Mello?=  writes:
> On Sat, Jan 18, 2014 at 11:12 PM, Stephen Frost  wrote:
>> Fabrízio, can you clarify the use-case for things like CREATE AGGREGATE
>> to have IF NOT EXISTS rather than OR REPLACE, or if there is a reason
>> why both should exist?  Complicating our CREATE options is not something
>> we really wish to do without good reason and we certainly don't want to
>> add something now that we'll wish to remove in another version or two.

> Well I have a scenario with many servers to deploy DDL scripts, and most of
> them we must run without transaction control because some tasks like CREATE
> INDEX CONCURRENTLY, DROP/CREATE DATABASE, CLUSTER, etc.

> When an error occurs the script stops, but the previous commands was
> commited, then we must review the script to comment parts that was already
> executed and then run it again. Until now is not a really trouble, but in
> some cases we must deploy another DDL script that contains a new version of
> some object before we finish to fix the previous version that was in
> production, and if we have CINE for all CREATE objects this task will more
> easy because we just run it again without care if will replace the content
> and do not produce an error.

Why wouldn't COR semantics answer that requirement just as well, if not
better?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] commit fest status and release timeline

2014-03-01 Thread Peter Eisentraut

Status Summary. Needs Review: 36, Waiting on Author: 7, Ready for
Committer: 16, Committed: 43, Returned with Feedback: 8, Rejected: 4.
Total: 114.

We're still on track to achieve about 50% committed patches, which would
be similar to the previous few commit fests.  So decent job so far.

Which brings us to important news.  The core team has agreed on a
release timeline:

- Mar 15 end commit fest
- Apr 15 feature freeze
- May 15 beta

This is similar to the last few years, so it shouldn't come as a shock
to anyone.

Let's use the remaining two weeks to give all patches in the commit fest
fair consideration and a decent review.  The time to reject or postpone
patches will inevitably come in the time between the end of the commit
fest and feature freeze.  Note that it is everyone's individual
responsibility to move their favorite patch forward.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gaussian distribution pgbench

2014-03-01 Thread Tom Lane

Alvaro Herrera  writes:
> Seems that in the review so far, Fabien has focused mainly in the
> mathematical properties of the new random number generation.  That seems
> perfectly fine, but no comment has been made about the chosen UI for the
> feature.  Per the few initial messages in the thread, in the patch as
> submitted you ask for a gaussian random number by using \setgaussian,
> and exponential via \setexp.  Is this the right UI?  Currently you get
> an evenly distributed number with \setrandom.  There is nothing that
> makes it obvious on \setgaussian by itself that it produces random
> numbers.  Perhaps we should simply add a new argument to \setrandom,
> instead of creating new commands for each distribution?  I would guess
> that, in the future, we're going to want other distributions as well.

+1 for an argument to \setrandom instead of separate commands.

> Not sure what it would look like; perhaps
> \setrandom foo 1 10 gaussian

FWIW, I think this style is sufficient; the others seem overcomplicated
for not much gain.  I'm not strongly attached to that position though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] trgm regex index peculiarity

2014-03-01 Thread Alexander Korotkov

On Mon, Feb 10, 2014 at 1:01 AM, Tom Lane  wrote:

> Alexander Korotkov  writes:
> > On Thu, Jan 16, 2014 at 3:34 AM, Tom Lane  wrote:
> >> I looked at this patch a bit.  It seems like this:
> >> +  *BLANK_COLOR_SIZE - How much blank character is more frequent
> than
> >> +  *   other character in average
> >> + #define BLANK_COLOR_SIZE  32
> >> is a drastic overestimate.
>
> > OK, I would like to make more reasoning for penalty.
> > Let's consider word of length n.
> > It contains n+1 trigrams including:
> > 1 __x trigram
> > 1 _xx trigram
> > 1 xx_ trigram
> > n - 2 xxx trigrams
>
> > Assume alphabet of size m those trigrams have following average
> frequencies:
> > __x 1/((n+1)*m)
> > _xx 1/((n+1)*m^2)
> > xx_ 1/((n+1)*m^2)
> > xxx (n-2)/((n+1)*m^3)
>
> Hmm, but you're assuming that the m letters are all equally frequent,
> which is surely not true in normal text.
>
> I didn't have a machine-readable copy of Hemingway or Tolstoy at hand,
> but I do have a text file with the collected works of H.P. Lovecraft,
> so I tried analyzing the trigrams in that.  I find that
> * The average word length is 4.78 characters.
> * There are 5652 distinct trigrams (discounting some containing digits),
> the most common one ('  t') occurring 81222 times; the average
> occurrence count is 500.0.
> * Considering only trigrams not containing any blanks, there are
> 4937 of them, the most common one ('the') occurring 45832 times,
> with an average count of 266.9.
> * There are (unsurprisingly) exactly 26 trigrams of the form '  x',
> with an average count of 19598.5.
> * There are 689 trigrams of the form ' xx' or 'xx ', the most common one
> (' th') occurring 58200 times, with an average count of 1450.1.
>
> So, we've rediscovered the fact that 'the' is the most common word in
> English text ;-).  But I think the significant conclusion for our purposes
> here is that single-space trigrams are on average about 1450.1/266.9 = 5.4
> times more common than the average space-free trigram, and two-space
> trigrams about 19598.5/266.9 = 73.4 times more common.
>
> So this leads me to the conclusion that we should be using a
> BLANK_COLOR_SIZE value around 5 or 6 (with maybe something other than
> a square-law rule for two-space trigrams).  Even using your numbers,
> it shouldn't be 32.
>

Next revision of patch is attached. Changes are so:
1) Notion "penalty" is used instead of "size".
2) We try to reduce total penalty to WISH_TRGM_PENALTY, but restriction is
MAX_TRGM_COUNT total trigrams count.
3) Penalties are assigned to particular color trigram classes. I.e.
separate penalties for __a, _aa, _a_, aa_. It's based on analysis of
trigram frequencies in Oscar Wilde writings. We can end up with different
numbers, but I don't think they will be dramatically different.

--
With best regards,
Alexander Korotkov.


trgm-regex-optimize.3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Securing "make check" (CVE-2014-0067)

2014-03-01 Thread Noah Misch

On Sat, Mar 01, 2014 at 12:48:08PM -0300, Alvaro Herrera wrote:
> I didn't check the patch in detail, but it seems to me that both the
> encode stuff as well as pgrand belong in src/common rather than
> src/port.

Since src/common exists only in 9.3 and up, that would mean putting them in
different libraries depending on the branch.  I did not think that worthwhile.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Securing "make check" (CVE-2014-0067)

2014-03-01 Thread Alvaro Herrera

I didn't check the patch in detail, but it seems to me that both the
encode stuff as well as pgrand belong in src/common rather than
src/port.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gaussian distribution pgbench

2014-03-01 Thread Alvaro Herrera

Seems that in the review so far, Fabien has focused mainly in the
mathematical properties of the new random number generation.  That seems
perfectly fine, but no comment has been made about the chosen UI for the
feature.  Per the few initial messages in the thread, in the patch as
submitted you ask for a gaussian random number by using \setgaussian,
and exponential via \setexp.  Is this the right UI?  Currently you get
an evenly distributed number with \setrandom.  There is nothing that
makes it obvious on \setgaussian by itself that it produces random
numbers.  Perhaps we should simply add a new argument to \setrandom,
instead of creating new commands for each distribution?  I would guess
that, in the future, we're going to want other distributions as well.

Not sure what it would look like; perhaps
\setrandom foo 1 10 gaussian
or 
\setrandom foo 1 10 dist=gaussian
or
\setrandom(gaussian) foo 1 10
or
\setrandom(dist=gaussian) foo 1 10

I think we could easily support

\set distrib gaussian
\setrandom(dist=:distrib) foo 1 10

so that it can be changed for a bunch of commands easily.

Or maybe I'm going overboard, everybody else is happy with \setgaussian,
and should just use that?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-03-01 Thread Stephen Frost

KaiGai,

* Kohei KaiGai (kai...@kaigai.gr.jp) wrote:
> BTW, this kind of discussion looks like a talk with a ghost because
> we cannot see the new interface according to the parallel execution
> right now, so we cannot have tangible investigation whether it becomes
> really serious backward incompatibility, or not.

Yeah, it would certainly be nice if we had all of the answers up-front.
What I keep hoping for is that someone who has been working on this area
(eg: Robert) would speak up...

> My bet is minor one. I cannot imagine plan-node interface that does
> not support existing non-parallel SeqScan or NestLoop and so on.

Sure you can- because once we change the interface, we're probably going
to go through and make everything use the new one rather than have to
special-case things.  That's more-or-less exactly my point here because
having an external hook like CustomScan would make that kind of
wholesale change more difficult.

That does *not* mean I'm against using GPUs and GPU optimizations.  What
it means is that I'd rather see that done in core, which would allow us
to simply change that interface along with the rest when doing wholesale
changes and not have to worry about backwards compatibility and breaking
other people's code.

Thanks,

Stephen

signature.asc
Description: Digital signature

[HACKERS] psql: show only failed queries

2014-03-01 Thread Pavel Stehule

Hello

I was asked, how can be showed only failed queries in psql.

I am thinking, so it is not possible now. But implementation is very simple

What do you think about it?

bash-4.1$ psql postgres -v ECHO=error -f data.sql
INSERT 0 1
Time: 27.735 ms
INSERT 0 1
Time: 8.303 ms
psql:data.sql:3: ERROR:  value too long for type character varying(2)
insert into foo values('bbb');
Time: 0.178 ms
INSERT 0 1
Time: 8.285 ms
psql:data.sql:5: ERROR:  value too long for type character varying(2)
insert into foo values('ss');
Time: 0.422 ms

Regards

Pavel
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index 3a820fa..8354f60 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -958,6 +958,12 @@ SendQuery(const char *query)
 		results = NULL;			/* PQclear(NULL) does nothing */
 	}
 
+	if (!OK && pset.echo == PSQL_ECHO_ERROR_QUERIES)
+	{
+		puts(query);
+		fflush(stdout);
+	}
+
 	/* If we made a temporary savepoint, possibly release/rollback */
 	if (on_error_rollback_savepoint)
 	{
diff --git a/src/bin/psql/settings.h b/src/bin/psql/settings.h
index 3e8328d..ed80179 100644
--- a/src/bin/psql/settings.h
+++ b/src/bin/psql/settings.h
@@ -30,6 +30,7 @@
 typedef enum
 {
 	PSQL_ECHO_NONE,
+	PSQL_ECHO_ERROR_QUERIES,
 	PSQL_ECHO_QUERIES,
 	PSQL_ECHO_ALL
 } PSQL_ECHO;
diff --git a/src/bin/psql/startup.c b/src/bin/psql/startup.c
index 1061992..666261f 100644
--- a/src/bin/psql/startup.c
+++ b/src/bin/psql/startup.c
@@ -712,6 +712,8 @@ echo_hook(const char *newval)
 {
 	if (newval == NULL)
 		pset.echo = PSQL_ECHO_NONE;
+	else if (strcmp(newval, "error") == 0)
+		pset.echo = PSQL_ECHO_ERROR_QUERIES;
 	else if (strcmp(newval, "queries") == 0)
 		pset.echo = PSQL_ECHO_QUERIES;
 	else if (strcmp(newval, "all") == 0)
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 1d69b95..641cff3 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -3397,6 +3397,23 @@ psql_completion(char *text, int start, int end)
 	{
 		matches = complete_from_variables(text, "", "");
 	}
+	else if (strcmp(prev2_wd, "\\set") == 0)
+	{
+		if (strcmp(prev_wd, "ECHO") == 0)
+		{
+			static const char *const my_list[] =
+			{"none", "error", "queries", "all", NULL};
+
+			COMPLETE_WITH_LIST_CS(my_list);
+		}
+		else if (strcmp(prev_wd, "ECHO_HIDDEN") == 0)
+		{
+			static const char *const my_list[] =
+			{"noexec", "off", "on", NULL};
+
+			COMPLETE_WITH_LIST_CS(my_list);
+		}
+	}
 	else if (strcmp(prev_wd, "\\sf") == 0 || strcmp(prev_wd, "\\sf+") == 0)
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_functions, NULL);
 	else if (strcmp(prev_wd, "\\cd") == 0 ||

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: new long psql parameter --on-error-stop

2014-03-01 Thread Pavel Stehule

Hello

here is a prototype:

bash-4.1$ /usr/local/pgsql/bin/psql --help-variables
List of some variables (options) for use from command line.
Complete list you find in psql section in the PostgreSQL documentation.

psql variables:
Usage:
  psql --set=NAME=VALUE
  or \set NAME VALUE in interactive mode

  AUTOCOMMIT when is on, successful SQL command is automatically
commited
  COMP_KEYWORD_CASE  determines which letter case to use when completing an
SQL key word
  ECHO   all lines from input can be written to standard output
  ECHO_HIDDENdisplay queries for internal commands (same as -E
option)
  FETCH_COUNThow many rows should be for one page (default 0
unlimited)
  HISTFILE   file name that be used for store history list
  HISTSIZE   the number of commands to store in the command history
  ON_ERROR_ROLLBACK  when is on, raise ROLLBACK on error automatically
  ON_ERROR_STOP  when is set, then batch execution stop immediately
after error
  VERBOSITY  control verbosity of error reports [default, verbose,
terse]

Printing options:
Usage:
  psql --pset=NAME[=VALUE]
  or \pset NAME [VALUE] in interactive mode

  border number of border style
  fieldsep   specify field separator for unaligned output
  fieldsep_zero  field separator in unaligned mode will be zero
  format set output format [unaligned, aligned, wrapped, html,
latex, ..]
  linestyle  sets the border line drawing style [ascii, old-ascii,
unicode]
  null   sets the string to be printed in place of a null value
  pager  when the pager option is off, the pager program is not
used
  recordsep  specifies the record (line) separator to use in
unaligned output format
  recordsep_zero record separator be in unaligned output format a zero
byte
  title  sets the table title for any subsequently printed
tables
  tuples_onlyin tuples-only mode, only actual table data is shown

Environment options:
Usage:
  NAME=VALUE, [NAME=VALUE] psql ...
  or \setenv NAME [VALUE] in interactive mode

  COLUMNSnumber of columns for wrapped format
  PAGER  used pager
  PGHOST same as the host connection parameter
  PGDATABASE same as the dbname connection parameter
  PGUSER same as the user connection parameter
  PGPASSWORD possibility to set password
  PSQL_EDITOR, EDITOR, VISUAL  editor used by \e \ef commands
  PSQL_EDITOR_LINE_NUMBER_ARG  style how to line number is used in editor
  PSQL_HISTORY   alternative location for the command history file
  PSQL_RCalternative location of the user's .psqlrc file
  SHELL  command executed by the \! command
  TMPDIR directory for storing temporary files

For more information consult the psql section in the PostgreSQL
documentation.

Regards

Pavel



2014-02-28 23:01 GMT+01:00 Andrew Dunstan :

>
> On 02/28/2014 04:38 PM, Tom Lane wrote:
>
>> Andrew Dunstan  writes:
>>
>>> Well, then we just have to add more info to --help

>>> +1 for at least doing that. I found it annoying just the other day not
>>> to find it in plsql's --help output, in a moment of brain fade when I
>>> forgot how to spell it. So it's not just beginners who can benefit, it's
>>> people like me whose memory occasionally goes awry.
>>>
>> No objection in principle, but what are we talking about exactly?
>> Adding some new backslash command that lists all the variables that have
>> special meanings?
>>
>
>
> That's a pretty good idea, especially if we give that command a command
> line option too, so something like
>
>psql --special-variables
>
> would run that command and exit.
>
> Maybe I'm over-egging the pudding a bit ;-)
>
> cheers
>
> andrew
>
commit 82a6a61a11bdb76c00481abeccff42b4b532762e
Author: Pavel Stehule 
Date:   Sat Mar 1 09:34:47 2014 +0100

initial

diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c
index baa9417..fb77132 100644
--- a/src/bin/psql/help.c
+++ b/src/bin/psql/help.c
@@ -78,12 +78,13 @@ usage(void)
 	printf(_("  -f, --file=FILENAME  execute commands from file, then exit\n"));
 	printf(_("  -l, --list   list available databases, then exit\n"));
 	printf(_("  -v, --set=, --variable=NAME=VALUE\n"
-			 "   set psql variable NAME to VALUE\n"));
+			 "   set psql variable NAME to VALUE e.g.: -v ON_ERROR_STOP=1\n"));
 	printf(_("  -V, --versionoutput version information, then exit\n"));
 	printf(_("  -X, --no-psqlrc  do not read startup file (~/.psqlrc)\n"));
 	printf(_("  -1 (\"one\"), --single-transaction\n"
 			 "   execute as a single transaction (if non-interactive)\n"));
 	printf(_("  -?, --help   show this help, then exit\n"));
+	printf(_("  --help-variables list of available configuration variables (options), then exit\n"));
 
 	pri

Re: [HACKERS] Patch: show relation and tuple infos of a lock to acquire

2014-03-01 Thread Amit Kapila

On Thu, Feb 27, 2014 at 4:14 PM, Christian Kruse
 wrote:
> On 25/02/14 16:11, Robert Haas wrote:
>> Reading this over, I'm not sure I understand why this is a CONTEXT at
>> all and not just a DETAIL for the particular error message that it's
>> supposed to be decorating.  Generally CONTEXT should be used for
>> information that will be relevant to all errors in a given code path,
>> and DETAIL for extra information specific to a particular error.
>
> Because there is more than one scenario where this context is useful,
> not just log_lock_wait messages.
>
>> If we're going to stick with CONTEXT, we could rephrase it like this:
>>
>> CONTEXT: while attempting to lock tuple (1,2) in relation with OID 3456
>>
>> or when the relation name is known:
>>
>> CONTEXT: while attempting to lock tuple (1,2) in relation "public"."foo"
>
> Accepted. Patch attached.

With new patch, the message while updating locked rows will be displayed
as below:

LOG:  process 364 still waiting for ShareLock on transaction 678 after
1014.000ms
CONTEXT:  while attempting to lock tuple (0,2) with values (2) in relation "publ
ic"."t1" of database postgres

LOG:  process 364 acquired ShareLock on transaction 678 after 60036.000 ms
CONTEXT:  while attempting to lock tuple (0,2) with values (2) in relation "publ
ic"."t1" of database postgres

Now I am not sure, if the second message is an improvement, as what it sounds
is "while attempting to lock tuple, it got shared lock on
transaction'. If you, Robert
or other feels it is okay, then we can retain it as it is in patch
else I think either
we need to rephrase it or may be try with some other way (global variable) such
that it appears only for required case. I feel the way Robert has
suggested i.e to
make it as Detail of particular message (we might need to use global variable to
pass certain info) is better way and will have minimal impact on the cases where
this additional information needs to be displayed.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

47 matches

Mail list logo