Re: [HACKERS] elegant and effective way for running jobs inside a database

2012-03-05 Thread David Johnston
 
  Keep in mind that it's not about coding in C but mostly about figuring
  out what a sane design out to look like.
 
 

While I can straddle the fence pretty my first reaction is that we are talking 
about application functionality that falls outside what belongs in core 
PostgreSQL.  I'd rather see pgAgent be improved and act as a basic 
implementation while, for more complex use-cases, letting the 
community/marketplace provide solutions.

Even with simple use-cases you end up having a separate process continually 
running anyway.  The main benefit to linking with core would be the ability to 
startup that process after the server starts and shutdown the process before 
the server shutdown.  That communication channel is something to consider 
outside this specific application and, if done, could be used to talk with 
whatever designated pgAgent-like application the user chooses.  Other 
applications could also be communicated with in this way.  Basically some form 
of API where in the postgres.conf file you specify which IP addresses and ports 
you wish to synchronize and which executable to launch just prior to 
communicating on said port.  If the startup routine succeeds that Postgres 
will, within reason, attempt to communicate and wait for these external process 
to finish before shutting down.  If the external application closes it should 
proactively notify Postgres that it is doing so AND if you startup a program 
manually it can look for and talk with a running Postgres instance.

David J.


 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parser - Query Analyser

2012-11-17 Thread David Johnston
On Nov 17, 2012, at 9:18, Michael Giannakopoulos miccagi...@gmail.com wrote:

 Hello guys,
 
 My name is Michail Giannakopoulos and I am a graduate student at University 
 of Toronto. I have no previous experience in developing a system like 
 postgreSQL before.
 
 What I am trying to explore is if it is possible to extend postgreSQL in 
 order to accept queries of the form:
 
 Select function(att1, att2, att3) AS output(out1, out2, ..., outk) FROM 
 [database_name];
 

Anything is possible but what you are trying to do makes little sense generally 
and would take a tremendous amount of work to be done in PostgreSQL.  The two 
main limitations are that you are creating a entirely new query language format 
and that the name of the database is constant and determined at the time of 
connection to the database.

From a practical perspective I do not believe it (as written exactly above) 
can done without breaking existing functionality and/or introducing 
ambiguities.

As I am not a PostgreSQL developer myself I cannot be of much more help but 
ISTM that providing more why and less what would get you better advice.  As to 
learning how to contribute to the project I will let others point you to the 
existing resources that are out there.  It would, however, probably help to 
explain what skills and background you already posses.

David J.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Materialized views WIP patch

2012-11-27 Thread David Johnston
On Nov 27, 2012, at 5:25, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:

 
 So my proposal for the current feature would be:
 
  ALTER MATERIALIZED VIEW mv UPDATE [ CONCURRENTLY ];
  UPDATE MATERIALIZED VIEW mv;
 
 The choice of keywords and syntax here hopefully clearly hint the user
 about the locking behavior of the commands, too. And as we said, the
 bare minimum for this patch does *not* include the CONCURRENTLY option,
 which we still all want to have (someday). :)
 

I dislike using ALTER syntax to perform a data-only action.

The other advantage of non-functional syntax is that you could more easily 
supply some form of where clause should you only want to perform a partial 
refresh.  With a function call that becomes more obtuse.

David J.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Materialized views WIP patch

2012-11-27 Thread David Johnston
 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-
 ow...@postgresql.org] On Behalf Of Dimitri Fontaine
 Sent: Tuesday, November 27, 2012 10:03 AM
 To: Kevin Grittner
 Cc: Pavel Stehule; Peter Eisentraut; Pgsql Hackers
 Subject: Re: [HACKERS] Materialized views WIP patch
 
 Kevin Grittner kgri...@mail.com writes:
  changing the structure of the table. Somehow I don't find that
  pursuasive as an argument for what ALTER MATERIALIZED VIEW should
  rescan the source relations and build a whole new set of data for
  exactly the same MV definition.
 
 Fair enough.
 
  Consider that in relational theory a table is considered a relation
  variable. ALTER is supposed to change the definition of the variable
  in some way. Other statements are used to change the value contained
  in the variable. Sure there are some grey areas already, but I don't
  see where we need to muddy the waters in this case.
 
 Under that light, using ALTER is strange indeed. I still don't like using
LOAD
 that much, allow me to try a last syntax proposal. Well all I can find
just now
 would be:
 
   UPDATE MATERIALIZED VIEW mv FOR EACH ROW;
   UPDATE MATERIALIZED VIEW mv FOR EACH STATEMENT [ CONCURRENTLY ];
 
 The only value of such a proposal is that it's not LOAD and it's still not
 introducing any new keyword. Oh it's also avoiding to overload the
 SNAPSHOT keyword. Well, it still does not look like the best candidate.
 
 Regards,

Just a thought but how about something like:

DO REFRESH OF MATERIALIZED VIEW mat_view;

In effect we begin overloading the meaning of DO to not only mean
anonymous code blocks but to also call pre-defined internal routines that
can be executed without having to use function-call syntax.  MATERIALIZED
VIEW can be more generic e.g., TABLE if the need arises, the REFRESH
Action is generic, and additional clauses can be added after the object
name (FOR, CONCURRENTLY, WHERE, etc...)

David J.










-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] enhanced error fields

2012-12-10 Thread David Johnston
 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-
 ow...@postgresql.org] On Behalf Of Peter Geoghegan
 Sent: Monday, December 10, 2012 3:29 PM
 To: Pavel Stehule
 Cc: PostgreSQL Hackers; Alvaro Herrera; Tom Lane
 Subject: Re: [HACKERS] enhanced error fields
 
 
 Now, there are one or two places where these fields are not actually
 available even though they're formally required according to a literal
reading
 of the above. This is only because there is clearly no such field sensibly
 available, even in principle - to my mind this cannot be a problem,
because
 the application developer cannot have any reasonable expectation of a
field
 being set. I'm really talking about two cases in particular:
 
 * For ERRCODE_NOT_NULL_VIOLATION, we don't actually provide
 schema_name and table_name in the event of domains. This was previously
 identified as an issue. If it is judged better to not have any
requirements
 there at all, so be it.
 
 * For the validateDomainConstraint() ERRCODE_CHECK_VIOLATION ereport
 call, we may not provide a constraint name iff a Constraint.connname is
 NULL. Since there isn't a constraint name to give even in principle, and
this is
 an isolated case, this seems reasonable.
 

Just skimming this topic but if these enhanced error fields are going to be
used by software, and we have 99% adherence to a standard, then my first
reaction is why not just supply Not Applicable (or Not Available as
appropriate) instead of suppressing the field altogether in these (and
possibly other, future) cases and make adherence for these fields 100%?

From an ease-of-use aspect for the API if I can simply always query each
of those fields and know I will be receiving a string it does at least seem
theoretically easier to interface with.  If I am expecting special string
values (enclosed in symbols making them invalid identifiers) I can then
handle those as desired without either receiving an error or a NULL when I
go to poll the missing field if those couple of instances.

I may be paranoid or mistaken regarding how this work but figured I'd at
least throw it out for consideration.

David J.






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of Row Level Security

2012-12-19 Thread David Johnston
  The more secure behavior is to allow entry of data which will not be
  visible by the person doing the entry.
 
 I don't think it is that simple. Allowing inserts without regard for row
level
 restrictions makes it far easier to probe for data. E.g. by inserting rows
and
 checking for unique violations.
 

So the PK column(s) are not as secure as, say, the address-related column.
Vice-versa I may know that someone lives at a given address (because my
attempt to place someone else there failed) but I would have no way of
knowing who that other person is.  My recourse would be to escalate the
data-entry request to someone with higher security permissions who could
read and write to the appropriate tables and resolve the conflict.  In both
cases the direct write-only situation necessitates that some level of
exposure occurs.  The work-around if that is unacceptable would be to accept
all data but any entries that cannot be directly inserted into the table
would remain in a staging area that someone with higher security would have
to monitor and clear as needed.  The same intervention is required but in
the first situation you can at least avoid coding the special logic and
instead trade security for ease-of-use.

As a default level of security we could throw a generic secure DLL rejected
for ROW(...) and not tell the user anything about the cause.  If that
person knows all unique indexes and constraints defined on the table they
could use trial-and-error to discover information about stored records but
even then if they get an error on two different columns they still have no
way of knowing if those errors belong to the same record.

Beyond that level you provide the user with some information as to the cause
so that they have a reasonable chance to catch typos and other mistakes
instead of escalating an benign issue.

Lastly is the custom solution whereby the developers accept ALL data entered
as being correct but saved to a staging table.  A review process by someone
with higher security clearances would then process and clear out that table
as necessary.  If the user is write-only then regardless of whether the
entry succeeded or failed they are considered to be done with their task
at that point and no meaningful results from the system can be supplied to
them.

None of these options disallows the presence of non-security related check
constraints to be checked, enforced, and communicated to the user.

I've probably lost sight of the bigger picture as my response to mostly
informed by these last couple of messages.

David J.

 Greetings,
 
 Andres Freund
 




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/PgSQL STRICT

2012-12-21 Thread David Johnston
 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-
 ow...@postgresql.org] On Behalf Of Marko Tiikkaja
 Sent: Friday, December 21, 2012 10:53 AM
 To: Tom Lane
 Cc: PostgreSQL-development
 Subject: Re: [HACKERS] PL/PgSQL STRICT
 
 On 12/21/12 4:49 PM, I wrote:
  On 12/21/12 4:39 PM, Tom Lane wrote:
  What is the use-case for this?
 
  Currently, the way to do this would be something like:
 
 I realize I didn't really answer the question.
 
 The use case is when you're UPDATEing or DELETEing a row and you want to
 quickly assert that there should be exactly one row.  For example, if
you've
 previously locked a row with SELECT .. FOR UPDATE, and now you want to
 UPDATE or DELETE it, it better be there (or you have a bug somewhere).

There had better be exactly one row - but who cares whether that is the row
we were actually expecting to delete/update...

I've recently had the experience of missing a WHERE pk = ... clause in an
UPDATE statement inside a function so I do see the value in having an easy
to implement safety idiom along these lines.

Along the lines of EXPLAIN (options) CMD would something like
UPDATE|DELETE (STRICT) identifier work?

David J.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] count(*) of zero rows returns 1

2013-01-14 Thread David Johnston
Tom Lane-2 wrote
 Gurjeet Singh lt;

 singh.gurjeet@

 gt; writes:
 Can somebody explain why a standalone count(*) returns 1?
 postgres=# select count(*);
  count
 ---
  1
 (1 row)
 
 The Oracle equivalent of that would be SELECT count(*) FROM dual.
 Does it make more sense to you thought of that way?
 
 I agree it's an odd thing for someone to query, but I feel it should
 return
 0, and not 1.
 
 For that to return zero, it would also be necessary for SELECT 2+2
 to return zero rows.  Which would be consistent with some views of the
 universe, but not particularly useful.  Another counterexample is
 
 regression=# select sum(42);
  sum 
 -
   42
 (1 row)
 
 which by your argument would need to return NULL, since that would be
 SUM's result over zero rows.

Given that:

SELECT *;

Results in: 

SQL Error: ERROR:  SELECT * with no tables specified is not valid

then an aggregate over an error should not magically cause the error to go
away.

I am curious on some points:

Is there something in the standard that makes SELECT count(*) valid?
What does SELECT * FROM dual in Oracle yield?
Is there a meaningful use case for SELECT sum(42), or more specifically
any aggregate query where there are no table/value inputs?  

I get the SELECT 2+2 and its ilk as there needs to be some way to evaluate
constants.

I get that the horse has already left the barn on this one but neither 0
nor 1 seem particularly sound answers to the question SELECT count(*).

David J.





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/count-of-zero-rows-returns-1-tp5739973p5740160.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Covering Indexes

2012-07-17 Thread David Johnston
 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-
 ow...@postgresql.org] On Behalf Of David E. Wheeler
 Sent: Tuesday, July 17, 2012 11:55 AM
 To: Simon Riggs
 Cc: Pg Hackers
 Subject: Re: [HACKERS] Covering Indexes
 
 On Jul 17, 2012, at 5:32 PM, Simon Riggs wrote:
 
   CREATE INDEX ON foo (a, b, c, d);
 
  allows
 
   SELECT c, d FROM foo WHERE a = ? AND b = ?
 
  to use an index only scan.
 
  The phrase unindexed seems misleading since the data is clearly in
  the index from the description on the URL you gave. And since the
  index is non-unique, I don't see any gap between Postgres and
  SQLliite4.
 
 Yeah, but that index is unnecessarily big if one will never use c or d in
the
 search. The nice thing about covering indexes as described for SQLite 4
and
 implemented in MSSQL is that you can specify additional columns that just
 come along for the ride, but are not part of the indexed data:
 
 CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);
 
 Yes, you can do that by also indexing c and d as of 9.2, but it might be
nice to
 be able to include them in the index as additional row data without
actually
 indexing them.
 
 Best,
 
 David

Concretely, I would presume that the contents of a covering index could then
look like the following (a,b,c,d):

(2,1,2,A)
(2,1,5,A) -- the 5 is out of natural order but exists in the covering
part
(2,1,3,A)

Whereas PostgreSQL would be forced to have the index ordered as such:

(2,1,2,A)
(2,1,3,A)
(2,1,5,A)

Either way the data in c and d are IN THE INDEX otherwise in neither
case could the data values be returned while strictly querying the index.

So the question that needs to be asked is what kind of performance increase
can be had during DML (insert/update) statements and whether those gains are
worth pursuing.  Since these other engines appear to allow both cases you
should be able to get at least a partial idea of the performance gains
between index (a,b,c,d) and index (a,b) covering (c,d).

Vik's concurrent point regarding non-indexable values makes some sense but
the use case there seems specialized as I suspect that in the general case
values that are non-indexable (if there truly are any) are generally those
that would be too large to warrant sticking into an index in the first
place.  But, XML values do ring true in my mind (particularly frequently
used fragments that are generally quite small).  But again whether that is a
reasonable use case for a covering index I do not know.  It feels like
trying to solve the remaining 10% when it took a long while to even muster
up enough support and resources to solve the 90%.

David J.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange evaluation Window function and SRF functions?

2012-07-30 Thread David Johnston
On Jul 30, 2012, at 12:33, Thom Brown t...@linux.com wrote:

 On 30 July 2012 17:19, Pavel Stehule pavel.steh...@gmail.com wrote:
 Hello
 
 I seen nice trick based on window function 
 http://stackoverflow.com/questions/11700930/how-can-i-trim-a-text-array-in-postgresql
 
 but isn't it example of wrong evaluation? Result of row_number is not correct
 
 Looks right to me.  I guess the way to get the row_number they're after out 
 of the result set would involve changing OVER () to OVER (ORDER BY 
 unnest(myTextArrayColumn))
 

The better way would be to perform the unnest in a sub-select then attach the 
row number in the outer select.

David J.

Re: [HACKERS] temporal support patch

2012-08-20 Thread David Johnston
 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-
 ow...@postgresql.org] On Behalf Of Robert Haas
 Sent: Monday, August 20, 2012 5:04 PM
 To: Jeff Davis
 Cc: Vlad Arkhipov; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] temporal support patch
 
 On Sun, Aug 19, 2012 at 6:28 PM, Jeff Davis pg...@j-davis.com wrote:
  The other issue is how to handle multiple changes of the same record
  within the transaction. Should they be stored or not?
 
  In a typical audit log, I don't see any reason to. The internals of a
  transaction should be implementation details; invisible to the
  outside, right?
 
 I'm not convinced.

Ideally the decision of whether to do so could be a client decision.  Not
storing intra-transaction changes is easier than storing all changes.  At
worse you could stage up all changed then simply fail to store all
intermediate results within a given relation.  It that case you gain nothing
in execution performance but safe both storage and interpretative resources.
So the question becomes is it worth doing without the ability to store
intermediate results?  If you were to ponder both which setup would the
default be?  If the default is the harder one (all statements) to implement
then to avoid upgrade issues the syntax should specify that it is logging
transactions only.

Random, somewhat related, thought:  I do all my working on a temporary
staging table and then, as my final action, insert the resultant records
onto a separate live table and drop the temporary table.  Further changes to
said record I perform by deleting the original then inserting a new record
(from staging again) with all the values changed.  Obviously this has
limitations with respect to foreign keys and such but it is possible.  What
happens to the audit log if the PK changes and if it does not change?  Any
other implications that need to be address or is it like giving a loaded gun
to someone and trust them to use is responsibily?

 
  I'm not sure that the database user is the proper thing to be stored
  in the history table. Many applications usually connect to a database
  using some virtual user and have their own users/roles tables to
  handle with privileges. There should be some way to substitute the
  stored user in the history table with the application's one. It's
  also helpful to store transaction id that inserted/updated/deleted the
 record.
 
  If the system is recording it for audit purposes, then it better be
  sure that it's true. You can't allow the application to pick and
  choose what gets stored there.
 
 That position would render this feature useless for every application for
 which I would otherwise have used it.  I think it's just nonsense to talk
about
 what we can or can't let the user do.  The user is in charge, and our job
is to
 allow him to do what he wants to do more easily, not to dictate what he
must
 do.
 
 --

I see the user element as having two components:

Client - what device/channel/user was used to connect to the database -
PostgreSQL Role
User - relative to that client which actual user performed the action
- Application Specified

A PostgreSQL role would correspond to client whereas the application would
be allowed to have full control of what User value is stored.

This gets a little complicated with respect to SET ROLE but gets close to
the truth.  The idea is that you look at the client to determine the
namespace over which the user is defined and identified.

So, a better way to phrase the position is that:

You cannot allow the application to choose what is stored to identify
itself (client) - i.e., its credentials identify who it is and those are
stored without consulting the application

At that point you've basically shifted responsibility for the correctness of
the audit log onto that application and away from the database.  However,
you do provide a place for the application to store an identifier that it is
able to resolve to a user if necessary.

This is an arbitrary two-layer hierarchy and while conceptually anything
with two layers may want more I am not sure whether the extra complexity
that would entail would be worth the effort.  Depending on what kinds of
information you allowed to be stored for User it becomes something that
can be modeled when desired and ignored otherwise.

The issue with adding the PostgreSQL role to the database in this way is
that you now can never delete that role or reassign it to another entity.  I
guess with temporal you could do so and basically have the identity-role
relationship define over specific periods of time...  I can (have) imagine a
whole level of indirection and association to be able to reasonably handle
assigning and storing permanent identities while allowing logon credentials
to remain outside of permanent storage.

David J.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] temporal support patch

2012-08-25 Thread David Johnston
 -Original Message-
 From: Robert Haas [mailto:robertmh...@gmail.com]
 Sent: Saturday, August 25, 2012 12:46 PM
 To: David Johnston
 Cc: Jeff Davis; Vlad Arkhipov; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] temporal support patch
 
 On Mon, Aug 20, 2012 at 7:17 PM, David Johnston pol...@yahoo.com
 wrote:
  Ideally the decision of whether to do so could be a client decision.
  Not storing intra-transaction changes is easier than storing all
changes.
 
 Not really.  If you don't care about suppressing intra-transaction
changes, you
 can essentially just have a trigger that fires on every update and adds
 information to the side table.  If you do care about suppressing them, you
 have to do something more complicated.  Or so it seems to me.
 

My internals knowledge is basically zero but it would seem that If you
simply wanted the end-of-transaction result you could just record nothing
during the transaction and then copy whatever values are present at commit
to whatever logging mechanism you need.  If you are recording
intra-transaction values you could do so to a temporary storage area and
then, at commit, decide whether the recent value for a given
relation/attribute is going to be retained in the final log or whether you
end up persisting all of the intermediate values as well.


  You cannot allow the application to choose what is stored to identify
  itself (client) - i.e., its credentials identify who it is and those
  are stored without consulting the application
 
 I don't think we can violate the general principle that the database
super-
 user or table owner can do whatever they want.  If one of those folks
wants
 to falsify their history, are we really going to tell them no?  To me
that has
 I'm sorry, Dave, I can't do that written all over it, and I think we'll
get about
 the same reaction that Hal did.
 Now, if user A is inserting into user B's table, and is not the
super-user, then,
 of course, we can and should ensure that no falsification is possible.
 

With respect to the physical log file there is no way for the super-user to
currently falsify (at time of statement execution) the user/role that they
are using.  Even a SET ROLE doesn't change the session user (I forget the
exact mechanics but I pretty sure on the general point).  I do not see how
this is that much different.

I agree that it is pointless to even try to maintain true in-database
auditing in the presence of god-like super-users so most of what I envision
relates to limited permissioned users that are forced to rely upon the
standard mechanisms provided by the database.  As a matter of principle
those wanting a secure and auditable environment should not be using
ownership level roles.

Since these temporal/audit tables are intended to be maintained by the
system if you do not ask the users to identify themselves but instead take
the information directly from the environment, you never have to give a I'm
sorry Dave response because Dave is never given the chance to submit a
proposed value.

David J.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?

2012-09-19 Thread David Johnston
 -Original Message-
 There really needs to be some way to specify that when an expression is
 evaluated for each row in a set, a function used within that expression is
not
 optimized away for some rows.  Fortunately we have a way:
 
 http://www.postgresql.org/docs/9.2/interactive/sql-createfunction.html
 
 | VOLATILE indicates that the function value can change even within a
 | single table scan, so no optimizations can be made. Relatively few
 | database functions are volatile in this sense; some examples are
 | random(), [...]
 
 The behavior in the OP's query would certainly be sane if the function
were
 not VOLATILE; as it is, I have a hard time seeing this as anything but a
bug.

What are the arguments against adding a 4th identifier - call it PER_ROW for
this argument?  The main reason VOLATILE is broken is that it is the default
and in order to minimize beginner's penalty it is not treated as such in
some situations.  The new one could behave just like VOLATILE but would
never be optimized away and would always evaluate once for each row in its
context.  

Then the question is whether you write a new random() function or break
backwards compatibility and alter the existing version.

David J.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?

2012-09-19 Thread David Johnston
 -Original Message-
 
  | VOLATILE indicates that the function value can change even within a
  | single table scan, so no optimizations can be made.
  | Relatively few database functions are volatile in this sense; some
  | examples are random(), [...]
 
  What are the arguments against adding a 4th identifier - call it
  PER_ROW for this argument?  The main reason VOLATILE is broken is that
  it is the default and in order to minimize beginner's penalty it is
  not treated as such in some situations.  The new one could behave just
  like VOLATILE but would never be optimized away and would always
  evaluate once for each row in its context.
 
 So how would you document that?  It sounds like the proposed level would
 behave exactly as the VOLATILE level is currently documented to behave; so
I
 guess we could shift the documentation of VOLATILE to PER_ROW (or
 whatever).  How would you then describe the behavior of VOLATILE?
 

I'm not sure but however we would describe it we might as well make the
change now regardless of whether another level is added.

The main distinguishing characteristic is that VOLATILE is not guaranteed to
evaluate once-per-row if it is not dependent upon particular values within a
given row.  

VOLATILE: A Volatile function used in an ORDER BY or WHERE clause without
referencing any columns from the query itself (i.e., no parameters or all
constants) will be evaluated a single time and the result treated as a
constant (i.e., all rows will have identical values) for that part of the
query.

PER_ROW: A per_row function will be evaluated once for every row that is
visible to the function and will be treated as a virtual column of said
relation with each cell having an its own value as a result of the
function call.

Using random() as an example of the two possible behaviors should further
clarify the differences quite nicely.

Quick pass - hopefully, a) this inspires someone else, and b) this is the
correct understanding in the first place.

David J.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?

2012-09-19 Thread David Johnston
 -Original Message-
 From: Kevin Grittner [mailto:kevin.gritt...@wicourts.gov]
 Sent: Wednesday, September 19, 2012 5:51 PM
 To: k...@rice.edu; David Johnston
 Cc: 'Florian Schoppmann'; 'Robert Haas'; pgsql-hackers@postgresql.org;
'Tom
 Lane'
 Subject: RE: [HACKERS] Invalid optimization of VOLATILE function in WHERE
 clause?
 
 David Johnston pol...@yahoo.com wrote:
 
  VOLATILE: A Volatile function used in an ORDER BY or WHERE clause
  without referencing any columns from the query itself (i.e., no
  parameters or all constants) will be evaluated a single time and the
  result treated as a constant (i.e., all rows will have identical
  values) for that part of the query.
 
 I hope you're wrong about the ORDER BY part of that.  A quick test
confirms
 that it works in ORDER BY, at least for some cases.  If there are any
 exceptions to that, I would sure like to know about it -- and really soon.
 
 select * from generate_series(1, 1) s(n)
   order by random() limit 10;
 
 -Kevin

I'd rather have someone who knows the code assert one way or the other; I
tossed it in there because I thought I've seen people complain that random()
doesn't work as expected with ORDER BY but that may just be faulty memory.
It may or may not depend on whether LIMIT/OFFSET are involved...?  Used in
the SELECT-list it gets evaluated for each row and I guess the ORDER BY
could have that behavior as well (I would expect it to anyway), so is it
strictly limited to WHERE clause evaluation that this discrepancy manifests?

David J.






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Oid registry

2012-09-27 Thread David Johnston
 
  I did like the alternative idea upthread of UUIDs for types which
  would give them a virtually unlimited space.
 
 Yeah, me too.  That doesn't require a centralized authority (hence, no
 debates here about whether a given extension is important enough to merit
 an allocation of a given size), doesn't move us further in the direction
of
 exposing the database's internal identifiers as a concept that users have
to
 care about, ad provides essentially infinite address space.  There's more
 engineering work involved but sometimes more engineering work means a
 better result.
 

Random thought from the sideline...

GIT is able to provide assurances as to content because it creates a hash.
Now, for a function PostgreSQL could hash the catalog entry (including
function body) and return than as proof that said function is the same as
one installed in some other database or published publically.  I do not know
enough about the other objects to know if something similar is possible but
maybe this will spark someone else's thoughts.

David J.






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Raise a WARNING if a REVOKE affects nothing?

2012-10-02 Thread David Johnston
 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-
 ow...@postgresql.org] On Behalf Of Noah Misch
 Sent: Tuesday, October 02, 2012 3:02 PM
 To: Craig Ringer
 Cc: PostgreSQL Hackers
 Subject: Re: [HACKERS] Raise a WARNING if a REVOKE affects nothing?
 
 On Tue, Aug 21, 2012 at 02:31:29PM +0800, Craig Ringer wrote:
  It'd really help if REVOKE consistently raised warnings when it didn't
  actually revoke anything.
 
 +1
 
 This will invite the same mixed feelings as the CREATE x IF NOT EXISTS
 notices, but I think it's worthwhile.
 
  Even better, a special case for REVOKEs on objects that only have
  owner and public permissions could say:
 
  WARNING: REVOKE didn't remove any permissions for user blah. This
  table/db/whatever has default permissions, so there were no GRANTs
  for user blah to revoke. See the documentation for REVOKE for more
  information.
 
 The extra aid from saying those particular things is not clear to me.
 
 It might be overkill, but we could report any other roles indirectly
conveying
 access to the named role.
 

Having been bitten by this myself I do see the value in such a warning.  It
is not uncommon for someone using REVOKE to believe they are installing a
block instead of removing an allowance; especially as it interacts with
default permissions.

That said, and this is an off-the-cuff thought, the entire UI for
permissions, and its treatment in the documentation, seems to be fact
oriented.  The system is well documented but actually getting up to speed to
learn and use it is still a matter of reading the documentation and figuring
out how everything fits together.  I haven't given it that much thought but
I am curious if others are of the same opinion.

IOW, this proposal is an attempt to fix a symptom without addressing the
root cause.

Food for thought.

David J.





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Deprecating RULES

2012-10-11 Thread David Johnston
 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-
 ow...@postgresql.org] On Behalf Of Andrew Dunstan
 Sent: Thursday, October 11, 2012 8:52 PM
 To: Daniel Farina
 Cc: Joshua D. Drake; Josh Berkus; Simon Riggs;
pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Deprecating RULES
 
 
 On 10/11/2012 08:20 PM, Daniel Farina wrote:
  On Thu, Oct 11, 2012 at 5:07 PM, Joshua D. Drake
 j...@commandprompt.com wrote:
  On 10/11/2012 03:59 PM, Josh Berkus wrote:
 
  I'm also not real keen on the idea that someone could dump a 9.2
  database and be unable to load it into 9.3 because of the DDL
  trigger, especially if they might not encounter it until halfway
  through a restore.  That seems rather user-hostile to me.
 
  Also, how would you picture that working with pg_upgrade?
 
  RULEs are a major feature we've had for over a decade.
 
  That nobody in the right mind would use in production for YEARS. That
  said there is a very real problem here. For a very, very long time
  the recommended way (wrong way in fact) to do partitioning was based
 on rules.
  Now, those in the know immediately said, WTF but I bet you that a
  lot of people that we don't know about are using rules for
partitioning.
 
  We definitely need a warning period that this is going away. That
  said, I don't know that we need a whole release cycle. If we start
  announcing now (or before the new year) that in 9.3 we will not have
  rules, that gives people 9-10 months to deal with the issue and that
  is assuming that we are dealing with early adopters, which we aren't
  because early adopters are not going to be using rules.
  My experience suggests that only ample annoyance for at least one full
  release cycle will provide a low-impact switch.  This annoyance must
  not be able to be turned off.
 
 
 
 Spot on. All our experience is that just announcing things, especially in
places
 other than release notes and similar, is ineffective as a way of
communicating
 with our user base.
 
 I'm with Tom and Josh and Daniel on this, and to be honest I'm somewhat
 surprised at the willingness of some people to spring surprises on users.
I still
 come across uses of rules in the wild, and not just for partitioning
either.
 Personally I think if we start now the earliest we should even consider
 removing the support is 9.4.
 
 cheers
 
 andrew

Deprecation means that existing code will no longer work without
refactoring.  If CREATE RULE was a security hazard or unstable that may
justify such an action but simply because using it properly (or at least
safely) is difficult doesn't mean that those who have managed should be
punished for their expertise.

Late night rambling here but the risk mitigation that we seem to be caring
about is new users searching for and using algorithms that they find on
the web without understanding the intricacies of how those algorithms work.
Do we really want to build something into the database to deal with this (by
disallowing it outright) or do we do our best to provide authoritative and
useful documentation so that when users go looking for the CREATE RULE
command in our documentation they are provided with reasoning and
alternatives to its use?

RULEs may be difficult but maybe there are some rare use-cases where they
would be appropriate.  No one here is all-knowing and just maybe someone in
the future will have an idea and decide to further improve them or at the
least recognize a situation where the current implementation is useful.

So, what actual harms are there to using CREATE RULE and are there less
invasive means, via a more nuanced restriction implementation of CREATE RULE
or simply via documentation, to mitigate those harms?  Maybe there would not
be enough benefits to CREATE RULE at this point in time to consider
implementing in from scratch but given that it already exists it should be
worth some effort to keep it functioning even if only for
forward-compatibility reasons.  And regardless, the whole what do you use
instead of CREATE RULE documentation needs to be created no matter the
eventual decision to fully remove the feature from the system.

David J.





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential autovacuum optimization: new tables

2012-10-12 Thread David Johnston
On Oct 12, 2012, at 22:13, Stephen Frost sfr...@snowman.net wrote:

 * Tom Lane (t...@sss.pgh.pa.us) wrote:
 [ shrug... ]  You're attacking a straw man, or more precisely putting
 words into my mouth about what the percentage-based thresholds might be.
 Notice the examples I gave involved update percentages quite far north
 of 100%.  It's possible and maybe likely that we need a sliding scale.
 
 I was just discussing such a sliding scale approach w/ Josh on IRC, my
 thinking was that we could use a logarithmic approach based on table
 size.
 
 Also, I don't necessarily accept the conclusion you seem to be drawing,
 that it's okay to have complete turnover of a small table and not redo
 its stats.  If you don't like the current behavior when there's no
 stats, why would you like the behavior when there are some stats but
 they no longer have the remotest relationship to reality?
 
 Josh's concern is about autovacuum causing lots of stats churn, which is
 understandable, we don't want it constantly rescanning a table, but
 perhaps we could use some kind of threshold for preventing autovac from
 rescanning a table it just scanned?  Note that I did *not* say 'GUC',
 but I don't know what the 'right' answer is for how frequently is
 good-but-not-too-frequent.  I'd also like to try and avoid adding GUCs.
 
 

Instead of global could you attach an interface function to the table and have 
the auto-analyzer call that function to basically ask the table whether it 
needs to be analyzed?  Still need to deal with defaults and provide a decent 
supply of built-in algorithms but at least the system can be made tunable.  The 
default algorithm could maybe just handoff to a table size specific handler.  
The create table and alter table commands could be used to change the assigned 
algorithm if desired and new ones could be supplied via extensions.

The 1000 row default seems unusual at first glance and contributes to the 
problem described.

It is likely that the first I sent following the create table is going to be a 
bulk load if the table is going to have many rows.  In the case where rows are 
inserted individually it is likely that the expected row count will be closer 
to 1 than 1000.

One useful algorithm to provide the user is analyze on insert and, though maybe 
less so, analyze on update.  So that any insert/update causes the table to be 
re-analyzed.  Not a good default but, combined with delayed analyze logic to 
establish a minimum frequency, is a possible option for some use cases.

Temporary table creation should have special attention given if changes are 
going to be made here.

Another idea is to have system after [command] trigger(s) than can be used to 
call analyze without waiting for the auto-vacuum process.  Provide some way for 
CREATE/ALTER TABLE and maybe auto-vacuum to enable and disable the trigger.

David J.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] pg_ping utility

2012-10-15 Thread David Johnston
 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-
 ow...@postgresql.org] On Behalf Of Tom Lane
 Sent: Monday, October 15, 2012 7:13 PM
 To: Andres Freund
 Cc: pgsql-hackers@postgresql.org; Thom Brown; Phil Sorber
 Subject: Re: [HACKERS] [WIP] pg_ping utility
 
 Andres Freund and...@2ndquadrant.com writes:
  Why not add a pg_ctl subcommand for that? For me that sounds like a
  good place for it...
 
 I think that's a bad fit, because every other pg_ctl subcommand requires
 access to the data directory.  It would be very confusing if this one
 subcommand worked remotely when the others didn't.
 
 There was also some discussion of wedging it into psql, which would at
least
 have the advantage that it'd typically be installed on the right side of
the
 client/server divide.  But I still think wedging into is the appropriate
verb
 there: psql is a tool for making a connection and executing some SQL
 commands, and ping is not that.
 
 Yeah, I know a whole new executable is kind of a pain, and the amount of
 infrastructure and added maintenance seems a bit high compared to what
 this does.  But a lot of the programs in src/bin/scripts are not much
bigger.
 (In fact that might be the best place for this.)
 
   regards, tom lane
 

This seems to be begging for a canonical pg_monitor command where
pg_ping would be one sub-command.  A bit much for a single command but it
would provide a frame onto which additional user interfaces could be hung -
though I am lacking for concrete examples at the moment.  pg_monitor would
be focused on database monitoring and not cluster monitoring generally
but pg_ping would be a necessary pre-requisite since if the cluster is not
available database monitoring doesn't make any sense.

With the recent focus on pg_stat_statements and the current WIP on
pg_lwlocks having an official UI for accessing much of this kind data has
merit.  Encapsulating the queries into commands makes actually using them
easier and there can be associated documentation discussing how to interpret
those specific commands and some level of consistency when asking for data
for bug and performance reports.  It may be that psql already does much of
this as I am just not that familiar with the program but if that is the case
then classifying it as making a connection and executing some SQL commands
is a limited description.  pg_ping is arguably doing at least the first part
of that.

David J.





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [PATCH] Enforce that INSERT...RETURNING preserves the order of multi rows

2012-10-21 Thread David Johnston
 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-
 ow...@postgresql.org] On Behalf Of Abhijit Menon-Sen
 Sent: Sunday, October 21, 2012 5:45 AM
 To: Tom Lane
 Cc: P. Christeas; pgsql-hackers@postgresql.org
 Subject: [HACKERS] Re: [PATCH] Enforce that INSERT...RETURNING preserves
 the order of multi rows
 
 At 2012-10-17 09:56:22 -0400, t...@sss.pgh.pa.us wrote:
 
   Clarify that in the documentation, and also write a test case that
   will prevent us from breaking the rule in the future.
 
  I don't believe this is a good idea in the slightest.  Yeah, the
  current implementation happens to act like that, but there is no
  reason that we should make it guaranteed behavior.
 
 I always thought it *was* guaranteed, and I've encountered code written by
 other people who were obviously under the same impression: take some
 strings (e.g. flag names), use insert … returning id, map the ids back to 
 the
 names, and use the values in further inserts into other tables (flag_id
 foreign key references flags).
 
 I know one could say returning id, name, but there's certainly code out
 there that doesn't do this.
 
 I personally think the return order should be guaranteed; and if not, then the
 documentation urgently needs some prominent warnings to tell people that
 they should not assume this (for any variant of RETURNING).
 
 -- Abhijit
 

Order is never guaranteed unless an ORDER BY clause is involved in processing 
the data immediately prior to its use.

I could see this being in a Rules that you must always remember listing but 
to include it in every location where people might be inclined to rely upon 
ordering is just going to clutter the documentation.

That said, I'm not personally opposed to this documentation suggestion.  But 
while the idea is acceptable the actual changes proposed by someone's patch is 
what needs to be approved and applied.

As to the order of RETURNING I do not see an overly compelling reason to 
enforce such a limitation; and in general implicit guarantees like this are 
undesirable since there is no way to turn them off.  For sorting in particular 
the action itself can be expensive and not always needed.  While we are not 
talking strictly sorting here (just maintained order) the concept still applies.

David J.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Deprecating RULES

2012-10-22 Thread David Johnston
 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-
 ow...@postgresql.org] On Behalf Of Merlin Moncure
 Sent: Monday, October 22, 2012 6:54 PM
 To: Robert Haas
 Cc: Andrew Dunstan; Josh Berkus; Daniel Farina; pgsql-
 hack...@postgresql.org
 Subject: Re: [HACKERS] Deprecating RULES
  
 Good point on the CTE (and it's correct).  I think by any reasonable
definition
 rules are in fact already de facto deprecated: they are not being extended
to
 interact with other features and the community is advising against their
use.
 I don't think anybody would complain if/when a hypothetical MERGE feature
 was advanced without rule interaction.
 
 That said, I don't think there is any reasonable argument to remove rules.
 Backwards compatibility should only be broken when it *must* be broken.
 Any 'developer interest only' standards ('grotty code', 'inelegant', 'ill
advised
 for new code', etc) of removal are completely specious and thus are IMSNHO
 irrelevant.
 
 merlin

While I agree with this sentiment to some degree in order for the community
to thrive new developer blood needs to be introduced periodically.  Not that
this feature is particularly an issue but making the codebase easier to
learn and maintain has considerable value in its own right.

To put a different spin on things it is like CREATE RULE is a specialty
tool.  Taken that way we should strictly describe the uses-cases where
CREATE RULE behavior is well-defined and problem free.  If the end-user
isn't trying to use RULEs in exactly those cases then they are advised to
attempt another solution or send an e-mail to the list to get some expert
opinions on that particular use-case.  Known problematic uses can also be
listed to minimize the amount of not listed, what do y'all think e-mails
sent to the list.  In this setup there is some developer obligation to try
and not break those well-defined use-cases; but that exists today even if
it is not explicitly mentioned.

David J.






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Creating indexes in the background

2012-10-29 Thread David Johnston
On Oct 27, 2012, at 19:20, David Lee davidomu...@gmail.com wrote:

 Hey folks,
 
 It seems like right now when you want to create an index concurrently, the 
 index creation will get canceled if you cancel the statement (i.e. you must 
 keep your statement open).
 
 Is there a way to launch an index creation in the background so that the 
 statement doesn't need to be kept open?
 
 --David
 

No there is not.

David J.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add contrib module functions to docs' function index

2012-11-13 Thread David Johnston
 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-
 ow...@postgresql.org] On Behalf Of Craig Ringer
 Sent: Tuesday, November 13, 2012 6:15 PM
 To: PostgreSQL Hackers
 Subject: [HACKERS] Add contrib module functions to docs' function index
 
 Hi all
 
 Andreas Heiduk on -bugs suggested that we add the functions provided by
 contrib modules to the function index in the docs, so it's easier to go
from,
 say,  what the heck is idx(...) to finding it in the intarray contrib
module.
 
 This seems like a good idea and I'd like to pop it in the TODO until I get
time to
 check it out, flagged as a minor/newbie-friendly problem.
 Any objections?
 

For clarity does this proposal refer to Chapter 9 of the documentation, the
Index, or both.

If modifying Chapter 9 the function and operator tables should be extended
to include a source column with values of base or contrib: module
name or something similar.

As to the desirability of such a change I concur that it would be a nice
usability enhancement to consider beyond just updating the actual Index.

David J.






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Switching to Homebrew as recommended Mac install?

2012-04-01 Thread David Johnston
On Apr 1, 2012, at 21:50, Jay Levitt jay.lev...@gmail.com wrote:

 Tom Lane wrote:
 While you might not like the EDB installer, at least those
 folks are active in the lists and accountable for whatever problems
 their code has.  Who in heck is responsible for the homebrew
 packaging, and do they answer questions in the PG lists?

 
 Just for general knowledge... Who's responsible is whoever wants to be; 
 homebrew is open source, and with a github-based workflow, it's trivial for 
 them to accept pull requests.  On the 1967 formulas (packages) in the repo, 
 there have been 1759 contributors.  I was volunteering to be the maintainer 
 and liaison if we did this; I'm pretty good at email and IRC.
 
 It's actually pretty clever and elegant - homebrew itself uses git and github 
 for formula updates and distribution, and is written in ruby1.8 which ships 
 with all Macs. /usr/local is a git repo, brew update is mostly git pull, 
 and brew search checks for new pull requests if it doesn't find a matching 
 formula. The whole thing's all of 1500 lines of code, and you saw what 
 formulas look like.
 
 Jay
 
 

You seem highly approving of homebrew and seem willing to develop and support 
it.  I guess the question to be asked is what requirements you would expect to 
have to meet before the Mac Downloads section would list your installer routine 
along with the three already present?  Aside from that unless you are really 
intent on trying to prove yourself to be the best if you are trying to overcome 
shortcomings of the existing installers it would still be nice to let them know 
how you feel things could be improved for the community/user sub-set you belong 
to.  

As a Windows developer (though production is on Linux) I get where you are 
coming from with respect to user permissions and the like - what is desirable 
in a development and in production do differ and so having different 
installation routines for them makes some sense.  Until your developers go to 
install on the production server and do not realize that they should be doing 
something different in order to make the server more secure than their 
development environment.

From what I follow I think you have really good ideas and sound reasoning.  
You do not need permission to contribute to the community in the way you seek 
so what is it that you are really asking for?  From the sound of things your 
primary focus is not in supporting the PostgreSQL community via providing 
services to others or developing new tools.  When brew is replaced by 
something more popular do you think you will continue to maintain the recipie 
or is it going to end up stuck showing us how to install version 9.3 or 
earlier.  I'm beyond my element here but the current installer maintainers are 
doing so in addition to their other, more regular, contributions.  That said, 
the contribution, even if it did stall in the future, would still be welcomed 
and if it is found to be very useful someone would likely pickup the torch as 
long as it is released under the same terms as PostgreSQL itself.

Just trying to bridge an apparent gap since the original e-mail seems to have 
come across as too adversarial that the underlying thoughts have been 
overlooked.  Trying to contribute in my own way with my current resources.

David J.
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Switching to Homebrew as recommended Mac install?

2012-04-01 Thread David Johnston
On Apr 1, 2012, at 13:14, Jay Levitt jay.lev...@gmail.com wrote:

 The Mac installation docs currently recommend the EDB one-click installer as 
 the first choice. While this does install pgadmin and some other refinements, 
 it also is fairly confusing to troubleshoot:
 

The items are not numbered and it is impossible to avoid special ordering.  
There are three options - and yes EDD is listed at the top of the page - but 
nothing else implies any kind of order and given that they are dealing with 
different ways to package if someone prefers MacPorts or Fink the fact they are 
listed lower shouldn't induce them to pick the unfamiliar one first.

David J.
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2 release notes, beta time?

2012-04-27 Thread David Johnston
On Apr 27, 2012, at 21:24, Tom Lane t...@sss.pgh.pa.us wrote:

 Bruce Momjian br...@momjian.us writes:
 On Fri, Apr 27, 2012 at 09:10:54PM -0400, Tom Lane wrote:
 How would that help?  The bottleneck is packaging, which is the same
 work whatever we call it.
 
 We release the alpha with no packaging.
 
 Think we'd lose a lot of potential testers that way.
 

It seems that even if you published before PGCon a reasonable number of 
possible testers would be there and/or responding to any feedback is going to 
be delayed since developers are going to be there.

I could see the goal being having a package-ready commit before PGCon but 
schedule the official release until after.  No matter what you call it those 
who want a head start can self-compile while others will just wait until the 
packagers are done.  Either way the same codebase will be in the wild (so 
probably don't want to call it alpha).

The original reason for pre-conference is so the developers can feel less bad 
talking about 9.3 features (and 9.2 post-mortem) since the beta for 9.2 will be 
completed.

David J.






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] precision and scale functions for numeric

2012-04-30 Thread David Johnston
 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-
 ow...@postgresql.org] On Behalf Of Robert Haas
 Sent: Monday, April 30, 2012 2:20 PM
 To: Peter Eisentraut
 Cc: pgsql-hackers
 Subject: Re: [HACKERS] precision and scale functions for numeric
 
 
 I think you could test for integer-ness by testing whether val % 0 = 0.
 

Either I am missing something here or you are.  Since Modulus is a division
function anything % 0 results in a division-by-zero ERROR - division has
to occur before a remainder can be obtained.

Maybe val % 2 NOT IN (0,1) ...

David J.





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: additional error fields

2012-05-01 Thread David Johnston
 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-
 ow...@postgresql.org] On Behalf Of Peter Geoghegan
 Sent: Tuesday, May 01, 2012 4:37 PM
 To: Tom Lane
 Cc: Pavel Stehule; PostgreSQL Hackers
 Subject: Re: [HACKERS] proposal: additional error fields
 
 On 1 May 2012 21:14, Tom Lane t...@sss.pgh.pa.us wrote:
  Peter Geoghegan pe...@2ndquadrant.com writes:
  Maybe no one is convinced by any of this, but the fact is that the
  SQLSTATE argument falls down when one considers that we aren't using
  it in many cases of errors that clearly are severe.
 
  The reason that argument isn't convincing is that we *are* using a
  SQLSTATE for every such message; it's just defaulted to XX000.
  AFAICT, it would be reasonable to treat all XX000 as alarm conditions
  until proven different.  If a given message is, in fact, not supposed
  to be can't happen, then it shouldn't be going through elog().  We'd
  probably be needing to fix some places that were lazily coded as
  elogs, but under your proposal we would also have to touch every such
  place ... and thousands more besides.
 
 Fair enough. Adjusting all of those elog calls may be excessive. The
argument
 could be made that what I've characterised as severe (which is, as I've
said,
 not entirely clear-cut) could be deduced from SQLSTATE if we were to
 formalise the can't happen errors are only allowed to use elog()
convention
 into a hard rule. However, I think it's critically important to make all
of this
 easy and well-documented. Severity should probably be part of the default
 log_line_prefix.
 
 Sorry for high-jacking your thread, Pavel.
 

So the apparent desire is to promote proper usage of SQLSTATE but
simultaneously add and encode a default SQLSTATE_PG_SEVERITY value for each
class/code that can be used for external monitoring and notification.
Ideally customization could be done so that differing opinions on such
severity classification could be made on a client-per-client basis without
having to resort to outputting the SQLSTATE code itself and then requiring
external software to maintain such an association.  To that end any
severity on the class itself would act as a default and specific codes
that want to share the same severity can be skipped while those needing a
different code can have an override specified.  Since the codes are neither
exhaustive nor mandatory such a default would apply to any user-chosen code
not previously defined.

Simply adding in more high-level categories avoids the issue that the
current system has insufficient information encoded to facilitate desired
reporting requirements.  If we encode our messages with a sufficient level
of detail then internally or externally adding categories and meta-data on
top of those layers is simple and new ideas and techniques can be tried
without having to modify the system in the future.

Supplemental context information such as table and constraint names can be
useful if the cost of recording such data is low enough and the value
sufficient.  That said, knowing the SQL that caused the error and which
process it is implementing should be sufficient to identify possible causes
and resolutions without requiring the specific columns and tables involved.
Since constraint violations already expose the name of the violated
constraint that particular situation seems to have a sufficient solution.
Given that you should not be giving end-users that kind of implementation
artifact anyway the developer and DBA should be able to identify the root
cause and either avoid it themselves or code an application interface to
present to the end-user.  So, at least from my perspective, the bar to move
this forward is pretty high - either it must be fairly simple to implement
(which it is not) or there needs to be more value to it than I am seeing
currently.  This ignores whether normal runtime performance costs will be a
significant factor.

Looking at the SQLSTATE error classes I am somewhat concerned with the
number of items found under HV and the apparent intermixing of client
and internal error types.

As for an upgrade path how about something along the lines of:

1) Make a best-attempt effort at identifying existing elog and ereport calls
and modifying them to output specific SQLSTATE codes
2) Modify elog/ereport to catch and log (stack trace) any calls that do not
set SQLSTATE to a specific value.
3) During development, beta, and RC phases keep such code in place and ask
people to look at their logs for missed elog/ereport calls
4) Remove the stack trace (logging) within ereport/elog from the final
released code

David J.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?

2012-05-01 Thread David Johnston
 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-
 ow...@postgresql.org] On Behalf Of Hannu Krosing
 Sent: Tuesday, May 01, 2012 5:29 PM
 
 The reason I am whining now is that with minor adjustments in
 implementation it could all be made much more powerful (try cast to ::json
 for values before cast to ::text) and much more elegant thanks to
 PostgreSQL's built in casting.
 
 If we allowed json to hold any JSON value and tried ::json when generating
 json for compound types than we would be able to claim that PostgreSQL
 supports JSON everywhere, defaulting to representing officially unsupported
 types as strings, but allowing users to convert these to their preferred
 conventions.

I get that a JSON Text is always also a JSON Value but the reverse is not true. 
 Thus, if we define JSON to be JSON Value we cannot guarantee that the encoded 
value is a possible JSON Text - the most important property for purposes of 
data interchange.

 
 I'd also prefer to have default conversions already included for some of our
 sexier types, like intervals (just a two element array) and hstore (an object)
 etc.

Interval is not simply 2 values but also denotes whether the particular value 
is inclusive or exclusive; you would have to use an object unless you transmit 
in a text format and let the target perform the necessary interpretation of the 
string.

 
 Suddenly we would be the best match database for Web development and
 all things Ajax and also have a widely used built in and adjustable 
 interchange
 format to outer world.
 
  Second, RFC 4627 is absolutely clear: a valid JSON value can only be
  an object or an array, so this thing about converting arbitrary datum
  values to JSON is a fantasy.
 
 Probably a typo on your part - valid JSON _text_ is object or array, valid
 JSON value can also be number, text, true, false and null
 
 What I am arguing for is interpreting our json type as representing a JSON
 value not JSON text, this would enable users to adjust and extend the
 generation of json values via defining casts for their specific types - most
 notably Date* types but also things like hstore, which has a natural JSON
 representation as object (a list of key:value pairs for non-js users, 
 a.k.a. a
 dictionary, hash, etc.)

Aside from the fact it is likely too late to change the interpretation I would 
argue against doing so in any case.

Currently, the idea is to get your result all lined up and ready to go and then 
ship it off to the caller as valid JSON so that the caller does not have to do 
so itself.  Answering the question what would this value look like if it was 
part of a json output? is good; however, production use is likely to mostly 
care about the entire json interchange construct (i.e., JSON Text)

So: json - json_text; 

A JSON Value always has a textual representation but if we were to have an 
actual type it would make sense to encode it such that (strings, objects and 
arrays) are delimited while (numbers, false, true, and null) are not.

Type Name: json_value

Output Representations (all output surrounded by double-quotes since all are 
string-like) - 
String: 'VALUE' (single-quote delimiter)
Object: {...}
Array: []
Number: 0.00
Other: false, true, null

JSON is fundamentally an interchange format (especially from a database's 
perspective).  JSON Values only really have meaning if they are attached 
explicitly to a JSON Text structure, if you wanted to store one independently 
you should convert it into a native representation first.  The few exceptions 
to this would be sufficiently handled via plain text with meta-data indicating 
that the stored value is structured in directly JSON compatible syntax.  In 
short, the default context for JSON in PostgreSQL should JSON Text (not JSON 
Value) and thus the unadorned json should reflect this default (which it 
does).

 
  If anything, we should adjust the JSON input routines to disallow
  anything else, rather than start to output what is not valid JSON.
 
 I tested python, ruby and javascript in firefox and chrome, all their JSON
 generators generate 1 for standalone integer 1 and a for standalone string
 a , and none refused to convert either to JSON.
 

^Assume that we keep the meaning of json to be JSON Text; what would you 
suggest occurs if someone attempts a datum - json cast?  Given that we are 
working in a strongly-typed environment the meaning of JSON cannot be changed 
and so either the cast has to output valid JSON Text or it has to fail.  My 
personal take it is have it fail since any arbitrary decision to cast to JSON 
Text is going to make someone unhappy and supposedly they can modify their 
query so that the result generates whatever format they desire.


I haven't followed the JSON development in 9.2 too closely but exposing 
whatever conversion mechanism is currently used to generate JSON makes sense 
from a ease-of-development standpoint.  But even then, during 

Re: [HACKERS] proposal: additional error fields

2012-05-01 Thread David Johnston
On May 1, 2012, at 20:05, Robert Haas robertmh...@gmail.com wrote:

 On Tue, May 1, 2012 at 4:09 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I continue to maintain that the SQLSTATE is a much better basis for
 solving this problem.  Its categories are already pretty close to
 what Peter needs: basically, IIUC, he wants to know about classes
 53, 58, maybe F0, and XX.
 
 This is really too mushy, IMHO.  ERRCODE_TOO_MANY_CONNECTIONS isn't
 what I'd call an oh-shit condition even though it's in class 53, but
 this could not create archive status file \%s\ is definitely an
 oh-shit regardless of what errcode_for_file_access() returns.
 
 Also, the fact is that most people do not log SQLSTATEs.  And even if
 they did, they're not going to know to grep for 53|58|maybe F0|XX.
 What we need is an easy way for people to pick out any log entries
 that represent conditions that should never occur as a result of any
 legitimate user activity.  
 Like, with grep.  And, without needing to
 have a PhD in Postgresology.
 

If you want something really simple why not output all elog calls to one file 
and ereport calls to the current log?

If you recognize the need to fix existing code so that you can determine the 
severity levels you desire then go all the way and use SQLSTATE at the call 
level and then add meta-data about those codes higher up.  That meta-data is 
then customizable so those who want the too many connections error can see them 
while those that do not can turn them off.

With the addition of the PostgreSQL specific severity category both that value 
and the SQLSTATE upon which it is based should be something that is considered 
best practice to output (and the default) and future attention should be given 
to ensuring that the code is as accurate as possible. Since existing log 
formats would still be valid upgrades should not be an issue.

David J.





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?

2012-05-01 Thread David Johnston
On May 1, 2012, at 20:41, Hannu Krosing ha...@2ndquadrant.com wrote:
 
 Most people don't work in strongly-typed environment, and thus would
 work around such restriction if they need a simple JSON value at the
 other end of the interchange.
 
 
 My personal take it is have it fail since any arbitrary decision to cast 
 to JSON Text is going to make someone unhappy and supposedly they can 
 modify their query so that the result generates whatever format they desire.
 
 Do you actually have such an experience or is it just a wild guess ?
 
 

So even given the semantic differences between an object and a scalar I am 
better understanding where interpreting JSON as JSON Value makes sense.  
However, if I convert a record or array to JSON I expect to get a JSON Text 
even if the there is only a single column or value in the input.  

I guess my take is that record - JSON text while anything else is JSON value.  
Whether it is worth maiming the special case for record is worthwhile I really 
do not know but the semantic difference does exist; and record output is a 
significant aspect of PostgreSQL output.

I get the ease-of-use aspect but also recognize that sometimes being slightly 
harder to use is worthwhile if you eliminate ambiguities or limit the 
possibility to make mistakes.

FWIW my background on this topic is more theoretical than experiential though I 
am an web-application developer by trade and do use some JSON in that capacity.

David J.
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?

2012-05-04 Thread David Johnston
 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-
 ow...@postgresql.org] On Behalf Of Hannu Krosing
 Sent: Friday, May 04, 2012 4:40 PM
 To: Robert Haas
 Cc: Tom Lane; Andrew Dunstan; PostgreSQL-development; Merlin Moncure
 Subject: Re: [HACKERS] JSON in 9.2 - Could we have just one to_json()
 function instead of two separate versions ?
 
 On Fri, 2012-05-04 at 15:59 -0400, Robert Haas wrote:
  On Fri, May 4, 2012 at 3:49 PM, Hannu Krosing ha...@krosing.net
 wrote:
   On Fri, 2012-05-04 at 09:52 -0400, Tom Lane wrote:
   Hannu Krosing ha...@2ndquadrant.com writes:
On Wed, 2012-05-02 at 12:06 -0700, Andrew Dunstan wrote:
So given that do we do anything about this now, or wait till 9.3?
  
I'd like the json support in 9.2 updated as follows
  
   I think it's too late to be entertaining proposals for such changes
   in 9.2.  If we had concluded that the existing functions were
   actively wrong or a bad idea,
  
   I think that hard-coding postgresql text representation as our
   json representation without a possibility for the user tio easily
   fix it without rewriting foll xx_to_json() functions is borderline
   actively wrong.
  
   Can we at least have the xxx_to_json() functions try cast to json
   first and fall back to text if the cast fails.
 
  I think the idea that you can involve the casting machinery in this is
  misguided.  sometextval::json has got to mean that sometextval is
  expected to be in the form of a syntactically correct JSON value - and
  NOT that we wrap it in a JSON string.  We can have constructors for
  JSON, but they've got to be separate from the casting machinery.
 
 on the contrary - the string representation of textual value a is a
 
 casting should _not_ neam syntax check, casting is by definition a conversion.
 
 if we cast text to int, we return value of type int , if we cast int to
 numeric(5,2) we return value of type numeric(5,2)
 
 why should casring to json work differntly ?
 

What is the distinction between what you are thinking regarding JSON and this 
example?

SELECT '1a'::integer;
SQL Error: ERROR:  invalid input syntax for integer: 1a
LINE 1: SELECT '1a'::integer

As a user if I cast something to something else I want the result to be of the 
correct type and deterministic; otherwise throw me some kind of invalid input 
format exception (or syntax exception).  Casting vs. Constructors is really a 
meaningless distinction to a lay person.  When I cast I do so by constructing a 
new value using my existing value for input.  When I use an explicit CAST I am 
unable to supply additional parameters to configure the casting whereas a 
constructor function gives me that possibility.  But a constructor function 
without any additional parameters is not semantically different than a cast.

I guess the concern to address is something like:

SELECT '{key: value}'::json OR SELECT '[1.25]'::json;  Do you interpret this as 
already being valid JSON and thus output object/array constructs (i.e., JSON 
Text) or do you treat them as string literals and output scalars (i.e., JSON 
Value).  Even if you feel these are artificial constructs the concepts holds 
that there may be ambiguous data that can be interpreted in multiple ways (this 
applies even to function forms, though in the function form you could specify 
which one you want to output using a separate DEFAULTed parameter). 



I can see the primary use-case for JSON Value casting as being queries of the 
following forms (since the record and array forms are going through the 
record/array_to_json function):

SELECT COUNT(*)::json FROM table [WHERE ...];
SELECT single_col::json FROM table WHERE id = ?;

Where the single provided value can be sent directly back to the web-caller 
JavaScript and used as-is because it is valid JSON.  Though, honestly, both 
SELECT to_json(single_col) and SELECT to_json(COUNT(*)) are equally usable so 
any distinction between them is a pure technical issue to me.

Am I correct in assuming the following expected behavior (the forgive the 
blatantly wrong syntax but you should get the point)?

RAISE NOTICE '%', SELECT 'A'::text  =  A
RAISE NOTICE '%', SELECT 'A'::json = A

David J



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible error in psql or Postgres?

2012-06-12 Thread David Johnston
On Jun 12, 2012, at 15:21, Dusan Misic promi...@gmail.com wrote:

 Is this normal Postgres / psql behavior?
 
 griffindb=# \d system.user;
   Table system.user
   Column   | Type  | Modifiers
 
 ---+---+
 
  username  | character varying(20) | not null
  password  | character varying(32) | not null
  firstname | character varying(40) | not null default 'nema ime'::character 
 vary
 ing
  lastname  | character varying(40) | not null default 'nema 
 prezime'::character
 varying
 Indexes:
 SystemUser_PK PRIMARY KEY, btree (username) CLUSTER
 
 normal query: 
 
 griffindb=# select * from system.user where username = 'root';
  username | password | firstname |   lastname
 --+--+---+---
  root | 1e7db545fccbf4e03abc6b71d329ab4f | Super | administrator
 (1 row)
 
 error query:
 
 griffindb=# select * from system.user where user = 'root';
  username | password | firstname | lastname
 --+--+---+--
 (0 rows)
 
 column user does not exist should throw an error!
 
 PostgreSQL / psql version: 9.1.3 on Windows 7 64-bit
 
 Should Postgres or psql report an error because column used in WHERE clause 
 does not exist? 

http://www.postgresql.org/docs/9.0/interactive/functions-info.html

user is actually a function the returns the current_user.  It is an SQL 
special function and thus does not require the use of () after the function 
name.  So basically you are saying where current_user = 'root' which is 
either a constant true or false for the statement.

David J.

Re: [HACKERS] We probably need autovacuum_max_wraparound_workers

2012-06-27 Thread David Johnston
On Jun 27, 2012, at 22:00, Josh Berkus j...@agliodbs.com wrote:

 Folks,
 
 Yeah, I can't believe I'm calling for *yet another* configuration
 variable either.  Suggested workaround fixes very welcome.
 
 The basic issue is that autovacuum_max_workers is set by most users
 based on autovac's fairly lightweight action most of the time: analyze,
 vacuuming pages not on the visibility list, etc.  However, when XID
 wraparound kicks in, then autovac starts reading entire tables from disk
 ... and those tables may be very large.
 
 This becomes a downtime issue if you've set autovacuum_max_workers to,
 say, 5 and several large tables hit the wraparound threshold at the same
 time (as they tend to do if you're using the default settings).  Then
 you have 5 autovacuum processes concurrently doing heavy IO and getting
 in each others' way.
 
 I've seen this at two sites now, and my conclusion is that a single
 autovacuum_max_workers isn't sufficient if to cover the case of
 wraparound vacuum.  Nor can we just single-thread the wraparound vacuum
 (i.e. just one worker) since that would hurt users who have thousands of
 small tables.
 
 

Would there be enough benefit to setting up separate small/medium?/large 
thresholds with user-changeable default table size boundaries so that you can 
configure 6 workers where 3 handle the small tables, 2 handle the medium 
tables, and 1 handles the large tables.  Or alternatively a small worker 
consumes 1, medium 2, and large 3 'units' from whatever size pool has been 
defined.  So you could have 6 small tables or two large tables in-progress 
simultaneously.

David J.
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Issues with generate_series using integer boundaries

2011-06-17 Thread David Johnston
 
 On Wed, Feb 9, 2011 at 4:50 AM, Thom Brown t...@linux.com wrote:
  On 9 February 2011 02:11, Robert Haas robertmh...@gmail.com wrote:
  On Tue, Feb 8, 2011 at 8:30 PM, Andrew Dunstan
 and...@dunslane.net wrote:
  Quite right, but the commitfest manager isn't meant to be a
  substitute for one. Bug fixes aren't subject to the same restrictions
of
 feature changes.
 
  Another option would be to add this here:
 
  http://wiki.postgresql.org/wiki/PostgreSQL_9.1_Open_Items
 
  I've removed it from the commitfest because it really doesn't belong
  there, and I've added it to the open items list.
 
 So, I finally got around to look at this, and I think there is a simpler
solution.
 When an overflow occurs while calculating the next value, that just means
 that the value we're about to return is the last one that should be
generated.
 So we just need to frob the context state so that the next call will
decide
 we're done.  There are any of number of ways to do that; I just picked
what
 looked like the easiest one.
 

Tangential comment but have you considered emitting a warning (and/or log
entry) when you are 10,000-50,000 away from issuing the last available
number in the sequence so that some recognition exists that any code
depending on the sequence is going to fail soon?

Also, during sequence creation you know the integer type being used so that
maximum value is known and an overflow should not need to come into play (I
guess the trade-off is the implicit try-catch [or whatever mechanism C
uses] performance hit versus the need to store another full integer in the
data structure).

You could also give access to the warning threshold value so that the
developer can change it to whatever value is desired (with a meaningful
default of course).

David J.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Creating temp tables inside read only transactions

2011-07-11 Thread David Johnston
Christopher Browne wrote:
 Vis-a-vis the attempt to do nested naming, that is ns1.ns2.table1, 
 there's a pretty good reason NOT to support that, namely that this 
 breaks relational handling of tables.  PostgreSQL is a *relational* 
 database system, hence it's preferable for structures to be 
 relational, as opposed to hierarchical, which is what any of the 
 suggested nestings are.

Organizing the tables into a multi-level namespace, either fixed-depth or 
variable-depth, rather than using a flat namespace, does not make the database 
any less relational, because the above definition and any others still hold.

The less relational argument above is a red herring or distraction.  One can 
argue against namespace nesting just fine without saying that.

-- Darren Duncan


I agree with Darren.

One thought that came to my mind was to use a different separator between two 
namespaces and/or between the database identifier and the rest of the path.
Examples:

ns1!ns2.table

OR

database@ns1.table

OR

database@ns1!ns2.table

I've been following only some of the discussion but it seems that much 
ambiguity would be lost by using different separators.  Schemas themselves are 
already non-standard so it isn't like we are constrained here in what is chosen.

Just some quick thoughts I've had but haven't fully considered how they would 
fit in to the existing setup.  But is there is any major reason why choosing 
different separators would not work?

Also, within search_path, some form of wild-card selector would be desirable:  
ns1!*.  I'm not opposed to having to be explicit about the search_path in order 
to avoid name collisions; though it would be nice if VIEWS had some kind of 
SET syntax, like functions do, so that the definer can specify the 
search_path that the view will resolve against.

David J.





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Creating temp tables inside read only transactions

2011-07-11 Thread David Johnston
On Mon, Jul 11, 2011 at 10:12 AM, Florian Pflug f...@phlo.org wrote:
 On Jul11, 2011, at 07:08 , Darren Duncan wrote:
 Christopher Browne wrote:
 Vis-a-vis the attempt to do nested naming, that is ns1.ns2.table1, 
 there's a pretty good reason NOT to support that, namely that this 
 breaks relational handling of tables.  PostgreSQL is a *relational* 
 database system, hence it's preferable for structures to be 
 relational, as opposed to hierarchical, which is what any of the 
 suggested nestings are.

Rather, the argument is that it was intentional for the structuring of
table naming to, itself, be relational, and changing that definitely has
some undesirable characteristics.

The need for recursive queries is the most obvious undesirable, but it's
not the only undesirable thing, by any means.

I do not see how recursive queries (really iteration of records) even enters
the picture...

Right now I can emulate a hierarchical schema structure via a naming scheme
- for example  schemabase_sub1_sub2_etc.  I am simply looking for a formal
way to do the above AND also tell the system that I want all schemas under
schemabase to be in the search path.  Heck, I guess just allowing for
simply pattern matching in search_path would be useful in this case
regardless of the presence of an actual schema hierarchy.  Using LIKE
syntax say: SET search_path TO schemabase_sub1_% or something similar.
The only missing ability becomes a way for graphical tools to represent the
schema hierarchy using a tree-structure with multiple depths.

I can see how adding . and .. and relative paths would confuse the issue
those are not necessary features of a multi-level schema depth.

The above, combined with a different separator for intra-level
namespace/schema delineation, would allow for an unambiguous way to define
and use a hierarchical schema with seemingly minimal invasion into the
current way of doing things. You could almost implement it just by requiring
a specific character to act as the separator and then construct the actual
schema using single-level literals and supporting functions that can convert
them into an hierarchy.  In other words, the schema table would still only
contain one field with the full parent!child as opposed to (schema,
parent) with (VALUES('parent',null),('child','parent')).

In other words, if we use ! as the separator, any schema named
parent!child  could be stored and referenced as such but then if you run a
getChildren(parent) function it would return child along with any other
schemas of the form parent!%.  In this case the % sign could maybe only
match everything except ! and the * symbol could be used to match ! as
well.

I could give more examples but I hope the basic idea is obvious.  The main
thing is that the namespace hierarchy usage is standardized in such a way
that pgAdmin and other GUI tools can reliably use for display purposes and
that search_path can be constructed in a more compact format so that every
schema and sub-schema is still absolutely referenced (you can even have the
SET command resolve search_path at execution time and then remain static
just like CREATE VIEW SELECT * FROM table.

David J.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unsigned integer types

2013-05-28 Thread David Johnston
Maciej Gajewski wrote
 I'm also afraid that with
 the extension I'd be left on my own maintaining it forever. While if
 this could go into the core product, it would live forever.

Clarification from the gallery: are we talking an extension or a custom
PostgreSQL build/fork?

If it is an extension the stick it up on GitHub and let whomever finds it
valuable help contribute to keeping it relevant.

No use letting perfection stand in the way of usability.  If the current
solutions are too slow then exploring the extension aspect - even if it
falls short - is worthwhile.  At minimum you learn from the experience and
maybe someone else (or even yourself) can build on that foundation.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Unsigned-integer-types-tp5756994p5757234.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bad error message on valuntil

2013-06-07 Thread David Johnston
Tom Lane-2 wrote
 Joshua D. Drake lt;

 jd@

 gt; writes:
 I had a customer pulling their hair out today because they couldn't 
 login to their system. The error was consistently:
 
 2013-06-07 08:42:44 MST postgres 10.1.11.67 27440 FATAL:  password
 authentication failed for user user
 
 However the problem had nothing to do with password authentication. It 
 was because the valuntil on the user had been set till a date in the 
 past. Now technically if we just removed the word password from the 
 error it would be accurate but it seems it would be better to say, 
 FATAL: the user user has expired.
 
 I think it's intentional that we don't tell the *client* that level of
 detail.  I could see emitting a log message about it, but it's not clear
 whether that will help an unsophisticated user.
 
   regards, tom lane

I presume that password in this context refers to the method by which
identity is checked; some alternatives being trust and ident?

Using the same logic of why you would not expose the fact that the user is
expired versus the user has provided invalid credentials exposing password
is a security leak as well.  And then, to top it off, provides a red herring
to the user trying to figure out why their username/password combination
isn't working.

Something like:

'Authentication for user user failed.  Update and try again or contact the
administrator to confirm user is authorized to log onto the system.'

David J.







--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Bad-error-message-on-valuntil-tp5758369p5758383.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bad error message on valuntil

2013-06-07 Thread David Johnston
Joshua D. Drake wrote
 On 06/07/2013 12:31 PM, Tom Lane wrote:
 Joshua D. Drake lt;

 jd@

 gt; writes:
 On 06/07/2013 11:57 AM, Tom Lane wrote:
 I think it's intentional that we don't tell the *client* that level of
 detail.

 Why? That seems rather silly.

 The general policy on authentication failure reports is that we don't
 tell the client anything it doesn't know already about what the auth
 method is.  We can log additional info into the postmaster log if it
 seems useful to do so, but the more you tell a client, the more you
 risk undesirable info leakage to a bad guy.  As an example here,
 reporting the valuntil condition would be acking to an attacker that
 he had the right password.
 
 So security by obscurity? Alright, without getting into that argument 
 how about we change the error message to:
 
 FATAL: Authentication failed: Check server log for specifics
 
 And then we make sure we log proper info?
 
 Sincerely,
 
 Joshua D. Drake
 

  regards, tom lane


In a password login situation you should not indicate to the client why the
login attempt failed.  If you say that the password expired they know the
username supplied has to be correct (otherwise how would you know the
password is expired).

However, echoing back the supplied user identifier (without otherwise
implying that it exists or does not exist on the server) provides a quick
verification spot for the user to see whether the expected user name was
being sent - especially since the location of the error message is probably
significantly removed from the location of the user name string on the
client.

Please check server log for specifics is not a good message for something
sent to a client that in many normal situation would have no access to said
logs.

I'd suggest:

Authentication Failed: the user (role_name)  password combination was not
found or is expired.

How a particular user is to go about resolving the issue is an
organizational (and individual) policy best ignored in the error message. 
For a stressed-out, administrator-capable, user who sees this message they
at least are reminded that even if the combination exists it is possible
that it is has somehow been disabled. Hopefully they will then remember that
password expiration is possible and will check that along with the presence
of the role/user.

David J.







--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Bad-error-message-on-valuntil-tp5758369p5758398.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why can't I use windowing functions over ordered aggregates?

2013-06-21 Thread David Johnston
Cédric Villemain-2 wrote
 And also, first_value is a *window* function, not a simple aggregate 
 function...

Per the documentation any aggregate function can be used with a WINDOW
declaration.  The logical question is why are window aggregates special so
that the reverse cannot be true?  In other words why is not every function
simply defined as a normal aggregate that can be used in both contexts?


 See this example:
 # create table foo (i int, t timestamptz);
 # insert into foo select n, now() from generate_series(1,10) g(n);
 # select i, first_value(i) over (order by t desc) from foo;
 # select i, first_value(i) over (order by t desc ROWS between 0 PRECEDING
 and 
 UNBOUNDED FOLLOWING) from foo;
 
 What do you expect SELECT first(val order by ts desc) to output ? 

Undefined due to incorrect specificity of the ORDER BY definition.  The
window version has the same issue.

The window aggregates should simply treat the entire input set as the
relevant frame - basically the same output as would result from
(simplistically):

SELECT window_agg(...)
FROM (
SELECT id, window_agg(...) OVER (ORDER BY id ASC)  ORDER BY id ASC
) agg
ORDER BY id DESC LIMIT 1

Admittedly this really only makes sense for first_value, last_value, and
nth_value; the other window aggregates can return valid values but to have
meaning they really need to be output in a windowing context.

David J.





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Why-can-t-I-use-windowing-functions-over-ordered-aggregates-tp5760233p5760358.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Kudos for Reviewers -- straw poll

2013-06-25 Thread David Johnston
Brendan Jurd wrote
 On 26 June 2013 03:17, Josh Berkus lt;

 josh@

 gt; wrote:
 How should reviewers get credited in the release notes?

 a) not at all
 b) in a single block titled Reviewers for this version at the bottom.
 c) on the patch they reviewed, for each patch

I think some consideration toward a commit and review summary (outside the
release notes; and graphical/interactive in nature ideally) for each major
release is something worth considering.  With regards to the release notes
I'd lean toward (b); significant contributions getting bumped to co-author
on specific patches covers (c) fairly well.  I am unsure whether release
note mentions are significant enough motivation...see other thoughts below.


 Should there be a criteria for a creditable review?

 a) no, all reviews are worthwhile
 b) yes, they have to do more than it compiles
 c) yes, only code reviews should count

Ideally (a) though (b) conceptually makes sense but it is too generic.


 Should reviewers for 9.4 get a prize, such as a t-shirt, as a
 promotion to increase the number of non-submitter reviewers?

 a) yes
 b) no
 c) yes, but submitters and committers should get it too

One low-cost prize that I've pondered is, on an ongoing basis, the ability
to post a link and/or message to the PostgreSQL front page within a
significantly less stringent barrier to acceptance than is required for
current content.  Basically except for topics or presentations deemed of
poor taste or detrimental to the project anything should be allowed.  Some
kind of this message was allowed because so-and-so has recently made the
following significant contributions to the project.  There are probably
quite a few logistics to deal with down this path but a sponsor platform for
shameless self-promotion for people making the project successful -
something visible on an ongoing basis and not just once a year in a release
note - is likely a very valuable to the contributor while fairly inexpensive
to the project (i.e., some risk of reputation and some cost to setup the
infrastructure).


David J.





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Kudos-for-Reviewers-straw-poll-tp5760952p5761031.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] column b is of type X but expression is of type text

2013-07-12 Thread David Johnston
Benedikt Grundmann wrote
 A third party application we use generates SQL queries.  Here is query it
 generated that broke today and for which I have a hard time arguing that
 the postgres behavior is correct (minimally the error message is
 confusing):
 
 =# create temporary table foo (b double precision );
 CREATE TABLE
 Time: 40.368 ms
 =# insert into foo select min(NULL);
 ERROR:  column b is of type double precision but expression is of type
 text
 LINE 1: insert into foo select min(NULL);
^
 HINT:  You will need to rewrite or cast the expression.
 
 So why does min(NULL) have type text?  According to the docs it has the
 type of the input.  The value is itself NULL which is a valid member of
 all
 types in SQL isn't it?
 
 So what is going on?
 
 Thanks,
 
 Bene

Ideally PostgreSQL would be smart enough to recognize that min(NULL) is of
an unknown type and thus would use the definition of foo to coerce NULL to
the desired type.  I cannot explain why it does not do this but from the
example it cannot.

Using a literal NULL without an explicit type-cast is not recommended as the
system cannot always accurately figure out what type you mean for it to use. 
Being a valid value for all types does not mean it magically switches to fit
whatever usage is required.  Columns are typed, not values per-se, and so
NULL can belong in any column but once it is part of that column it takes on
that column's type.

The query you show is pretty pointless since the intent of min is to take
a column over which to aggregate; not a literal which will only ever return
itself.

In short the SELECT query is trying its best to execute and so in the
presence of an unadorned NULL - and being unable to infer the type from
context - it simply uses the default type which is text.  The SELECT
executes just fine, and outputs a min column of type text which when
supplied to the table foo causes the type mis-match for column b on
foo.

The PostgreSQL behavior is simple because it does not infer the type of
NULL from the column in foo but it is not required to do so its failure is
not wrong.  The error message, given what does occur, makes perfect sense
and is easy enough to trace (i.e., what column is feeding foo.b from the
SELECT statement; then, why is that column being seen as text).

PostgreSQL is in the opinion of some too verbose in its requirement to be
explicit regarding types but it does make for less buggy code overall.  This
particular use-case may be solvable but I'd argue that your example is not
likely to convince anyone that it is a serious enough problem worth the
effort it would take to do so.

David J.






--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/column-b-is-of-type-X-but-expression-is-of-type-text-tp5763586p5763587.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] column b is of type X but expression is of type text

2013-07-12 Thread David Johnston
Josh Berkus wrote
 On 07/12/2013 07:28 AM, Benedikt Grundmann wrote:
 Thanks David,
 
 I like the fact that postgres is explicit in it's types.  All I'm arguing
 is that error message is misleading. And that I had a hard time
 understanding why happened what happened.  The part I was missing is that
 despite supporting an any type the necessary type inference is very very
 local and quickly resorts to the default type.
 
 2. you call min().  Min() works for many datatypes.  Min() says: can I
 work for text?  The answer is yes, so at this point the NULL which
 was default text becomes *really* text.
 
 .
 .
 .
 
 The alternative would be to disallow unadorned NULLs entirely, which
 would break thousands of applications.

In the absence of the function call the system is able to delay resolving
the type until later in the query:

SELECT *
FROM (VALUES ('2013-02-01'::date), ('2013-01-01'), (NULL) ) vals (col1);
--works


SELECT *
FROM (VALUES ('2013-02-01'::date), ('2013-01-01'), (min(NULL)) ) vals
(col1); --fails

I have no idea how this mechanism works but ISTM that the planner could, for
anyelement, look at where the result of the function call is used and add
a cast to the function input value to match the desired result type if the
input type is undefined.

I'm curious what you would consider to be a more apropos error message in
this situation; regardless of how difficult it would be to implement.

I am also curious if you can think of a better example of where this
behavior is problematic.  The query for this thread is not something that I
would deem to be good SQL.

David J.





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/column-b-is-of-type-X-but-expression-is-of-type-text-tp5763586p5763615.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A general Q about index

2013-07-16 Thread David Johnston
soroosh sardari wrote
 Hi
 
 I want to know how an index is created and used.
 actually if you can show to me a simple start point, it would be great.
 
 Regards,
 Soroosh Sardari

In the documentation there is a table of contents and listed on that table
is a section named Indexes.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/A-general-Q-about-index-tp5763912p5763926.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A general Q about index

2013-07-16 Thread David Johnston
David Johnston wrote
 
 soroosh sardari wrote
 Hi
 
 I want to know how an index is created and used.
 actually if you can show to me a simple start point, it would be great.
 
 Regards,
 Soroosh Sardari
 In the documentation there is a table of contents and listed on that table
 is a section named Indexes.
 
 David J.

Since you posted this to hacker you may mean you wish to know how to program
them as oppose to use them in SQL.  I have no clue to this regard.  If you
did mean use in SQL then the documentation is your friend and you also
should not have posted this question to -hackers but to -general instead;
probably should have posted there anyway to begin with and been more clear
as to what you mean by created and used.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/A-general-Q-about-index-tp5763912p5763928.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: How to configer the pg_hba record which the database name with \n ?

2013-08-01 Thread David Johnston
huxm wrote
  where there is a
 newline(\n) in the name.

I can't imagine why you would want to use non-printing characters in a name,
especially a database name.  Even if the hba.conf file was able to interpret
it (which it probably cannot but I do not know for certain) client
interfaces are likely to have problems as well.  Most of these would not
think of interpolating a database identifier in that manner but instead
treat the name as a literal value.  Even when line-continuations are allowed
they are often cosmetic in nature and the resultant newline is discarded
during the pre-execution phase of the command interpreter.

Arguably having a check constraint on the catalog to prohibit such a name
would be more useful than trying to make such a construct functional.

I'd guess in the immediate term the users accessing this database would need
to have all as their target and then you use role-based authorization to
limit which specific databases are accessible.

David J.





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/How-to-configer-the-pg-hba-record-which-the-database-name-with-n-tp5765847p5765889.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-08-01 Thread David Johnston
Minor request: could someone enlighten me as to why making the directory
location a compile-time option is undesirable.  Packagers then can setup
whatever structure they desire when they compile their distributions.  In
which case the discussion becomes what is a reasonable default and that can
be made with respect of other defaults that are in place for people that
would self-compile.

I get the supporting users - telling them where to go to find these files
aspect but I believe that ship has already sailed.  The goal should be to
make it as easy as possible to allow distributions and/or individual users
to integrate PostgreSQL into their normal routine as possible.  It isn't
like we are adding unneeded complexity since it is obvious from the
discussion that where files/directories are placed in the file system is a
major variable.  Enforcing $PGDATA when we know Debian is going to be upset
doesn't seem to be that great an idea - it isn't like we are going to
suddenly make them realize they have been doing things incorrectly all this
time.  I am not familiar with all of the configurations but I do recall that
the location of postgres.conf and related files is already distribution
specific so why shouldn't these extensions be as well?

Sorry if this was discussed previously; I'll go look deeper in the thread if
someone confirms that indeed it is.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Proposal-for-Allow-postgresql-conf-values-to-be-changed-via-SQL-tp5729917p5765892.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-08-01 Thread David Johnston
Andres Freund-3 wrote
 Even trying to do this completely will guarantee that this patch will
 never, ever, suceed. There simply is no way to reliably detect problems
 that have complex interactions with the rest of the system.
 
 We can improve the detection rate of problems after some real world
 experience. Don't make this unneccesarily complex.

Instead of prevention some thought to recovery should be considered then.

How about some form of persistence mechanism so that, before making these
kinds of changes, the admin can save the current configuration.  Then, in
a worse case-scenario, they could run something like pg_ctl
--restore-persisted-configuration ... to reset everything back the last
known good configuration.

A single-version save-restore routine for the configuration.  When restoring
you would want to keep the current/non-working configuration and
associated logging information - maybe archived somewhere along with the a
copy of the last known working version.  This would provide some level of
audit capability as well as a convenient way for someone to take that
archive and send it off to someone more knowledgeable for assistance. 
Having it auto-run at boot time - possibly to a different archive area than
when run manually - would be possible as well; so you'd have both the last
good boot configuration as well as whatever point-in-time configurations you
wish to save.







--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Proposal-for-Allow-postgresql-conf-values-to-be-changed-via-SQL-tp5729917p5765968.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: File-per-GUC WAS: Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-08-05 Thread David Johnston
Josh Berkus wrote
 (this discussion concerns issue (D), file-per-setting vs. one-big-file)
 
 So the case of two sessions both modifying ALTER SYSTEM SET, and one
 succeeding for some-but-all-GUCS, and the other succeeding for
 some-but-not-all-GUCs, would not be user-friendly or pretty, even if
 each setting change succeeded or failed atomically.

Can the final file write occur only at COMMIT; with anything inside a
transaction simply staged up for later saving (or rollback).  The file write
phase as a whole then needs to be atomic and not just a single GUC-file.

Could the system read the last update timestamp of each GUC-file when the
original statement is executed and then re-read all of them at commit and
fail with some kind of serialization error if the last-update timestamp on
any of the files has changed?

I dislike the idea of any kind of automatic reload.  That said some kind of
have their been any configuration changes since last reload?
query/function makes sense.  In can be plugged into Nagios or similar to
warn if these changes are occurring but made live.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Proposal-for-Allow-postgresql-conf-values-to-be-changed-via-SQL-tp5729917p5766338.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Doc Patch: Subquery section to say that subqueries can't modify data

2013-08-06 Thread David Johnston
Instead of simply expanding the section on sub-queries, which may still be
worthwhile, it seems that we have effectively introduced a new kind of
query - namely one that mixes both query DDL and update DDL into a kind of
hybrid query.  An entire section describing the means to implement these
queries and the limitations thereof would seem advisable as the current
material is spread throughout the documentation.

Some areas to address would:

Select queries that cause/utilize:

function-based modifications
CTE-based modifications
FDW/dblink-based modifications

I guess the main question is if someone were to put this together would it
likely be included in the queries section of the documentation.  Also, are
there any other thoughts to add; and is something like this documented in a
ToDo somewhere already?

The proposed patch; while warranting a technical review (namely that the
presence of functions in a sub-select can cause the sub-query to update the
database), seems to add one more place to go find this information without
adding a central index or summary that someone learning the system could
directly comprehend/learn as opposed to it being some allowed/disallowed
side-effect to something else.

David J.





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Doc-Patch-Subquery-section-to-say-that-subqueries-can-t-modify-data-tp5766574p5766580.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Variadic aggregates vs. project policy

2013-08-30 Thread David Johnston
Tom Lane-2 wrote
 Pavel Stehule lt;

 pavel.stehule@

 gt; writes:
 I was one who sent a bug report - this error is not too dangerous, but it
 is hidden, and difficult to find, if you don't know what can be happen.
 Same as bug with plpgsql and SQL identifier collisions. If you
 understand,
 then you can protect self well and  simply. If not, then it is a magic
 error. So still I am thing so best solution is
 
 a) a warning when detect ORDER BY in variadic aggregates
 
 Such a warning would never be tolerated by users, because it would appear
 even when the query is perfectly correct.
 
 b) disallow ORDER BY in variadic aggregates in classic syntax, and enable
 it only in WITHIN GROUP syntax where is safe ,
 
 And we're *not* inventing randomly different syntax for variadic
 aggregates.  That ship sailed when we did it this way for regular
 functions.

In the example case the problem is that ORDER BY constant is a valid, if
not-very-useful, construct.  Can we warn on this specific usage and thus
mitigate many of the potential avenues of mis-use?

If we alter syntax for mitigation purposes I'd want to consider requiring
parentheses around the columns that belong to the ORDER BY instead of using
the full extended syntax of WITHIN GROUP.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Variadic-aggregates-vs-project-policy-tp5768980p5769106.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Variadic aggregates vs. project policy

2013-08-30 Thread David Johnston
Andres Freund-3 wrote
 On 2013-08-30 06:34:47 -0700, David Johnston wrote:
 Tom Lane-2 wrote
  I was one who sent a bug report - this error is not too dangerous, but
 it
  is hidden, and difficult to find, if you don't know what can be
 happen.
  Same as bug with plpgsql and SQL identifier collisions. If you
  understand,
  then you can protect self well and  simply. If not, then it is a magic
  error. So still I am thing so best solution is
 
  a) a warning when detect ORDER BY in variadic aggregates
 
  Such a warning would never be tolerated by users, because it would
 appear
  even when the query is perfectly correct.
 
  b) disallow ORDER BY in variadic aggregates in classic syntax, and
 enable
  it only in WITHIN GROUP syntax where is safe ,
 
  And we're *not* inventing randomly different syntax for variadic
  aggregates.  That ship sailed when we did it this way for regular
  functions.

 In the example case the problem is that ORDER BY constant is a valid, if
 not-very-useful, construct.  Can we warn on this specific usage and thus
 mitigate many of the potential avenues of mis-use?
 
 That doesn't help against something like »SELECT string_agg(somecol
 ORDER BY bar, separator)« where separator is a column.
 
 If we alter syntax for mitigation purposes I'd want to consider requiring
 parentheses around the columns that belong to the ORDER BY instead of
 using
 the full extended syntax of WITHIN GROUP.
 
 I think that ship has sailed. The syntax is there and it's not going
 away. Requiring different syntaxes for variadic/nonvariadic usages is
 going to be a way much bigger pitfall for users.

Neither suggestion (nor any suggestion I would imagine) is going to solve
the problem.  The goal is to minimize the size of the exposure.

For the second ORDER BY (col1, col2) suggestion it would be added and
recommended so those using that syntax would have less to worry about.  This
would apply to ALL invocations, not just variadic.

David J.






--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Variadic-aggregates-vs-project-policy-tp5768980p5769119.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ENABLE/DISABLE CONSTRAINT NAME

2013-09-02 Thread David Johnston
Jeff Davis-8 wrote
 Is there any semantic difference between marking a constraint as
 DISABLED and simply dropping it? Or does it just make it easier to
 re-add it later?

I cannot answer the question but if there is none then the main concern I'd
have is capturing meta-information about WHY such a constraint has been
disabled instead of dropped.

I guess this whole feature extends from the trigger disable feature that
already exists.  Given we have the one adding this seems symmetrical...

I cannot really see using either feature on a production system (if
following best practices) but I can imagine where they could both be helpful
during development.  Note with this usage pattern the meta-information about
why becomes considerably less important.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/ENABLE-DISABLE-CONSTRAINT-NAME-tp5769136p5769337.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3 RC1 psql encoding reporting inconsistently?

2013-09-02 Thread David Johnston
Tom Lane-2 wrote
 Michael Nolan lt;

 htfoot@

 gt; writes:
 This is 9.3 RC1 on a Fedora 7 system. Why does \l report the encoding
 as SQL_ASCII and \set report it as UTF8?
 
 psql sets client_encoding based on its environment (LANG or related
 variables).  That's been true for some time --- since 9.1, according
 to a quick check.
 
   regards, tom lane

My knowledge of encoding is minimal but to expand on the comment:

Client and server (or, more specifically, database) encodings can and often
do differ just as you are seeing here.

I'm guessing that somewhere deep inside psql and/or postgres encoding
conversion is performed if the client and server do not match.  While I
guess it is possible to try and auto-adapt the client encoding to match the
server/database the current policy is to require the user to explicitly (so
to speak) declare the encoding they are using on their client.

I guess a counter-question would be: what would you expect \set to report
and why?

David J.






--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/9-3-RC1-psql-encoding-reporting-inconsistently-tp5769334p5769339.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question regarding Sync message and unnamed portal

2013-09-10 Thread David Johnston
Tatsuo Ishii-4 wrote
 
 From these statements, I would think #4 will fail in the following
 sequence of commands because #3 closes transaction and it destroys
 unnamed portal: 1)Parse/Bind creates unnamed portal,
 2)Parse/Bind/Execute creates named portal and executes, 3)Send Sync
 message (because it is required in extended protocol), 4)Execute
 unnamed portal created in #1.
 
 If this is true, that means unnamed portal execution and named portal
 execution cannot be mixed unless they are inside an explicit
 transaction. IMO this should be described in the document.

My relatively uninformed conclusions from reading this thread and
documentation excerpts:

Correct.

In the absence of an explicit transaction only a single logical query can be
executed per transaction.  #2 should be disallowed, though, as opposed to
#4 failing since the second PARSE implies a second logical query being
formed which should not occur outside an explicit transaction.

At completion of each series of extended-query messages, the frontend
should issue a Sync message.

I take this phrase to mean that a logical sequence is defined as:

PARSE - (BIND - EXECUTE){1,} - SYNC

If you are within an explicit transaction you are brought back to point
where the next PARSE can be issued and thus a new statement - within the
same transaction - can be issued.  If you are not within an explicit
transaction the SYNC effectively causes a COMMIT; to be issued on the
back-end thus the unnamed statement and unnamed portal are both closed.  If
the resultant portal is named and defined with FOR HOLD the named portal
will continue to exist while the named (or unnamed for that matter)
statement will be closed.

The documentation makes the assumption that you understand the single
statement nature of implicit transactions.  Given the level of knowledge
needed to operate with the frontend protocol this is not necessarily
unreasonable.  I guess my question is whether the backend fails in the
situation of an implicit transaction and two PARSE messages without an
intervening SYNC.  If not then when is it reasonable to issue two PARSE
statements in this manner?

David J.






--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Question-regarding-Sync-message-and-unnamed-portal-tp5726023p5770365.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question regarding Sync message and unnamed portal

2013-09-10 Thread David Johnston
Continuing my novice interpretation from before...


Tatsuo Ishii-4 wrote
 It would be nice if something like unnamed portal will be destroyed
 by a Sync message if you are in an explicit transaction is in our
 manual.

I do not believe this to be true from what I've scanned.  Inside an
implicit transaction it will be but in an explicit transaction the SYNC
does not cause the transaction to close so any named and unnamed portals
should still exist after the SYNC (or FLUSH) sequence is completed.  This
allows for cursors to be maintained over the entire lifetime of the explicit
transaction.

I do not follow where Tom's FLUSH comment is helpful here.  In an explicit
transaction their behavior  should be semantically identical.  In the
implicit transaction case the use of FLUSH should not enable you to change
the statement that is being executed inside the
single-statement-transaction. In the example the error should come at #2 and
you'd never get to #3 where you'd issue SYNC/FLUSH (or rather when you get
to three you are already dead).

SYNC vs. FLUSH in the documentation is confusing as to how they interact
with implicit vs. explicit transactions.  In the explicit transaction since
SYNC issues a FLUSH anyway, the only difference seems to be the presence of
the RFQ message.

How does this (SQL) sequence work - knowing each COMMAND is a sequence of
PARSE-BIND-EXECUTE extended commands that either suceeds or fails:

1) BEGIN;
2) SUCCESSFUL COMMAND;
3) SYNC;
4) ERROR COMMAND;
5) SUCCESSFUL (in theory) COMMAND;
6) SYNC;
7) SUCCESSFUL (in theory) COMMAND;
8) COMMIT;

The second SYNC resets and returns to normal message processing but the
transaction as a whole is in a failure mode.  The interesting items are 5
and 7.


 So if we are not in an explicit transaction we cannot reuse unnamed portal
 because Sync closes the transaction

Correct.  In an implicit transaction all un-held portals are closed upon
SYNC - named or otherwise.

Note that the use of not explicit instead of implicit makes reading this
discussion a little more difficult.

However, because of:


 An unnamed prepared statement lasts only until the next Parse statement
 specifying the unnamed statement as destination is issued.

the unnamed_statement should still exist.  I can see this being useful if
you are inserting 10,000 records via the extended protocol and want to
save every 1,000.  After SYNCing the first 1,000 BIND/EXECUTES you can
continue using BINDing the original unnamed_statement to execute the next
1,000 and so forth.

David J.









--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Question-regarding-Sync-message-and-unnamed-portal-tp5726023p5770370.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pending query cancel defeats SIGQUIT

2013-09-10 Thread David Johnston
Noah Misch-2 wrote
 The errfinish() pertaining to that WARNING issues CHECK_FOR_INTERRUPTS(),
 and
 the query cancel pending since before the SIGQUIT arrived then takes
 effect.
 This is less bad on 9.4, because the postmaster will SIGKILL the backend
 after
 5s.  On older releases, the backend persists indefinitely.

9.4 == head or is this is typo?

Your feelings on how far to back-patch?

David J.



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Pending-query-cancel-defeats-SIGQUIT-tp5770390p5770394.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Weaker shmem interlock w/o postmaster.pid

2013-09-11 Thread David Johnston
Noah Misch-2 wrote
  I'm thinking to preserve postmaster.pid at immediate shutdown in all
 released
  versions, but I'm less sure about back-patching a change to make
  PGSharedMemoryCreate() pickier.  On the one hand, allowing startup to
 proceed
  with backends still active in the same data directory is a corruption
 hazard.
 
 The corruption risk, imv anyway, is sufficient to backpatch the change
 and overrides the concerns around very fast shutdown/restarts.
 
 Making PGSharedMemoryCreate() pickier in all branches will greatly
 diminish
 the marginal value of preserving postmaster.pid, so I'm fine with dropping
 the
 postmaster.pid side of the proposal.

Its probably still worth a fresh look at the immediate shutdown process to
see whether the current location where postmaster.pid is removed is
acceptable.  It may not be necessary to leave it in place always but:

1) if there is a section of shared memory that can only be reached/found if
one knows the pid, and
2) postmaster.pid is removed before that area is secured from future
clobbering

then there may be a risk that can still be mitigated by moving its removal
without having to go to the extreme.  

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Weaker-shmem-interlock-w-o-postmaster-pid-tp5770399p5770559.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Questions about checksum feature in 9.3

2013-09-16 Thread David Johnston
Ants Aasma-2 wrote
 So, has anyone compiled checksum vectorized on OS X? Are there any
 performance data that would indicate whether or not I should worry with
 this in the first place?
 
 Even without vectorization the worst case performance hit is about
 20%. This is for a workload that is fully bottlenecked on swapping
 pages in between shared buffers and OS cache. In real world cases it's
 hard to imagine it having any measurable effect. A single core can
 checksum several gigabytes per second of I/O without vectorization,
 and about 30GB/s with vectorization.

Thoughts on how/where to provide guidance as to this kind of concern.  The
single paragraph in the initdb documentation seems to be lacking.  Would a
destination page on the wiki, linked to from the documentation, where
current knowledge regarding benchmarks and caveats can be stored, be
appropriate.

To that end, Ants, do you actually have some resources and/or benchmarks
which support your claim and that you can provide links to?

The single core aspect is interesting.  Does the implementation have a
dedicated core to perform these calculations or must the same thread that
handles the relevant query perform this work as well?  How much additional
impact/overhead does having to multitask have on the maximum throughput of a
single core in processing checksums?

This whole vectorization angle also doesn't seem to be in the
documentation...though I didn't look super hard.

David J.



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Questions-about-checksum-feature-in-9-3-tp5770936p5771100.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Not In Foreign Key Constraint

2013-09-18 Thread David Johnston
Misa Simic wrote
 I guess that rule can be achieved with triigers on TableA and TableC - but
 the same is true for FK (and FK constraint is more effective then trigger
 -
 that is why I wonder would it be useful/achievable to create that kind of
 constraint)
 
 Thoughts, ideas?

You create a common keys in use table and only insert a record into the
main tables if you can successfully add the desired key to the shared keys
table ( as a unique value ).  Setup a normal FK to that table to help
enforce that valid records must exist on the keys table.  Not fool-proof but
you only need to worry about insertions - delete from the pk table to remove
the record from the main table and free up the key.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Not-In-Foreign-Key-Constraint-tp5771056p5771546.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Not In Foreign Key Constraint

2013-09-19 Thread David Johnston
Misa Simic wrote
 Hi hackers,
 
 I just wonder how hard would be to implement something like Not In FK
 Constraint or opposite to FK...

A more useful couple next sentences would be along the lines of:

I have this problemI've approached it by doingbut it seems that an
actual database enforced constraint would be a better solution.  Is that
something that has been considered?  Are their other ways of attacking this
problem I have not considered?

You took quite a bit of time to try and start a discussion, and I get that
you don't necessarily know where it is going to lead, but Not In FK
constraint, with a descriptive sentence of two, likely would have been
enough to get the ball rolling.  Instead you devoted more space to technical
clarification that would have been better served by espousing on what
problem and how current approaches to dealing with said problem are
limited.

A more specific end-question would also help solicit better responses.

I say all this because 3 days later nothing more substantial than why is
this feature necessary has been put forth.  The general idea likely has
some merit but you've not provided anything for people to hook their teeth
into.

David J.






--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Not-In-Foreign-Key-Constraint-tp5771056p5771651.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Documentation for SET var_name FROM CURRENT

2013-09-30 Thread David Johnston
Amit Kapila-2 wrote
 While reading documentation for SET command, I observed that FROM
 CURRENT syntax and its description is missing from SET command's
 syntax page (http://www.postgresql.org/docs/devel/static/sql-set.html).
 
 Do you think that documentation should be updated for the same or is
 there any reason why it is not documented?

It is documented as part of CREATE FUNCTION since its use is only valid in
that context. The paragraph with the link to CREATE FUNCTION seems
sufficient to notify and direct people to the needed description for this.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Documentation-for-SET-var-name-FROM-CURRENT-tp5772920p5772922.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Documentation for SET var_name FROM CURRENT

2013-10-01 Thread David Johnston
Amit Kapila-2 wrote
 On Tue, Oct 1, 2013 at 10:25 AM, David Johnston lt;

 polobo@

 gt; wrote:
 Amit Kapila-2 wrote
 While reading documentation for SET command, I observed that FROM
 CURRENT syntax and its description is missing from SET command's
 syntax page (http://www.postgresql.org/docs/devel/static/sql-set.html).

 Do you think that documentation should be updated for the same or is
 there any reason why it is not documented?

 It is documented as part of CREATE FUNCTION since its use is only valid
 in
 that context.
 
Not only as part of CREATE FUNCTION, but as part of ALTER
 DATABASE, ALTER ROLE, ALTER FUNCTION syntax as well. In all these
 places other syntax of SET is also used and described.
 I think you are right that syntax 
 SET .. FROM CURRENT
  is mainly used
 in context with few other SQL statements, but as it is part of SET
 command, so isn't it better to mention the syntax on SET page and may
 be explain a bit about its usage?
 
 The paragraph with the link to CREATE FUNCTION seems
 sufficient to notify and direct people to the needed description for
 this.
 
After explaining the usage in short, may be can provide links to
 all other statements where it can be used, but I think syntax 
 SET ..
 FROM CURRENT
  should be there with SET command's other syntax.

FROM CURRENT is only valid as part of the SET sub-command attached to CREATE
FUNCTION.

Yes, a number of SQL commands take a sub-command called SET.  These are not
the same as the Top-level SQL SET command and have their own rules and
syntax defined on the parent command's page.  They share a key word to make
the grammar and usage saner but these are semantically different statements.

A paragraph cross-referencing where SET sub-commands exist has merit but
since the main SET command does not accept FROM CURRENT it (FC) should not
be included in its page directly.

If you want to put forth an actual documentation change and get a concrete
opinion then by all means.  If you want someone else to do it I'm detailing
why that is unlikely to happen.

The link to section 18.1 from the SET command documentation covers most of
the other relevant info about how to go about configuring the system.  The
SQL command reference is generally a much more narrow scope focusing on the
syntax of the specific command listed.  The current documentation for SET
conforms to this reality.

We both know how the settings sub-system works.  I don't see many novice
questions, though my only exposure is the mailing list, about this kind of
thing so the documentation seems effective.  Other than supposed
completeness are there other reasons you feel a change regarding FROM
CURRENT or SETtings in general need modification?

David J.






--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Documentation-for-SET-var-name-FROM-CURRENT-tp5772920p5772958.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Documentation for SET var_name FROM CURRENT

2013-10-01 Thread David Johnston
David Johnston wrote
 A paragraph cross-referencing where SET sub-commands exist has merit but
 since the main SET command does not accept FROM CURRENT it (FC) should not
 be included in its page directly.

It is strange that this actually does work - at least in 9.0 - given that
SET ... FROM CURRENT does not seem to have any usage outside of its
interaction as part of the CREATE FUNCTION command.

Is there some use-case I am not seeing?

Since the command works I would agree that it should be provided in the
syntax section for SET and that a comment be added that says generally
that its presence is an historical artifact and has no real use as part of
the top-level command.  Its intended use is in conjunction with the CREATE
FUNCTION command.  Alternative wordings to describe uses I am not seeing are
good too.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Documentation-for-SET-var-name-FROM-CURRENT-tp5772920p5772977.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread David Johnston
Robert Haas wrote
 Unfortunately, the buildfarm
 isn't entirely happy with this decision.  On buildfarm member anole
 (HP-UX B.11.31), allocation of dynamic shared memory fails with a
 Permission denied error, and on smew (Debian GNU/Linux 6.0), it
 fails with Function not implemented, which according to a forum
 post[1] I found probably indicates that /dev/shm doesn't mount a tmpfs
 on that box.
 
 What shall we do about this?  I see a few options.

Is this something that rightly falls into being a distro/package specific
setting?  If so then the first goal should be to ensure the maximum number
of successful basic installation scenarios - namely someone installing
PostgreSQL and connect to the running postgres database without encountering
an error.  

As a default I would presume the current System V behavior is sufficient to
accomplish this goal.  If package maintainers can then guarantee that
changing the default will improve the user experience they should be
supported and encouraged to do so but if they are at all unsure they should
leave the default in place.  

As long as a new user is able to get a running database on their machine
if/when they run up against the low defaults of System V memory they will at
least be able to focus on that single problem as opposed to having a failed
initial install and being unsure exactly what they may have done wrong.

Thus option # 2 seems sufficient.  I do think that having some kind of
shared-memory-manager utility could have value but I'd rather see that be a
standalone utility as opposed to something magical done inside the bowels of
the database.  While probably harder to code and learn such a utility would
provide for a much greater UX if implemented well.

David J.



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/dynamic-shared-memory-wherein-I-am-punished-for-good-intentions-tp5774055p5774080.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Using indexes for ORDER BY and PARTITION BY clause in windowing functions

2013-10-23 Thread David Johnston
Sameer Kumar wrote
 edb=# explain analyze select max(score) from student_score group by
 course;

This query returns 6 records.  The window one returns 123,000.  Why do you
expect these to have anywhere near the same performance or plan?

You can enable/disable indexes/scans to see what alternatives plans may
provide but nothing here stands out as being obviously incorrect.

I'm not really clear on what your question is.  Generally it sounds as if
you are wondering if there are any plans to I prove the algorithms behind
window function processing.  Are you just looking at symptoms and thus
possibly have unreasonable expectations or do you actually see an avenue for
improvement in the engine?


 QUERY PLAN |   Sort Method: external merge  Disk: 7576kB

Work memory; I/O is killing your performance on this query.  It is more
flexible but you pay a price for that.


 Another thing, (I may be stupid and naive here) does PostgreSQL re-uses
 the
 hash which has been already created for sort. In this case the inner query
 must have created a hash for windoing aggregate. Can't we use that same
 one
 while applying the the filter rn=1 ?

Probably but others more knowledgable will need to answer authoritatively.

David J.





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Using-indexes-for-ORDER-BY-and-PARTITION-BY-clause-in-windowing-functions-tp5775605p5775708.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Additional information on log_line_prefix

2013-10-25 Thread David Johnston
emanuel_calvo wrote
  %E = estimated rows

How would you expect this to work?  This information seems mostly useless
without the context of a full EXPLAIN output.


  %T = temporal tables used

I am guessing you mean temporary, not temporal - the later also being
known as time oriented

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Additional-information-on-log-line-prefix-tp5775956p5775958.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How should row-security affects ON UPDATE RESTRICT / CASCADE ?

2013-10-29 Thread David Johnston
Tom Lane-2 wrote
 Craig Ringer lt;

 craig@

 gt; writes:
 During my testing of Kohei KaiGai's row-security patches I've been
 looking into how foreign keys should be and are handled. There are some
 interesting wrinkles around FK cascades, the rights under which FK
 checks execute, and about the consistency effects of changing or
 applying an RLS policy.
 
 As I recall, I've been saying since day one that row-level security cannot
 sensibly coexist with foreign-key constraints, and I've been told that the
 potential users of such a feature don't care.  I'm glad to see somebody
 else complaining.
 
 As far as the points you're making go, I think we must say that RLS checks
 are not applied during FK trigger queries, ie the FK triggers can always
 see everything even though they don't run as superuser.  

Is there some way to enforce that the PK and FK hosting tables have
compatible RLS definitions?  The examples that come to mind are:

1) both tables have RLS filters on at least one of the FK relationship
columns so in a multi-tenant situation a given user is likely (hard to
enforce perfectly) to be restricted to at least checking only the subset of
rows in the PK belong to their tenant.

2) the PK table has no filter AND the FK table does not have an RLS filter
on any of the columns being used in the FK.  This covers shared lookup
tables.

I see no serious problem with DELETE FK-triggers but the ability to PK probe
by inserting into a FK table does seem to need limitation.  Of course the
normal direct insert RLS checks will help (and maybe totally) to cover #1
above.

The other question is whether such a hidden relationship constitutes a
mis-configuration of RLS.  This goes back to compatibility - is there some
algorithm that can be applied to FK constraints and the associated tables
that can measure compatibility and generate warnings when a constraint or
RLS definition is added or changed on those tables?  An error is probably to
severe; especially at first.

Lacking a use-case for when two incompatible tables need to have a FK-PK
relationship I'm more inclined to force the application of RLS across the
relationship constraint and consider these trigger errors to be symptoms of
a mis-configuration of the RLS policy that need to be fixed by the DBA.  In
the presence of a mis-configured policy the ability to provide security
guarantees is shot and the examples so far all prove that.  Table B should
have the PK record visible for corresponding visible FK records on table A
otherwise there would have been no way to insert the table A initially
which means there was a time when an (invalid) constraint was added that
broke the relationship and at that point an error should have been raised.

Hopefully this all sparks some thoughts from others much more familiar with
RLS than I.

David J.










--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/How-should-row-security-affects-ON-UPDATE-RESTRICT-CASCADE-tp5776229p5776273.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Feature request: Optimizer improvement

2013-11-01 Thread David Johnston
Jim Nasby-2 wrote
 Should that really matter in this case? ISTM we should always handle LIMIT
 before moving on to the SELECT clause…?

SELECT generate_series(1,10) LIMIT 1

David J.



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Feature-request-Optimizer-improvement-tp5776589p5776707.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical column order and physical column order

2013-11-03 Thread David Johnston
David Rowley wrote
 I'm sure in the real world there are many cases where a better choice in
 column ordering would save space and save processing times, but is this
 something that we want to leave up to our users?

Right now there is little visibility, from probably 99% of people, that this
is even something to be concerned with.  I have no qualms with making a
person run a routine to change the physical ordering of their tables - and
if they really care about logical order in the output it is best to list the
column names anyway - so the problem that is worth solving is providing a
way for the system to tell the user, for a given table, what the most
efficient physical order would be - ideally in the form of a CREATE TABLE AS
statement - and let the user manually effect the change.

So invent the algorithm to identify the best physical order and make it
accessible to users for manual use.  If the benefits seem great enough after
its use for a couple of releases a more informed decision can be made as to
whether to try and automate its application.

David J.



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/logical-column-order-and-physical-column-order-tp5776770p5776784.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLE not synonymous with SELECT * FROM?

2013-11-11 Thread David Johnston
Colin 't Hart wrote
 Methinks we should fix the documentation, something like:
 
 The command
 
 TABLE name
 
 is equivalent to
 
 SELECT * FROM name
 
 It can be used as a top-level command or as a space-saving syntax
 variant in parts of complex queries. Only the WITH, ORDER BY, LIMIT,
 and Locking clauses and set operations can be used with TABLE; the
 WHERE and ORDER BY clauses and any form of aggregation cannot be used.

The paragraph is unnecessary if the Synopsis section of the SELECT
documentation is updated to correctly reflect all the valid clauses that can
be attached to TABLE.  The current reading implies that you cannot attach
anything so when you said LIMIT worked I was surprised.

Also, testing seems to confirm that the allowance of LIMIT implies that
OFFSET is allowed as well.

If TABLE is allowed as a top-level command why doesn't it get its own page
in the SQL commands section?  It really doesn't matter - and honestly while
I've known about it I've never actually thought to use it in actual queries
because as soon as you want to do something special you have to switch it
out for SELECT * FROM anyway - but it does seem inconsistent.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/TABLE-not-synonymous-with-SELECT-FROM-tp5777695p533.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump and pg_dumpall in real life

2013-11-11 Thread David Johnston
Andrew Dunstan wrote
 A general ability to rename things would be good. In particular, 
 restoring schema x into schema y or table x into table y would be very 
 useful, especially if you need to be able to compare old with new.

compare old and new what?  I would imagine that schema comparisons would be
much easier if the only thing that is different is the database name and you
compare database old to database new.

Are there any existing threads or posts, that you recollect, that detail
solid use-cases for clone-and-rename mechanics?  I don't seem to recall
anything in the past year or so but my coverage is probably only about 70%
in that timeframe.

SQL seems particularly unfriendly to renaming and runtime name resolution in
general (largely due to caching effects).  Some kind of alias mechanism
makes sense conceptually but the performance hit for such isn't likely to be
worth incurring.

I could see having table name aliases so that raw data in a dump from one
database could be restored into another but I'd likely require that the user
be able to generate the target schema from source themselves.  That would
facilitate the use-case where the DBA/programmer is able to fully recreate
their schema from source and only require that actual data be restored into
the newly created database.  I can see where grants may fall into a grey
middle-area but functions/view/triggers and the like would need to be
synchronized with any schema naming changes and that should, IMO, be driven
from source and not facilitated by a dump/restore process.

David J.






--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/pg-dump-and-pg-dumpall-in-real-life-tp518p5777816.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump and pg_dumpall in real life

2013-11-11 Thread David Johnston
Josh Berkus wrote
 Well, then we just need pg_restore to handle the role already exists
 error message gracefully.  That's all.  Or a CREATE ROLE IF NOT EXISTS
 statement, and use that for roles.

My only qualm here is if the exists check is based off of role name only. 
If database A and database B came from different clusters but both have
a role david the actual identity of david is (could be) different
because the source cluster.

The risk of such occurring is a high-security situation is likely to be
small but some kind of --ignore-different-cluster-same-role flag may be
worthwhile such that pg_restore will error unless that flag is set (i.e.,
high security by default).  The error itself should be rare enough most
people wouldn't even notice it is there but seeing such an error (with a
hint provided as well) would be easily able to disable and continue on with
the restore.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/pg-dump-and-pg-dumpall-in-real-life-tp518p5777823.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transaction-lifespan memory leak with plpgsql DO blocks

2013-11-12 Thread David Johnston
Robert Haas wrote
 That's a sufficiently astonishing result that it wouldn't be
 surprising for this to get reported as a bug where a simple
 performance gap wouldn't be, and I think if we don't fix it the
 perception will be that we've left that bug unfixed.  Now, there are
 lots of things we don't fix just because there is not an infinitely
 large army of trained PostgreSQL hackers who love to fix other
 people's bugs for free, so I'm not going to say we HAVE to fix this or
 whatever - but neither do I think fixing it is useless and worthless.

Having had this same thought WRT the FOR UPDATE in LOOP bug posting the
lack of a listing of outstanding bugs does leave some gaps.  I would imagine
people would appreciate something like:

Frequency: Rare
Severity: Low
Fix Complexity: Moderate
Work Around: Easy - create an actual function; create some form of loop
Status: Confirmed - Awaiting Volunteers to Fix

Even without a formal system it may not hurt for bug threads to have a
posting with this kind of information summarizing the thread.  As Tom is apt
to do - for the sake of the archives - though mostly I see those once
something has been fixed and not for items that are being left open.

Ideally these could also be migrated to the wiki, with links back to the
main thread, to provide a basic known open items interface - something
that I imagine would make corporate acceptance of PostgreSQL more likely.

I don't see where there are a considerably large number of these unresolved
items - most things do indeed get fixed or explained away as normal user
learning.

Sorry for the digression but it seems relevant.

David J.







--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Transaction-lifespan-memory-leak-with-plpgsql-DO-blocks-tp5777942p5778001.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] First patch -- somewhat trivial feature

2013-11-13 Thread David Johnston
Stephen Frost wrote
 * Robert Berry (

 berrydigital@

 ) wrote:
 This is my first attempt at writing a patch, so it's pretty simple.
 
 Neat!
 
  seq_page: 1.00, rnd_page: 4.00, cpu_tup: 0.01,
 cpu_ind: 0.005000, cpu_op: 0.002500
  amenabled: 111
 
 The bit vector is enable variables in the order listed in cost.h,
 though mainly provides a high level view on whether or not any strategies
 are disabled.
 
 While I like the general idea, I have to admit that I don't particularly
 like the format and I'm not sure why it makes sense to have this as part
 of 'explain'?  Why not do a 'show all;' ahead of the explain?

I kinda get the theory behind this but, WRT formatting, explain can output
multiple formats and any patch affecting said output should provide for
changing all of them.  Having each of the sample outputs in the post would
allow for comments from those who would not generally apply such patches.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/First-patch-somewhat-trivial-feature-tp5778245p5778250.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-14 Thread David Johnston
Hannu Krosing-5 wrote
 On 11/14/2013 08:17 PM, Merlin Moncure wrote:
 On Thu, Nov 14, 2013 at 11:34 AM, David E. Wheeler
 lt;

 david@

 gt; wrote:
 On Nov 14, 2013, at 7:07 AM, Merlin Moncure lt;

 mmoncure@

 gt; wrote:

 This is exactly what needs to be done, full stop (how about: hstore).
 It really comes down to this: changing the serialization behaviors
 that have been in production for 2 releases (three if you count the
 extension) is bad enough, but making impossible some legal json
 constructions which are currently possible is an unacceptable
 compatibility break.  

The current json format is a minimally conforming (i.e., does not enforce
the should not contain duplicates suggestion) structured json validating
type that stores its input as-is once validated.  Its presence is going to
probably cause difficulties with function API for reasons already mentioned
but its place in core type-library is already firmly established.  Andrew's
API additions seem like good things to have for this type.  I haven't seen
any comments on this but do these functions facilitate creating json that
can have duplicates and that maintain order?  Even if we accept input to
json with these limitations we are not obligated to make our own json output
minimally conforming - though we should at maintain such if it is already in
place.


 So my suggestion is to upgrade existing json data type to
 text - or maybe json_text with format check - when upgrading
 to 9.4, to change current function which output json  to
 output text and have new json type which stores proper
 JavaScript Object - like structured data.

Technically a down-grade but anyway...

How does this work with a pg_dump/pg_restore upgrade?


If we want to have maximally conforming json type(s) we can still create
them.  I'd say we'd still want two versions, similar in a way to how we have
bytea and text even though any text can technically be stored like
bytea.  The constructor API for both would want to be identical with the
only real difference being that text-json_source would be layout preserving
(i.e., validation only) while text-json_binary would be a true parsing
conversion.  Likewise json_source-text would output the same input while
json_binary-text would output the canonical form (pretty-printing and such
would need to be initiated via functions).

If things are going to be a little more complex anyway why not just go and
toss in the kitchen sink too?  This way we provide maximal flexibility. 
From a development perspective some features (indexes, equality, in-place
updates and related modification API) may only make sense on a subset of the
available types but trade-offs are a fact of life.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/additional-json-functionality-tp5777975p5778406.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-15 Thread David Johnston
Merlin Moncure-2 wrote
 I don't want to have two types, but I think I'd probably rather have two
 clean types than this. I can't imagine it being remotely acceptable to
 have
 behaviour depend in whether or not something was ever stored, which is
 what
 this looks like.
 
 Well, maybe so.  My main gripe with the 'two types' solutions is that:
 1) current type is already in core (that is, not an extension). In
 hindsight, I think this was a huge mistake.
 2) current type has grabbed the 'json' type name and the 'json_xxx' API.
 3) current type is getting used all over the place
 
 'Two types' means that (AIUI) you can't mess around with the existing
 API too much. And the new type (due out in 2016?) will be something of
 a second citizen.  The ramifications of dealing with the bifurcation
 is what makes *my* head hurt.  Every day the json stuff is getting
 more and more widely adopted.  9.4 isn't going to drop until 2014 best
 case and it won't be widely deployed in the enterprise until 2015 and
 beyond.  So you're going to have a huge code base operating on the
 'legacy' json type.
 
 merlin

The current type can store the exact same data as what a hash-like type
could store.  It can also store stuff a hash-like type would not be able to
store.  From my reading the main reason for adding the new hash-like type
would be to increase the performance characteristics of using said type. So:

1) if reasonable performance can be had with the current type the new type
would be unnecessary
2) if #1 is not possible then the new type trades of leniency in format for
performance improvements

One implication of #2 is that existing json that wants the improved
performance will need to undergo a full-table rewrite in order to be
converted.

Both output textual representations are identical and function overloading
and API should be able to maintained substantially identical between the two
types.

David J



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/additional-json-functionality-tp5777975p5778628.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-15 Thread David Johnston
Josh Berkus wrote
 On 11/15/2013 02:59 PM, Merlin Moncure wrote:
  On Fri, Nov 15, 2013 at 4:31 PM, Hannu Krosing lt;

 hannu@

 gt; wrote:
 I think you may be on to something here.  This might also be a way
 opt-in to fast(er) serialization (upthread it was noted this is
 unimportant; I'm skeptical).  I deeply feel that two types is not the
 right path but I'm pretty sure that this can be finessed.
 
 As far as I understand merlin is mostly ok with stored json being
 normalised and the problem is just with constructing extended
 json (a.k.a. processing instructions) to be used as source for
 specialised parsers and renderers.
 
 Thing is, I'm not particularly concerned about *Merlin's* specific use
 case, which there are ways around. What I am concerned about is that we
 may have users who have years of data stored in JSON text fields which
 won't survive an upgrade to binary JSON, because we will stop allowing
 certain things (ordering, duplicate keys) which are currently allowed in
 those columns.  At the very least, if we're going to have that kind of
 backwards compatibilty break we'll want to call the new version 10.0.
 
 That's why naming old JSON as json_text won't work; it'll be a
 hardened roadblock to upgrading.

Agreed.  I can't imagine a use-case that would warrant breaking the current
behavior of json.  Either we live with just one, text-oriented, json type
and finesse whatever performance gains we can without breaking
compatibility; or we introduce additional types (I personally like adding 2
instead of one but just adding the binary one would be ok) which - barring
an overwhelming desire by -core to group-self-flagellate - means giving the
new type an as yet unused name.

From a marketing perspective having 3 types with the following properties is
an easy message to sell:

1) json - liberal interpretation w/ validation only; stored as text; output
as-is
2) json_text - strict interpretation w/ validation only; stored as text;
output as-is
3) json_binary - strict interpretation w/ validation  parsing; stored as
binary; output normalized

This way json seems less like a mistake but rather an intentional desire
to introduce a liberal type that meets data exchange needs in the short term
and now, later, a structured data storage mechanism similar to hstore.

Even if you have json_binary I can imaging that some people would want to be
able to store the original strict json as-is.  Sure, they can use text, but
this way intent is made clear and validation is attached directly to the
type as opposed to having to be done separately.  The use-cases described
for needing a liberal json prove this out.  That said json would be an
acceptable replacement for json_text in many cases and separate validation
for strict json prior to storing into json isn't that heinous.

David J.





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/additional-json-functionality-tp5777975p5778655.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-15 Thread David Johnston
Looking at this a different way: could we just implement BSON and leave json
alone?

http://bsonspec.org/

David J.





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/additional-json-functionality-tp5777975p5778656.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-17 Thread David Johnston
David E. Wheeler-3 wrote
 I like JSONB because:
 
 1. The B means binary
 2. The B means second
 3. It's short
 4. See also BYTEA.

json_strict :

Not sure about the bytea reference off-hand...

I was pondering jsons which meets the short property just fine and the
trailing s would stand for strict which is the user-visible semantic
that this type exhibits rather than some less-visible binary attribute
which most users would not really care about.  I dislike the implication of
plural-ness that the s imparts, though.

Implication of second doesn't seem that important since both types provide
useful semantics.

I can imagine where the short aspect will lead people to accidentally type
json where they mean to use jsonb and having a just a single extra
character will increase the likelihood they will not notice.  Knowing about
and having used json_strict previously it will be more probable that such
users will noticeably feel something is missing if they drop the whole
_strict suffix.

So, I'll toss out json_strict for my bikeshed contribution.

David J.
 



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/additional-json-functionality-tp5777975p5778770.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Suggestion: Issue warning when calling SET TRANSACTION outside transaction block

2013-11-18 Thread David Johnston
Bruce Momjian wrote
 Considering we are doing this outside of a transaction, and WARNING or
 ERROR is pretty much the same, from a behavioral perspective.
 
 Should we change this and LOCK to be a warning?

From the calling application's perspective an error and a warning are
definitely behaviorally different.

For this I'd vote for a warning (haven't pondered the LOCK scenario) as
using SET out of context means the user has a fairly serious
mis-understanding of the code path they have written (accedentially or
otherwise).  Notice makes sense (speaking generally and without much
research here) for stuff where the ultimate outcome matches the statement
but the statement itself didn't actually do anything.  Auto-sequence and
index generation fell into this but even notice was too noisy.  In this case
we'd expect that the no-op statement was issued in error and thus should be
changed making a warning the level of incorrect-ness to communicate.  A
notice would be more appropriate if there were valid use-cases for the user
doing this and we just want to make sure they are conscious of the
unusualness of the situation.

I dislike error for backward compatibility reasons.  And saving the user
from this kind of mistake doesn't warrant breaking what could be properly
functioning code.  Just because PostgreSQL isn't in a transaction does not
mean the client is expecting the current code to work correctly - even if by
accident - as part of a sequence of queries.

David J.





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Suggestion-Issue-warning-when-calling-SET-TRANSACTION-outside-transaction-block-tp5743139p5778994.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Suggestion: Issue warning when calling SET TRANSACTION outside transaction block

2013-11-18 Thread David Johnston
Bruce Momjian wrote
 On Mon, Nov 18, 2013 at 05:05:45PM -0800, David Johnston wrote:
 Bruce Momjian wrote
  Considering we are doing this outside of a transaction, and WARNING or
  ERROR is pretty much the same, from a behavioral perspective.
  
  Should we change this and LOCK to be a warning?
 
 From the calling application's perspective an error and a warning are
 definitely behaviorally different.
 
 For this I'd vote for a warning (haven't pondered the LOCK scenario) as
 using SET out of context means the user has a fairly serious
 mis-understanding of the code path they have written (accedentially or
 otherwise).  Notice makes sense (speaking generally and without much
 research here) for stuff where the ultimate outcome matches the statement
 but the statement itself didn't actually do anything.  Auto-sequence and
 index generation fell into this but even notice was too noisy.  In this
 case
 we'd expect that the no-op statement was issued in error and thus should
 be
 changed making a warning the level of incorrect-ness to communicate.  A
 notice would be more appropriate if there were valid use-cases for the
 user
 doing this and we just want to make sure they are conscious of the
 unusualness of the situation.
 
 I dislike error for backward compatibility reasons.  And saving the user
 from this kind of mistake doesn't warrant breaking what could be properly
 functioning code.  Just because PostgreSQL isn't in a transaction does
 not
 mean the client is expecting the current code to work correctly - even if
 by
 accident - as part of a sequence of queries.
 
 Well, ERROR is what LOCK returns, so if we change SET TRANSACTION to be
 WARNING, we should change LOCK too, so on backward-compatibility
 grounds, ERROR makes more sense.
 
 Personally, I am fine with changing them all to WARNING.

Error makes more sense if the goal is internal consistency.  That goal
should be subservient to backward compatibility.  Changing LOCK to warning
is less problematic since the likelihood of current code functioning in such
a way that after upgrade it would begin working differently in the absence
of an error does not seem probable.  Basically someone would have be
trapping on the error and conditionally branching their logic. 

That said, if this was a day 0 decision I'd likely raise an error. 
Weakening LOCK doesn't make sense since it is day 0 behavior.  Document the
warning for SET as being weaker than ideal because of backward compatibility
and call it a day (i.e. leave LOCK at error).  The documentation, not the
code, then enforces the feeling that such usage is considered wrong without
possibly breaking wrong but working code.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Suggestion-Issue-warning-when-calling-SET-TRANSACTION-outside-transaction-block-tp5743139p5779006.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Suggestion: Issue warning when calling SET TRANSACTION outside transaction block

2013-11-18 Thread David Johnston
Bruce Momjian wrote
 On Mon, Nov 18, 2013 at 06:30:32PM -0800, David Johnston wrote:
  Personally, I am fine with changing them all to WARNING.
 
 Error makes more sense if the goal is internal consistency.  That goal
 should be subservient to backward compatibility.  Changing LOCK to
 warning
 is less problematic since the likelihood of current code functioning in
 such
 a way that after upgrade it would begin working differently in the
 absence
 of an error does not seem probable.  Basically someone would have be
 trapping on the error and conditionally branching their logic. 
 
 That said, if this was a day 0 decision I'd likely raise an error. 
 Weakening LOCK doesn't make sense since it is day 0 behavior.  Document
 the
 warning for SET as being weaker than ideal because of backward
 compatibility
 and call it a day (i.e. leave LOCK at error).  The documentation, not the
 code, then enforces the feeling that such usage is considered wrong
 without
 possibly breaking wrong but working code.
 
 We normally don't approach warts with documentation --- we usually just
 fix them and document them in the release notes.  If we did, our docs
 would be a whole lot uglier.

That is a fair point - though it may be that this instance needs to be one
of those usually exceptions.

For any sane use-case turning this into an error shouldn't cause any grief;
and those cases where there is grief should be evaluated and changed anyway.

I could honestly live with either change to SET TRANSACTION but regardless
would leave LOCK as-is.  The backward compatibility concern, while valid,
does indeed seem weak and worth breaking in order to maintain a consistent
ABI going forward.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Suggestion-Issue-warning-when-calling-SET-TRANSACTION-outside-transaction-block-tp5743139p5779028.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Suggestion: Issue warning when calling SET TRANSACTION outside transaction block

2013-11-19 Thread David Johnston
Robert Haas wrote
 On Mon, Nov 18, 2013 at 9:07 PM, Bruce Momjian lt;

 bruce@

 gt; wrote:
 Well, ERROR is what LOCK returns, so if we change SET TRANSACTION to be
 WARNING, we should change LOCK too, so on backward-compatibility
 grounds, ERROR makes more sense.

 Personally, I am fine with changing them all to WARNING.
 
 I don't think it's worth breaking backward compatibility.  I'm not
 entirely sure what I would have decided here in a vacuum, but at this
 point existing precedent seems determinative.

Well, at this point we have already broken backward compatibility by
releasing this.  With Tom's thread necromancy I missed the fact this got
released in 9.3

Now, given normal upgrade realities the people likely to have this bite them
probably are a ways out from upgrading so I wouldn't expect to have seen
many complaints yet - but at the same time I do not recall seeing any
complaints yet (limited to -bugs and -general)

The referenced patch:

is released
is documented
is consistent with precedent established by similar codepaths
causes an obvious error in what is considered broken code
can be trivially corrected by a user willing and able to update their
application

I'd say leave this as-is and only re-evaluate the decision if complaints are
brought forth.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Suggestion-Issue-warning-when-calling-SET-TRANSACTION-outside-transaction-block-tp5743139p5779170.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Suggestion: Issue warning when calling SET TRANSACTION outside transaction block

2013-11-19 Thread David Johnston
Tom Lane-2 wrote
 David Johnston lt;

 polobo@

 gt; writes:
 Robert Haas wrote
 I don't think it's worth breaking backward compatibility.  I'm not
 entirely sure what I would have decided here in a vacuum, but at this
 point existing precedent seems determinative.
 
 Well, at this point we have already broken backward compatibility by
 releasing this.  With Tom's thread necromancy I missed the fact this got
 released in 9.3
 
 Uh, what?  The commit I'm objecting to is certainly not in 9.3.
 It's this one:
 
 Author: Bruce Momjian lt;

 bruce@

 gt;
 Branch: master [a54141aeb] 2013-10-04 13:50:28 -0400
 
 Issue error on SET outside transaction block in some cases
 
 Issue error for SET LOCAL/CONSTRAINTS/TRANSACTION outside a
 transaction
 block, as they have no effect.
 
 Per suggestion from Morten Hustveit
 
 I agree that it's too late to reconsider the behavior of pre-existing
 cases such as LOCK TABLE, but that doesn't mean I can't complain about
 this one.

My bad, I was relaying an assertion without checking it myself.  I believe
my source meant 9.4/head and simply mis-typed 9.3 which I then copied.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Suggestion-Issue-warning-when-calling-SET-TRANSACTION-outside-transaction-block-tp5743139p5779205.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-19 Thread David Johnston
Andrew Dunstan wrote
 Given that, I'm not sure we shouldn't permit them in b) either. I think 
 I lost that argument back in the 9.2 dev cycle. I really don't want to 
 get to a situation where foo::json::jsonb can produce an error.

So what do you propose happens when the input json has duplicate keys?  

IMO A reasonable default cast function should error if the json contents
require anything more than a straight parse to be stored into jsonb.  If the
user still needs to make the conversion we should have a standard and
configurable parser function with json input and jsonb output.  In this case
the key-keep options would be keep first encountered or keep last
encountered or fail on duplicate the last of which would be the default.

I have not really pondered storing scalars into jsonb but before pondering
usability are there any technical concerns.  If the goal is to share the
backend with hstore then current hstore does not allow for this and so the
json aspect would either transfer back over or it would need customized
code.

David J.





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/additional-json-functionality-tp5777975p5779221.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-19 Thread David Johnston
Gavin Flower-2 wrote
 More seriously, there are obviously variants in what people consider 
 useful human readable form of JSON output, but it is probably 
 inefficient to store white space.  

Enough to matter?  Maybe the extra whitespace causes a marginal value to be
toasted but, IIUC, for a value that is going to be toasted anyway the
compression factors for both speed and space is going to make whitespace
considerations insignificant.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/additional-json-functionality-tp5777975p5779227.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] additional json functionality

2013-11-20 Thread David Johnston
Hannu Krosing-3 wrote
 On 11/18/2013 06:49 PM, Josh Berkus wrote:
 On 11/18/2013 06:13 AM, Peter Eisentraut wrote:
 On 11/15/13, 6:15 PM, Josh Berkus wrote:
 Thing is, I'm not particularly concerned about *Merlin's* specific use
 case, which there are ways around. What I am concerned about is that we
 may have users who have years of data stored in JSON text fields which
 won't survive an upgrade to binary JSON, because we will stop allowing
 certain things (ordering, duplicate keys) which are currently allowed
 in
 those columns.  At the very least, if we're going to have that kind of
 backwards compatibilty break we'll want to call the new version 10.0.
 We could do something like SQL/XML and specify the level of validity
 in a typmod, e.g., json(loose), json(strict), etc.
 Doesn't work; with XML, the underlying storage format didn't change.
 With JSONB, it will ... so changing the typemod would require a total
 rewrite of the table.  That's a POLS violation if I ever saw one
 We do rewrites on typmod changes already.
 
 To me having json(string) and json(hstore) does not seem too bad.

Three things:

1) How would this work in the face of functions that erase typemod
information?
2) json [no type mod] would have to effectively default to json(string)?
3) how would #1 and #2 interact?

I pondered the general idea but my (admittedly limited) gut feeling is that
using typemod would possibly be technically untenable and from an end-user
perspective would be even more confusing than having two types.

David J.






--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/additional-json-functionality-tp5777975p5779428.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITH ORDINALITY versus column definition lists

2013-11-20 Thread David Johnston
Tom Lane-2 wrote
 It seems to me that we don't really want this behavior of the coldeflist
 not including the ordinality column.  It's operating as designed, maybe,
 but it's unexpected and confusing.  We could either
 
 1. Reinsert HEAD's prohibition against directly combining WITH ORDINALITY
 with a coldeflist (with a better error message and a HINT suggesting that
 you can get what you want via the TABLE syntax).
 
 2. Change the parser so that the coldeflist is considered to include the
 ordinality column, for consistency with the bare-alias case.  We'd
 therefore insist that the last coldeflist item be declared as int8, and
 then probably have to strip it out internally.

#2 but I am hoping to be able to make the definition of the column optional. 
One possibility is that if you do want to provide an alias you have to make
it clear that the coldeflist item in question is only valid for a with
ordinality column alias.  Otherwise the entire coldeflist is used to alias
the record-type output and the ordinality column is provided its default
name.

Two options I came up with:

1) disallow any type specifier on the last item:  t(f1 int, f2 text, o1)
2) add a new pseudo-type, ord:  t(f1 int, f2 text, o1 ord)

I really like option #2.  It makes it perfectly clear, entirely within the
coldeflist SQL, that the last column is different and in this case optional
both in the sense of providing an alias and also the user can drop the whole
ordinality aspect of the call as well.  The system does not need to be told,
by the user, the actual type of the ordinality column.  And given that I
would supposed most people would think to use int or bigint before using
int8 the usability there is improved once they need and then learn that to
alias the ordinality column they use the ord type which would internally
resolve to the necessary output type.

Option one is somewhat simpler but the slight added verbosity makes reading
the SQL coldeflist easier, IMO, since you are already scanning name-type
pairs and recognizing the missing type is, for me, harder than reading off
ord and recalling its meaning.

David J.





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/WITH-ORDINALITY-versus-column-definition-lists-tp5779443p5779449.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITH ORDINALITY versus column definition lists

2013-11-20 Thread David Johnston
Tom Lane-2 wrote
 David Johnston lt;

 polobo@

 gt; writes:
 Tom Lane-2 wrote
 It seems to me that we don't really want this behavior of the coldeflist
 not including the ordinality column.  It's operating as designed, maybe,
 but it's unexpected and confusing.  We could either
 
 1. Reinsert HEAD's prohibition against directly combining WITH
 ORDINALITY
 with a coldeflist (with a better error message and a HINT suggesting
 that
 you can get what you want via the TABLE syntax).
 
 2. Change the parser so that the coldeflist is considered to include the
 ordinality column, for consistency with the bare-alias case.  We'd
 therefore insist that the last coldeflist item be declared as int8, and
 then probably have to strip it out internally.
 
 Two options I came up with:
 
 1) disallow any type specifier on the last item:  t(f1 int, f2 text, o1)
 2) add a new pseudo-type, ord:  t(f1 int, f2 text, o1 ord)
 
 I really like option #2.
 
 I don't.  Pseudo-types have a whole lot of baggage.  #1 is a mess too.
 And in either case, making coldef list items optional increases the number
 of ways to make a mistake, if you accidentally omit some other column for
 instance.

I'll have to trust on the baggage/mess conclusion but if you can distinctly
and un-ambigiously identify the coldeflist item that is to be used for
ordinality column aliasing then the mistakes related to the
function-record-coldeflist are the same as now.  There may be more (be still
quite few I would think) ways for the user to make a mistake but the syntax
ones are handled anyway and so if the others can be handled reasonably well
the UI for the feature becomes more friendly.

IOW, instead of adding int8 and ignoring it we poll the last item,
conditionally discard it (like the int8 case), then handle the possibly
modified structure as planned.


 Basically the problem here is that it's not immediately obvious whether
 the coldef list ought to include the ordinality column or not.  The user
 would probably guess not (since the system knows what type ordinality
 should be).  

Yes, if the column is not made optional somehow then I dislike option #2


 The TABLE syntax is really a vastly better solution for this.  So I'm
 thinking my #1 is the best answer, assuming we can come up with a good
 error message.  My first attempt would be
 
 ERROR: WITH ORDINALITY cannot be used with a column definition list
 HINT: Put the function's column definition list inside TABLE() syntax.
 
 Better ideas?

Works for me if #1 is implemented.



Just to clarify we are still allowing simple aliasing:

select * from generate_series(1,2) with ordinality as t(f1,f2); 

Its only when the output of the function is record does the restriction of
placing the record-returning function call into TABLE (if you want ordinals)
come into play.


select * from table(array_to_set(array['one', 'two']) as (f1 int,f2 text))
with ordinality as t(a1,a2,a3); 

If we could do away with having to re-specify the record-aliases in the
outer layer (a1, a2) then I'd be more understanding but I'm thinking that is
not possible unless you force a single-column alias definition attached to
WITH ORDINALITY to mean alias the ordinality column only.


On the plus side: anyone using record-returning functions is already dealing
with considerable verbosity so this extra bit doesn't seem to be adding that
much overhead; and since the alias - t(a1,a2,a3) - is optional if you don't
care about aliasing the with ordinal column the default case is not that
verbose (just add the surrounding TABLE).

I feel like I need a flow-chart for #1...

With #2 (w/ optional) you can add in an alias for the ordinality column
anyplace you would be specifying a coldeflist OR alias list.  Favoring the
pseudo-type solution is the fact that given the prior sentence if you place
o1 ord in the wrong place it is possible to generate an error like with
ordinality not present for aliasing.

#1 is simpler to implement and does not preclude #2 in the future.

Possible #3?

Not sure if this is possible at this point but really the alias for the
ordinality column would be attached directly to the ordinality keyword.

e.g., ...) with ordinality{alias} as t(a1, a2)

David J.








--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/WITH-ORDINALITY-versus-column-definition-lists-tp5779443p5779468.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITH ORDINALITY versus column definition lists

2013-11-20 Thread David Johnston
Tom Lane-2 wrote
 David Johnston lt;

 polobo@

 gt; writes:
 Just to clarify we are still allowing simple aliasing:
 
 select * from generate_series(1,2) with ordinality as t(f1,f2); 
 
 Right, that works (and is required by spec, I believe).  It's what to
 do with our column-definition-list extension that's at issue.
 
 Not sure if this is possible at this point but really the alias for the
 ordinality column would be attached directly to the ordinality keyword.
 
 e.g., ...) with ordinality{alias} as t(a1, a2)
 
 This has no support in the standard.

Now I'm just spinning some thoughts:

) with ordinality AS t(a1 text, a2 text | ord1)  -- type-less, but a
different separator

) with ordinality AS t(a1 text, a2 text)(ord1) -- stick it in its own
section, type-less

) with ordinality AS t(a1 text, a2 text) ordinal(ord1) --name the section
too

would probably want to extend the alias syntax to match...

Is there any precedent in other RDBMS to consider?

I don't see any obvious alternatives to the ones you listed and syntax is
really not a huge barrier.  If the implementation of an optionally specified
alias is a barrier then either someone needs to feel strongly enough to
implement it or just default to #1 for the time being.

But others really haven't had a chance to read and respond yet so I'm gonna
get off this train for a while.


David J.





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/WITH-ORDINALITY-versus-column-definition-lists-tp5779443p5779473.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UNNEST with multiple args, and TABLE with multiple funcs

2013-11-20 Thread David Johnston
Robert Haas wrote
 select * from table(array(select generate_series(10,20,5)),
 array['fred','jim']);

Can we have our arrays and eat our functions too? (and is someone willing to
bake such a complicated cake...)

select * from table ( ARRAY | FUNCTION/SET [, ARRAY | FUNCTION/SET ]* )

The standard-compliant case is handled as required - and those who want to
write compliant code can use the array(select function) trick - while others
can avoid straining their eyes and fingers.

Since we would have to invent implicit unnesting anyway to conform, and the
function version is working currently, the suggested behavior would seem to
be the ideal target.


David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/UNNEST-with-multiple-args-and-TABLE-with-multiple-funcs-tp5767280p5779512.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UNNEST with multiple args, and TABLE with multiple funcs

2013-11-20 Thread David Johnston
Tom Lane-2 wrote
 Andrew Gierth lt;

 andrew@.org

 gt; writes:
 Tom == Tom Lane lt;

 tgl@.pa

 gt; writes:
  Tom and this would result in producing the array elements as a table
  Tom column.  There is nothing in there about a function returning
  Tom set.
 
 In the spec, there is no such thing as a function returning a set of
 rows in the sense that we use.
 
 Right, but they do have a concept of arrays that's similar to ours,
 and AFAICS the spec demands different behavior for an array-returning
 function than what we've got here.
 
 We could conceivably say that we'll implicitly UNNEST() if the function
 returns array, and not otherwise --- but that seems pretty inconsistent
 and surprise-making to me.  I'm not too sure what to do if a function
 returns setof array, either.

If a function returns a scalar array (RETURNS text[]) we would unnest the
array per-spec.  If it returns a set (RETURN setof anything {including a
single array}) we would not unnest it since set returning functions are
non-spec - instead we'd use our SRF processing routine.  If the function
returns a scalar non-array the implicit single-row returned by the function
would be output.

How would the spec interpret:

CREATE FUNCTION f(IN text, OUT text[]) RETURNS record AS $$ ...

TABLE( f('id_123') )

If that is illegal because the result is not just a single array value then
we would not unnest the component array and would also output the implicit
single-row.

My $0.02, quickly gathered

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/UNNEST-with-multiple-args-and-TABLE-with-multiple-funcs-tp5767280p5779515.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UNNEST with multiple args, and TABLE with multiple funcs

2013-11-20 Thread David Johnston
Tom Lane-2 wrote
 We could conceivably say that we'll implicitly UNNEST() if the function
 returns array, and not otherwise --- but that seems pretty inconsistent
 and surprise-making to me. 

The use-cases for putting a scalar array returning function call into a
TABLE construct, and NOT wanting the array to be un-nested, are likely few
and far between.

Neither the inconsistency nor surprise-making are serious deal-breakers for
me.

And if we do go with the screw the standard approach then we should just
state right now that we will never adhere to standard on inconsistency
grounds and not even encourage others to make it work.  If TABLE(
array_scalar_func() ) ends up only returning a single row then nothing can
be done to make it unnest the array and conform with the syntax without
breaking backward compatibility.

I'd rather change TABLE to FUNCTION and leave the implementation of
TABLE open for future standards-compliance - which maybe you do as well and
just haven't carried that sentiment to your more recent responses

David J.







--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/UNNEST-with-multiple-args-and-TABLE-with-multiple-funcs-tp5767280p5779518.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why is UPDATE with column-list syntax not implemented

2013-11-21 Thread David Johnston
AK wrote
 9.3 documentation says:
 
 According to the standard, the column-list syntax should allow a list of
 columns to be assigned from a single row-valued expression, such as a
 sub-select:
 
 UPDATE accounts SET (contact_last_name, contact_first_name) =
 (SELECT last_name, first_name FROM salesmen
  WHERE salesmen.id = accounts.sales_id);
 This is not currently implemented — the source must be a list of
 independent expressions.
 
 Why is this not implemented? Is it considered inconvenient to use, or
 difficult to implement. or not important enough, or some other reason?

I cannot answer why but I too would like to see this.  I actually asked this
a long while back but cannot seem to find my posting or recall the response.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Why-is-UPDATE-with-column-list-syntax-not-implemented-tp5779600p5779601.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why semicolon after begin is not allowed in postgresql?

2013-11-25 Thread David Johnston
AK wrote
 Kevin,
 
 I do see your logic now, but this thing is a common mistake - it means
 that this seems counter-intuitive to some people. What would happen if we
 applied Occam's razor and just removed this rule?
 
 All existing code would continue to work as is, and we would have one less
 rule to memorize. That would make PostgreSql a slightly better product,
 right?

I'm somewhat on the fence for this but am leaning toward maintaining
status-quo.  Mostly because of the analogy with IF ... END IF; versus the
SQL BEGIN; command which is a entirely separate construct.  

I would maybe change the documentation so that instead of simply dictating a
rule we explain why the syntax is the way it is - like this thread is doing. 
If they consciously omit the semi-colon hopefully they also understand that
what they are beginning is a code-block in plpgsql as opposed to an SQL
transaction.

That said, technical purity isn't always a good answer.  I'd be inclined to
let someone passionate enough about the idea implement it an critique
instead of dis-allowing it outright; but in the end that is likely to result
in the same end.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/why-semicolon-after-begin-is-not-allowed-in-postgresql-tp5779905p5780222.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why semicolon after begin is not allowed in postgresql?

2013-11-25 Thread David Johnston
Mark Kirkwood-2 wrote
 Postgres supports many procedural languages (e.g plperl, plpython) and all
 these have different 
 grammar rules from SQL - and from each other. We can't (and shouldn't) 
 try altering them to be similar to SQL - it would defeat the purpose of 
 providing a procedural environment where the given language works as 
 advertised.
 
 So in the case of plpgsql - it needs to follow the Ada grammar, 
 otherwise it would be useless.

I do not follow the useless conclusion - what, present day, does Ada got
to do with it?  And the request is to alter only plpgsql, not all the other
languages.  To the casual end-user plpgsql is an internal language under
our full control and installed by default in all new releases.  Is it really
unreasonable to expect us to design in some level of coordination between it
and SQL?

Cross-compatibility is a valid reason though I'm guessing with all the
inherent differences between our standard PL and other database's PLs that
making this change would not be a materially noticeable additional
incompatibility.

I'll even accept language consistency and not worth the effort of
special-casing but mostly because the error is immediate and obvious, and
the solution is simple and readily learned.

A side observation: why does DECLARE not require a block-end keyword but
instead BEGIN acts as effectively both start and end?  BEGIN, IF, FOR,
etc... all come in pairs but DECLARE does not.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/why-semicolon-after-begin-is-not-allowed-in-postgresql-tp5779905p5780245.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why semicolon after begin is not allowed in postgresql?

2013-11-25 Thread David Johnston
Andrew Dunstan wrote
 On 11/25/2013 06:13 PM, David Johnston wrote:

 A side observation: why does DECLARE not require a block-end keyword
 but
 instead BEGIN acts as effectively both start and end?  BEGIN, IF, FOR,
 etc... all come in pairs but DECLARE does not.


 A complete block is:
 
  [ DECLARE declarations ]
  BEGIN statements
  [ EXCEPTIONS handlers ]
  END
 
 The declare and exceptions parts are optional, as indicated. Does that 
 make it clearer?

Doh!

IF / THEN / ELSE / ENDIF  (concept, not syntax)

That also does help to reinforce the point being made here...

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/why-semicolon-after-begin-is-not-allowed-in-postgresql-tp5779905p5780250.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >