Re: [HACKERS] Cluster wide option to control symbol case folding

2017-01-06 Thread Bruce Momjian
On Mon, Jan  2, 2017 at 01:25:35PM -0500, Robert Haas wrote:
> > But, I can easily imagine a good number of people deciding they want
> > mixed case on the server, and so quoting their identifiers. And, then
> > deciding PostgreSQL is defective, rather than deciding their favorite
> > administration or query tool is defective. Almost all of the tools I
> > tried worked fine when I had all lower case symbols on the server. Based
> > on observing the generated SQL, most of the tools that failed for me
> > when I had mixed case symbols on the server would work against a case
> > preserving mode in PostgreSQL. The tools generally pass through the
> > catalog reported symbols without manipulation.
> 
> Right.  Tom's argument makes a lot of sense when you think about this
> from the point of view of someone writing extensions or tools that are
> designed to work with anybody's PostgreSQL instance.  When you think
> about it from the point of view of a user wanting to write an
> application that can work with any of a number of databases, then your
> argument has a lot of merit to it.  I'm not sure there's any way to
> split the baby here: tool authors will obviously prefer that
> PostgreSQL's behavior in this area be invariable, while people trying
> to develop portable database applications will prefer configurability.
> As far as I can see, this is a zero sum game that is bound to have one
> winner and one loser.

Please let me restate the above.  For those working only in the Postgres
ecosystem, the rules are pretty clear --- quote nothing and use only
lowercase, or quote everything.  The reason "quote nothing" is
insufficient is that tools like pgAdmin will quote mixed-case
identifiers during object creation, so technically it is difficult to
have pure "quote nothing" behavior in Postgres unless you control all
the tooling.

Now, clearly, we have users coming from non-Postgres databases where the
behavior is different, i.e. Oracle might be "quote nothing and use only
uppercase".  It seems SQLAnywhere is "quote nothing and case is
preserved".

The problem with opening Postgres up to user-modifiable case folding is
that Postgres ecosystem queries will have to adjust to the fact that
case folding is no longer predictable.  For database applications the
fix might be as easy as changing the session state, but for extensions
and libraries, they would need to quote _everything_ to handle
uncontrollable case folding behavior.  So, in a way, the crazy quoting
we are trying to avoid for migrated queries now happens in the Postgres
ecosystem, and we might even have more of it.

This basically pushes the quoting overhead from users moving to Postgres
from other databases to Postgres ecosystem tooling.  Whether that is
better or worse overall is a judgement call, as Robert stated.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cluster wide option to control symbol case folding

2017-01-04 Thread Lewis, Ian (Microstar Laboratories)
Robert Haas [mailto:robertmh...@gmail.com] wrote:
>> Where you get into trouble there is that you might run CREATE
EXTENSION 
>> from that session

Yes. I can see this problem. And, while I can imagine resolving it with
context belonging to the extension, separate from the current session's
context, any resolution gets to be pretty complex. Probably complex
enough that the resolution is worse than living with the problem as part
of the cost of the feature. Which means, it remains one of the arguments
against it.

>> Again, I'm not trying to rain down fire and brimstone
>> on your idea here and I clearly see the utility of it.

I do not feel this at all (though, since this is e-mail, it is helpful
that you state it explicitly). And, I have not felt like this was the
case at any point in these discussions. I have consistently received
thoughtful and remarkably good responses containing solid points. 

While I did not recognize all the impacts, and I probably still do not,
I never thought this was a trivial issue with no arguments against it
(not least, that what I was asking for is not standard compliant). Every
modal behavior of any sort in any software adds pain. At the very least
it increases the regression testing burden. And, this is a mode with
pretty fundamental impact. It has to be worth a lot to somebody to be
worth having. And, of course, it also has to work.

Ian Lewis (www.mstarlabs.com)



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cluster wide option to control symbol case folding

2017-01-04 Thread Robert Haas
On Tue, Jan 3, 2017 at 6:45 PM, Lewis, Ian (Microstar Laboratories)
 wrote:
> One idea, which would likely be harder to implement on the server, but
> that would have less impact on third party tools and libraries, would be
> to configure case folding on a session basis. There would have to be
> some means to configure a session for the case folding your application
> wants to see. And, the general default would have to be the current
> PostgreSQL behavior so that an application that was designed for current
> behavior would never see a change.

Where you get into trouble there is that you might run CREATE
EXTENSION from that session, or call an SQL function defined in a
session with different settings (perhaps a function created by an
extension).  This is not unlike various problems we've had over the
years with search_path, which really ought to be lexically scoped but
is in fact dynamically scoped.

Again, I'm not trying to rain down fire and brimstone on your idea
here and I clearly see the utility of it.  I also think it's great
that you've engaged in the discussion in the way that you have.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cluster wide option to control symbol case folding

2017-01-03 Thread Lewis, Ian (Microstar Laboratories)
Tom Lane [mailto:t...@sss.pgh.pa.us] wrote:
>> 2. If the folding mode is chosen through a GUC variable, which
>> is certainly what people would expect, then it turns out that
>> it breaks client libraries/applications *anyway*, because an
>> installation-wide setting could impose itself on a client that
>> hadn't asked for it.

I know that some variables can only be configured at a wide scope, and
not a narrow one. Is there no way to restrict a GUC variable's
configuration scope to session and finer, but force a fixed value at
global scope? 

If it is possible to restrict global configuration, that at least
protects the general purpose administrative tools to a significant
degree.

>> And for libraries, that isn't a great solution because then they're
incompatible with applications that wanted another setting.

Good point. Libraries continue to have problems even with session level
configuration if they are to operate in the context of an application
that reconfigures its session case folding for its own purposes. 

But, that seems like a problem that is much more likely to affect
developers of new systems rather than general users or administrators of
existing database systems. 

If so, it is more of a forward looking problem than a legacy problem in
the sense that the person who encounters it is likely to be in a
position to do something about it. This makes it much less critical to
get every library in the world updated to support all case folding modes
than would be the case for general administrative tools like pgAdmin.

Depending on the nature of the library, a developer would have the
option of using multiple sessions or, perhaps, if it were possible,
modifying the folding configuration when using the library. 

Anyhow, as you say, libraries clearly continue to have issues even with
restricted scope on case folding configuration.

And the session level idea really helps nothing unless the global
default session configuration is fixed.

Ian Lewis (www.mstarlabs.com)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cluster wide option to control symbol case folding

2017-01-03 Thread Tom Lane
"Lewis, Ian \(Microstar Laboratories\)"  writes:
> One idea, which would likely be harder to implement on the server, but
> that would have less impact on third party tools and libraries, would be
> to configure case folding on a session basis.

There are a couple of problems even with that:

1. All the sessions have to share the same catalog state, which greatly
restricts what you could do.

2. If the folding mode is chosen through a GUC variable, which is
certainly what people would expect, then it turns out that it breaks
client libraries/applications *anyway*, because an installation-wide
setting could impose itself on a client that hadn't asked for it.
So you have to update everything at least to the extent of teaching it
to turn off setting X when talking to server versions >= Y.  And for
libraries, that isn't a great solution because then they're incompatible
with applications that wanted another setting.  The notion that the
client side is a monolithic chunk doesn't withstand scrutiny; really
there's usually a stack of code over there, and it all has to cope
with the SQL semantics we expose.

Point #2 was really the lesson that we learned the hard way with the
autocommit fiasco.  We'd thought going into it that client-side code
could be updated only when somebody wanted to use the new behavior,
and it took awhile to absorb the fact that much code would be forced
to deal with the behavior whether it wanted to or not.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cluster wide option to control symbol case folding

2017-01-03 Thread Lewis, Ian (Microstar Laboratories)
Robert Haas [mailto:robertmh...@gmail.com] wrote:
> The issue is, rather, that every extension written for
> PostgreSQL, whether in or out of core, needs to handle this issue and
> every general-purpose client tool (pgAdmin, etc.) needs to be aware of
> it. 

I can see the accuracy of all of the points you make here. And, I
definitely had not thought through the side effects on support tools and
third party libraries of implementing such modal behavior on the server
when I originally asked my question. I did not even understand the
ramifications of upper case folding on the server until Tom pointed out
the earlier conversations on the subject (in my defense, I was not
confused enough to think I had thought through all the effects of a
fundamental change to language recognition based on writing one e-mail
message). 

A fully case sensitive mode, leaving the server catalogs all in lower
case, which is what we would really like to have for our use, still
looks pretty easy to implement on the server. And, it would at least
behave consistently with the lower case folding mode if one quoted all
identifiers, unlike a case preserving, case insensitive mode.

One idea, which would likely be harder to implement on the server, but
that would have less impact on third party tools and libraries, would be
to configure case folding on a session basis. There would have to be
some means to configure a session for the case folding your application
wants to see. And, the general default would have to be the current
PostgreSQL behavior so that an application that was designed for current
behavior would never see a change. 

While not quite obvious to me how one would implement this for all
client environments, it would make such a feature more useful if it
included a means to make the configuration outside of the scope of an
application itself so that one could give an application over which one
has no control the behavior it expects. That is, provide a means to
configure a specific application's session default behavior on the
client. But, provide no means to configure the server's general default
behavior so that the server itself is never modal with respect to case
folding. Only the client session is modal.

It is pretty easy to see the pain of adding symbol case folding modes.
On the other hand, there is no way to know exactly the gain (or loss) in
adoption to providing alternate case folding. So, you have one fact (the
pain) and one speculation (the gain). I can see that makes deciding
whether this is a good or bad idea for the project not at all easy. 

Anyhow, I appreciate the time you, and others, have taken to explain
your thinking and the impacts of adding modal case folding to the
server. 

Ian Lewis (www.mstarlabs.com)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cluster wide option to control symbol case folding

2017-01-03 Thread Jim Nasby

On 1/2/17 2:01 PM, Greg Stark wrote:

The PostGIS extensions might not work on
your system with different case rules if they haven't been 100%
consistent with their camelCasing


FWIW I've already run into a similar problem with inter-extension 
dependencies and relocatability. I've found hacks to work around this 
during extension installation (ie: query pg_extension.extnamespace in 
the dependent extension build script), but if the other extension gets 
relocated you're hosed.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cluster wide option to control symbol case folding

2017-01-03 Thread Alvaro Herrera
Lewis, Ian (Microstar Laboratories) wrote:

> PS. To anyone who might know the answer: My Reply All to this group does
> not seem to join to the original thread. All I am doing is Reply All
> from Outlook. Is there something else I need to do to allow my responses
> to join the original thread?

That's a known deficiency with Outlook, which fails to include the
required headers (References and/or In-Reply-To).  We have a few threads
like that in the archives.  As far as I know there's no workaround; the
only solution is to change to another mail program.  Perhaps there are
config options that can be changed, but I've asked a few people that
have had this problem and nobody has found anything -- but I don't know
how hard they've tried, if at all.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cluster wide option to control symbol case folding

2017-01-03 Thread Robert Haas
On Mon, Jan 2, 2017 at 8:03 PM, Lewis, Ian (Microstar Laboratories)
 wrote:
> Personally, I believe such an option would increase, not decrease the
> number of people who could relatively easily use PostgreSQL. If that is
> right it is a strong argument for such a modal behavior in spite of the
> obvious real pain.

That is definitely true.  "X will let more people easily use
PostgreSQL" is an argument with a lot of merit, and I don't see how
anyone could argue otherwise (unless they want fewer people to use
PostgreSQL).  I think the issue isn't so much whether we'd increase or
decrease the number of people who could easily use PostgreSQL; I'm
confident that, as you say, many potential users would find it
convenient.  The issue is, rather, that every extension written for
PostgreSQL, whether in or out of core, needs to handle this issue and
every general-purpose client tool (pgAdmin, etc.) needs to be aware of
it.  Many drivers probably need to know about it.  Any application
built on PostgreSQL must either now require a specific server
configuration or be prepared to work with any of them.  I think this
is the sort of thing where the server changes would take a month and
tools, connectors, and extensions would still be getting bug reports a
decade later.  Now, I am not as convinced as Tom is that this is a
categorically terrible idea.  But I do think that it would be easy to
underestimate the amount of collateral damage that such a change would
cause.  It would be very large.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cluster wide option to control symbol case folding

2017-01-02 Thread Lewis, Ian (Microstar Laboratories)
From: Robert Haas [mailto:robertmh...@gmail.com] wrote:

> I'm not sure there's any way to split the baby here: tool authors will
obviously prefer that PostgreSQL's behavior in this area be invariable,
while people trying to develop portable database applications will
prefer configurability.
> As far as I can see, this is a zero sum game that is bound to have one
winner and one loser.

Tom is clearly right that such modes make life harder in a fundamental
way for anyone writing only against PostgreSQL. And, excepting the upper
case folding option, which is of no interest at all to me personally - I
do not care which case folding messes up my symbol declarations, it
would move PostgreSQL away from the standard rather than closer to it
(against that, however, PostgreSQL has many features that are not part
of the standard, including its existing lower case folding).

If he is also right that addition of such an option would deteriorate
into a situation where more people think PostgreSQL is broken, rather
than fewer people thinking that, as I think would be the case, I have no
strong argument for why PostgreSQL - as a project - should support such
modal behavior. 

Personally, I believe such an option would increase, not decrease the
number of people who could relatively easily use PostgreSQL. If that is
right it is a strong argument for such a modal behavior in spite of the
obvious real pain. 

And, from what I can see, many, maybe most, general purpose tool authors
target many backends. So, they already have to deal with some
signficiant degree of variation in case folding behavior.

So, I do not really see this as a zero sum game. It is a question of
whether such an option would grow the user base. If not, it is clearly a
bad idea for the project. But, if it would grow the user base
sufficiently, then, yes, there is pain for those who write general
purpose tools aimed only at PostgreSQL. But, such tools gain from a
wider adoption of PostgreSQL.

Ian Lewis (www.mstarlabs.com)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cluster wide option to control symbol case folding

2017-01-02 Thread Lewis, Ian (Microstar Laboratories)
gsst...@gmail.com [mailto:gsst...@gmail.com] On Behalf Of Greg Stark
wrote:

> But the problem with configurable quoting rules is a bit different.
> Imagine your application later decides to depend on PostGIS. So you
load the PostGIS extension and perhaps also some useful functions you
found on Stack Overflow for solving some GIS problem you have. Those
extensions will create objects and then work with those objects and may
use CamelCase for clarity -- in > fact I think PostGIS functions are
documented as CamelCase.  The PostGIS extensions might not work on your
system with different case rules if they haven't been 100% consistent
with their camelCasing, and the functions from StackOverflow would be
even less likely to work.

Well, in the case of StackOverflow suggestions, I cannot remember a time
when I did not have to rewrite whatever suggestions I have found and
used. That is not to say that StackOverflow is not useful. It is
incredibly useful at times. But, the suggestions are usually fragments
showing how to do something, not solutions. And, most such suggestions
are small. And, so, relatively easy to understand and patch as needed.
Many such suggestions are not very useful verbatim anyhow. They are
useful exactly because they allow you to understand something that you
were unable to glean from the documentation. Certainly, making symbol
usage consistent is not a hard patch on a small fragment of code that
probably needs help anyhow to bring it to production grade. I would not
consider this a strong argument against having modal symbol recognition.

Your point about PostGIS, and other full or partial solutions for a
complex problem, is a more serious issue. I do not have a strong answer
to this point. However, at the least a CamelCase case defect in a tool
is a pretty easy problem to locate and submit as a patch. (I understand
that your point is not just about PostGIS, but for PostGIS itself I have
read in a few places that they quote everything already. I do not know
whether that is true or not as I have never even looked at the tool.
However, if it is true they quote everything, then they already have
their CamelCase exactly right everywhere. If they did not the symbol
lookup would fail against current PostgreSQL. Any tool that quotes
everything should work the same way against any mode as long as all
modes are case sensitive. It might be ugly, but at least it should
always work no matter what the back end case translation.)

In our own code, I actually would prefer that we were forced to always
use the same case everywhere we refer to a symbol. And a case sensitive
behavior would enforce that at testing. I do not want this because I
want to be able to define symbols that differ only in case. I want it so
that every symbol reference is exactly visually like every other symbol
reference to the same object. Even though the effect is small, I think
such consistency makes it easier to read code. Even in C we almost never
use the ability to overload on case alone except in a few rare - and
localized - cases where the code is actually clearer with such a
notation. For example, in a mathematical implementation, using a
notation where something like t acts as an index and T defines the range
of t the difference in case is very clear. Perhaps more importantly,
this use of overload on case is consistent with conventional
mathematical notation (which, in my opinion is very good where it
belongs). This is not true when dealing with TheLongSymbolWithMixedCase
vs. TheLongSymbolWithMixedcase. The human brain cannot see that
difference easily, while it can see the difference between t and T very
easily, and it can see the relationship between the two symbols more
easily than it can see the relationship between t and tmax, say. Still,
we almost never have such code running on a database server.

Anyhow, you have a good point about third party libraries and tools that
integrate with PostgreSQL. However, I for one would be willing to live
with and address that kind of issue as needed. If the behavior were
controlled at database create time, which, from the articles Tom linked,
seems to be the general consensus as the right time for such a choice
given the current implementation, then one would at least have the
option of having databases with different case rules within a cluster.
Since each session can only connect to one database, this is not a
solution to every such situation, but it would address at least some
such cases.

Ian Lewis (www.mstarlabs.com)

PS. To anyone who might know the answer: My Reply All to this group does
not seem to join to the original thread. All I am doing is Reply All
from Outlook. Is there something else I need to do to allow my responses
to join the original thread?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cluster wide option to control symbol case folding

2017-01-02 Thread Greg Stark
On 25 December 2016 at 09:40, Lewis, Ian (Microstar Laboratories)
 wrote:
> So, the current behavior already breaks many tools unless one accepts
> that all symbols on the server are lower case. At root, based on reading
> the threads you provided, this probably indicates defects in the tools,
> rather than a problem with PostgreSQL. My reading of the standard text
> quoted in various places is that any mixed case identifier returned from
> the catalog has to be quoted to match in a query (whether you fold to
> lower or upper case).

Well tools that work with user-defined columns and make assumptions
that they don't require quoting are just buggy.

But the problem with configurable quoting rules is a bit different.
Imagine your application later decides to depend on PostGIS. So you
load the PostGIS extension and perhaps also some useful functions you
found on Stack Overflow for solving some GIS problem you have. Those
extensions will create objects and then work with those objects and
may use CamelCase for clarity -- in fact I think PostGIS functions are
documented as CamelCase.  The PostGIS extensions might not work on
your system with different case rules if they haven't been 100%
consistent with their camelCasing, and the functions from
StackOverflow would be even less likely to work.

If there was some way to scope this setting lexically so it only
affected code that's defined in specific place that might be safer.
But I don't think things are currently organized that way. If you're
only concerned with server-side functions it wouldn't be hard to have
a specific pl language that was case sensitive though it might be
tricky to do to pl/pgsql due to the way pl/pgsql depends on the sql
parser.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cluster wide option to control symbol case folding

2017-01-02 Thread Jim Nasby

On 1/2/17 12:25 PM, Robert Haas wrote:

But, I can easily imagine a good number of people deciding they want
mixed case on the server, and so quoting their identifiers. And, then
deciding PostgreSQL is defective, rather than deciding their favorite
administration or query tool is defective. Almost all of the tools I
tried worked fine when I had all lower case symbols on the server. Based
on observing the generated SQL, most of the tools that failed for me
when I had mixed case symbols on the server would work against a case
preserving mode in PostgreSQL. The tools generally pass through the
catalog reported symbols without manipulation.

Right.  Tom's argument makes a lot of sense when you think about this
from the point of view of someone writing extensions or tools that are
designed to work with anybody's PostgreSQL instance.  When you think
about it from the point of view of a user wanting to write an
application that can work with any of a number of databases, then your
argument has a lot of merit to it.  I'm not sure there's any way to
split the baby here: tool authors will obviously prefer that
PostgreSQL's behavior in this area be invariable, while people trying
to develop portable database applications will prefer configurability.
As far as I can see, this is a zero sum game that is bound to have one
winner and one loser.


I do wonder what the authors of those tools are using to test them. My 
guess is they're just hitting the default postgres database, which has 
no quotable identifiers.


I'd find it useful to have a contrib module/tool that would create a 
full-blown test database that contains objects of every possible type, 
including using identifiers that need quotes. It'd also be useful to 
test leading and trailing whitespace.


Having this would be useful for seeing what some of the more unusual 
objects look like in the catalog, and would simplify tests that I create 
for my extensions somewhat. If tool authors knew it existed they could 
use it for testing. It'd certainly be useful for testing things like 
pg_dump and event triggers.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cluster wide option to control symbol case folding

2017-01-02 Thread Robert Haas
On Sun, Dec 25, 2016 at 4:40 AM, Lewis, Ian (Microstar Laboratories)
 wrote:
> I assume you are talking about general purpose tools that attempt to
> interact with any database in any configuration. Obviously, a purpose
> built tool, such as our own internal database applications, would be
> designed only for the behavior of the databases it is intended to work
> against.

Right.

> So, the current behavior already breaks many tools unless one accepts
> that all symbols on the server are lower case. At root, based on reading
> the threads you provided, this probably indicates defects in the tools,
> rather than a problem with PostgreSQL. My reading of the standard text
> quoted in various places is that any mixed case identifier returned from
> the catalog has to be quoted to match in a query (whether you fold to
> lower or upper case).
>
> But, I can easily imagine a good number of people deciding they want
> mixed case on the server, and so quoting their identifiers. And, then
> deciding PostgreSQL is defective, rather than deciding their favorite
> administration or query tool is defective. Almost all of the tools I
> tried worked fine when I had all lower case symbols on the server. Based
> on observing the generated SQL, most of the tools that failed for me
> when I had mixed case symbols on the server would work against a case
> preserving mode in PostgreSQL. The tools generally pass through the
> catalog reported symbols without manipulation.

Right.  Tom's argument makes a lot of sense when you think about this
from the point of view of someone writing extensions or tools that are
designed to work with anybody's PostgreSQL instance.  When you think
about it from the point of view of a user wanting to write an
application that can work with any of a number of databases, then your
argument has a lot of merit to it.  I'm not sure there's any way to
split the baby here: tool authors will obviously prefer that
PostgreSQL's behavior in this area be invariable, while people trying
to develop portable database applications will prefer configurability.
As far as I can see, this is a zero sum game that is bound to have one
winner and one loser.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cluster wide option to control symbol case folding

2016-12-25 Thread Craig Ringer
On 25 Dec. 2016 14:49, "Tom Lane"  wrote:



No.  This has been looked into repeatedly in the past, and we simply
don't want to deal with it.  Quite aside from the impact on the server
(which would be extensive), it would break every nontrivial application,
and force them all to try to deal with each possible folding behavior.
The more behaviors there are, the worse that gets.


Good point. That's been a source of pain for standard_conforming_strings
and bytea_output. Many apps are also - sadly - hopelessly broken with
regards to text encoding. (Including, unfortunately, PostgreSQL it's self,
but more subtly than the obvious brain-dead behaviour of many apps).

Tom makes a good point. Apps that need to care are better off consistently
double quoting.

Frankly if we were going to add any option at all I'd want one to force all
unquoted identifiers to ERROR so apps could be easily validated not to
depend on case folding behaviour at all. But even that has issues and is
probably better done in tooling and static analysis or via a plugin.


Re: [HACKERS] Cluster wide option to control symbol case folding

2016-12-25 Thread Lewis, Ian (Microstar Laboratories)
Tom Lane [mailto:t...@sss.pgh.pa.us] wrote:
>  Quite aside from the impact on the server (which would be extensive),
it would break every nontrivial application, and force them all to try
to deal with each possible folding behavior.
I have read through the various threads related to this issue that you
supplied. And, it looks quite clear that a change is unlikely. Maybe it
is even a bad idea, though personally I think it could prove worth the
pain that you obviously anticipate.

However, before fully dropping this issue, I do have one comment I would
like to make (just for the record) on your statement that a change in
the current behavior would break most non-trivial applications.

I assume you are talking about general purpose tools that attempt to
interact with any database in any configuration. Obviously, a purpose
built tool, such as our own internal database applications, would be
designed only for the behavior of the databases it is intended to work
against.

I have, over the past few months, tried quite a large number (10, maybe
a few more than that) of general purpose tools against PostgreSQL
looking for replacements for some of the tools we use against our old
database server. And, almost none of them work well if I quote
identifiers on the server. Almost all work perfectly well if I accept
that all my symbols will be converted to lower case and do not quote the
identifiers. Most fail - often not even in corner cases - when the
catalog (properly) returns a mixed case symbol like WeeklySales. Very
few of the tools I have tried seem to know to quote such a symbol to
preserve the case when generating a query to send back to the server. 

Most of the tools I have tested work through ODBC, though a few connect
directly to PostgreSQL. Both types of tools have exhibited similar
issues. Maybe the ODBC connection is relevant, since it appears that
Microsoft SQL Server does what our current server does and preserves
case, at least in some modes. A good fraction of the tools I have tried,
push their compatibility with SQL Server.

So, the current behavior already breaks many tools unless one accepts
that all symbols on the server are lower case. At root, based on reading
the threads you provided, this probably indicates defects in the tools,
rather than a problem with PostgreSQL. My reading of the standard text
quoted in various places is that any mixed case identifier returned from
the catalog has to be quoted to match in a query (whether you fold to
lower or upper case). 

But, I can easily imagine a good number of people deciding they want
mixed case on the server, and so quoting their identifiers. And, then
deciding PostgreSQL is defective, rather than deciding their favorite
administration or query tool is defective. Almost all of the tools I
tried worked fine when I had all lower case symbols on the server. Based
on observing the generated SQL, most of the tools that failed for me
when I had mixed case symbols on the server would work against a case
preserving mode in PostgreSQL. The tools generally pass through the
catalog reported symbols without manipulation. 

I fully understand your concern to keep the number of modal behaviors to
a minimum. And, the upper case folding looks to have a lot of side
effects, and so is a whole lot harder to implement than I expected.
Preserving case, which is what we actually want, by contrast still looks
pretty easy, though it is clearly not standard compliant. 

I do not have any more to add on the subject. I just wanted to make this
note that a good fraction of third party tools already fail unless one
never quotes identifiers on PostgreSQL. 

Thank you for taking the time to reply to my original inquiry. 

Ian Lewis
www.mstarlabs.com




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cluster wide option to control symbol case folding

2016-12-24 Thread Lewis, Ian (Microstar Laboratories)
On Saturday, December 24, 2016 10:49 PM Tom Lane
[mailto:t...@sss.pgh.pa.us] wrote:
> No.  This has been looked into repeatedly in the past, and we simply
don't want to deal with it.  

Fair enough. We will not pursue the issue then. That is why I asked.

Ian Lewis (www.mstarlabs.com)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cluster wide option to control symbol case folding

2016-12-24 Thread Lewis, Ian (Microstar Laboratories)
On  December 24, 2016 9:52 PM Craig Ringer
[mailto:craig.rin...@2ndquadrant.com] wrote:
> Personally I can see such an option being ok as an initdb-time setting
or at CREATE DATABASE time. Case folding can know the current db from
global context.
>
> It'd have to be fast though. Very fast.
 
That seems reasonable. In fact, it is nicer to configure at the database
level, rather than at the cluster level. I did not know it was possible
to tell the database from global context. Since a connection only allows
access to a single database, it makes sense that could be possible.
 
A check of a single global configuration variable to switch between one
of three handler paths would be very small compared with the
character-by-character checks currently performed by
downcase_identifier(), though, of course, the extra check would not be
free.
 
However, it would likely be faster to setup up the processing to call
through a global pointer to one of three handler functions. On most
Intel processors, at least, that extra pointer indirection costs little
to nothing. The pointer could be set up during database connect (I do
not know what I am talking about here, but there must be such a process
somewhere).
 
Presumably, the handler pointer would have to go into the global
database descriptor whatever that is. Or, if you allow use of global
objects for storing information about database scope run-time
configuration, it could just be a function pointer stored with the
handlers. The database initialization processing could call a setup
function when it runs to select the correct handling for its
configuration. The default would be the current downcase_identifier()
handling.
 
Does this seem like an approach that would meet your "Very fast"
requirement?
 
Ian Lewis (www.mstarlabs.com)


Re: [HACKERS] Cluster wide option to control symbol case folding

2016-12-24 Thread Tom Lane
"Lewis, Ian \(Microstar Laboratories\)"  writes:
> Is there any chance that the PostgreSQL developers would accept a new
> cluster wide configuration option to control how the system handles
> symbol case folding? 

No.  This has been looked into repeatedly in the past, and we simply
don't want to deal with it.  Quite aside from the impact on the server
(which would be extensive), it would break every nontrivial application,
and force them all to try to deal with each possible folding behavior.
The more behaviors there are, the worse that gets.  Pretty soon, every
application is double-quoting every identifier out of sheer paranoia.
Which is exactly the behavior you say you'd rather avoid --- but not
only have you not avoided it, you've forced the entire Postgres
ecosystem into it.

The closest precedent that ever actually got into the server was the
transaction-autocommit option that existed for awhile circa PG 7.3.
Once we realized just how much complexity we were forcing on every
application, we took that out again.  If somehow we forgot that
episode and agreed to take a case-folding behavioral change, I'm
pretty sure the same dynamics would play out again.

There's a lot of material on this in the archives.  The latest
substantive discussion I can find, which includes links to several
previous investigations, is here:
https://www.postgresql.org/message-id/flat/200807081925.40467.peter_e%40gmx.net

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cluster wide option to control symbol case folding

2016-12-24 Thread Craig Ringer
On 25 Dec. 2016 10:30 am, "Lewis, Ian (Microstar Laboratories)" <
ile...@mstarlabs.com> wrote:

Is there any chance that the PostgreSQL developers would accept a new
cluster wide configuration option to control how the system handles
symbol case folding?


Probably not  as a GUC (configuration option, like in PostgreSQL.conf).
There's a somewhat established principle that GUCs should not change query
semantics.

Personally I can see such an option being ok as an initdb-time setting or
at CREATE DATABASE time. Case folding can know the current db from global
context.

It'd have to be fast though. Very fast.


[HACKERS] Cluster wide option to control symbol case folding

2016-12-24 Thread Lewis, Ian (Microstar Laboratories)
Is there any chance that the PostgreSQL developers would accept a new
cluster wide configuration option to control how the system handles
symbol case folding? 

Currently PostgreSQL folds all un-quoted symbols to lower case. 

We would like to add a global configuration option with a name like
  symbol_casefold
 
with settings to allow folding symbols to
  lc - lower case (default)
  uc   - upper case  
  none - no case folding


USE CASE 1
The option we actually want for our own use is "none". Currently we have
several large bodies of code that work with an SQL Anywhere backend.
This server preserves case and - by a configuration option - performs
case insensitive lookup on all symbols. We are moving to PostgreSQL for
internal applications. We are also considering using PostgreSQL for the
storage backend in a product.

We use camel case for symbol names in all of our application and backend
code (largely written in C++, Object Pascal, and Python). In a language
like Pascal that ignores case, we still maintain consistent use of case
in symbol names for the sake of human readers of the code.

Where we make a connection from application code to a storage backend we
would like to use exactly the same symbol name - including case - for
the symbol in the backend as we use in the corresponding symbol in
application code. For example, if we intend to read a value into a
variable called FirstName we would like the corresponding field in the
database to also be FirstName. The main reason we want this exact match
is so that a human reader sees exactly the same thing in the two places.
This makes it easier to see the connection between the two bodies of
code. However, in places we also automate the connection between client
symbols and server symbols, and in such a case it is useful, though
certainly not necessary, to have an exact match.

While we do not do so at present, where we use a database purely through
code, we can likely add double quotes around all symbols in our
generated SQL, which means we can have an exact match as we want under
PostgreSQL as it currently behaves.

However, in many cases we also access the same database backend through
third party tools and ad hoc queries. 

If we double quote all of our symbols, which is what we have tried to do
in our initial tests, then all such tools and manually written queries
must also double quote all symbols. 

We use several tools and libraries that do not appear to have any way to
properly quote symbols when they are obtained automatically from the
schema. So, for example, if our reporting tool retrieves a field called
FirstName from the database schema, it happily uses that name in its
internally generated SQL. And, on PostgreSQL, of course this fails
because FirstName in script is firstname at the server, which is not
defined if we have double quoted the field name in the table definition.
Since we do not have the code, we cannot work around this in any easy
way.

For ad hoc queries we can double quote all symbol references. But, this
makes the queries noticeably harder for a person to read. And, I do not
believe this is just a matter of "getting used to it". The quotes
clutter the script, and that clutter makes the script fundamentally
harder to understand. Maybe this is small, but anything that makes
comprehension harder is a bad thing. Queries can be hard enough to
understand without extra syntactic clutter.

So, these considerations leave us the option of never quoting symbols
when using PostgreSQL. This works everywhere we have tried it. But, it
is pretty unattractive from the point of view of looking at the symbol
names in the backend if they are to match exactly the symbol names we
use in code. Using a different naming convention than we use everywhere
else in our code (underscore separated all lower case symbol names, say)
is not appealing either. We are very consistent in our symbol name
handling in our code, and breaking our conventions in some relatively
large section of our code is very unattractive. 

In addition, in many places our tools and code use schema supplied field
names to form column titles in a table or in a caption on an edit box
(etc.). The mixed case names are much nicer for this purpose than the
folded names.

For our purposes, these are our arguments for wanting control of how the
server folds case. 


USE CASE 2
Even though we have no use for it, I have included the option "uc"
because, in trying to determine whether PostgreSQL could support our
desired behavior, I found a fairly large number of people who are coming
from a different backend, such as Oracle (from what I see on the
internet - no personal experience), that case fold similarly to
PostgreSQL. However, instead of folding to lower case, it appears a
number of other database servers fold to upper case. 

This leaves people who are moving from these other database systems with
problems in their own code that they have to patch up to be able to make
the port