Re: Capitalization of localized month and day names (to_char() with 'TMmonth', 'TMday', etc.)

2021-10-08 Thread Magnus Holmgren
onsdag 6 oktober 2021 kl. 16:01:49 CEST skrev du:
> =?UTF-8?Q?Juan_Jos=C3=A9_Santamar=C3=ADa_Flecha?= 
 writes:
> > On Wed, Oct 6, 2021 at 11:09 AM Magnus Holmgren
> > > 
> > wrote:
> >> There's just this tiny but seemingly obvious issue that I can't believe I
> >> haven't noticed until now: to_date(now(), 'TMmonth') returns 'october' in
> >> an
> >> English locale (en_US.UTF-8 at least). Names of months and weekdays are
> >> proper
> >> nouns and as such *always* capitalized in English, so that seems wrong to
> >> me.
> > 
> > IMHO, the patterns of TO_CHAR() do as promised in the documentation [1]:
> > MONTH full upper case month name (blank-padded to 9 chars)
> > Month full capitalized month name (blank-padded to 9 chars)
> > month full lower case month name (blank-padded to 9 chars)
> > 
> > What you are proposing looks more like a new feature than a bug.
> 
> Yeah, this is operating as designed and documented.  The idea that
> there should be a way to get "month name as it'd be spelled mid-sentence"
> is an interesting one, but I really doubt that anyone would thank us for
> changing TMmonth to act that way.  (Perhaps a new format code or modifier
> would be easier to swallow?)

Yes, I see that it's working as designed and documented, but I contend that 
the design is flawed for the reason I gave. I mean, you can't deny that names 
of months and weekdays are always capitalized in English and certain other 
languages, whereas in another set of languages they are not, can you? Perhaps 
this is a conscious design choice with some reason behind it, but if so, 
neither the PostgreSQL nor the Oracle documentation (https://docs.oracle.com/
cd/B12037_01/server.101/b10759/sql_elements004.htm#i34510) reveal it. What is 
the use case for linguistically incorrectly lowercased localized month and day 
names? What would such a change break?

I still suspect that whoever designed this didn't consider locale switching. 
(Interestingly, "month", "mon", "day", and "dy" are locale-specific by 
themselves; there is no "TM" prefix needed.

> I also wonder exactly how the code would figure out what to do ---
> language-specific conventions for this are not information available
> from the libc locale APIs, AFAIR.

I checked the code, and it looks like cache_locale_time() in src/backend/
utils/adt/pg_locale.c uses strftime(3) to produce the correctly capitalized 
day and month names and abbreviations (format codes %A, %B, %a, and %b). All 
that would be needed is not to force them to lowercase in DCH_to_char() in 
src/backend/utils/adt/formatting.c.

What could a new, separate format code that doesn't do this look like?

-- 
Magnus Holmgren, developer
MILLNET AB



-- 
Vid e-postkontakt med Millnet är det normalt att åtminstone vissa 
personuppgifter sparas om dig. Du kan läsa mer om vilka uppgifter som 
sparas och hur vi hanterar dem på https://www.millnet.se/integritetspolicy/ 
<https://www.millnet.se/integritetspolicy/>.




Capitalization of localized month and day names (to_char() with 'TMmonth', 'TMday', etc.)

2021-10-06 Thread Magnus Holmgren
Hello and thank you very much for the best open source database engine!

There's just this tiny but seemingly obvious issue that I can't believe I 
haven't noticed until now: to_date(now(), 'TMmonth') returns 'october' in an 
English locale (en_US.UTF-8 at least). Names of months and weekdays are proper 
nouns and as such *always* capitalized in English, so that seems wrong to me. 

If you want to build an internationalized application, you want 'TMmonth' to 
return a month name that can be used in the middle of a sentence, capitalized 
or not depending on locale, and 'TMMonth' to return an always capitalized 
month name that can be used at the start of a sentence.

This was discussed back in 2008:

https://www.postgresql.org/message-id/flat/
47C34A98.7050102%40timbira.com#9593d90487976d28e2b612cff576545d

There is talk about how PostgreSQL has to do what Oracle does, but does it 
really have to replicate bugs at this level of detail? Localized date and 
number formats are only for presentation, and not meant to be machine-
readable.

I imagine that the reason it works the way it does is that the unlocalized 
formats exist, and there wouldn't be any difference between 'month' and 
'Month' if 'month' also capitalized month names due to English language rules, 
and as long as you're not building an internationalized application you can 
always use 'Month' to get it right.

(This is the situation in PostgreSQL 13, at least. I haven't tried PostgreSQL 
14, but there are no mentions of to_char() or localization in the release 
notes, nothing in the documentation of to_char() suggesting any change, and I 
also haven't found any more recent discussions.)

Thoughts?

-- 
Magnus Holmgren
MILLNET AB, Teknikringen 6, 583 30 Linköping




-- 
Vid e-postkontakt med Millnet är det normalt att åtminstone vissa 
personuppgifter sparas om dig. Du kan läsa mer om vilka uppgifter som 
sparas och hur vi hanterar dem på https://www.millnet.se/integritetspolicy/ 
<https://www.millnet.se/integritetspolicy/>.