Re: str.title() fails with words containing apostrophes
On 2017-03-06 06:33 PM, Steve D'Aprano wrote: If you read "title case" as *literally* as being only for titles (of books, I believe there is only one conclusion to be drawn from this thread - There is still a place for human proofreaders. I'm taking that as good news. -- D'Arcy J.M. Cain Vybe Networks Inc. http://www.VybeNetworks.com/ IM:da...@vex.net VoIP: sip:da...@vybenetworks.com -- https://mail.python.org/mailman/listinfo/python-list
Re: str.title() fails with words containing apostrophes
Marko Rauhamaa writes: > Steve D'Aprano wrote: >> I came across this book title: >> >> Täällä Pohjantähden alla (‘Here beneath the North Star’) >> >> http://www.booksfromfinland.fi/1980/12/the-strike/ >> >> which is partly title case, but I'm not sure what rule is being >> applied there. My guess is that "Täällä Pohjantähden" means "North >> Star" and it counts as a proper noun, like countries and people's >> names, and so takes initial caps for each word. Am I close? > > Correct. Not quite. "Täällä" is "here", "Pohjantähden" is "North Star". So it's just a first word and a name. ("Pohja" and "tähti" correspond to "North" and "Star".) -- https://mail.python.org/mailman/listinfo/python-list
Re: str.title() fails with words containing apostrophes
Steve D'Aprano writes: > On Tue, 7 Mar 2017 03:28 am, Grant Edwards wrote: >> >> Besides locale-aware, it'll need to be style-guide-aware so that it >> knows whether you want MLA, Chicago, Strunk & White, NYT, Gregg, >> Mrs. Johnson from 9th grade English class, or any of a dozen or two >> others. And that's just for US English. [For all I know, most of >> the ones I listed agree completely on "title case", but I doubt it.] > > As far as I am aware, there are only two conventions for title case in > English: > > Initial Capitals For All The Words In A Sentence. > > Initial Capitals For All the Significant Words in a Sentence. > > For some unstated, subjective rule for "significant" which usually > means "three or more letters, excluding the definite article ('the')". That's where the variation is hidden. I browsed three sites to see what they do. One doesn't title-capitalize anything. One capitalizes everything. One was more interesting. I think it has human editors who pay attention to these matters. They do not capitalize these short words: 'a', 'an', 'at', 'the', 'in', 'of', 'on', 'for', 'to', 'and', 'vs.'; they capitalize longer prepositions: 'From', 'Into', 'With', 'Through'. Also auxiliary verbs and copulas even when short. A 'Nor' was capitalized in the middle of a title, but there was a sentence boundary just before the 'Nor'. I'd classify 'nor' with 'and' otherwise, but they might base the non-capitalization on frequency for all I know. Some two-letter words: 'Is', 'Am', 'Do', 'So', 'No', 'He', 'We', 'It', 'My', 'Up'; also 'Au Revoir', 'Oi Oi Oi', 'Ay Ay Ay'. Then there is 'Grown-Ups' and 'Contrary-to-Fact' but 'X-ing'. Sometimes a hyphen makes a word boundary, sometimes not. > But of course there are exceptions: words which are necessarily in > all-caps should stay in all-caps (e.g. NASA) and names. There may be lots of these if you are handling something like a tech news site that talks about people and companies and institutions from all over the world. Names are tricky. -- https://mail.python.org/mailman/listinfo/python-list
Re: str.title() fails with words containing apostrophes
Chris Angelico: > On Tue, Mar 7, 2017 at 12:03 PM, Marko Rauhamaa wrote: >> >> As for the UK: >> >>Yhdistynyt kuningaskunta > > About the only part of that that I understand is "kuning" == > king/queen/kingdom. I swear, you like the letter 'y' more than the > Welsh do... The Proto-Finnic borrowed the word "kuningas" from the Proto-Germanic, where it was "kuningaz". The Germanic descendant languages have mangled the original quite a bit: English: king German: König Swedish: kung See also: http://www.etymonline.com/index.php?term=king>. The word "yhdistynyt" ultimately comes from the Proto-Uralic word "*ükte" ("one"). The word "kunta" ("society") is in its Proto-Uralic form. Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: str.title() fails with words containing apostrophes
On Tue, Mar 7, 2017 at 4:18 PM, Steven D'Apranowrote: > On Tue, 07 Mar 2017 12:18:13 +1100, Chris Angelico wrote: > >> On Tue, Mar 7, 2017 at 12:03 PM, Marko Rauhamaa >> wrote: >>> >>> As for the UK: >>> >>>Yhdistynyt kuningaskunta >> >> About the only part of that that I understand is "kuning" == >> king/queen/kingdom. I swear, you like the letter 'y' more than the Welsh >> do... > > > But do Finns like 'y' more than the English like 'e'? Perhaps not. And certainly not as much as Calculus likes 'e'. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: str.title() fails with words containing apostrophes
On Tue, 07 Mar 2017 12:18:13 +1100, Chris Angelico wrote: > On Tue, Mar 7, 2017 at 12:03 PM, Marko Rauhamaa> wrote: >> >> As for the UK: >> >>Yhdistynyt kuningaskunta > > About the only part of that that I understand is "kuning" == > king/queen/kingdom. I swear, you like the letter 'y' more than the Welsh > do... But do Finns like 'y' more than the English like 'e'? -- Steve -- https://mail.python.org/mailman/listinfo/python-list
Re: str.title() fails with words containing apostrophes
On Tue, Mar 7, 2017 at 12:03 PM, Marko Rauhamaawrote: > > As for the UK: > >Yhdistynyt kuningaskunta About the only part of that that I understand is "kuning" == king/queen/kingdom. I swear, you like the letter 'y' more than the Welsh do... ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: str.title() fails with words containing apostrophes
Steve D'Aprano: > On Tue, 7 Mar 2017 01:03 am, Marko Rauhamaa wrote: > If you read "title case" as *literally* as being only for titles (of > books, for instance) then of course you are right. Finnish book titles > are normally written in sentence case (initial capital, followed by > all lowercase). Yes. > But if you consider title case more widely, Finnish includes it too. > Names are written in title case ("Marko Rauhamaa" rather than "Marko > rauhamaa"). I imagine countries get the same treatment when needed. > How do you write Saudi Arabia, United Kingdom, and North Korea? The rules are a bit complicated. Sentence case is the basic rule. However, if the latter part of a compound name is a proper noun, both parts are capitalized and connected with a hyphen: Saudi-Arabia Pohjois-Korea Iso-Britannia As for the UK: Yhdistynyt kuningaskunta or: Ison-Britannian ja Pohjois-Irlannin yhdistynyt kuningaskunta Similarly, the University of Helsinki is: Helsingin yliopisto > I came across this book title: > > Täällä Pohjantähden alla (‘Here beneath the North Star’) > > http://www.booksfromfinland.fi/1980/12/the-strike/ > > which is partly title case, but I'm not sure what rule is being > applied there. My guess is that "Täällä Pohjantähden" means "North > Star" and it counts as a proper noun, like countries and people's > names, and so takes initial caps for each word. Am I close? Correct. The sentence case rule is sometimes violated for historical or marketing reasons: Helsingin Sanomat (a newspaper) Suomen Kuvalehti (a magazine) Sutelan Kello ja Kulta (a jeweler) Helsingin Hautaustoimisto (an undertaker) Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: str.title() fails with words containing apostrophes
On Tue, 7 Mar 2017 01:03 am, Marko Rauhamaa wrote: > Chris Angelico: > >> Right. If you want true title casing, it has to be *extremely* >> linguistically-aware. > > For instance, title case has no meaning in the context of Finnish. In > other words, your internationalized program shouldn't even think of > title case when localized in Finnish. If you read "title case" as *literally* as being only for titles (of books, for instance) then of course you are right. Finnish book titles are normally written in sentence case (initial capital, followed by all lowercase). But if you consider title case more widely, Finnish includes it too. Names are written in title case ("Marko Rauhamaa" rather than "Marko rauhamaa"). I imagine countries get the same treatment when needed. How do you write Saudi Arabia, United Kingdom, and North Korea? I came across this book title: Täällä Pohjantähden alla (‘Here beneath the North Star’) http://www.booksfromfinland.fi/1980/12/the-strike/ which is partly title case, but I'm not sure what rule is being applied there. My guess is that "Täällä Pohjantähden" means "North Star" and it counts as a proper noun, like countries and people's names, and so takes initial caps for each word. Am I close? -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list
Re: str.title() fails with words containing apostrophes
On 2017-03-06, Steve D'Apranowrote: > On Tue, 7 Mar 2017 03:28 am, Grant Edwards wrote: > >> On 2017-03-06, Chris Angelico wrote: >> >>> Still, it's fun to discuss, if only to show why that kind of >>> locale-aware transformation is important. >> >> Besides locale-aware, it'll need to be style-guide-aware so that it >> knows whether you want MLA, Chicago, Strunk & White, NYT, Gregg, >> Mrs. Johnson from 9th grade English class, or any of a dozen or two >> others. And that's just for US English. [For all I know, most of the >> ones I listed agree completely on "title case", but I doubt it.] > > As far as I am aware, there are only two conventions for title case in > English: > > Initial Capitals For All The Words In A Sentence. > > Initial Capitals For All the Significant Words in a Sentence. > > For some unstated, subjective rule for "significant" which usually > means "three or more letters, excluding the definite article ('the')". > > But of course there are exceptions: words which are necessarily in > all-caps should stay in all-caps (e.g. NASA) and names. And you capitalize "insignificant" words at the beginning of the title or following a colon. And then there are special cases for hyphenated words, prepositions that belong to a phrasal verb, and so on and so forth. Plus a bunch of exceptions that have been imported from other languages (this is mostly covered by the "name" exception). The "name" one is probably the only one that many people will notice if it's wrong. Unfortunately, writing an algorithm that can decide what constitutes a "name" borders on the impossible. -- Grant Edwards grant.b.edwardsYow! I'm totally DESPONDENT at over the LIBYAN situation gmail.comand the price of CHICKEN ... -- https://mail.python.org/mailman/listinfo/python-list
Re: str.title() fails with words containing apostrophes
On Tue, 7 Mar 2017 03:28 am, Grant Edwards wrote: > On 2017-03-06, Chris Angelicowrote: > >> Still, it's fun to discuss, if only to show why that kind of >> locale-aware transformation is important. > > Besides locale-aware, it'll need to be style-guide-aware so that it > knows whether you want MLA, Chicago, Strunk & White, NYT, Gregg, > Mrs. Johnson from 9th grade English class, or any of a dozen or two > others. And that's just for US English. [For all I know, most of the > ones I listed agree completely on "title case", but I doubt it.] As far as I am aware, there are only two conventions for title case in English: Initial Capitals For All The Words In A Sentence. Initial Capitals For All the Significant Words in a Sentence. For some unstated, subjective rule for "significant" which usually means "three or more letters, excluding the definite article ('the')". But of course there are exceptions: words which are necessarily in all-caps should stay in all-caps (e.g. NASA) and names. -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list
Re: str.title() fails with words containing apostrophes
On 2017-03-06, Chris Angelicowrote: > Still, it's fun to discuss, if only to show why that kind of > locale-aware transformation is important. Besides locale-aware, it'll need to be style-guide-aware so that it knows whether you want MLA, Chicago, Strunk & White, NYT, Gregg, Mrs. Johnson from 9th grade English class, or any of a dozen or two others. And that's just for US English. [For all I know, most of the ones I listed agree completely on "title case", but I doubt it.] -- Grant Edwards grant.b.edwardsYow! FOOLED you! Absorb at EGO SHATTERING impulse gmail.comrays, polyester poltroon!! -- https://mail.python.org/mailman/listinfo/python-list
Re: str.title() fails with words containing apostrophes
On 2017-03-06 05:04 AM, Peter Otten wrote: Won't Steve D'aprano And D'arcy Cain Be Happy Now :) Perhaps one could limit the conversion to go from lower to upper only, as names tend be in the desired case in the original text. That would help with acronyms as well. -- D'Arcy J.M. Cain Vybe Networks Inc. http://www.VybeNetworks.com/ IM:da...@vex.net VoIP: sip:da...@vybenetworks.com -- https://mail.python.org/mailman/listinfo/python-list
Re: str.title() fails with words containing apostrophes
Chris Angelico: > Right. If you want true title casing, it has to be *extremely* > linguistically-aware. For instance, title case has no meaning in the context of Finnish. In other words, your internationalized program shouldn't even think of title case when localized in Finnish. This localization problem runs even deeper. My Windows Phone displays the time and date: 15:49 maanantai 6. maaliskuuta It gets many things right. However, in Finland, nobody ever spells out the month name in dates. "March 6" should be localized to "6.3.", preferably to "6.3.2017". > Still, it's fun to discuss, if only to show why that kind of > locale-aware transformation is important. Finland is a bilingual country. You often see street ads for the same product in Finnish and Swedish. What is notable is that the Finnish and Swedish variants can have completely different punchlines because a translation just wouldn't sound good either way. Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: str.title() fails with words containing apostrophes
On Mon, Mar 6, 2017 at 9:04 PM, Peter Otten <__pete...@web.de> wrote: > Perhaps one could limit the conversion to go from lower to upper only, as > names tend be in the desired case in the original text. No, that just tends to make things confusing to use. > Unfortunately this won't help with > title("admiral von schneider") > 'Admiral Von Schneider' # von should be lower case On Mon, Mar 6, 2017 at 8:52 PM, Jussi Piitulainenwrote: > It also will capitalize all the little words in the string that are > usually not capitalized in titles, even in the usual headlinese English > variants. And all the acronyms and such that are usually written in all > caps, or in even odder patterns. Right. If you want true title casing, it has to be *extremely* linguistically-aware. Each of these highlights the fact that "title case" does not truly equate to "capitalize each whitespace-delimited word", so it's going to need some sort of intelligence. There's probably a linguistic library out there that does all of this, but it doesn't need to be in the stdlib. I am a little surprised by the "Don'T" from the OP, but I'm not at all surprised at "Admiral Von Schneider", nor of "How To Teach Css" and other "anomalies". Still, it's fun to discuss, if only to show why that kind of locale-aware transformation is important. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: str.title() fails with words containing apostrophes
Jussi Piitulainen wrote: > gvm...@gmail.com writes: > >> On Sunday, March 5, 2017 at 11:25:04 PM UTC+5:30, Steve D'Aprano wrote: >>> I'm trying to convert strings to Title Case, but getting ugly results >>> if the words contain an apostrophe: >>> >>> >>> py> 'hello world'.title() # okay >>> 'Hello World' >>> py> "i can't be having with this".title() # not okay >>> "I Can'T Be Having With This" >>> >>> >>> Anyone have any suggestions for working around this? > > [snip sig] > >> import string >> >> txt = "i can't be having with this" >> string.capwords(txt) >> >> That gives you "I Can't Be Having With This" >> >> Hope that helps. > > Won't Steve D'aprano And D'arcy Cain Be Happy Now :) Perhaps one could limit the conversion to go from lower to upper only, as names tend be in the desired case in the original text. >>> def first_up(s): ... return s[:1].upper() + s[1:] ... >>> def title(s): ... return re.compile(r"(?:\b'?)\w+").sub(lambda m: first_up(m.group()), s) ... >>> title("won't steve D'Aprano and d'arcy cain be 'happy' now?") "Won't Steve D'Aprano And D'arcy Cain Be 'Happy' Now?" Unfortunately this won't help with >>> title("admiral von schneider") 'Admiral Von Schneider' # von should be lower case -- https://mail.python.org/mailman/listinfo/python-list
Re: str.title() fails with words containing apostrophes
gvm...@gmail.com writes: > On Monday, March 6, 2017 at 2:37:11 PM UTC+5:30, Jussi Piitulainen wrote: >> gvm...@gmail.com writes: >> >> > On Sunday, March 5, 2017 at 11:25:04 PM UTC+5:30, Steve D'Aprano wrote: >> >> I'm trying to convert strings to Title Case, but getting ugly results >> >> if the words contain an apostrophe: >> >> >> >> >> >> py> 'hello world'.title() # okay >> >> 'Hello World' >> >> py> "i can't be having with this".title() # not okay >> >> "I Can'T Be Having With This" >> >> >> >> >> >> Anyone have any suggestions for working around this? >> >> [snip sig] >> >> > import string >> > >> > txt = "i can't be having with this" >> > string.capwords(txt) >> > >> > That gives you "I Can't Be Having With This" >> > >> > Hope that helps. >> >> Won't Steve D'aprano And D'arcy Cain Be Happy Now :) > > > I found it at https://docs.python.org/3/library/string.html#string.capwords :) Sure, it's there, and that's a good point. It still mangles their names. It also mangles any whitespace in the string. That is probably mostly harmless. It also will capitalize all the little words in the string that are usually not capitalized in titles, even in the usual headlinese English variants. And all the acronyms and such that are usually written in all caps, or in even odder patterns. I guess it's a somewhat practical approximation to an AI-hard problem. (Mumble mumble str.swapcase, er, never mind me :) -- https://mail.python.org/mailman/listinfo/python-list
Re: str.title() fails with words containing apostrophes
On Monday, March 6, 2017 at 2:37:11 PM UTC+5:30, Jussi Piitulainen wrote: > gvm...@gmail.com writes: > > > On Sunday, March 5, 2017 at 11:25:04 PM UTC+5:30, Steve D'Aprano wrote: > >> I'm trying to convert strings to Title Case, but getting ugly results > >> if the words contain an apostrophe: > >> > >> > >> py> 'hello world'.title() # okay > >> 'Hello World' > >> py> "i can't be having with this".title() # not okay > >> "I Can'T Be Having With This" > >> > >> > >> Anyone have any suggestions for working around this? > > [snip sig] > > > import string > > > > txt = "i can't be having with this" > > string.capwords(txt) > > > > That gives you "I Can't Be Having With This" > > > > Hope that helps. > > Won't Steve D'aprano And D'arcy Cain Be Happy Now :) I found it at https://docs.python.org/3/library/string.html#string.capwords :) -- https://mail.python.org/mailman/listinfo/python-list
Re: str.title() fails with words containing apostrophes
On Sunday, March 5, 2017 at 11:25:04 PM UTC+5:30, Steve D'Aprano wrote: > I'm trying to convert strings to Title Case, but getting ugly results if the > words contain an apostrophe: > > > py> 'hello world'.title() # okay > 'Hello World' > py> "i can't be having with this".title() # not okay > "I Can'T Be Having With This" > > > Anyone have any suggestions for working around this? > > > > -- > Steve > “Cheer up,” they said, “things could be worse.” So I cheered up, and sure > enough, things got worse. import string txt = "i can't be having with this" string.capwords(txt) That gives you "I Can't Be Having With This" Alternatively txt = "i can't be having with this" ' '.join([word.capitalize() for word in txt.split()]) will result in: "I Can't Be Having With This" -- https://mail.python.org/mailman/listinfo/python-list
Re: str.title() fails with words containing apostrophes
gvm...@gmail.com writes: > On Sunday, March 5, 2017 at 11:25:04 PM UTC+5:30, Steve D'Aprano wrote: >> I'm trying to convert strings to Title Case, but getting ugly results >> if the words contain an apostrophe: >> >> >> py> 'hello world'.title() # okay >> 'Hello World' >> py> "i can't be having with this".title() # not okay >> "I Can'T Be Having With This" >> >> >> Anyone have any suggestions for working around this? [snip sig] > import string > > txt = "i can't be having with this" > string.capwords(txt) > > That gives you "I Can't Be Having With This" > > Hope that helps. Won't Steve D'aprano And D'arcy Cain Be Happy Now :) -- https://mail.python.org/mailman/listinfo/python-list
Re: str.title() fails with words containing apostrophes
On Sunday, March 5, 2017 at 11:25:04 PM UTC+5:30, Steve D'Aprano wrote: > I'm trying to convert strings to Title Case, but getting ugly results if the > words contain an apostrophe: > > > py> 'hello world'.title() # okay > 'Hello World' > py> "i can't be having with this".title() # not okay > "I Can'T Be Having With This" > > > Anyone have any suggestions for working around this? > > > > -- > Steve > “Cheer up,” they said, “things could be worse.” So I cheered up, and sure > enough, things got worse. import string txt = "i can't be having with this" string.capwords(txt) That gives you "I Can't Be Having With This" Hope that helps. -- https://mail.python.org/mailman/listinfo/python-list
Re: str.title() fails with words containing apostrophes
On 2017-03-05 03:40 PM, Terry Reedy wrote: import re def title(string): return re.sub(r"\b'\w", lambda m: m.group().lower(), string.title()) Nice. It lowercases a word char that follows an "'" that follows a word without an intervening non-word char. It passes this test: print(title("'time' isn't 'timeless'!")) 'Time' Isn't 'Timeless'! It guess the reason not to bake this exception into str.title is that it is language specific and could even be wrong if someone used "'" to separate words (perhaps in a different alphabet). Or, it doesn't handle exceptions. print title("My name is D'Arcy") Oops. -- D'Arcy J.M. Cain Vybe Networks Inc. http://www.VybeNetworks.com/ IM:da...@vex.net VoIP: sip:da...@vybenetworks.com -- https://mail.python.org/mailman/listinfo/python-list
Re: str.title() fails with words containing apostrophes
On 3/5/2017 2:38 PM, MRAB wrote: On 2017-03-05 17:54, Steve D'Aprano wrote: I'm trying to convert strings to Title Case, but getting ugly results if the words contain an apostrophe: py> 'hello world'.title() # okay 'Hello World' py> "i can't be having with this".title() # not okay "I Can'T Be Having With This" Anyone have any suggestions for working around this? A bit of regex? import re def title(string): return re.sub(r"\b'\w", lambda m: m.group().lower(), string.title()) Nice. It lowercases a word char that follows an "'" that follows a word without an intervening non-word char. It passes this test: print(title("'time' isn't 'timeless'!")) 'Time' Isn't 'Timeless'! It guess the reason not to bake this exception into str.title is that it is language specific and could even be wrong if someone used "'" to separate words (perhaps in a different alphabet). -- Terry Jan Reedy -- https://mail.python.org/mailman/listinfo/python-list
Re: str.title() fails with words containing apostrophes
On 2017-03-05 17:54, Steve D'Aprano wrote: I'm trying to convert strings to Title Case, but getting ugly results if the words contain an apostrophe: py> 'hello world'.title() # okay 'Hello World' py> "i can't be having with this".title() # not okay "I Can'T Be Having With This" Anyone have any suggestions for working around this? A bit of regex? import re def title(string): return re.sub(r"\b'\w", lambda m: m.group().lower(), string.title()) -- https://mail.python.org/mailman/listinfo/python-list
str.title() fails with words containing apostrophes
I'm trying to convert strings to Title Case, but getting ugly results if the words contain an apostrophe: py> 'hello world'.title() # okay 'Hello World' py> "i can't be having with this".title() # not okay "I Can'T Be Having With This" Anyone have any suggestions for working around this? -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list