Re: [docbook-apps] Apostrophe in docbook document
Hi Ron On 26/01/10 20:42, Ron Catterall wrote: Hi Dave Not sure why I got into this, but I'll push it along a bit. XML was designed to allow the storage of formatted text in a human and machine readable state. When a human does the reading (of the XML text) he can see the apos or rsquo character in context and guess pretty accurately whether it is an indication of a missing character, a genitive marker or a closing quote. So far I am with you all the way - it doesn't matter in English. And when the formatted output is presented to the human which Unicode code point is used is rarely material. Now look at machine reading: Imagine a linguist wanting to search some text to count 1. The use of contractions (e.g. isn't versus is not ). He wants to find list and count all contractions. His text editor or little Perl script (he doesn't know regex) looks for rsquo and finds what he wants corrupted by lots of extraneous closing strings and genitive markers. The three logically different functions are represented by the same code. 2. ditto except that this time he wants to find quoted strings 3. ditto but this time his interest is in the grammar and he is searching for genitives 4. why he might want to distinguish between singular and plural genitives is beyond me. But he might. My initial reaction is who the heck is going to mark this up - accurately and with the knowledge of English and Unicode to do a good job of it. Someone in Edinburgh perhaps? http://www.ling.ed.ac.uk/ I guess I just don't like one symbol with three meanings. Imagine this in your code, you don't need = == and EQ, one symbol will handle all. Yep. I'm doing that to please the compiler writer I guess. The problem of course is not a Docbook problem, it is in the UTF tables (and the linguist would probably be using TEI anyway, but it's not a TEI problem either) Your proposal is a solution, not a problem Ron :-) In my case all my quotes in XML tags are done on the keyboard #x27, all my text quotes are quote, all my apostrophe marks and genitives are apos so a simple global edit puts all to rights for me - now that I know to use rsquo Suggestion. If you're using Linux. Look into keyboard mappings and use... perhaps your numeric keypad to generate this 'suite' for you using a single keypress? Just a thought. regards regards -- Dave Pawson XSLT XSL-FO FAQ. http://www.dpawson.co.uk - To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org
Re: [docbook-apps] Apostrophe in docbook document
Hi, Ron Catterall wrote: Could somebody tell me which is the correct symbol to use for apostrophe. The HTML special characters entity set has apos #x27 where apos presumably stands for apostrophe - this is the straight up single quote mark which XML defines and uses for special purposes. It’s advised not to use it for apostrophe. It should be used only in programming languages, XML, etc. So, as you said, for special purposes. The Shorter Oxford English Dictionary on-line tells me: The omission of one or more letters in a word. A sign (') used to indicate the omission of one or more letters or numerals (as in can't, o'er, 'cello; spirit of '76 (i.e. 1776)), or in marking the possessive case (man's, boys'). So SOED appears to use apos. Do they know, or just confused? Or could it be a UK/US difference? (I've always used the keyboard ' for XML use and apos when I want the apostrophe in text.) You must be one of the very rare ones! Wikipedia tells me: The apostrophe is different from the closing single quotation mark (usually rendered identically but serving a quite different purpose), and from the similar-looking prime (which is used to indicate measurement in feet or arcminutes, and for various mathematical purposes). So Wiki tells me use rsquo #x2019 (#8217) which looks like an apostrophe, and not to use apos or prime (#x2032) but also: The prime symbol should not be confused with the apostrophe, single quotation mark, acute accent or grave accent. Not sure you will be pleased by this, but to my knowledge Unicode defines only one code point for both the apostrophe and the right single quotation mark (U+2019). There are different code points for prime (U+2032) and acute accent (U+00B4), though. Strictly speaking, I suppose that there should be two different code points for apostrophe and right single quotation mark, that would be rendered using the same glyph by most fonts. I don’t know if this is an oversight from the Unicode Standard committee or if it was deliberately done so because that would otherwise be splitting hair too much. Quote from the Unicode Standard, version 5.2: ‘Punctuation Apostrophe. U+2019 right single quotation mark is preferred where the character is to represent a punctuation mark, as for contractions: “We’ve been here before.” In this latter case, U+2019 is also referred to as a punctuation apostrophe.’ And also: ‘When text is set, U+2019 right single quotation mark is preferred as apostrophe, but only U+0027 is present on keyboards. Word processors commonly offer a facility for automatically converting the U+0027 apostrophe to a contextually selected curly quotation glyph.’ Vincent Jacques Foucry wrote: On 25 janv. 2010, at 17:30, Mathieu Malaterre wrote: Hello, #3 is the fastest to type. #2 and #4 are ugly to read when editing the .xml file using text file. How about solution #1 In my source document I use the single quote (#4). In my custom stylesheet I change them by the curly quote (#1). xsl:param name=singlequote xsl:text'/xsl:text /xsl:param xsl:param name=curlyquote xsl:text’/xsl:text /xsl:param xsl:template match=d:para/text() | d:title/text() xsl:value-of select=translate(.,$singlequote,$curlyquote)/ /xsl:template I do not change for computeroutput or litteral tags. Jacques - To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org
Re: [docbook-apps] Apostrophe in docbook document
On Tue, 26 Jan 2010 14:42:34 -0600, Ron Catterall r...@catterall.net wrote: Imagine a linguist wanting to search some text to count ... The problem of course is not a Docbook problem, it is in the UTF tables The problem is with neither, it is with the linguist :-). (I can say that, because I'm a linguist.) All seriousness aside, using corpora for linguistics requires more than looking for certain Unicode characters, which may not be used consistently anyway (and especially in a case like this, where the characters--if they were distinct Unicode characters--would doubtless be confused). Distinguishing between quotes and apostrophes requires some fairly complex methods. There are rules of thumb that often work, but they will break on certain cases. Corpora linguists become familiar with where these things break, and construct work-arounds accordingly, or hand-tag recalcitrant cases. If you really want an interesting problem, go for distinguishing among the uses of the ASCII period! Mike Maxwell - To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org
[docbook-apps] Apostrophe in docbook document
Hi there, I would like to know what are people using for there apostrophe in there docbook document ? There are 4 contestants: 1 ’ (curly on UTF-8 system) 2 rsquo; 3 ' 4 #8217; #3 is the fastest to type. #2 and #4 are ugly to read when editing the .xml file using text file. How about solution #1 ? Thanks -- Mathieu
Re: [docbook-apps] Apostrophe in docbook document
On 25 janv. 2010, at 17:30, Mathieu Malaterre wrote: Hello, #3 is the fastest to type. #2 and #4 are ugly to read when editing the .xml file using text file. How about solution #1 In my source document I use the single quote (#4). In my custom stylesheet I change them by the curly quote (#1). xsl:param name=singlequote xsl:text'/xsl:text /xsl:param xsl:param name=curlyquote xsl:text’/xsl:text /xsl:param xsl:template match=d:para/text() | d:title/text() xsl:value-of select=translate(.,$singlequote,$curlyquote)/ /xsl:template I do not change for computeroutput or litteral tags. Jacques - To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org
Re: [docbook-apps] Apostrophe in docbook document
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Mathieu Malaterre wrote: Hi there, I would like to know what are people using for there apostrophe in there docbook document ? There are 4 contestants: 1 ’ (curly on UTF-8 system) 2 rsquo; 3 ' 4 #8217; #3 is the fastest to type. #2 and #4 are ugly to read when editing the .xml file using text file. How about solution #1 ? №s 1, 2, and 4 are exactly the same. The XML parser will treat them identically (assuming the rsquo entity is defined correctly). While №s 2 and 4 are ugly to read, they are easy to type; further, № 2 is much easier to remember than № 4. № 1 is very hard to type for most people; I have the kind of brain that remembers character codes, and I don’t mind typing Ctrl-Shift-U 2 0 1 9 SPACE in order to enter it (in a GNOME system), or ' 9 (in an RFC 1345 environment like Emacs), or Alt-0146 in Windows, but I really don’t expect anyone else to do that. Someday, someone will create an affordable, usable keyboard with proper punctuation on it... Until then, I recommend № 2. ~Chris - -- Chris Maden, text nerd URL: http://crism.maden.org/ “The most merciful thing in the world, I think, is the inability of the human mind to correlate all its contents.” — H.P. Lovecraft GnuPG Fingerprint: C6E4 E2A9 C9F8 71AC 9724 CAA3 19F8 6677 0077 C319 -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAktdyuYACgkQGfhmdwB3wxmhtACeJ9hZ6uROPNMndSz012DAY1KP 5zEAoNtrg+jBN2STezZiILwyqkAt7blx =VrdC -END PGP SIGNATURE- - To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org
Re: [docbook-apps] Apostrophe in docbook document
Could somebody tell me which is the correct symbol to use for apostrophe. The HTML special characters entity set has apos #x27 where apos presumably stands for apostrophe - this is the straight up single quote mark which XML defines and uses for special purposes. The Shorter Oxford English Dictionary on-line tells me: The omission of one or more letters in a word. A sign (') used to indicate the omission of one or more letters or numerals (as in can't, o'er, 'cello; spirit of '76 (i.e. 1776)), or in marking the possessive case (man's, boys'). So SOED appears to use apos. Do they know, or just confused? Or could it be a UK/US difference? (I've always used the keyboard ' for XML use and apos when I want the apostrophe in text.) Wikipedia tells me: The apostrophe is different from the closing single quotation mark (usually rendered identically but serving a quite different purpose), and from the similar-looking prime (which is used to indicate measurement in feet or arcminutes, and for various mathematical purposes). So Wiki tells me use rsquo #x2019 (#8217) which looks like an apostrophe, and not to use apos or prime (#x2032) but also: The prime symbol should not be confused with the apostrophe, single quotation mark, acute accent or grave accent. Jacques Foucry wrote: On 25 janv. 2010, at 17:30, Mathieu Malaterre wrote: Hello, #3 is the fastest to type. #2 and #4 are ugly to read when editing the .xml file using text file. How about solution #1 In my source document I use the single quote (#4). In my custom stylesheet I change them by the curly quote (#1). xsl:param name=singlequote xsl:text'/xsl:text /xsl:param xsl:param name=curlyquote xsl:text’/xsl:text /xsl:param xsl:template match=d:para/text() | d:title/text() xsl:value-of select=translate(.,$singlequote,$curlyquote)/ /xsl:template I do not change for computeroutput or litteral tags. Jacques - To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org -- Ron Catterall Ph.D. D.Sc. r...@catterall.net http://catterall.net smime.p7s Description: S/MIME Cryptographic Signature
Re: [docbook-apps] Apostrophe in docbook document
Hi all Having consulted the Lady of the House over a cocktail or two, I think we understand the problem and have a solution (given a decision what an apostrophe should look like on paper.) We have (at least) three logical symbols: 1. a singular possessive - this is Ron' book 2. a plral posessive - these the are mens' books 3. a missing word ain't (or old English an't) In principle we need three different symbols, **BUT** these different symbols are only needed for computerized searches, not for visual scanning by humans, of text. The principle is already implemented in XML: different U-codes can point to the same symbol. So we want four different Ucode points, possibly !amiss;, !asing;, and !aplur: and !apos;, the first three of which point to the symbol currently rsquo; and the fourth which points to apos; (I use ! instead of to avoid any mis-interpretation of symbols by my email software (Thunderbird)). Given these pointers in Ucode any software can unambiguously parse any XML code. Any human will (or should be intelligent enough to) read the visual display and interpret correctly the meaning of the symbol correctly. But what should the (single) symbol look like on paper? The Lady of the House is quite clear on this - at school I was taught my Mr. Webster that an apostrophe was a 'small filled in 9' raised above the line, rather 'like a comma, which is a small filled in 9 on the line and projecting below'. I defer totally to Mr. Webster (whoever he was) and to the Lady of the House - Jacqui Holland-Bradley, as she was then known, and the lady who introduced IP-networking to the British and European community when the Telcos of the UK and Europe were all saying we will never do this Internet thing over here - OSI is the way to go. Anybody remember OSI? As a matter of total irrelevance to the apostrophe question, Jacqui hosted and organized the first occasion at her IPNetworking conference in 1991 when the USSR (as it was then) connected to the Internet. As such she speaks with the authority of, 'she who must be obeyed' (and if you haven't met Rumpole of the Bailey, your education is sadly incomplete - Jacqui worked out of Gray's Inn)! Ron Christopher R. Maden wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Mathieu Malaterre wrote: Hi there, I would like to know what are people using for there apostrophe in there docbook document ? There are 4 contestants: 1 ’ (curly on UTF-8 system) 2 rsquo; 3 ' 4 #8217; #3 is the fastest to type. #2 and #4 are ugly to read when editing the .xml file using text file. How about solution #1 ? №s 1, 2, and 4 are exactly the same. The XML parser will treat them identically (assuming the rsquo entity is defined correctly). While №s 2 and 4 are ugly to read, they are easy to type; further, № 2 is much easier to remember than № 4. № 1 is very hard to type for most people; I have the kind of brain that remembers character codes, and I don’t mind typing Ctrl-Shift-U 2 0 1 9 SPACE in order to enter it (in a GNOME system), or ' 9 (in an RFC 1345 environment like Emacs), or Alt-0146 in Windows, but I really don’t expect anyone else to do that. Someday, someone will create an affordable, usable keyboard with proper punctuation on it... Until then, I recommend № 2. ~Chris - -- Chris Maden, text nerd URL: http://crism.maden.org/ “The most merciful thing in the world, I think, is the inability of the human mind to correlate all its contents.” — H.P. Lovecraft GnuPG Fingerprint: C6E4 E2A9 C9F8 71AC 9724 CAA3 19F8 6677 0077 C319 -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAktdyuYACgkQGfhmdwB3wxmhtACeJ9hZ6uROPNMndSz012DAY1KP 5zEAoNtrg+jBN2STezZiILwyqkAt7blx =VrdC -END PGP SIGNATURE- - To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org -- Ron Catterall Ph.D. D.Sc. r...@catterall.net http://catterall.net smime.p7s Description: S/MIME Cryptographic Signature
Re: [docbook-apps] Apostrophe in docbook document
On Mon, Jan 25, 2010 at 4:53 PM, Ron Catterall r...@catterall.net wrote: We have (at least) three logical symbols: 1. a singular possessive - this is Ron' book 2. a plral posessive - these the are mens' books 3. a missing word ain't (or old English an't) Missing some: * Slang: What ya mean, 'unting rabbits? * Quotes in quotes: I can't believe you'd quote her saying, this is totally 'bogus'. * Numbers: '99 ...also make sure you don't do anything in programlisting, code, etc... - To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org
Re: [docbook-apps] Apostrophe in docbook document
Keith Fahlgren wrote, at 1/25/2010 4:58 PM: On Mon, Jan 25, 2010 at 4:53 PM, Ron Catterall r...@catterall.net wrote: We have (at least) three logical symbols: 1. a singular possessive - this is Ron' book 2. a plral posessive - these the are mens' books 3. a missing word ain't (or old English an't) I assume you mean a missing letter or letters for 3, not a missing word. Also, historically speaking these all come from a common root, or at least that's been claimed. (I.e., Ron; his book - Ron's book.) I would guess that the origin of the practice is probably at least somewhat a myth, but typographically it might was well be true and these have always been represented by the same symbol. Thus, I'm at all not sure of the benefit of trying to split things out in this way, which as far as I know is completely unprecedented. At the very least, splitting singular and plural possessives out seems a hair that ought remain unsplit. Missing some: * Slang: What ya mean, 'unting rabbits? This is a case of the revised item 3, above. * Quotes in quotes: I can't believe you'd quote her saying, this is totally 'bogus'. These are single quotes, not apostrophes. They're a different thing altogether -- and typographically so, as well as in meaning. * Numbers: '99 This is a case of the revised item 3, above, if we further revise it to say missing letters or numbers. Logically, I believe it's the same thing. Of course, I also believe all of Ron's list are the same thing -- and that, in fact, you have one logical symbol that's used for two closely related uses -- so the fact that I think missing numbers and missing letters are equivalent may be something people want to argue with. - Brooks - To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org
Re: [docbook-apps] Apostrophe in docbook document
On 26/01/10 00:53, Ron Catterall wrote: Hi all Having consulted the Lady of the House over a cocktail or two, I think we understand the problem and have a solution (given a decision what an apostrophe should look like on paper.) We have (at least) three logical symbols: 1. a singular possessive - this is Ron' book 2. a plral posessive - these the are mens' books 3. a missing word ain't (or old English an't) In principle we need three different symbols, **BUT** these different symbols are only needed for computerized searches, not for visual scanning by humans, Beg to differ Ron, English appears not to require more than one? Is it simply for your search needs? The only different one in your previous list is the prime symbol, U+2032. The remainder should be the same. regards -- Dave Pawson XSLT XSL-FO FAQ. http://www.dpawson.co.uk - To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org