Re: [docbook-apps] Apostrophe in docbook document

2010-01-27 Thread Dave Pawson

Hi Ron


On 26/01/10 20:42, Ron Catterall wrote:


Hi Dave

Not sure why I got into this, but I'll push it along a bit.

XML was designed to allow the storage of formatted text in a human and
machine readable state.

When a human does the reading (of the XML text) he can see the apos or
rsquo character in context and guess pretty accurately whether it is an
indication of a missing character, a genitive marker or a closing quote.
So far I am with you all the way - it doesn't matter in English.


And when the formatted output is presented to the human
which Unicode code point is used is rarely material.




Now look at machine reading:
Imagine a linguist wanting to search some text to count
1. The use of contractions (e.g. isn't versus is not ). He wants to find
list and count all contractions. His text editor or little Perl script
(he doesn't know regex) looks for rsquo and finds what he wants
corrupted by lots of extraneous closing strings and genitive markers.
The three logically different functions are represented by the same code.
2. ditto except that this time he wants to find quoted strings
3. ditto but this time his interest is in the grammar and he is
searching for genitives
4. why he might want to distinguish between singular and plural
genitives is beyond me. But he might.


My initial reaction is who the heck is going to mark this up - 
accurately and with the knowledge of English and Unicode to do

a good job of it. Someone in Edinburgh perhaps? http://www.ling.ed.ac.uk/




I guess I just don't like one symbol with three meanings. Imagine this
in your code, you don't need = == and EQ, one symbol will handle all.


Yep. I'm doing that to please the compiler writer I guess.



The problem of course is not a Docbook problem, it is in the UTF tables
(and the linguist would probably be using TEI anyway, but it's not a TEI
problem either)


Your proposal is a solution, not a problem Ron :-)




In my case all my quotes in XML tags are done on the keyboard #x27, all
my text quotes are quote, all my apostrophe marks and genitives are
apos so a simple global edit puts all to rights for me - now that I
know to use rsquo



Suggestion. If you're using Linux. Look into keyboard mappings
and use... perhaps your numeric keypad to generate this 'suite' for you
using a single keypress? Just a thought.

regards




regards

--
Dave Pawson
XSLT XSL-FO FAQ.
http://www.dpawson.co.uk

-
To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org



Re: [docbook-apps] Apostrophe in docbook document

2010-01-26 Thread Vincent Hennebert
Hi,

Ron Catterall wrote:
 Could somebody tell me which is the correct symbol to use for apostrophe.
 
 The HTML special characters entity set has apos #x27 where apos
 presumably stands for apostrophe - this is the straight up single quote
 mark which XML defines and uses for special purposes.

It’s advised not to use it for apostrophe. It should be used only in
programming languages, XML, etc. So, as you said, for special purposes.


 The Shorter Oxford English Dictionary on-line tells me:
 The omission of one or more letters in a word.
 A sign (') used to indicate the omission of one or more letters or
 numerals (as in can't, o'er, 'cello; spirit of '76 (i.e. 1776)), or in
 marking the possessive case (man's, boys').
 So SOED appears to use apos.  Do they know, or just confused?  Or could
 it be a UK/US difference?
 (I've always used the keyboard ' for XML use and apos when I want the
 apostrophe in text.)

You must be one of the very rare ones!


 Wikipedia tells me:
 The apostrophe is different from the closing single quotation mark
 (usually rendered identically but serving a quite different purpose),
 and from the similar-looking prime (which is used to indicate
 measurement in feet or arcminutes, and for various mathematical purposes).
 So Wiki tells me use rsquo  #x2019 (#8217) which looks like an
 apostrophe, and not to use apos or prime (#x2032)
 but also:
 The prime symbol should not be confused with the apostrophe, single
 quotation mark, acute accent or grave accent.

Not sure you will be pleased by this, but to my knowledge Unicode
defines only one code point for both the apostrophe and the right single
quotation mark (U+2019). There are different code points for prime
(U+2032) and acute accent (U+00B4), though. Strictly speaking, I suppose
that there should be two different code points for apostrophe and right
single quotation mark, that would be rendered using the same glyph by
most fonts. I don’t know if this is an oversight from the Unicode
Standard committee or if it was deliberately done so because that would
otherwise be splitting hair too much.

Quote from the Unicode Standard, version 5.2:
‘Punctuation Apostrophe. U+2019 right single quotation mark is
preferred where the character is to represent a punctuation mark, as
for contractions: “We’ve been here before.” In this latter case,
U+2019 is also referred to as a punctuation apostrophe.’
And also:
‘When text is set, U+2019 right single quotation mark is preferred
as apostrophe, but only U+0027 is present on keyboards. Word
processors commonly offer a facility for automatically converting
the U+0027 apostrophe to a contextually selected curly quotation
glyph.’


Vincent


 Jacques Foucry wrote:
 On 25 janv. 2010, at 17:30, Mathieu Malaterre wrote:
 Hello,

 #3 is the fastest to type. #2 and #4 are ugly to read when editing
 the .xml file using text file. How about solution #1

 In my source document I use the single quote (#4). In my custom
 stylesheet I change them by the curly quote (#1).

 xsl:param name=singlequote
 xsl:text'/xsl:text
 /xsl:param
 xsl:param name=curlyquote
 xsl:text’/xsl:text
 /xsl:param

 xsl:template match=d:para/text() | d:title/text()
 xsl:value-of select=translate(.,$singlequote,$curlyquote)/
 /xsl:template

 I do not change for computeroutput or litteral tags.

 Jacques

-
To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org



Re: [docbook-apps] Apostrophe in docbook document

2010-01-26 Thread maxwell
On Tue, 26 Jan 2010 14:42:34 -0600, Ron Catterall r...@catterall.net
wrote:
 Imagine a linguist wanting to search some text to count
 ...
 The problem of course is not a Docbook problem, it is in the UTF tables 

The problem is with neither, it is with the linguist :-).  (I can say
that, because I'm a linguist.)

All seriousness aside, using corpora for linguistics requires more than
looking for certain Unicode characters, which may not be used consistently
anyway (and especially in a case like this, where the characters--if they
were distinct Unicode characters--would doubtless be confused).  

Distinguishing between quotes and apostrophes requires some fairly complex
methods.  There are rules of thumb that often work, but they will break on
certain cases.  Corpora linguists become familiar with where these things
break, and construct work-arounds accordingly, or hand-tag recalcitrant
cases.

If you really want an interesting problem, go for distinguishing among the
uses of the ASCII period!

   Mike Maxwell

-
To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org



[docbook-apps] Apostrophe in docbook document

2010-01-25 Thread Mathieu Malaterre
Hi there,

 I would like to know what are people using for there apostrophe in there
docbook document ? There are 4 contestants:
1 ’ (curly on UTF-8 system)
2 rsquo;
3 '
4 #8217;

#3 is the fastest to type. #2 and #4 are ugly to read when editing the .xml
file using text file. How about solution #1 ?

Thanks
-- 
Mathieu


Re: [docbook-apps] Apostrophe in docbook document

2010-01-25 Thread Jacques Foucry
On 25 janv. 2010, at 17:30, Mathieu Malaterre wrote:
Hello,

 #3 is the fastest to type. #2 and #4 are ugly to read when editing the .xml 
 file using text file. How about solution #1

In my source document I use the single quote (#4). In my custom stylesheet I 
change them by the curly quote (#1).

xsl:param name=singlequote
xsl:text'/xsl:text
/xsl:param
xsl:param name=curlyquote
xsl:text’/xsl:text
/xsl:param

xsl:template match=d:para/text() | d:title/text()
xsl:value-of select=translate(.,$singlequote,$curlyquote)/
/xsl:template

I do not change for computeroutput or litteral tags.

Jacques
-
To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org



Re: [docbook-apps] Apostrophe in docbook document

2010-01-25 Thread Christopher R. Maden
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Mathieu Malaterre wrote:
 Hi there,
 
  I would like to know what are people using for there apostrophe in there
 docbook document ? There are 4 contestants:
 1 ’ (curly on UTF-8 system)
 2 rsquo;
 3 '
 4 #8217;
 
 #3 is the fastest to type. #2 and #4 are ugly to read when editing the .xml
 file using text file. How about solution #1 ?

№s 1, 2, and 4 are exactly the same.  The XML parser will treat them
identically (assuming the rsquo entity is defined correctly).

While №s 2 and 4 are ugly to read, they are easy to type; further, № 2
is much easier to remember than № 4.  № 1 is very hard to type for most
people; I have the kind of brain that remembers character codes, and I
don’t mind typing Ctrl-Shift-U 2 0 1 9 SPACE in order to enter it (in a
GNOME system), or  ' 9 (in an RFC 1345 environment like Emacs), or
Alt-0146 in Windows, but I really don’t expect anyone else to do that.

Someday, someone will create an affordable, usable keyboard with proper
punctuation on it...  Until then, I recommend № 2.

~Chris
- --
Chris Maden, text nerd  URL: http://crism.maden.org/ 
“The most merciful thing in the world, I think, is the inability of
 the human mind to correlate all its contents.” — H.P. Lovecraft
GnuPG Fingerprint: C6E4 E2A9 C9F8 71AC 9724 CAA3 19F8 6677 0077 C319
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAktdyuYACgkQGfhmdwB3wxmhtACeJ9hZ6uROPNMndSz012DAY1KP
5zEAoNtrg+jBN2STezZiILwyqkAt7blx
=VrdC
-END PGP SIGNATURE-

-
To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org



Re: [docbook-apps] Apostrophe in docbook document

2010-01-25 Thread Ron Catterall

Could somebody tell me which is the correct symbol to use for apostrophe.

The HTML special characters entity set has apos #x27 where apos 
presumably stands for apostrophe - this is the straight up single quote 
mark which XML defines and uses for special purposes.


The Shorter Oxford English Dictionary on-line tells me:
The omission of one or more letters in a word.
A sign (') used to indicate the omission of one or more letters or 
numerals (as in can't, o'er, 'cello; spirit of '76 (i.e. 1776)), or in 
marking the possessive case (man's, boys').
So SOED appears to use apos.  Do they know, or just confused?  Or could 
it be a UK/US difference?
(I've always used the keyboard ' for XML use and apos when I want the 
apostrophe in text.)


Wikipedia tells me:
The apostrophe is different from the closing single quotation mark 
(usually rendered identically but serving a quite different purpose), 
and from the similar-looking prime (which is used to indicate 
measurement in feet or arcminutes, and for various mathematical purposes).
So Wiki tells me use rsquo  #x2019 (#8217) which looks like an 
apostrophe, and not to use apos or prime (#x2032)

but also:
The prime symbol should not be confused with the apostrophe, single 
quotation mark, acute accent or grave accent.


Jacques Foucry wrote:

On 25 janv. 2010, at 17:30, Mathieu Malaterre wrote:
Hello,


#3 is the fastest to type. #2 and #4 are ugly to read when editing the .xml 
file using text file. How about solution #1


In my source document I use the single quote (#4). In my custom stylesheet I 
change them by the curly quote (#1).

xsl:param name=singlequote
xsl:text'/xsl:text
/xsl:param
xsl:param name=curlyquote
xsl:text’/xsl:text
/xsl:param

xsl:template match=d:para/text() | d:title/text()
xsl:value-of select=translate(.,$singlequote,$curlyquote)/
/xsl:template

I do not change for computeroutput or litteral tags.

Jacques
-
To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org





--
Ron Catterall Ph.D. D.Sc.
r...@catterall.net
http://catterall.net



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [docbook-apps] Apostrophe in docbook document

2010-01-25 Thread Ron Catterall

Hi all

Having consulted the Lady of the House over a cocktail or two, I think 
we understand the problem and have a solution (given a decision what an 
apostrophe should look like on paper.)


We have (at least) three logical symbols:
1. a singular possessive - this is Ron' book
2. a plral posessive - these the are mens' books
3. a missing word ain't (or old English an't)

In principle we need three different symbols, **BUT** these different 
symbols are only needed for computerized searches, not for visual 
scanning by humans, of text.  The principle is already implemented in 
XML:  different U-codes can point to the same symbol.


So we want four different Ucode points, possibly !amiss;, !asing;, and 
!aplur: and !apos;, the first three of which point to the symbol 
currently rsquo; and the fourth which points to apos; (I use ! instead 
of  to avoid any mis-interpretation of symbols by my email software 
(Thunderbird)).


Given these pointers in Ucode any software can unambiguously parse any 
XML code.  Any human will (or should be intelligent enough to) read the 
visual display and interpret correctly the meaning of the symbol correctly.


But what should the (single) symbol look like on paper?  The Lady of the 
House is quite clear on this - at school I was taught my Mr. Webster 
that an apostrophe was a 'small filled in 9' raised above the line, 
rather 'like a comma, which is a small filled in 9 on the line and 
projecting below'.


I defer totally to Mr. Webster (whoever he was) and to the Lady of the 
House - Jacqui Holland-Bradley, as she was then known, and the lady who 
introduced IP-networking to the British and European community when the 
Telcos of the UK and Europe were all saying we will never do this 
Internet thing over here - OSI is the way to go.  Anybody remember OSI?


As a matter of total irrelevance to the apostrophe question, Jacqui 
hosted and organized the first occasion at her IPNetworking conference 
in 1991 when the USSR (as it was then) connected to the Internet.  As 
such she speaks with the authority of, 'she who must be obeyed' (and if 
you haven't met Rumpole of the Bailey, your education is sadly 
incomplete - Jacqui worked out of Gray's Inn)!


Ron


Christopher R. Maden wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Mathieu Malaterre wrote:

Hi there,

 I would like to know what are people using for there apostrophe in there
docbook document ? There are 4 contestants:
1 ’ (curly on UTF-8 system)
2 rsquo;
3 '
4 #8217;

#3 is the fastest to type. #2 and #4 are ugly to read when editing the .xml
file using text file. How about solution #1 ?


№s 1, 2, and 4 are exactly the same.  The XML parser will treat them
identically (assuming the rsquo entity is defined correctly).

While №s 2 and 4 are ugly to read, they are easy to type; further, № 2
is much easier to remember than № 4.  № 1 is very hard to type for most
people; I have the kind of brain that remembers character codes, and I
don’t mind typing Ctrl-Shift-U 2 0 1 9 SPACE in order to enter it (in a
GNOME system), or  ' 9 (in an RFC 1345 environment like Emacs), or
Alt-0146 in Windows, but I really don’t expect anyone else to do that.

Someday, someone will create an affordable, usable keyboard with proper
punctuation on it...  Until then, I recommend № 2.

~Chris
- --
Chris Maden, text nerd  URL: http://crism.maden.org/ 
“The most merciful thing in the world, I think, is the inability of
 the human mind to correlate all its contents.” — H.P. Lovecraft
GnuPG Fingerprint: C6E4 E2A9 C9F8 71AC 9724 CAA3 19F8 6677 0077 C319
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAktdyuYACgkQGfhmdwB3wxmhtACeJ9hZ6uROPNMndSz012DAY1KP
5zEAoNtrg+jBN2STezZiILwyqkAt7blx
=VrdC
-END PGP SIGNATURE-

-
To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org





--
Ron Catterall Ph.D. D.Sc.
r...@catterall.net
http://catterall.net



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [docbook-apps] Apostrophe in docbook document

2010-01-25 Thread Keith Fahlgren
On Mon, Jan 25, 2010 at 4:53 PM, Ron Catterall r...@catterall.net wrote:
 We have (at least) three logical symbols:
 1. a singular possessive - this is Ron' book
 2. a plral posessive - these the are mens' books
 3. a missing word ain't (or old English an't)

Missing some:
* Slang: What ya mean, 'unting rabbits?
* Quotes in quotes: I can't believe you'd quote her saying, this is
totally 'bogus'.
* Numbers: '99

...also make sure you don't do anything in programlisting, code, etc...

-
To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org



Re: [docbook-apps] Apostrophe in docbook document

2010-01-25 Thread Brooks Moses
Keith Fahlgren wrote, at 1/25/2010 4:58 PM:
 On Mon, Jan 25, 2010 at 4:53 PM, Ron Catterall r...@catterall.net wrote:
 We have (at least) three logical symbols:
 1. a singular possessive - this is Ron' book
 2. a plral posessive - these the are mens' books
 3. a missing word ain't (or old English an't)

I assume you mean a missing letter or letters for 3, not a missing word.

Also, historically speaking these all come from a common root, or at
least that's been claimed.  (I.e., Ron; his book - Ron's book.)  I
would guess that the origin of the practice is probably at least
somewhat a myth, but typographically it might was well be true and these
have always been represented by the same symbol.  Thus, I'm at all not
sure of the benefit of trying to split things out in this way, which as
far as I know is completely unprecedented.  At the very least, splitting
singular and plural possessives out seems a hair that ought remain unsplit.

 Missing some:
 * Slang: What ya mean, 'unting rabbits?

This is a case of the revised item 3, above.

 * Quotes in quotes: I can't believe you'd quote her saying, this is
 totally 'bogus'.

These are single quotes, not apostrophes.  They're a different thing
altogether -- and typographically so, as well as in meaning.

 * Numbers: '99

This is a case of the revised item 3, above, if we further revise it to
say missing letters or numbers.  Logically, I believe it's the same thing.

Of course, I also believe all of Ron's list are the same thing -- and
that, in fact, you have one logical symbol that's used for two closely
related uses -- so the fact that I think missing numbers and missing
letters are equivalent may be something people want to argue with.

- Brooks

-
To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org



Re: [docbook-apps] Apostrophe in docbook document

2010-01-25 Thread Dave Pawson

On 26/01/10 00:53, Ron Catterall wrote:

Hi all

Having consulted the Lady of the House over a cocktail or two, I think
we understand the problem and have a solution (given a decision what an
apostrophe should look like on paper.)

We have (at least) three logical symbols:
1. a singular possessive - this is Ron' book
2. a plral posessive - these the are mens' books
3. a missing word ain't (or old English an't)

In principle we need three different symbols, **BUT** these different
symbols are only needed for computerized searches, not for visual
scanning by humans,



Beg to differ Ron, English appears not to require more than one?
Is it simply for your search needs?

The only different one in your previous list is the prime symbol, U+2032.
The remainder should be the same.




regards

--
Dave Pawson
XSLT XSL-FO FAQ.
http://www.dpawson.co.uk

-
To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org