Before inviting feedback on a number of questions, here’s my reasoning
again: According to the Chicago Manual of Style, 16e, 8.10, 16.71, “Pieter
van den Keere” needs to appear in the text (leaving capitalisation issues
aside) as “van den Keere” and in the bibliography as “Keere, Pieter van
den”. The same applies for “Tawfiq al-Hakim”: “al-Hakim” and “Hakim, Tawfiq
al-” (CMS 8.14, 16.76). This requires “van den” and “al-” to be entered or
parsed as a non-dropping particles, and “demote-non-dropping-particle” to
be set to “display-and-sort”. This in turn requires names such as “La
Fontaine” to be entered/parsed as one multi-part family name rather than
what the CSL specs used to suggest, “La” as non-dropping-particle and
“Fontaine” as family name, or else we’d end up with the incorrect
“Fontaine, Jean de La”. (Parsing “La Fontaine” as one multi-part family
name seems appropriate anyway, since to the best of my knowledge the two
elements of “La Fontaine” are never separated in any circumstances.) This
again requires adjusting citeproc-js’s (and hopefully soon, Zotero’s) name
parsing algorithm.

So my proposal is (1) to set “demote-non-dropping-particle” to
“display-and-sort” in all Chicago styles (and, most likely, other styles,
too), (2) to remove “La” and other strings that aren’t genuine non-dropping
particles from the CSL specs and the list citeproc-js uses for parsing, and
(3), to make citeproc-js’s name parsing algorithm not only field- but also
case-specific: Field-specific means parsing ambiguous strings according to
whether they are found at the front of the family field (-> non-dropping)
or at the end of the first field (-> dropping); citeproc-js can do this.
Case-specific means distinguishing, e.g., “Van” and “van”, and parsing,
e.g., “Van Rompuy” as one multi-part family name, but splitting “van Gogh”
into a non-dropping-particle “van” and a (root) family name “Gogh”. Since I
haven’t been able to find _any_ upper-case elements that would still count
as dropping or non-dropping particles in this scheme, we might even be able
to simplify the parsing algorithm to “lower-case strings at the front of
the family field are parsed as non-dropping particles, lower-case strings
at the end of the given field are parsed as dropping particles”.

Note that even with field- and case-sensitive particle identification there
are still a few strings that are ambiguous, and thus in some cases a name
in the family field still needs to be protected for correct parsing (i.e.,
wrapped in quotes; this is an existing citeproc-js feature):

- A French “Paul de Man” (“de” = dropping particle) is entered as [Man]
[Paul de];
- a Dutch (“de” = non-dropping particle) as [de Man] [Paul];
- but for an American(ised) “Paul de Man” (CMS 8.5, “de” = part of family
name), the family name will still have to be wrapped in quotes, ["de Man"]
[Paul], in order to be parsed correctly as one multi-part family name.

Now, the questions:

- Is there anything wrong with this reasoning?
- Is there anything problematic about these proposals?

And, more specifically:

- Is anyone aware of style guides or other authoritative sources that would
call for treating particles, especially non-dropping ones, differently from
what CMS recommends? (In particular, anything that could _not_ be solved by
setting “demote-non-dropping-particle” to “sort-only” or “never”? – Would a
Dutch publication prefer “sort-only”?)
- Is anyone aware of upper-case name elements that are genuine
_non-dropping_ particles, i.e., would have to appear as “Bla Doe” in the
text but as “Doe, Paul Bla” in the bibliography? (All non-dropping
particles I’ve come across so far are lower-case.)
    - Regarding Arabic names, would anyone ever want to display “Tawfiq
Al-Hakim” as “Al-Hakim” and “Hakim, Tawfiq Al-”? Or would the use of upper
case typically indicate that “Al-”/“El-” should be seen as part of the
family name rather than as a particle, and thus sorted under “A” or “E”?
- Is anyone aware of upper-case name elements that are genuine _dropping_
particles? (All dropping particles I’ve come across so far are lower-case.)
- Thus, is the rule “Unless it’s part of a family name (and thus wrapped in
quotes), any lower-case string must be a particle” sound?
- Is anyone aware of style guides or other authoritative sources that would
ever call for separating the elements of multi-part family names such as
“La Fontaine” or “Van Rompuy” for sorting or display? (If there were, I
fear we’d have to discuss reviewing the CSL specs …)
- Can anyone provide an example of a real name with both dropping and
non-dropping particles? (“Jean de La Fontaine” no longer qualifies; “Jean
de van Gogh”, if it existed, might.)
- As far as I see, all names with _genuine_ non-dropping particles are of
Dutch or Arabic origin. Is anyone aware of others?
- What are your views on allowing the use of non-breaking spaces, like
[de·Man] [Paul], for protecting multi-part family names from being parsed?
(Prettier than quotes, but less obvious, and we’d still need the quotes for
“d’Alembert” or “al-Hakim”, if these were ever found to need protection.)

Finally, though we need a good parsing solution now, of course none of this
should keep us from working on a better UI that could eliminate the need
for this awkward parsing of name fields altogether – though the algorithm
might still be useful for parsing data upon import in the future.
------------------------------------------------------------------------------
_______________________________________________
xbiblio-devel mailing list
xbiblio-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel

Reply via email to