Before inviting feedback on a number of questions, here’s my reasoning again: According to the Chicago Manual of Style, 16e, 8.10, 16.71, “Pieter van den Keere” needs to appear in the text (leaving capitalisation issues aside) as “van den Keere” and in the bibliography as “Keere, Pieter van den”. The same applies for “Tawfiq al-Hakim”: “al-Hakim” and “Hakim, Tawfiq al-” (CMS 8.14, 16.76). This requires “van den” and “al-” to be entered or parsed as a non-dropping particles, and “demote-non-dropping-particle” to be set to “display-and-sort”. This in turn requires names such as “La Fontaine” to be entered/parsed as one multi-part family name rather than what the CSL specs used to suggest, “La” as non-dropping-particle and “Fontaine” as family name, or else we’d end up with the incorrect “Fontaine, Jean de La”. (Parsing “La Fontaine” as one multi-part family name seems appropriate anyway, since to the best of my knowledge the two elements of “La Fontaine” are never separated in any circumstances.) This again requires adjusting citeproc-js’s (and hopefully soon, Zotero’s) name parsing algorithm.
So my proposal is (1) to set “demote-non-dropping-particle” to “display-and-sort” in all Chicago styles (and, most likely, other styles, too), (2) to remove “La” and other strings that aren’t genuine non-dropping particles from the CSL specs and the list citeproc-js uses for parsing, and (3), to make citeproc-js’s name parsing algorithm not only field- but also case-specific: Field-specific means parsing ambiguous strings according to whether they are found at the front of the family field (-> non-dropping) or at the end of the first field (-> dropping); citeproc-js can do this. Case-specific means distinguishing, e.g., “Van” and “van”, and parsing, e.g., “Van Rompuy” as one multi-part family name, but splitting “van Gogh” into a non-dropping-particle “van” and a (root) family name “Gogh”. Since I haven’t been able to find _any_ upper-case elements that would still count as dropping or non-dropping particles in this scheme, we might even be able to simplify the parsing algorithm to “lower-case strings at the front of the family field are parsed as non-dropping particles, lower-case strings at the end of the given field are parsed as dropping particles”. Note that even with field- and case-sensitive particle identification there are still a few strings that are ambiguous, and thus in some cases a name in the family field still needs to be protected for correct parsing (i.e., wrapped in quotes; this is an existing citeproc-js feature): - A French “Paul de Man” (“de” = dropping particle) is entered as [Man] [Paul de]; - a Dutch (“de” = non-dropping particle) as [de Man] [Paul]; - but for an American(ised) “Paul de Man” (CMS 8.5, “de” = part of family name), the family name will still have to be wrapped in quotes, ["de Man"] [Paul], in order to be parsed correctly as one multi-part family name. Now, the questions: - Is there anything wrong with this reasoning? - Is there anything problematic about these proposals? And, more specifically: - Is anyone aware of style guides or other authoritative sources that would call for treating particles, especially non-dropping ones, differently from what CMS recommends? (In particular, anything that could _not_ be solved by setting “demote-non-dropping-particle” to “sort-only” or “never”? – Would a Dutch publication prefer “sort-only”?) - Is anyone aware of upper-case name elements that are genuine _non-dropping_ particles, i.e., would have to appear as “Bla Doe” in the text but as “Doe, Paul Bla” in the bibliography? (All non-dropping particles I’ve come across so far are lower-case.) - Regarding Arabic names, would anyone ever want to display “Tawfiq Al-Hakim” as “Al-Hakim” and “Hakim, Tawfiq Al-”? Or would the use of upper case typically indicate that “Al-”/“El-” should be seen as part of the family name rather than as a particle, and thus sorted under “A” or “E”? - Is anyone aware of upper-case name elements that are genuine _dropping_ particles? (All dropping particles I’ve come across so far are lower-case.) - Thus, is the rule “Unless it’s part of a family name (and thus wrapped in quotes), any lower-case string must be a particle” sound? - Is anyone aware of style guides or other authoritative sources that would ever call for separating the elements of multi-part family names such as “La Fontaine” or “Van Rompuy” for sorting or display? (If there were, I fear we’d have to discuss reviewing the CSL specs …) - Can anyone provide an example of a real name with both dropping and non-dropping particles? (“Jean de La Fontaine” no longer qualifies; “Jean de van Gogh”, if it existed, might.) - As far as I see, all names with _genuine_ non-dropping particles are of Dutch or Arabic origin. Is anyone aware of others? - What are your views on allowing the use of non-breaking spaces, like [de·Man] [Paul], for protecting multi-part family names from being parsed? (Prettier than quotes, but less obvious, and we’d still need the quotes for “d’Alembert” or “al-Hakim”, if these were ever found to need protection.) Finally, though we need a good parsing solution now, of course none of this should keep us from working on a better UI that could eliminate the need for this awkward parsing of name fields altogether – though the algorithm might still be useful for parsing data upon import in the future.
------------------------------------------------------------------------------
_______________________________________________ xbiblio-devel mailing list xbiblio-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xbiblio-devel