Re: [xbiblio-devel] CSL spec and test cases

Rintze Zelle Thu, 08 Aug 2013 12:29:11 -0700

On Thu, Aug 8, 2013 at 1:12 PM, Brecht Machiels <[email protected]> wrote:
> * the CSL spec is contradictory about number detection
>>>> Tests whether the given variables **contain numeric content**.
> versus
>>>> Content is considered numeric if it **solely consists of numbers**.
>>>> For example, "2nd" tests "true" whereas "second" and "2nd edition"
>>>> test "false".
> does not seem to agree with condition_IsNumeric


The behavior of "is-numeric" changed in CSL 1.0.1. See
http://citationstyles.org/downloads/release-notes-csl101.html#numbers

I can see how the current description in the specification might be
somewhat confusing, but it is meant to agree with
https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/condition_IsNumeric.txt.
In "Tests whether the given variables contain numeric content."
(http://citationstyles.org/downloads/specification.html#choose), I
mean to say that the test is against the entire string contents of
each variable. In a string like "2nd edition", the "edition" substring
means that the entire string is non-numeric.

> * Chicago page range format: what do do with five or more digits?

The specification currently links to
http://www.aahn.org/guidelines.html, but it seems like the content we
relied on moved to http://www.aahn.org/stylesheet.html . The latter
page shows an excerpt from CMoS that we almost copied verbatim.
Sebastian, could you check if CMoS 16th edition gives any guidance on
number ranges of 5 or more digits?

> * Which values are allowed for the "page" input field? I see multiple
> ranges can also be specified. I think the CSL spec should, in general,
> also define the format of the input fields. Personally, I would opt for a
> structured format (like the date fields) as opposed to a string-format
> (the page field). Individual CSL processors can still convert a
> string-formatted field to the structured data. This would require changes
> to the tests.

This would presumably involve describing the JSON format used by
citeproc-js in more detail. See
http://blog.martinfenner.org/2013/08/08/csl-is-more-than-citation-styles/
for a relevant discussion on this topic.

> * Shouldn't "page-first" be a number variable? It is used with number in
> page_NumberPageFirst

See https://github.com/citation-style-language/schema/issues/9. I
think Frank prefers to render "page" and "page-first" with cs:number,
but that's currently not kosher CSL.

> * The spec doesn't say anything about the nested groups special case.
> variables_TitleShortOnShortTitleNoTitleCondition seems to disagree with
> the CSL spec:
>>>> cs:group and its child elements are suppressed if a) at least one
>>>> renderingelement in cs:group calls a variable (either directly or via
>>>> a macro), and b)all variables that are called are empty.
> In the group in the else section only the title variable is called. For
> ITEM-3, this variable is empty, so the group should be suppressed, but it
> isn't.
> Should a nested group always act as if it's (successfully) calling a
> variable? If so, the spec should mention this.

I think Frank already has an opinion on this, but I can't find the
discussion. I think the test
(https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/variables_TitleShortOnShortTitleNoTitleCondition.txt)
describes the desired behavior, in which case the specification should
indeed be amended. This is somewhat related to the open issue
https://github.com/citation-style-language/schema/issues/104

> * I seem to remember citeproc-js postprocesses its output to remove
> duplicate affixes. The CSL spec doesn't say anything about this, AFAIK.
> What's the official stance on this? I would personally avoid doing this,
> unless the spec includes an unambiguous definition on how this should work.

I'm convinced that CSL processors need to do some suppression of
duplicated punctuation. Frank just prepared some tests that describe
the current behavior in citeproc-js, and I hope to write up some
requirements for the specification in the next few weeks based on
those. See

https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/punctuation_FullMontyPlain.txt
https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/punctuation_FullMontyQuotesIn.txt
https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/punctuation_FullMontyQuotesOut.txt
https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/punctuation_FullMontyField.txt

> * locale_TitleCaseGarbageLangEnglishLocale: is "en" a valid locale? If so,
> and default-locale="en", which locale should we use?

http://citationstyles.org/downloads/specification.html#locale-fallback
discusses this: "If the chosen output locale is a language (e.g.
"de"), the (primary) dialect is used in step 1 (e.g. "de-DE")."

The table above that line mentions that "en-US" is the primary dialect for "en".

> * textcase_SkipNameParticlesInTitleCase (1): I believe this behavior is
> not part of the CSL spec, is it?

https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/textcase_SkipNameParticlesInTitleCase.txt

Correct.

> * textcase_SkipNameParticlesInTitleCase (2): the result doesn't seem to
> follow the CSL spec. The 'a' after the colon should be capitalized:
>>>> In both cases, stop words are lowercased, unless they are the first or
>>>> lastword in the string, or follow a colon.

It seems like it should
(http://citationstyles.org/downloads/specification.html#title-case-conversion).
Frank?

> * date_VariousInvalidDates: why is 'Spring' in the output?

https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/date_VariousInvalidDates.txt

Don't know. I think you can ignore this unit test. Frank?

> * page_Chicago: is the example S input data correct? It strikes me as a
> confusing way of representing a page range (in addition to saving only a
> single digit).

https://bitbucket.org/bdarcus/citeproc-test/src/tip/processor-tests/humans/page_Chicago.txt

Looks unambiguous to me.

> * A large number of tests test functionality that is not in the CSL spec,
> but is provided by citeproc-js (raw dates, static ordering, literal names,
> ...). I think these should be indicated as such, or perhaps moved to a
> separate directory. This would make it easier to check the other CSL
> processor's compatibility.

Sylvester Keil proposed using a Cucumber format for unit tests, which
would allow tests to be tagged:
https://github.com/inukshuk/citeproc-ruby/blob/1c420de0f7a86b7c35782dee86ce62cbebb47ab9/features/condition/is_numeric.feature

If somebody else helps with the technical infrastructure, I'd be happy
to help reclassifying the existing unit tests.

Rintze

------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
xbiblio-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel

Re: [xbiblio-devel] CSL spec and test cases

Reply via email to