Thanks, that also makes sense. When you talk about 'input', do you mean both 
user input, and input from repositories? E.g. also based on the data that is 
returned by PubMed or Google Scholar?



On Apr 6, 2012, at 4:11 PM, Frank Bennett wrote:

> On Fri, Apr 6, 2012 at 8:56 PM, Charles Parnot <[email protected]> 
> wrote:
>> Hi all,
>> 
>> It's not a subject where we get too many complaints, but on occasion. In 
>> Papers2, we have hard-coded a number of name particles and have tried to 
>> decide what rule to apply to each (dropping or non-dropping) based on usage. 
>> I realize the rule can change for the same particle, as some particles are 
>> the same in different languages, and even worse, the rules can differ when 
>> used in different countries. In any case, I was curious to hear your 
>> feedback on that topic. Please let me know if it's been beaten to death in a 
>> previous thread. I have seen a few threads in searching the mailing list, 
>> but no extensive discussion.
> 
> The citeproc-js relies on input for the semantic dropping/non-dropping
> distinction. With two-field input, a particle that precedes the
> "family" name element is non-dropping, and one that is attached to the
> "given" name with a comma is dropping. Some parsing clutter is used to
> cover special cases, such as name suffixes (Jr, III), and particles
> that form a fixed part of the family name, and a few cases that have
> come up where a particle is capitalized in the input. Apart from those
> bits, which are essentially workarounds, we don't try to interpret
> what a given fragment means in its own right.
> 
>> 
>> I am listing here all the particles Papers2 detect. The particles are 
>> decomposed in the dropping part + non-dropping part (either can be empty of 
>> course). Note we also correct the capitalization.
>> 
>> I think we have the 'al', 'el', wrong.
>> 
>> 
>> // spain(??) / arabic
>> al                  al
>> dos                 dos
>> el                  el
>> de las              de              Las
>> lo                  lo
>> les                 les
>> 
>> // italy(??)
>> il                  il
>> del                                 del
>> dela                dela
>> della               della
>> dello               dello
>> di                                  Di
>> da                                  Da
>> do                                  Do
>> des                                 Des
>> lou                                 Lou
>> pietro                              Pietro
>> 
>> // france
>> de                                  de
>> de la               de              La
>> du                  du
>> d'                  d'
>> le                                  Le
>> la                                  La
>> l'                                  L'
>> saint                               Saint
>> sainte                              Sainte
>> st.                                 Saint
>> ste.                                Sainte
>> 
>> // holland
>> van                                 van
>> van de                              vande
>> van der                             vander
>> van den                             vanden
>> vander                              vander
>> v.d.                                vander
>> vd                                  vander
>> van het                             van het
>> ver                                 ver
>> ten                 ten
>> ter                 ter
>> te                  te
>> op de               op de
>> in de               in de
>> in 't               in 't
>> in het              in het
>> uit de              uit de
>> uit den             uit den
>> 
>> // germany / austria
>> von                 von
>> von der             von der
>> von dem             von dem
>> von zu              von zu
>> v.                  von
>> v                   von
>> vom                 vom
>> das                 das
>> zum                 zum
>> zur                 zur
>> den                 den
>> der                 der
>> des                 des
>> auf den             auf den
>> 
>> // scotland(?)
>> mac                                 Mac
>> 
>> 
>> // arabic
>> ben                                 Ben
>> bin                                 Bin
>> sen                 sen
>> 
>> // what to do with these??
>> // mc                               Mc
>> // o'                               O'
>> // au
>> // af
>> 
>> 
>> 
>> ------------------------------------------------------------------------------
>> For Developers, A Lot Can Happen In A Second.
>> Boundary is the first to Know...and Tell You.
>> Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
>> http://p.sf.net/sfu/Boundary-d2dvs2
>> _______________________________________________
>> xbiblio-devel mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
> ------------------------------------------------------------------------------
> For Developers, A Lot Can Happen In A Second.
> Boundary is the first to Know...and Tell You.
> Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
> http://p.sf.net/sfu/Boundary-d2dvs2
> _______________________________________________
> xbiblio-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel

--
Charles Parnot
[email protected]
twitter: @cparnot
http://mekentosj.com



------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
xbiblio-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel

Reply via email to