On Fri, Apr 6, 2012 at 11:16 PM, Charles Parnot
<[email protected]> wrote:
> Thanks, that also makes sense. When you talk about 'input', do you mean both 
> user input, and input from repositories? E.g. also based on the data that is 
> returned by PubMed or Google Scholar?

I meant just the input that the processor sees in the incoming JSON
(with really proper input all of these elements will be broken out
into separate JSON keys, but two-field name systems are common, so the
processor has a layer to convert from the two-field name format
delivered straight from a calling application into the internal form
used for processing).

I'm not closely involved in the translator layer, but judging from
work on the CiNii site (to get ready for my own return to the world of
actual research, and because it's one of the few sites with
multilingual metadata to feed MLZ), names can get pretty messy on the
server side. In the best case translators will be able to remangle
names into the form expected by the processor, but where that fails
the user will have to touch things up in their database.

>
>
>
> On Apr 6, 2012, at 4:11 PM, Frank Bennett wrote:
>
>> On Fri, Apr 6, 2012 at 8:56 PM, Charles Parnot <[email protected]> 
>> wrote:
>>> Hi all,
>>>
>>> It's not a subject where we get too many complaints, but on occasion. In 
>>> Papers2, we have hard-coded a number of name particles and have tried to 
>>> decide what rule to apply to each (dropping or non-dropping) based on 
>>> usage. I realize the rule can change for the same particle, as some 
>>> particles are the same in different languages, and even worse, the rules 
>>> can differ when used in different countries. In any case, I was curious to 
>>> hear your feedback on that topic. Please let me know if it's been beaten to 
>>> death in a previous thread. I have seen a few threads in searching the 
>>> mailing list, but no extensive discussion.
>>
>> The citeproc-js relies on input for the semantic dropping/non-dropping
>> distinction. With two-field input, a particle that precedes the
>> "family" name element is non-dropping, and one that is attached to the
>> "given" name with a comma is dropping. Some parsing clutter is used to
>> cover special cases, such as name suffixes (Jr, III), and particles
>> that form a fixed part of the family name, and a few cases that have
>> come up where a particle is capitalized in the input. Apart from those
>> bits, which are essentially workarounds, we don't try to interpret
>> what a given fragment means in its own right.
>>
>>>
>>> I am listing here all the particles Papers2 detect. The particles are 
>>> decomposed in the dropping part + non-dropping part (either can be empty of 
>>> course). Note we also correct the capitalization.
>>>
>>> I think we have the 'al', 'el', wrong.
>>>
>>>
>>> // spain(??) / arabic
>>> al                  al
>>> dos                 dos
>>> el                  el
>>> de las              de              Las
>>> lo                  lo
>>> les                 les
>>>
>>> // italy(??)
>>> il                  il
>>> del                                 del
>>> dela                dela
>>> della               della
>>> dello               dello
>>> di                                  Di
>>> da                                  Da
>>> do                                  Do
>>> des                                 Des
>>> lou                                 Lou
>>> pietro                              Pietro
>>>
>>> // france
>>> de                                  de
>>> de la               de              La
>>> du                  du
>>> d'                  d'
>>> le                                  Le
>>> la                                  La
>>> l'                                  L'
>>> saint                               Saint
>>> sainte                              Sainte
>>> st.                                 Saint
>>> ste.                                Sainte
>>>
>>> // holland
>>> van                                 van
>>> van de                              vande
>>> van der                             vander
>>> van den                             vanden
>>> vander                              vander
>>> v.d.                                vander
>>> vd                                  vander
>>> van het                             van het
>>> ver                                 ver
>>> ten                 ten
>>> ter                 ter
>>> te                  te
>>> op de               op de
>>> in de               in de
>>> in 't               in 't
>>> in het              in het
>>> uit de              uit de
>>> uit den             uit den
>>>
>>> // germany / austria
>>> von                 von
>>> von der             von der
>>> von dem             von dem
>>> von zu              von zu
>>> v.                  von
>>> v                   von
>>> vom                 vom
>>> das                 das
>>> zum                 zum
>>> zur                 zur
>>> den                 den
>>> der                 der
>>> des                 des
>>> auf den             auf den
>>>
>>> // scotland(?)
>>> mac                                 Mac
>>>
>>>
>>> // arabic
>>> ben                                 Ben
>>> bin                                 Bin
>>> sen                 sen
>>>
>>> // what to do with these??
>>> // mc                               Mc
>>> // o'                               O'
>>> // au
>>> // af
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> For Developers, A Lot Can Happen In A Second.
>>> Boundary is the first to Know...and Tell You.
>>> Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
>>> http://p.sf.net/sfu/Boundary-d2dvs2
>>> _______________________________________________
>>> xbiblio-devel mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>> ------------------------------------------------------------------------------
>> For Developers, A Lot Can Happen In A Second.
>> Boundary is the first to Know...and Tell You.
>> Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
>> http://p.sf.net/sfu/Boundary-d2dvs2
>> _______________________________________________
>> xbiblio-devel mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>
> --
> Charles Parnot
> [email protected]
> twitter: @cparnot
> http://mekentosj.com
>
>
>
> ------------------------------------------------------------------------------
> For Developers, A Lot Can Happen In A Second.
> Boundary is the first to Know...and Tell You.
> Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
> http://p.sf.net/sfu/Boundary-d2dvs2
> _______________________________________________
> xbiblio-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
xbiblio-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel

Reply via email to