Ah, OK, it makes sense. Papers already splits the names into all the fields that the processor needs. Indeed, the particle detection belongs in the client, not the processor. Interesting that you still handle the 2-part names as well, it makes sense.
charles On Apr 6, 2012, at 4:42 PM, Frank Bennett wrote: > On Fri, Apr 6, 2012 at 11:16 PM, Charles Parnot > <[email protected]> wrote: >> Thanks, that also makes sense. When you talk about 'input', do you mean both >> user input, and input from repositories? E.g. also based on the data that is >> returned by PubMed or Google Scholar? > > I meant just the input that the processor sees in the incoming JSON > (with really proper input all of these elements will be broken out > into separate JSON keys, but two-field name systems are common, so the > processor has a layer to convert from the two-field name format > delivered straight from a calling application into the internal form > used for processing). > > I'm not closely involved in the translator layer, but judging from > work on the CiNii site (to get ready for my own return to the world of > actual research, and because it's one of the few sites with > multilingual metadata to feed MLZ), names can get pretty messy on the > server side. In the best case translators will be able to remangle > names into the form expected by the processor, but where that fails > the user will have to touch things up in their database. > >> >> >> >> On Apr 6, 2012, at 4:11 PM, Frank Bennett wrote: >> >>> On Fri, Apr 6, 2012 at 8:56 PM, Charles Parnot <[email protected]> >>> wrote: >>>> Hi all, >>>> >>>> It's not a subject where we get too many complaints, but on occasion. In >>>> Papers2, we have hard-coded a number of name particles and have tried to >>>> decide what rule to apply to each (dropping or non-dropping) based on >>>> usage. I realize the rule can change for the same particle, as some >>>> particles are the same in different languages, and even worse, the rules >>>> can differ when used in different countries. In any case, I was curious to >>>> hear your feedback on that topic. Please let me know if it's been beaten >>>> to death in a previous thread. I have seen a few threads in searching the >>>> mailing list, but no extensive discussion. >>> >>> The citeproc-js relies on input for the semantic dropping/non-dropping >>> distinction. With two-field input, a particle that precedes the >>> "family" name element is non-dropping, and one that is attached to the >>> "given" name with a comma is dropping. Some parsing clutter is used to >>> cover special cases, such as name suffixes (Jr, III), and particles >>> that form a fixed part of the family name, and a few cases that have >>> come up where a particle is capitalized in the input. Apart from those >>> bits, which are essentially workarounds, we don't try to interpret >>> what a given fragment means in its own right. >>> >>>> >>>> I am listing here all the particles Papers2 detect. The particles are >>>> decomposed in the dropping part + non-dropping part (either can be empty >>>> of course). Note we also correct the capitalization. >>>> >>>> I think we have the 'al', 'el', wrong. >>>> >>>> >>>> // spain(??) / arabic >>>> al al >>>> dos dos >>>> el el >>>> de las de Las >>>> lo lo >>>> les les >>>> >>>> // italy(??) >>>> il il >>>> del del >>>> dela dela >>>> della della >>>> dello dello >>>> di Di >>>> da Da >>>> do Do >>>> des Des >>>> lou Lou >>>> pietro Pietro >>>> >>>> // france >>>> de de >>>> de la de La >>>> du du >>>> d' d' >>>> le Le >>>> la La >>>> l' L' >>>> saint Saint >>>> sainte Sainte >>>> st. Saint >>>> ste. Sainte >>>> >>>> // holland >>>> van van >>>> van de vande >>>> van der vander >>>> van den vanden >>>> vander vander >>>> v.d. vander >>>> vd vander >>>> van het van het >>>> ver ver >>>> ten ten >>>> ter ter >>>> te te >>>> op de op de >>>> in de in de >>>> in 't in 't >>>> in het in het >>>> uit de uit de >>>> uit den uit den >>>> >>>> // germany / austria >>>> von von >>>> von der von der >>>> von dem von dem >>>> von zu von zu >>>> v. von >>>> v von >>>> vom vom >>>> das das >>>> zum zum >>>> zur zur >>>> den den >>>> der der >>>> des des >>>> auf den auf den >>>> >>>> // scotland(?) >>>> mac Mac >>>> >>>> >>>> // arabic >>>> ben Ben >>>> bin Bin >>>> sen sen >>>> >>>> // what to do with these?? >>>> // mc Mc >>>> // o' O' >>>> // au >>>> // af >>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> For Developers, A Lot Can Happen In A Second. >>>> Boundary is the first to Know...and Tell You. >>>> Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! >>>> http://p.sf.net/sfu/Boundary-d2dvs2 >>>> _______________________________________________ >>>> xbiblio-devel mailing list >>>> [email protected] >>>> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel >>> ------------------------------------------------------------------------------ >>> For Developers, A Lot Can Happen In A Second. >>> Boundary is the first to Know...and Tell You. >>> Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! >>> http://p.sf.net/sfu/Boundary-d2dvs2 >>> _______________________________________________ >>> xbiblio-devel mailing list >>> [email protected] >>> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel >> >> -- >> Charles Parnot >> [email protected] >> twitter: @cparnot >> http://mekentosj.com >> >> >> >> ------------------------------------------------------------------------------ >> For Developers, A Lot Can Happen In A Second. >> Boundary is the first to Know...and Tell You. >> Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! >> http://p.sf.net/sfu/Boundary-d2dvs2 >> _______________________________________________ >> xbiblio-devel mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel > ------------------------------------------------------------------------------ > For Developers, A Lot Can Happen In A Second. > Boundary is the first to Know...and Tell You. > Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! > http://p.sf.net/sfu/Boundary-d2dvs2 > _______________________________________________ > xbiblio-devel mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/xbiblio-devel -- Charles Parnot [email protected] twitter: @cparnot http://mekentosj.com ------------------------------------------------------------------------------ For Developers, A Lot Can Happen In A Second. Boundary is the first to Know...and Tell You. Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! http://p.sf.net/sfu/Boundary-d2dvs2 _______________________________________________ xbiblio-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
