Dear Avram,

I'm returning to this thread to shamelessly plug the citation parser I wrote in 
the last couple of weeks:

https://github.com/inukshuk/anystyle-parser

I had to parse about 8000 references and was not satisfied by the results I got 
using ParsCit and FreeCite. The Parser follows the same general approach, but 
I've extended and improved (I hope) much of the feature elicitation; also, I'm 
using wapiti instead of libcrf++ which, IMO, has a much cleaner codebase and 
because I personally preferred a C over C++ implementation. In any case, wapiti 
is extremely fast and my models produced very encouraging results for my data 
once I trained about 30 references (in addition to the CORA dataset).

Picking up on your idea, it would be extremely easy to adapt CSL styles to 
generate tagged output. Thus, we could automate the process of producing valid 
training data, as you suggest. 

Anyway, I thought I'd let you (and anyone interested in parsing citation 
references) know about the project. If you want to try out the parser but 
encounter any problems, don't hesitate to contact me for help. A word of 
caution: if your results are not accurate right away, try to tag one or two 
references and train the parser – I tried to make training the parser with new 
references very easy.

/end shameless plug

Best,
Sylvester

On Jul 26, 2011, at 11:51 PM, Avram Lyon wrote:

> On Tue, Jul 26, 2011 at 10:36 PM, Simon Kornblith <[email protected]> wrote:
>> So, I have a crazy idea of how to shift as much of the complexity of
>> generating CSL away from the user as possible. Essentially, I want to be
>> able to copy and paste bibliography entries from a journal's reference list
>> into a box and end up with a formatted style.
>> As far as the implementation goes, we would need to:
>> 1) Convert the bibliography entries to a series of labeled fields using a
>> parser such as FreeCite.
> 
> I just spent some time getting FreeCite running locally. The project
> has been largely dormant for two years or so, but there's someone
> who's been committing to a fork on Github lately, and I was able to
> get it to work on my machine pretty quickly, once I remembered my
> Rails mambo. It works somewhat better than the current hosted version
> at Brown-- it at least recognizes post-1999 dates. If we could build
> some capability for the user to override the tags, an interactive
> review, then I think it'd make a reasonable platform.
> 
> I think one of the issues that FreeCite struggles with is limited
> training data-- we should be able to provide strong data on things
> like author names, place names, publishers and the like (from the data
> stores of Zotero and perhaps Mendeley), that might make the tagging
> more accurate. We can also produce tagged training data using
> citeproc-js and known inputs to give good, comprehensive descriptions
> of major patterns in citation formatting.
> 
> Avram
> 
> ------------------------------------------------------------------------------
> Got Input?   Slashdot Needs You.
> Take our quick survey online.  Come on, we don't ask for help often.
> Plus, you'll get a chance to win $100 to spend on ThinkGeek.
> http://p.sf.net/sfu/slashdot-survey
> _______________________________________________
> xbiblio-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel


------------------------------------------------------------------------------
Doing More with Less: The Next Generation Virtual Desktop 
What are the key obstacles that have prevented many mid-market businesses
from deploying virtual desktops?   How do next-generation virtual desktops
provide companies an easier-to-deploy, easier-to-manage and more affordable
virtual desktop model.http://www.accelacomm.com/jaw/sfnl/114/51426474/
_______________________________________________
xbiblio-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel

Reply via email to