On Thu, Sep 8, 2011 at 8:08 AM, Sylvester Keil <[email protected]> wrote:
> Dear Avram,
>
> I'm returning to this thread to shamelessly plug the citation parser I wrote 
> in the last couple of weeks:
>
> https://github.com/inukshuk/anystyle-parser

Cool!

> I had to parse about 8000 references and was not satisfied by the results I 
> got using ParsCit and FreeCite. The Parser follows the same general approach, 
> but I've extended and improved (I hope) much of the feature elicitation; 
> also, I'm using wapiti instead of libcrf++ which, IMO, has a much cleaner 
> codebase and because I personally preferred a C over C++ implementation. In 
> any case, wapiti is extremely fast and my models produced very encouraging 
> results for my data once I trained about 30 references (in addition to the 
> CORA dataset).
>
> Picking up on your idea, it would be extremely easy to adapt CSL styles to 
> generate tagged output. Thus, we could automate the process of producing 
> valid training data, as you suggest.

So just to understand, are you volunteering to work up a
proof-of-concept of Simon's idea with your new tool? :-)

Bruce

> Anyway, I thought I'd let you (and anyone interested in parsing citation 
> references) know about the project. If you want to try out the parser but 
> encounter any problems, don't hesitate to contact me for help. A word of 
> caution: if your results are not accurate right away, try to tag one or two 
> references and train the parser – I tried to make training the parser with 
> new references very easy.
>
> /end shameless plug
>
> Best,
> Sylvester
>
> On Jul 26, 2011, at 11:51 PM, Avram Lyon wrote:
>
>> On Tue, Jul 26, 2011 at 10:36 PM, Simon Kornblith <[email protected]> 
>> wrote:
>>> So, I have a crazy idea of how to shift as much of the complexity of
>>> generating CSL away from the user as possible. Essentially, I want to be
>>> able to copy and paste bibliography entries from a journal's reference list
>>> into a box and end up with a formatted style.
>>> As far as the implementation goes, we would need to:
>>> 1) Convert the bibliography entries to a series of labeled fields using a
>>> parser such as FreeCite.
>>
>> I just spent some time getting FreeCite running locally. The project
>> has been largely dormant for two years or so, but there's someone
>> who's been committing to a fork on Github lately, and I was able to
>> get it to work on my machine pretty quickly, once I remembered my
>> Rails mambo. It works somewhat better than the current hosted version
>> at Brown-- it at least recognizes post-1999 dates. If we could build
>> some capability for the user to override the tags, an interactive
>> review, then I think it'd make a reasonable platform.
>>
>> I think one of the issues that FreeCite struggles with is limited
>> training data-- we should be able to provide strong data on things
>> like author names, place names, publishers and the like (from the data
>> stores of Zotero and perhaps Mendeley), that might make the tagging
>> more accurate. We can also produce tagged training data using
>> citeproc-js and known inputs to give good, comprehensive descriptions
>> of major patterns in citation formatting.
>>
>> Avram
>>
>> ------------------------------------------------------------------------------
>> Got Input?   Slashdot Needs You.
>> Take our quick survey online.  Come on, we don't ask for help often.
>> Plus, you'll get a chance to win $100 to spend on ThinkGeek.
>> http://p.sf.net/sfu/slashdot-survey
>> _______________________________________________
>> xbiblio-devel mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>
>
> ------------------------------------------------------------------------------
> Doing More with Less: The Next Generation Virtual Desktop
> What are the key obstacles that have prevented many mid-market businesses
> from deploying virtual desktops?   How do next-generation virtual desktops
> provide companies an easier-to-deploy, easier-to-manage and more affordable
> virtual desktop model.http://www.accelacomm.com/jaw/sfnl/114/51426474/
> _______________________________________________
> xbiblio-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
>

------------------------------------------------------------------------------
Doing More with Less: The Next Generation Virtual Desktop 
What are the key obstacles that have prevented many mid-market businesses
from deploying virtual desktops?   How do next-generation virtual desktops
provide companies an easier-to-deploy, easier-to-manage and more affordable
virtual desktop model.http://www.accelacomm.com/jaw/sfnl/114/51426474/
_______________________________________________
xbiblio-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xbiblio-devel

Reply via email to