On Thu, Sep 8, 2011 at 8:08 AM, Sylvester Keil <[email protected]> wrote: > Dear Avram, > > I'm returning to this thread to shamelessly plug the citation parser I wrote > in the last couple of weeks: > > https://github.com/inukshuk/anystyle-parser
Cool! > I had to parse about 8000 references and was not satisfied by the results I > got using ParsCit and FreeCite. The Parser follows the same general approach, > but I've extended and improved (I hope) much of the feature elicitation; > also, I'm using wapiti instead of libcrf++ which, IMO, has a much cleaner > codebase and because I personally preferred a C over C++ implementation. In > any case, wapiti is extremely fast and my models produced very encouraging > results for my data once I trained about 30 references (in addition to the > CORA dataset). > > Picking up on your idea, it would be extremely easy to adapt CSL styles to > generate tagged output. Thus, we could automate the process of producing > valid training data, as you suggest. So just to understand, are you volunteering to work up a proof-of-concept of Simon's idea with your new tool? :-) Bruce > Anyway, I thought I'd let you (and anyone interested in parsing citation > references) know about the project. If you want to try out the parser but > encounter any problems, don't hesitate to contact me for help. A word of > caution: if your results are not accurate right away, try to tag one or two > references and train the parser – I tried to make training the parser with > new references very easy. > > /end shameless plug > > Best, > Sylvester > > On Jul 26, 2011, at 11:51 PM, Avram Lyon wrote: > >> On Tue, Jul 26, 2011 at 10:36 PM, Simon Kornblith <[email protected]> >> wrote: >>> So, I have a crazy idea of how to shift as much of the complexity of >>> generating CSL away from the user as possible. Essentially, I want to be >>> able to copy and paste bibliography entries from a journal's reference list >>> into a box and end up with a formatted style. >>> As far as the implementation goes, we would need to: >>> 1) Convert the bibliography entries to a series of labeled fields using a >>> parser such as FreeCite. >> >> I just spent some time getting FreeCite running locally. The project >> has been largely dormant for two years or so, but there's someone >> who's been committing to a fork on Github lately, and I was able to >> get it to work on my machine pretty quickly, once I remembered my >> Rails mambo. It works somewhat better than the current hosted version >> at Brown-- it at least recognizes post-1999 dates. If we could build >> some capability for the user to override the tags, an interactive >> review, then I think it'd make a reasonable platform. >> >> I think one of the issues that FreeCite struggles with is limited >> training data-- we should be able to provide strong data on things >> like author names, place names, publishers and the like (from the data >> stores of Zotero and perhaps Mendeley), that might make the tagging >> more accurate. We can also produce tagged training data using >> citeproc-js and known inputs to give good, comprehensive descriptions >> of major patterns in citation formatting. >> >> Avram >> >> ------------------------------------------------------------------------------ >> Got Input? Slashdot Needs You. >> Take our quick survey online. Come on, we don't ask for help often. >> Plus, you'll get a chance to win $100 to spend on ThinkGeek. >> http://p.sf.net/sfu/slashdot-survey >> _______________________________________________ >> xbiblio-devel mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/xbiblio-devel > > > ------------------------------------------------------------------------------ > Doing More with Less: The Next Generation Virtual Desktop > What are the key obstacles that have prevented many mid-market businesses > from deploying virtual desktops? How do next-generation virtual desktops > provide companies an easier-to-deploy, easier-to-manage and more affordable > virtual desktop model.http://www.accelacomm.com/jaw/sfnl/114/51426474/ > _______________________________________________ > xbiblio-devel mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/xbiblio-devel > ------------------------------------------------------------------------------ Doing More with Less: The Next Generation Virtual Desktop What are the key obstacles that have prevented many mid-market businesses from deploying virtual desktops? How do next-generation virtual desktops provide companies an easier-to-deploy, easier-to-manage and more affordable virtual desktop model.http://www.accelacomm.com/jaw/sfnl/114/51426474/ _______________________________________________ xbiblio-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/xbiblio-devel
