Re: [CODE4LIB] New Open Source Citation Parser
A suggestion: you might want to also add Biblio-Citation-Parser by Mike Jewell (http://search.cpan.org/~mjewell/Biblio-Citation-Parser-1.10/)http://search.cpan.org/~mjewell/Biblio-Citation-Parser-1.10/ Steve On Tue, Sep 16, 2008 at 11:11 AM, Miriam Goldberg [EMAIL PROTECTED]wrote: Thanks for pointing out these other parsing tools. I've added them to the list on our website (see under heading Other Citation Tools at http://freecite.library.brown.edu/). Citation metadata extraction is a difficult open problem whose potential solutions are based on continually-developing technologies. So I think it's important that we approach this task from many diverse angles. If our project makes a little headway here, ParsCit makes some headway there, and five other groups make their own advancements, hopefully we'll be able to pool our findings into a viable application. Anyone want to compare and contrast these three projects? Might make a good very short article/review for the Code4Lib Journal if you wanted to. Agreed. I'd love to see this. Another idea might be to write an application that takes the output of multiple parsers and assembles the best answer. On Fri, Sep 12, 2008 at 3:50 PM, Jonathan Rochkind [EMAIL PROTECTED] wrote: This is the third open source citation parser I know of now. A welcome change from a year ago when I needed one and didn't know of any! But I can't help but think maybe people should be cooperating more instead of engineering their own wheels. Also curious if anyone has looked at all three and can compare and contrast and make a reccommendation. The other two I know about are: ParsCit -- http://wing.comp.nus.edu.sg/parsCit/ A CDL project I don't have a good home page for, but code is here: http://gales.cdlib.org/~egh/hmm-citation-extractor/ I've been keeping track because I have a use for this, although haven't had time to make use of any of them yet. Anyone want to compare and contrast these three projects? Might make a good very short article/review for the Code4Lib Journal if you wanted to. Jonathan jean rainwater [EMAIL PROTECTED] 09/12/08 2:25 PM Please help us beta test FreeCite, a new citation parser for non-structured bibliographic data. FreeCite is the result of collaboration between the Brown University Library and Public Display, a Providence-based software company founded by and employing many Brown grads. Public Display's core business is information extraction. Partial funding for this project was provided by the Andrew W. Mellon Foundation. FreeCite is implemented in Ruby on Rails and uses the CRF++ library implementation of conditional random fields. The model is trained on the CORA dataset with lexical augmentation from the Directory of Research and Researchers at Brown (DRR-B). The API and code are available at: http://freecite.library.brown.edu. Jean Rainwater Co-Leader, Integrated Technology Services Brown University Library Providence, RI 02912 401.863.9031 [EMAIL PROTECTED]
Re: [CODE4LIB] New Open Source Citation Parser
A suggestion: you might want to also add Biblio-Citation-Parser Added. Keep em coming! On Tue, Sep 16, 2008 at 12:21 PM, Steve Oberg [EMAIL PROTECTED] wrote: A suggestion: you might want to also add Biblio-Citation-Parser by Mike Jewell (http://search.cpan.org/~mjewell/Biblio-Citation-Parser-1.10/)http://search.cpan.org/~mjewell/Biblio-Citation-Parser-1.10/ Steve On Tue, Sep 16, 2008 at 11:11 AM, Miriam Goldberg [EMAIL PROTECTED]wrote: Thanks for pointing out these other parsing tools. I've added them to the list on our website (see under heading Other Citation Tools at http://freecite.library.brown.edu/). Citation metadata extraction is a difficult open problem whose potential solutions are based on continually-developing technologies. So I think it's important that we approach this task from many diverse angles. If our project makes a little headway here, ParsCit makes some headway there, and five other groups make their own advancements, hopefully we'll be able to pool our findings into a viable application. Anyone want to compare and contrast these three projects? Might make a good very short article/review for the Code4Lib Journal if you wanted to. Agreed. I'd love to see this. Another idea might be to write an application that takes the output of multiple parsers and assembles the best answer. On Fri, Sep 12, 2008 at 3:50 PM, Jonathan Rochkind [EMAIL PROTECTED] wrote: This is the third open source citation parser I know of now. A welcome change from a year ago when I needed one and didn't know of any! But I can't help but think maybe people should be cooperating more instead of engineering their own wheels. Also curious if anyone has looked at all three and can compare and contrast and make a reccommendation. The other two I know about are: ParsCit -- http://wing.comp.nus.edu.sg/parsCit/ A CDL project I don't have a good home page for, but code is here: http://gales.cdlib.org/~egh/hmm-citation-extractor/ I've been keeping track because I have a use for this, although haven't had time to make use of any of them yet. Anyone want to compare and contrast these three projects? Might make a good very short article/review for the Code4Lib Journal if you wanted to. Jonathan jean rainwater [EMAIL PROTECTED] 09/12/08 2:25 PM Please help us beta test FreeCite, a new citation parser for non-structured bibliographic data. FreeCite is the result of collaboration between the Brown University Library and Public Display, a Providence-based software company founded by and employing many Brown grads. Public Display's core business is information extraction. Partial funding for this project was provided by the Andrew W. Mellon Foundation. FreeCite is implemented in Ruby on Rails and uses the CRF++ library implementation of conditional random fields. The model is trained on the CORA dataset with lexical augmentation from the Directory of Research and Researchers at Brown (DRR-B). The API and code are available at: http://freecite.library.brown.edu. Jean Rainwater Co-Leader, Integrated Technology Services Brown University Library Providence, RI 02912 401.863.9031 [EMAIL PROTECTED]
Re: [CODE4LIB] New Open Source Citation Parser
I might add CrossRef's Simple Text Query for generating article DOIs from citations. Not open source though. http://www.crossref.org/SimpleTextQuery/ On Fri, Sep 12, 2008 at 3:50 PM, Jonathan Rochkind [EMAIL PROTECTED] wrote: This is the third open source citation parser I know of now. A welcome change from a year ago when I needed one and didn't know of any! But I can't help but think maybe people should be cooperating more instead of engineering their own wheels. Also curious if anyone has looked at all three and can compare and contrast and make a reccommendation. The other two I know about are: ParsCit -- http://wing.comp.nus.edu.sg/parsCit/ A CDL project I don't have a good home page for, but code is here: http://gales.cdlib.org/~egh/hmm-citation-extractor/ I've been keeping track because I have a use for this, although haven't had time to make use of any of them yet. Anyone want to compare and contrast these three projects? Might make a good very short article/review for the Code4Lib Journal if you wanted to. Jonathan jean rainwater [EMAIL PROTECTED] 09/12/08 2:25 PM Please help us beta test FreeCite, a new citation parser for non-structured bibliographic data. FreeCite is the result of collaboration between the Brown University Library and Public Display, a Providence-based software company founded by and employing many Brown grads. Public Display's core business is information extraction. Partial funding for this project was provided by the Andrew W. Mellon Foundation. FreeCite is implemented in Ruby on Rails and uses the CRF++ library implementation of conditional random fields. The model is trained on the CORA dataset with lexical augmentation from the Directory of Research and Researchers at Brown (DRR-B). The API and code are available at: http://freecite.library.brown.edu. Jean Rainwater Co-Leader, Integrated Technology Services Brown University Library Providence, RI 02912 401.863.9031 [EMAIL PROTECTED]
[CODE4LIB] New Open Source Citation Parser
Please help us beta test FreeCite, a new citation parser for non-structured bibliographic data. FreeCite is the result of collaboration between the Brown University Library and Public Display, a Providence-based software company founded by and employing many Brown grads. Public Display's core business is information extraction. Partial funding for this project was provided by the Andrew W. Mellon Foundation. FreeCite is implemented in Ruby on Rails and uses the CRF++ library implementation of conditional random fields. The model is trained on the CORA dataset with lexical augmentation from the Directory of Research and Researchers at Brown (DRR-B). The API and code are available at: http://freecite.library.brown.edu. Jean Rainwater Co-Leader, Integrated Technology Services Brown University Library Providence, RI 02912 401.863.9031 [EMAIL PROTECTED]
Re: [CODE4LIB] New Open Source Citation Parser
Are you aware that there is an existing, yet embryonic, FLOSS project called FreeCite? http://www.freecite.org/ Mark Matienzo Applications Developer, NYPL Labs The New York Public Library On Fri, Sep 12, 2008 at 2:25 PM, jean rainwater [EMAIL PROTECTED] wrote: Please help us beta test FreeCite, a new citation parser for non-structured bibliographic data. FreeCite is the result of collaboration between the Brown University Library and Public Display, a Providence-based software company founded by and employing many Brown grads. Public Display's core business is information extraction. Partial funding for this project was provided by the Andrew W. Mellon Foundation. FreeCite is implemented in Ruby on Rails and uses the CRF++ library implementation of conditional random fields. The model is trained on the CORA dataset with lexical augmentation from the Directory of Research and Researchers at Brown (DRR-B). The API and code are available at: http://freecite.library.brown.edu. Jean Rainwater Co-Leader, Integrated Technology Services Brown University Library Providence, RI 02912 401.863.9031 [EMAIL PROTECTED]
Re: [CODE4LIB] New Open Source Citation Parser
This is the third open source citation parser I know of now. A welcome change from a year ago when I needed one and didn't know of any! But I can't help but think maybe people should be cooperating more instead of engineering their own wheels. Also curious if anyone has looked at all three and can compare and contrast and make a reccommendation. The other two I know about are: ParsCit -- http://wing.comp.nus.edu.sg/parsCit/ A CDL project I don't have a good home page for, but code is here: http://gales.cdlib.org/~egh/hmm-citation-extractor/ I've been keeping track because I have a use for this, although haven't had time to make use of any of them yet. Anyone want to compare and contrast these three projects? Might make a good very short article/review for the Code4Lib Journal if you wanted to. Jonathan jean rainwater [EMAIL PROTECTED] 09/12/08 2:25 PM Please help us beta test FreeCite, a new citation parser for non-structured bibliographic data. FreeCite is the result of collaboration between the Brown University Library and Public Display, a Providence-based software company founded by and employing many Brown grads. Public Display's core business is information extraction. Partial funding for this project was provided by the Andrew W. Mellon Foundation. FreeCite is implemented in Ruby on Rails and uses the CRF++ library implementation of conditional random fields. The model is trained on the CORA dataset with lexical augmentation from the Directory of Research and Researchers at Brown (DRR-B). The API and code are available at: http://freecite.library.brown.edu. Jean Rainwater Co-Leader, Integrated Technology Services Brown University Library Providence, RI 02912 401.863.9031 [EMAIL PROTECTED]