Re: [CODE4LIB] New Open Source Citation Parser

2008-09-16 Thread Steve Oberg
A suggestion: you might want to also add Biblio-Citation-Parser by Mike
Jewell 
(http://search.cpan.org/~mjewell/Biblio-Citation-Parser-1.10/)http://search.cpan.org/~mjewell/Biblio-Citation-Parser-1.10/
Steve

On Tue, Sep 16, 2008 at 11:11 AM, Miriam Goldberg [EMAIL PROTECTED]wrote:

 Thanks for pointing out these other parsing tools. I've added them to
 the list on our website (see under heading Other Citation Tools at
 http://freecite.library.brown.edu/).

 Citation metadata extraction is a difficult open problem whose
 potential solutions are based on continually-developing technologies.
 So I think it's important that we approach this task from many diverse
 angles. If our project makes a little headway here, ParsCit makes some
 headway there, and five other groups make their own advancements,
 hopefully we'll be able to pool our findings into a viable
 application.

  Anyone want to compare and contrast these three projects?  Might make a
 good very
  short article/review for the Code4Lib Journal if you wanted to.

 Agreed. I'd love to see this. Another idea might be to write an
 application that takes the output of multiple parsers and assembles
 the best answer.

 On Fri, Sep 12, 2008 at 3:50 PM, Jonathan Rochkind [EMAIL PROTECTED]
 wrote:
  This is the third open source citation parser I know of now. A welcome
 change from a year ago when I needed one and didn't know of any! But I can't
 help but think maybe people should be cooperating more instead of
 engineering their own wheels. Also curious if anyone has looked at all three
 and can compare and contrast and make a reccommendation.
 
  The other two I know about are:
 
  ParsCit -- http://wing.comp.nus.edu.sg/parsCit/
  A CDL project I don't have a good home page for, but code is here:
 http://gales.cdlib.org/~egh/hmm-citation-extractor/
 
  I've been keeping track because I have a use for this, although haven't
 had time to make use of any of them yet.
 
  Anyone want to compare and contrast these three projects?  Might make a
 good very short article/review for the Code4Lib Journal if you wanted to.
 
  Jonathan
 
 
  jean rainwater [EMAIL PROTECTED] 09/12/08 2:25 PM 
  Please help us beta test FreeCite, a new citation parser for
  non-structured bibliographic data. FreeCite is the result of
  collaboration between the Brown University Library and Public Display,
  a Providence-based software company founded by and employing many
  Brown grads.  Public Display's core business is information
  extraction. Partial funding for this project was provided by the
  Andrew W. Mellon Foundation.
 
  FreeCite is implemented in Ruby on Rails and uses the CRF++ library
  implementation of conditional random fields. The model is trained on
  the CORA dataset  with lexical augmentation from the Directory of
  Research and Researchers at Brown (DRR-B). The API and code are
  available at: http://freecite.library.brown.edu.
 
  Jean Rainwater
  Co-Leader, Integrated Technology Services
  Brown University Library
  Providence, RI 02912
  401.863.9031
  [EMAIL PROTECTED]
 



Re: [CODE4LIB] New Open Source Citation Parser

2008-09-16 Thread Miriam Goldberg
 A suggestion: you might want to also add Biblio-Citation-Parser

Added. Keep em coming!

On Tue, Sep 16, 2008 at 12:21 PM, Steve Oberg [EMAIL PROTECTED] wrote:
 A suggestion: you might want to also add Biblio-Citation-Parser by Mike
 Jewell 
 (http://search.cpan.org/~mjewell/Biblio-Citation-Parser-1.10/)http://search.cpan.org/~mjewell/Biblio-Citation-Parser-1.10/
 Steve

 On Tue, Sep 16, 2008 at 11:11 AM, Miriam Goldberg [EMAIL PROTECTED]wrote:

 Thanks for pointing out these other parsing tools. I've added them to
 the list on our website (see under heading Other Citation Tools at
 http://freecite.library.brown.edu/).

 Citation metadata extraction is a difficult open problem whose
 potential solutions are based on continually-developing technologies.
 So I think it's important that we approach this task from many diverse
 angles. If our project makes a little headway here, ParsCit makes some
 headway there, and five other groups make their own advancements,
 hopefully we'll be able to pool our findings into a viable
 application.

  Anyone want to compare and contrast these three projects?  Might make a
 good very
  short article/review for the Code4Lib Journal if you wanted to.

 Agreed. I'd love to see this. Another idea might be to write an
 application that takes the output of multiple parsers and assembles
 the best answer.

 On Fri, Sep 12, 2008 at 3:50 PM, Jonathan Rochkind [EMAIL PROTECTED]
 wrote:
  This is the third open source citation parser I know of now. A welcome
 change from a year ago when I needed one and didn't know of any! But I can't
 help but think maybe people should be cooperating more instead of
 engineering their own wheels. Also curious if anyone has looked at all three
 and can compare and contrast and make a reccommendation.
 
  The other two I know about are:
 
  ParsCit -- http://wing.comp.nus.edu.sg/parsCit/
  A CDL project I don't have a good home page for, but code is here:
 http://gales.cdlib.org/~egh/hmm-citation-extractor/
 
  I've been keeping track because I have a use for this, although haven't
 had time to make use of any of them yet.
 
  Anyone want to compare and contrast these three projects?  Might make a
 good very short article/review for the Code4Lib Journal if you wanted to.
 
  Jonathan
 
 
  jean rainwater [EMAIL PROTECTED] 09/12/08 2:25 PM 
  Please help us beta test FreeCite, a new citation parser for
  non-structured bibliographic data. FreeCite is the result of
  collaboration between the Brown University Library and Public Display,
  a Providence-based software company founded by and employing many
  Brown grads.  Public Display's core business is information
  extraction. Partial funding for this project was provided by the
  Andrew W. Mellon Foundation.
 
  FreeCite is implemented in Ruby on Rails and uses the CRF++ library
  implementation of conditional random fields. The model is trained on
  the CORA dataset  with lexical augmentation from the Directory of
  Research and Researchers at Brown (DRR-B). The API and code are
  available at: http://freecite.library.brown.edu.
 
  Jean Rainwater
  Co-Leader, Integrated Technology Services
  Brown University Library
  Providence, RI 02912
  401.863.9031
  [EMAIL PROTECTED]
 




Re: [CODE4LIB] New Open Source Citation Parser

2008-09-15 Thread Tom Keays
I might add CrossRef's Simple Text Query for generating article DOIs
from citations. Not open source though.

http://www.crossref.org/SimpleTextQuery/

On Fri, Sep 12, 2008 at 3:50 PM, Jonathan Rochkind [EMAIL PROTECTED] wrote:
 This is the third open source citation parser I know of now. A welcome change 
 from a year ago when I needed one and didn't know of any! But I can't help 
 but think maybe people should be cooperating more instead of engineering 
 their own wheels. Also curious if anyone has looked at all three and can 
 compare and contrast and make a reccommendation.

 The other two I know about are:

 ParsCit -- http://wing.comp.nus.edu.sg/parsCit/
 A CDL project I don't have a good home page for, but code is here: 
 http://gales.cdlib.org/~egh/hmm-citation-extractor/

 I've been keeping track because I have a use for this, although haven't had 
 time to make use of any of them yet.

 Anyone want to compare and contrast these three projects?  Might make a good 
 very short article/review for the Code4Lib Journal if you wanted to.

 Jonathan


 jean rainwater [EMAIL PROTECTED] 09/12/08 2:25 PM 
 Please help us beta test FreeCite, a new citation parser for
 non-structured bibliographic data. FreeCite is the result of
 collaboration between the Brown University Library and Public Display,
 a Providence-based software company founded by and employing many
 Brown grads.  Public Display's core business is information
 extraction. Partial funding for this project was provided by the
 Andrew W. Mellon Foundation.

 FreeCite is implemented in Ruby on Rails and uses the CRF++ library
 implementation of conditional random fields. The model is trained on
 the CORA dataset  with lexical augmentation from the Directory of
 Research and Researchers at Brown (DRR-B). The API and code are
 available at: http://freecite.library.brown.edu.

 Jean Rainwater
 Co-Leader, Integrated Technology Services
 Brown University Library
 Providence, RI 02912
 401.863.9031
 [EMAIL PROTECTED]



[CODE4LIB] New Open Source Citation Parser

2008-09-12 Thread jean rainwater
Please help us beta test FreeCite, a new citation parser for
non-structured bibliographic data. FreeCite is the result of
collaboration between the Brown University Library and Public Display,
a Providence-based software company founded by and employing many
Brown grads.  Public Display's core business is information
extraction. Partial funding for this project was provided by the
Andrew W. Mellon Foundation.

FreeCite is implemented in Ruby on Rails and uses the CRF++ library
implementation of conditional random fields. The model is trained on
the CORA dataset  with lexical augmentation from the Directory of
Research and Researchers at Brown (DRR-B). The API and code are
available at: http://freecite.library.brown.edu.

Jean Rainwater
Co-Leader, Integrated Technology Services
Brown University Library
Providence, RI 02912
401.863.9031
[EMAIL PROTECTED]


Re: [CODE4LIB] New Open Source Citation Parser

2008-09-12 Thread Mark A. Matienzo
Are you aware that there is an existing, yet embryonic, FLOSS project
called FreeCite?

http://www.freecite.org/

Mark Matienzo
Applications Developer, NYPL Labs
The New York Public Library

On Fri, Sep 12, 2008 at 2:25 PM, jean rainwater
[EMAIL PROTECTED] wrote:
 Please help us beta test FreeCite, a new citation parser for
 non-structured bibliographic data. FreeCite is the result of
 collaboration between the Brown University Library and Public Display,
 a Providence-based software company founded by and employing many
 Brown grads.  Public Display's core business is information
 extraction. Partial funding for this project was provided by the
 Andrew W. Mellon Foundation.

 FreeCite is implemented in Ruby on Rails and uses the CRF++ library
 implementation of conditional random fields. The model is trained on
 the CORA dataset  with lexical augmentation from the Directory of
 Research and Researchers at Brown (DRR-B). The API and code are
 available at: http://freecite.library.brown.edu.

 Jean Rainwater
 Co-Leader, Integrated Technology Services
 Brown University Library
 Providence, RI 02912
 401.863.9031
 [EMAIL PROTECTED]



Re: [CODE4LIB] New Open Source Citation Parser

2008-09-12 Thread Jonathan Rochkind
This is the third open source citation parser I know of now. A welcome change 
from a year ago when I needed one and didn't know of any! But I can't help but 
think maybe people should be cooperating more instead of engineering their own 
wheels. Also curious if anyone has looked at all three and can compare and 
contrast and make a reccommendation. 

The other two I know about are:

ParsCit -- http://wing.comp.nus.edu.sg/parsCit/
A CDL project I don't have a good home page for, but code is here: 
http://gales.cdlib.org/~egh/hmm-citation-extractor/

I've been keeping track because I have a use for this, although haven't had 
time to make use of any of them yet. 

Anyone want to compare and contrast these three projects?  Might make a good 
very short article/review for the Code4Lib Journal if you wanted to. 

Jonathan  


 jean rainwater [EMAIL PROTECTED] 09/12/08 2:25 PM 
Please help us beta test FreeCite, a new citation parser for
non-structured bibliographic data. FreeCite is the result of
collaboration between the Brown University Library and Public Display,
a Providence-based software company founded by and employing many
Brown grads.  Public Display's core business is information
extraction. Partial funding for this project was provided by the
Andrew W. Mellon Foundation.

FreeCite is implemented in Ruby on Rails and uses the CRF++ library
implementation of conditional random fields. The model is trained on
the CORA dataset  with lexical augmentation from the Directory of
Research and Researchers at Brown (DRR-B). The API and code are
available at: http://freecite.library.brown.edu.

Jean Rainwater
Co-Leader, Integrated Technology Services
Brown University Library
Providence, RI 02912
401.863.9031
[EMAIL PROTECTED]