Re: [CODE4LIB] New Open Source Citation Parser
A suggestion: you might want to also add Biblio-Citation-Parser by Mike Jewell (http://search.cpan.org/~mjewell/Biblio-Citation-Parser-1.10/)http://search.cpan.org/~mjewell/Biblio-Citation-Parser-1.10/ Steve On Tue, Sep 16, 2008 at 11:11 AM, Miriam Goldberg [EMAIL PROTECTED]wrote: Thanks for pointing out these other parsing tools. I've added them to the list on our website (see under heading Other Citation Tools at http://freecite.library.brown.edu/). Citation metadata extraction is a difficult open problem whose potential solutions are based on continually-developing technologies. So I think it's important that we approach this task from many diverse angles. If our project makes a little headway here, ParsCit makes some headway there, and five other groups make their own advancements, hopefully we'll be able to pool our findings into a viable application. Anyone want to compare and contrast these three projects? Might make a good very short article/review for the Code4Lib Journal if you wanted to. Agreed. I'd love to see this. Another idea might be to write an application that takes the output of multiple parsers and assembles the best answer. On Fri, Sep 12, 2008 at 3:50 PM, Jonathan Rochkind [EMAIL PROTECTED] wrote: This is the third open source citation parser I know of now. A welcome change from a year ago when I needed one and didn't know of any! But I can't help but think maybe people should be cooperating more instead of engineering their own wheels. Also curious if anyone has looked at all three and can compare and contrast and make a reccommendation. The other two I know about are: ParsCit -- http://wing.comp.nus.edu.sg/parsCit/ A CDL project I don't have a good home page for, but code is here: http://gales.cdlib.org/~egh/hmm-citation-extractor/ I've been keeping track because I have a use for this, although haven't had time to make use of any of them yet. Anyone want to compare and contrast these three projects? Might make a good very short article/review for the Code4Lib Journal if you wanted to. Jonathan jean rainwater [EMAIL PROTECTED] 09/12/08 2:25 PM Please help us beta test FreeCite, a new citation parser for non-structured bibliographic data. FreeCite is the result of collaboration between the Brown University Library and Public Display, a Providence-based software company founded by and employing many Brown grads. Public Display's core business is information extraction. Partial funding for this project was provided by the Andrew W. Mellon Foundation. FreeCite is implemented in Ruby on Rails and uses the CRF++ library implementation of conditional random fields. The model is trained on the CORA dataset with lexical augmentation from the Directory of Research and Researchers at Brown (DRR-B). The API and code are available at: http://freecite.library.brown.edu. Jean Rainwater Co-Leader, Integrated Technology Services Brown University Library Providence, RI 02912 401.863.9031 [EMAIL PROTECTED]
[CODE4LIB] anyone know about Inera?
I recently became aware of a company that provides what it terms reference correction software: Inera. This is the company that powers the crossRef Simple Text Query box (http://www.crossref.org/freeTextQuery). See http://www.inera.com/refcorrection.shtml for more details Does anyone on this list have any knowledge of this company? I'm just wondering if it would be better to use what they have rather than continue to possibly reinvent the wheel for citation parsing. Steve
Re: [CODE4LIB] anyone know about Inera?
Jason, Thanks, yes, I knew of this effort and have actually spent a lot of time working with this same software (or rather the same underlying software). But I'm not sure it does enough or does it well enough for me at this point. I'd like to take a list of one or two, up to hundreds of citations and dump it into a web form and output SFX URLs as a result. Steve On Fri, Jul 11, 2008 at 1:51 PM, Jason Ronallo [EMAIL PROTECTED] wrote: Steve, If you need citation parsing, rather than reference correction, maybe this will work for you: http://aye.comp.nus.edu.sg/parsCit/ I haven't had a chance to try it yet, though. Jason On Fri, Jul 11, 2008 at 11:51 AM, Steve Oberg [EMAIL PROTECTED] wrote: I recently became aware of a company that provides what it terms reference correction software: Inera. This is the company that powers the crossRef Simple Text Query box (http://www.crossref.org/freeTextQuery). See http://www.inera.com/refcorrection.shtml for more details Does anyone on this list have any knowledge of this company? I'm just wondering if it would be better to use what they have rather than continue to possibly reinvent the wheel for citation parsing. Steve
Re: [CODE4LIB] anyone know about Inera?
Ross, Actually, SFX is probably not going to care what the title is. It's much more likely to care about the ISSN, volume and issue. Yes, true. But linking to full text is only partly the issue when it comes to using SFX in this way. I also want to ensure that those articles that we don't already have available in full text are directly routed to our internal doc. delivery form (in SFX speak, using an svc.ill=yes in the OpenURL). This would of course mean that however the citation got parsed is how that form is filled out. Incorrect title information is a problem in this case. Now, if the matching targets are EBSCO or Proquest, you might have a problem (since they accept inbound OpenURLs from SFX), but I'm not sure, exactly. How many of these things do you have? Literally, possibly thousands. I can't divulge a great amount of detail (there we go again with that restriction on info.) of the exact use. But let's just say there are many very large documents (in PDF or Word), each of which contains between 100-400 article citations, that I am working with. Why on earth try to provide article-level OpenURLs? Well, for many reasons. I fully realize how much of a risk that is in terms of reliability and maintenance. But right now I just want a way to do this in bulk with a high level of accuracy. Steve
Re: [CODE4LIB] Enterprise Search and library collection [SEC=UNCLASSIFIED]
Renata and others, After posting my original reply I realized how dumb it was to respond but say, sorry, can't tell you more. As an aside, this is one of the things that irritates me the most about working in a for profit environment: the control exerted by MPOW over just about anything. But hey, this is the job situation I've consciously chosen so, I guess I shouldn't complain. Although I can't name names and go into detail about our implementation, I have anonymized screenshots of various aspects of it and posted details about it at http://familymanlibrarian.com/2007/01/21/more-on-turning-the-catalog-inside-out/ Keep in mind that my involvement has been focused on the catalog side. A lot of the behind-the-scenes work also dealt with matching subject terms in catalog records to the much simpler taxonomy chosen for our website. You can imagine that it can be quite complicated to set up a good rule set for matching LCSH or MeSH terms effectively to a more generic set of taxonomy terms and have those be meaningful to end users. We are continually evaluating and tweaking this setup. As far as other general details, this implementation involved a lot of people, in fact a team of about 15, some more directly and exclusively and others peripherally. In terms of maintenance, day to day maintenance is handled by about three FTE. Our library catalog data is refreshed once a day, as is the citation database to which I referred in the previous email, and content from our web content management environment. A few other repositories are updated weekly because their content isn't as volatile. The whole planning and implementation process took a year and is still really working through implementation issues. For example we recently upgraded the version of our enterprise search tool to a newer version and this was a major change requiring a lot of resources and it took a lot more time to do than expected. I hope this additional information is helpful. Steve On Tue, Jul 8, 2008 at 1:11 AM, Dyer, Renata [EMAIL PROTECTED] wrote: Our organisation is looking into getting an enterprise search and I was wondering how many libraries out there have incorporated library collection into a 'federated' search that would retrieve a whole lot: a library collection items, external sources (websites, databases), internal documents (available on share drives and/or records systems), maybe even records from other internal applications, etc.? I would like to hear about your experience and what is good or bad about it. Please reply on or offline whichever more convenient. I'll collate answers. Thanks, Renata Dyer Systems Librarian Information Services The Treasury Langton Crescent, Parkes ACT 2600 Australia (p) 02 6263 2736 (f) 02 6263 2738 (e) [EMAIL PROTECTED] https://adot.sirsidynix.net.au/uhtbin/cgisirsi/ruzseo2h7g/0/0/49 ** Please Note: The information contained in this e-mail message and any attached files may be confidential information and may also be the subject of legal professional privilege. If you are not the intended recipient, any use, disclosure or copying of this e-mail is unauthorised. If you have received this e-mail by error please notify the sender immediately by reply e-mail and delete all copies of this transmission together with any attachments. **
Re: [CODE4LIB] Enterprise Search and library collection [SEC=UNCLASSIFIED]
Renata, My library has done exactly this and we are in the second year of our implementation. We are using an enterprise search product and incorporating several disparate data repositories including a very large citation database, our library catalog, a fileshare, and some other things (basically, everything you name). It works but it is quite complicated and therefore, at times, fragile. In addition to incorporating several different repositories, we also use this enterprise search tool to power dynamic content throughout the site (e.g. list of journals for subject x, browse by subject, etc.). I'm purposely not giving specific details of the tool(s) we use because I'm not allowed to share this information outside of the company where I work. Steve On Tue, Jul 8, 2008 at 9:14 AM, Jason Stirnaman [EMAIL PROTECTED] wrote: Renata, We haven't implemented anything yet, but we did recently issue an RFI for exactly this and evaluated the vendors who responded. Our biggest challenge is still getting sufficient institutional buy-in. So, we will likely be conducting small, focused pilots with two different vendors. I would be happy to share our RFI and some of our evaluation results with you off list. Jason -- Jason Stirnaman Digital Projects Librarian/School of Medicine Support A.R. Dykes Library, University of Kansas Medical Center [EMAIL PROTECTED] 913-588-7319 On 7/8/2008 at 1:11 AM, in message [EMAIL PROTECTED], Dyer, Renata [EMAIL PROTECTED] wrote: Our organisation is looking into getting an enterprise search and I was wondering how many libraries out there have incorporated library collection into a 'federated' search that would retrieve a whole lot: a library collection items, external sources (websites, databases), internal documents (available on share drives and/or records systems), maybe even records from other internal applications, etc.? I would like to hear about your experience and what is good or bad about it. Please reply on or offline whichever more convenient. I'll collate answers. Thanks, Renata Dyer Systems Librarian Information Services The Treasury Langton Crescent, Parkes ACT 2600 Australia (p) 02 6263 2736 (f) 02 6263 2738 (e) [EMAIL PROTECTED] https://adot.sirsidynix.net.au/uhtbin/cgisirsi/ruzseo2h7g/0/0/49 ** Please Note: The information contained in this e-mail message and any attached files may be confidential information and may also be the subject of legal professional privilege. If you are not the intended recipient, any use, disclosure or copying of this e-mail is unauthorised. If you have received this e-mail by error please notify the sender immediately by reply e-mail and delete all copies of this transmission together with any attachments. **
Re: [CODE4LIB] ssh tunneling through a mysql dsn
Not sure if I'm understanding Eric's original scenario correctly but...This setup of needing to support SSH tunneling through to an Oracle database is exactly what we have setup in my library using SecureCRT ( http://www.vandyke.com/products/securecrt/). I think this software is quite useful and supports keys and all the rest ( Set up SSH keys such that building the tunnel doesn't prompt for a password, * Run the local end of the tunnel on a free port, * Configure your local client to talk to the local end of the tunnel). This is an essential piece of our infrastructure and we have SecureCRT set up on servers as well as individual PCs to ensure secure transmission of information to sources outside our firewall. Steve On Wed, Jun 25, 2008 at 11:56 AM, Birkin James Diana [EMAIL PROTECTED] wrote: On Jun 25, 2008, at 8:59 AM, Eric Lease Morgan wrote: Is there anyway to support SSH tunneling through a MySQL DSN? Not sure if this is exactly relevant, but I used to need to access a remote mysql database not open to internet access, and came to love ssh tunneling. Some notes: http://bspace.us/notes/entries/ssh-tunneling-notes/ Also, for a different reason, I needed to handle passwordless logins. This might be of some use: http://bspace.us/notes/entries/passwordless-logins/ --- Birkin James Diana Programmer, Integrated Technology Services Brown University Library [EMAIL PROTECTED]
Re: [CODE4LIB] alpha characters used for field names
Eric, This is definitely not a feature of MARC but rather a feature of your local ILS (Aleph 500). Those are local fields for which you'd need to make a translation to a standard MARC field if you wanted to move that information to another system that is based on MARC. Steve On Wed, Jun 25, 2008 at 2:20 PM, Eric Lease Morgan [EMAIL PROTECTED] wrote: Are alpha characters used for field names valid in MARC records? When we do dumps of MARC records our ILS often dumps them with FMT and CAT field names. So not only do I have glorious 246 fields and 100 fields but I also have CAT fields and FMT fields. Are these features of my ILS -- extensions of the standard -- or really a part of MARC? Moreover, does something like Marc4J or MARC::Batch and friends deal with these alpha field names correctly? -- Eric Lease Morgan
Re: [CODE4LIB] alpha characters used for field names
Ok. What's allowable/possible vs. what is actually defined as part of variable MARC data fields in say MARC21. I'm amused by the hairsplitting. The bottom line is these particular fields are ALEPH specific and are not part of MARC21. I agree with others that accounting for these in whatever parsing program you use should not be a big deal. Steve On 6/25/08, Jonathan Rochkind [EMAIL PROTECTED] wrote: I believe that alpha characters for field names ARE legal according to (most of the various) MARC standard(s). But they are not generally used in library MARC data. Jonathan Eric Lease Morgan wrote: Are alpha characters used for field names valid in MARC records? When we do dumps of MARC records our ILS often dumps them with FMT and CAT field names. So not only do I have glorious 246 fields and 100 fields but I also have CAT fields and FMT fields. Are these features of my ILS -- extensions of the standard -- or really a part of MARC? Moreover, does something like Marc4J or MARC::Batch and friends deal with these alpha field names correctly? --Eric Lease Morgan -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu
Re: [CODE4LIB] planet.code4lib.org -- 3 suggestions
Just wanted to mention that Mark (Lindner) *does* know his blog is linked from planet code4lib. He didn't ask for it to be, it was just linked, and quite a while ago. I've asked about having my blog linked there too but I definitely don't intend to change the content (much of which is purposefully personal or non-library related) to accomodate other's reading interests. I think I had a conversation with Ed Summers quite a while ago and he encouraged me to go ahead and request it be added but I never followed through. I'd be ok with Ranti's suggestion but I personally feel that placing some sort of onus on blog authors isn't a great solution. I sort through 100s of threads every day and am quite happy ignoring ones on topics that aren't on point or what I am looking for and I think this would be ok for planet code4lib as well. $.02 from a lurker. Steve On Wed, May 21, 2008 at 4:53 PM, Jonathan Gorman [EMAIL PROTECTED] wrote: Catching up on some of Mark's posts I can see why some might want him off. Perhaps someone who's more emotionally attached to the issue of removal might just want to contact him and see if he knows he's on the list or if he wants to remain on? I realized I don't honestly care enough about the planet one way or the other. I'd be sad to see it go, but I wouldn't wail in misery. Jon Original message Date: Wed, 21 May 2008 17:31:03 -0400 From: Jonathan Rochkind [EMAIL PROTECTED] Subject: Re: [CODE4LIB] planet.code4lib.org -- 3 suggestions To: CODE4LIB@LISTSERV.ND.EDU No one other than me is managing it at present. Pretty much the only 'management' I do is adding blogs whenever someone asks me too. (I also did just a bit of fine-tuning of the CSS for the html version). I think it may be the planet software that decides what order to display lastname and firstname, but feel free to email me ones that are displaying oddly, and I'll see if I can fix them. I'm not going to get into serious hacking of the planet software though, or replacing it with other software (I _maybe_ could be convinced to upgrade it if there's an upgrade available). (if anyone else wants to do any of that stuff, raise your hand on the list, and we can probably get you access). An unanswered question is when or if the community ever expects me to _remove_ blogs from the planet. It's not clear. I don't want to remove them if people are going to see it as an abuse of power or something, as some have indicated they would. (Most could probably care less either way). Other blogs people have suggested I remove from the code4lib aggregator, as consisting of mainly nontopical content for code4lib, are Mark Lindner and Meredith Farkas. I guess say so if you'd like to LEAVE those on the aggregator, and if nobody says so, I'll leave them. If someone does say so... then I have no idea. :) Jonathan Jodi Schneider wrote: I'm a big fan of the planet aggregator. Normally I make suggestions on #code4lib. However, Jonathan Rochkind asked me to bring them up onlist this time. (Who besides Jonathan is managing the planet at present?) (1) Bjorn Tipling suggested removing him, since he's going to focus on politics: Some of the places where my blog is being tracked, such as code4lib and netlamers, might want to look at whether or not they want to continue to follow me. http://bjorn.tipling.com/2008/05/17/blog-pundits/ Can we remove his blog please? (2) I'd really like a changelog--which might further justify adding/dropping blogs without discussion. (3) Could we please label blogs consistently? For individuals, we have mostly lastname, firstname with a few firstname lastname. Either way works. But the mixture rankles (sad, I know!). Thanks! -Jodi Jodi Schneider Science Library Specialist Amherst College 413-542-2076 -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu
Re: [CODE4LIB] free movie cover images?
All, This has been an interesting discussion and frankly it is not uncommon in my experience for these kinds of questions to arise. Not sure I have anything to add in terms of answers, but see my response below to one part of Peter Keane's recent message. Looked at another way: a thumbnail is just a bit of visual metadata, and you cannot copyright metadata. Well, it's been tried before. In the mid-80s there was a big hullabaloo over what was seen or interpreted as OCLC copyrighting all MARC records in its WorldCat database. See the following informative website that goes into the context and history of the incident: Guidelines for the Use and Transfer of OCLC-Derived Records [OCLC] http://www.oclc.org/support/documentation/worldcat/records/guidelines/ Steve
Re: [CODE4LIB] HubMed defunct?
Ed, Sorry for the extremely late response. I, too, contacted Alf and was relieved that it was a simple matter for him to get the site accessible again. Steve On Fri, Apr 25, 2008 at 8:43 AM, Ed Summers [EMAIL PROTECTED] wrote: I just pinged Alf Eaton in IM and he said that he's got it back up now...there was some snafu with the domain registration that's now been corrected. He was pleased to see that you cared enough to write to code4lib :-) //Ed On Thu, Apr 24, 2008 at 2:25 PM, Steve Oberg [EMAIL PROTECTED] wrote: I don't know if anyone else on this discussion list knows about or has ever used HubMed (www.hubmed.org), an alternate interface to PubMed. But if you have, did you know the site appears to be defunct now? If this is temporary, relief. If not, well, it'll be upsetting. This is a very handy and feature rich site that I've used quite a bit lately, especially the Citation Finder part. Steve
[CODE4LIB] HubMed defunct?
I don't know if anyone else on this discussion list knows about or has ever used HubMed (www.hubmed.org), an alternate interface to PubMed. But if you have, did you know the site appears to be defunct now? If this is temporary, relief. If not, well, it'll be upsetting. This is a very handy and feature rich site that I've used quite a bit lately, especially the Citation Finder part. Steve