Re: [CODE4LIB] New Open Source Citation Parser

2008-09-16 Thread Steve Oberg
A suggestion: you might want to also add Biblio-Citation-Parser by Mike
Jewell 
(http://search.cpan.org/~mjewell/Biblio-Citation-Parser-1.10/)http://search.cpan.org/~mjewell/Biblio-Citation-Parser-1.10/
Steve

On Tue, Sep 16, 2008 at 11:11 AM, Miriam Goldberg [EMAIL PROTECTED]wrote:

 Thanks for pointing out these other parsing tools. I've added them to
 the list on our website (see under heading Other Citation Tools at
 http://freecite.library.brown.edu/).

 Citation metadata extraction is a difficult open problem whose
 potential solutions are based on continually-developing technologies.
 So I think it's important that we approach this task from many diverse
 angles. If our project makes a little headway here, ParsCit makes some
 headway there, and five other groups make their own advancements,
 hopefully we'll be able to pool our findings into a viable
 application.

  Anyone want to compare and contrast these three projects?  Might make a
 good very
  short article/review for the Code4Lib Journal if you wanted to.

 Agreed. I'd love to see this. Another idea might be to write an
 application that takes the output of multiple parsers and assembles
 the best answer.

 On Fri, Sep 12, 2008 at 3:50 PM, Jonathan Rochkind [EMAIL PROTECTED]
 wrote:
  This is the third open source citation parser I know of now. A welcome
 change from a year ago when I needed one and didn't know of any! But I can't
 help but think maybe people should be cooperating more instead of
 engineering their own wheels. Also curious if anyone has looked at all three
 and can compare and contrast and make a reccommendation.
 
  The other two I know about are:
 
  ParsCit -- http://wing.comp.nus.edu.sg/parsCit/
  A CDL project I don't have a good home page for, but code is here:
 http://gales.cdlib.org/~egh/hmm-citation-extractor/
 
  I've been keeping track because I have a use for this, although haven't
 had time to make use of any of them yet.
 
  Anyone want to compare and contrast these three projects?  Might make a
 good very short article/review for the Code4Lib Journal if you wanted to.
 
  Jonathan
 
 
  jean rainwater [EMAIL PROTECTED] 09/12/08 2:25 PM 
  Please help us beta test FreeCite, a new citation parser for
  non-structured bibliographic data. FreeCite is the result of
  collaboration between the Brown University Library and Public Display,
  a Providence-based software company founded by and employing many
  Brown grads.  Public Display's core business is information
  extraction. Partial funding for this project was provided by the
  Andrew W. Mellon Foundation.
 
  FreeCite is implemented in Ruby on Rails and uses the CRF++ library
  implementation of conditional random fields. The model is trained on
  the CORA dataset  with lexical augmentation from the Directory of
  Research and Researchers at Brown (DRR-B). The API and code are
  available at: http://freecite.library.brown.edu.
 
  Jean Rainwater
  Co-Leader, Integrated Technology Services
  Brown University Library
  Providence, RI 02912
  401.863.9031
  [EMAIL PROTECTED]
 



[CODE4LIB] anyone know about Inera?

2008-07-11 Thread Steve Oberg
I recently became aware of a company that provides what it terms reference
correction software:  Inera.  This is the company that powers the crossRef
Simple Text Query box (http://www.crossref.org/freeTextQuery).

See http://www.inera.com/refcorrection.shtml for more details

Does anyone on this list have any knowledge of this company? I'm just
wondering if it would be better to use what they have rather than continue
to possibly reinvent the wheel for citation parsing.

Steve


Re: [CODE4LIB] anyone know about Inera?

2008-07-11 Thread Steve Oberg
Jason,

Thanks, yes, I knew of this effort and have actually spent a lot of time
working with this same software (or rather the same underlying software).
But I'm not sure it does enough or does it well enough for me at this point.
I'd like to take a list of one or two, up to hundreds of citations and dump
it into a web form and output SFX URLs as a result.

Steve

On Fri, Jul 11, 2008 at 1:51 PM, Jason Ronallo [EMAIL PROTECTED] wrote:

 Steve,
 If you need citation parsing, rather than reference correction, maybe
 this will work for you:
 http://aye.comp.nus.edu.sg/parsCit/

 I haven't had a chance to try it yet, though.

 Jason

 On Fri, Jul 11, 2008 at 11:51 AM, Steve Oberg [EMAIL PROTECTED] wrote:
  I recently became aware of a company that provides what it terms
 reference
  correction software:  Inera.  This is the company that powers the
 crossRef
  Simple Text Query box (http://www.crossref.org/freeTextQuery).
 
  See http://www.inera.com/refcorrection.shtml for more details
 
  Does anyone on this list have any knowledge of this company? I'm just
  wondering if it would be better to use what they have rather than
 continue
  to possibly reinvent the wheel for citation parsing.
 
  Steve
 



Re: [CODE4LIB] anyone know about Inera?

2008-07-11 Thread Steve Oberg
Ross,


Actually, SFX is probably not going to care what the title is.

 It's much more likely to care about the ISSN, volume and issue.


Yes, true. But linking to full text is only partly the issue when it comes
to using SFX in this way.  I also want to ensure that those articles that we
don't already have available in full text are directly routed to our
internal doc. delivery form (in SFX speak, using an svc.ill=yes in the
OpenURL).  This would of course mean that however the citation got parsed is
how that form is filled out.  Incorrect title information is a problem in
this case.

Now, if the matching targets are EBSCO or Proquest, you might have a
 problem (since they accept inbound OpenURLs from SFX), but I'm not
 sure, exactly.

 How many of these things do you have?


Literally, possibly thousands. I can't divulge a great amount of detail
(there we go again with that restriction on info.) of the exact use. But
let's just say there are many very large documents (in PDF or Word), each of
which contains between 100-400 article citations, that I am working with.
Why on earth try to provide article-level OpenURLs? Well, for many reasons.
I fully realize how much of a risk that is in terms of reliability and
maintenance.  But right now I just want a way to do this in bulk with a high
level of accuracy.

Steve


Re: [CODE4LIB] Enterprise Search and library collection [SEC=UNCLASSIFIED]

2008-07-10 Thread Steve Oberg
Renata and others,

After posting my original reply I realized how dumb it was to respond but
say, sorry, can't tell you more.  As an aside, this is one of the things
that irritates me the most about working in a for profit environment: the
control exerted by MPOW over just about anything. But hey, this is the job
situation I've consciously chosen so, I guess I shouldn't complain.

Although I can't name names and go into detail about our implementation, I
have anonymized screenshots of various aspects of it and posted details
about it at
http://familymanlibrarian.com/2007/01/21/more-on-turning-the-catalog-inside-out/
Keep in mind that my involvement has been focused on the catalog side.  A
lot of the behind-the-scenes work also dealt with matching subject terms in
catalog records to the much simpler taxonomy chosen for our website.  You
can imagine that it can be quite complicated to set up a good rule set for
matching LCSH or MeSH terms effectively to a more generic set of taxonomy
terms and have those be meaningful to end users. We are continually
evaluating and tweaking this setup.

As far as other general details, this implementation involved a lot of
people, in fact a team of about 15, some more directly and exclusively and
others peripherally.  In terms of maintenance, day to day maintenance is
handled by about three FTE.  Our library catalog data is refreshed once a
day, as is the citation database to which I referred in the previous email,
and content from our web content management environment.  A few other
repositories are updated weekly because their content isn't as volatile.
The whole planning and implementation process took a year and is still
really working through implementation issues. For example we recently
upgraded the version of our enterprise search tool to a newer version and
this was a major change requiring a lot of resources and it took a lot more
time to do than expected.

I hope this additional information is helpful.

Steve

On Tue, Jul 8, 2008 at 1:11 AM, Dyer, Renata [EMAIL PROTECTED]
wrote:

 Our organisation is looking into getting an enterprise search and I was
 wondering how many libraries out there have incorporated library
 collection into a 'federated' search that would retrieve a whole lot:
 a library collection items, external sources (websites, databases),
 internal documents (available on share drives and/or records systems),
 maybe even records from other internal applications, etc.?


 I would like to hear about your experience and what is good or bad about
 it.

 Please reply on or offline whichever more convenient.

 I'll collate answers.

 Thanks,

 Renata Dyer
 Systems Librarian
 Information Services
 The Treasury
 Langton Crescent, Parkes ACT 2600 Australia
 (p) 02 6263 2736
 (f) 02 6263 2738
 (e) [EMAIL PROTECTED]

 https://adot.sirsidynix.net.au/uhtbin/cgisirsi/ruzseo2h7g/0/0/49


 **
 Please Note: The information contained in this e-mail message
 and any attached files may be confidential information and
 may also be the subject of legal professional privilege.  If you are
 not the intended recipient, any use, disclosure or copying of this
 e-mail is unauthorised.  If you have received this e-mail by error
 please notify the sender immediately by reply e-mail and delete all
 copies of this transmission together with any attachments.
 **



Re: [CODE4LIB] Enterprise Search and library collection [SEC=UNCLASSIFIED]

2008-07-08 Thread Steve Oberg
Renata,

My library has done exactly this and we are in the second year of our
implementation.  We are using an enterprise search product and incorporating
several disparate data repositories including a very large citation
database, our library catalog, a fileshare, and some other things
(basically, everything you name).  It works but it is quite complicated and
therefore, at times, fragile.  In addition to incorporating several
different repositories, we also use this enterprise search tool to power
dynamic content throughout the site (e.g. list of journals for subject x,
browse by subject, etc.).  I'm purposely not giving specific details of the
tool(s) we use because I'm not allowed to share this information outside of
the company where I work.

Steve

On Tue, Jul 8, 2008 at 9:14 AM, Jason Stirnaman [EMAIL PROTECTED] wrote:

 Renata,

 We haven't implemented anything yet, but we did recently issue an RFI
 for exactly this and evaluated the vendors who responded.  Our biggest
 challenge is still getting sufficient institutional buy-in.  So, we will
 likely be conducting small, focused pilots with two different vendors.
 I would be happy to share our RFI and some of our evaluation results
 with you off list.

 Jason
 --

 Jason Stirnaman
 Digital Projects Librarian/School of Medicine Support
 A.R. Dykes Library, University of Kansas Medical Center
 [EMAIL PROTECTED]
 913-588-7319


  On 7/8/2008 at 1:11 AM, in message
 [EMAIL PROTECTED],
 Dyer,
 Renata [EMAIL PROTECTED] wrote:
  Our organisation is looking into getting an enterprise search and I
 was
  wondering how many libraries out there have incorporated library
  collection into a 'federated' search that would retrieve a whole
 lot:
  a library collection items, external sources (websites, databases),
  internal documents (available on share drives and/or records
 systems),
  maybe even records from other internal applications, etc.?
 
 
  I would like to hear about your experience and what is good or bad
 about
  it.
 
  Please reply on or offline whichever more convenient.
 
  I'll collate answers.
 
  Thanks,
 
  Renata Dyer
  Systems Librarian
  Information Services
  The Treasury
  Langton Crescent, Parkes ACT 2600 Australia
  (p) 02 6263 2736
  (f) 02 6263 2738
  (e) [EMAIL PROTECTED]
 
  https://adot.sirsidynix.net.au/uhtbin/cgisirsi/ruzseo2h7g/0/0/49
 
 
 
 **
  Please Note: The information contained in this e-mail message
  and any attached files may be confidential information and
  may also be the subject of legal professional privilege.  If you are
  not the intended recipient, any use, disclosure or copying of this
  e-mail is unauthorised.  If you have received this e-mail by error
  please notify the sender immediately by reply e-mail and delete all
  copies of this transmission together with any attachments.
 
 **



Re: [CODE4LIB] ssh tunneling through a mysql dsn

2008-06-25 Thread Steve Oberg
Not sure if I'm understanding Eric's original scenario correctly but...This
setup of needing to support SSH tunneling through to an Oracle database is
exactly what we have setup in my library using SecureCRT (
http://www.vandyke.com/products/securecrt/).  I think this software is quite
useful and supports keys and all the rest ( Set up SSH keys such that
building the tunnel doesn't prompt for a password, * Run the local end of
the tunnel on a free port, * Configure your local client to talk to the
local end of the tunnel).  This is an essential piece of our infrastructure
and we have SecureCRT set up on servers as well as individual PCs to ensure
secure transmission of information to sources outside our firewall.

Steve

On Wed, Jun 25, 2008 at 11:56 AM, Birkin James Diana [EMAIL PROTECTED]
wrote:

 On Jun 25, 2008, at 8:59 AM, Eric Lease Morgan wrote:

  Is there anyway to support SSH tunneling through a MySQL DSN?


 Not sure if this is exactly relevant, but I used to need to access a remote
 mysql database not open to internet access, and came to love ssh tunneling.
 Some notes:

 http://bspace.us/notes/entries/ssh-tunneling-notes/

 Also, for a different reason, I needed to handle passwordless logins. This
 might be of some use:

 http://bspace.us/notes/entries/passwordless-logins/

 ---
 Birkin James Diana
 Programmer, Integrated Technology Services
 Brown University Library
 [EMAIL PROTECTED]



Re: [CODE4LIB] alpha characters used for field names

2008-06-25 Thread Steve Oberg
Eric,

This is definitely not a feature of MARC but rather a feature of your local
ILS (Aleph 500).  Those are local fields for which you'd need to make a
translation to a standard MARC field if you wanted to move that information
to another system that is based on MARC.

Steve

On Wed, Jun 25, 2008 at 2:20 PM, Eric Lease Morgan [EMAIL PROTECTED] wrote:

 Are alpha characters used for field names valid in MARC records?

 When we do dumps of MARC records our ILS often dumps them with FMT and CAT
 field names. So not only do I have glorious 246 fields and 100 fields but I
 also have CAT fields and FMT fields. Are these features of my ILS --
 extensions of the standard -- or really a part of MARC? Moreover, does
 something like Marc4J or MARC::Batch and friends deal with these alpha field
 names correctly?

 --
 Eric Lease Morgan



Re: [CODE4LIB] alpha characters used for field names

2008-06-25 Thread Steve Oberg
Ok. What's allowable/possible vs. what is actually defined as part of
variable MARC data fields in say MARC21.  I'm amused by the
hairsplitting. The bottom line is these particular fields are ALEPH
specific and are not part of MARC21.  I agree with others that
accounting for these in whatever parsing program you use should not be
a big deal.

Steve



On 6/25/08, Jonathan Rochkind [EMAIL PROTECTED] wrote:
 I believe that alpha characters for field names ARE legal according to
 (most of the various) MARC standard(s). But they are not generally used
 in library MARC data.

 Jonathan

 Eric Lease Morgan wrote:
 Are alpha characters used for field names valid in MARC records?

 When we do dumps of MARC records our ILS often dumps them with FMT and
 CAT field names. So not only do I have glorious 246 fields and 100
 fields but I also have CAT fields and FMT fields. Are these features
 of my ILS -- extensions of the standard -- or really a part of MARC?
 Moreover, does something like Marc4J or MARC::Batch and friends deal
 with these alpha field names correctly?

 --Eric Lease Morgan


 --
 Jonathan Rochkind
 Digital Services Software Engineer
 The Sheridan Libraries
 Johns Hopkins University
 410.516.8886
 rochkind (at) jhu.edu



Re: [CODE4LIB] planet.code4lib.org -- 3 suggestions

2008-05-21 Thread Steve Oberg
Just wanted to mention that Mark (Lindner) *does* know his blog is linked
from planet code4lib.  He didn't ask for it to be, it was just linked, and
quite a while ago.

I've asked about having my blog linked there too but I definitely don't
intend to change the content (much of which is purposefully personal or
non-library related) to accomodate other's reading interests.  I think I had
a conversation with Ed Summers quite a while ago and he encouraged me to go
ahead and request it be added but I never followed through.

I'd be ok with Ranti's suggestion but I personally feel that placing some
sort of onus on blog authors isn't a great solution.  I sort through 100s of
threads every day and am quite happy ignoring ones on topics that aren't on
point or what I am looking for and I think this would be ok for planet
code4lib as well.

$.02 from a lurker.

Steve

On Wed, May 21, 2008 at 4:53 PM, Jonathan Gorman [EMAIL PROTECTED] wrote:

 Catching up on some of Mark's posts I can see why some might want him off.
  Perhaps someone who's more emotionally attached to the issue of removal
 might just want to contact him and see if he knows he's on the list or if he
 wants to remain on?

 I realized I don't honestly care enough about the planet one way or the
 other.  I'd be sad to see it go, but I wouldn't wail in misery.

 Jon

  Original message 
 Date: Wed, 21 May 2008 17:31:03 -0400
 From: Jonathan Rochkind [EMAIL PROTECTED]
 Subject: Re: [CODE4LIB] planet.code4lib.org -- 3 suggestions
 To: CODE4LIB@LISTSERV.ND.EDU
 
 No one other than me is managing it at present. Pretty much the only
 'management' I do is adding blogs whenever someone asks me too. (I also
 did just a bit of fine-tuning of the CSS for the html version).  I think
 it may be the planet software that decides what order to display
 lastname and firstname, but feel free to email me ones that are
 displaying oddly, and I'll see if I can fix them. I'm not going to get
 into serious hacking of the planet software though, or replacing it with
 other software (I _maybe_ could be convinced to upgrade it if there's an
 upgrade available).  (if anyone else wants to do any of that stuff,
 raise your hand on the list, and we can probably get you access).
 
 An unanswered question is when or if the community ever expects me to
 _remove_ blogs from the planet.  It's not clear. I don't want to remove
 them if people are going to see it as an abuse of power or something, as
 some have indicated they would. (Most could probably care less either
 way).
 
 Other blogs people have suggested I remove from the code4lib aggregator,
 as consisting of mainly nontopical content for code4lib, are Mark
 Lindner and Meredith Farkas.  I guess say so if you'd like to LEAVE
 those on the aggregator, and if nobody says so, I'll leave them. If
 someone does say so... then I have no idea. :)
 
 Jonathan
 
 Jodi Schneider wrote:
  I'm a big fan of the planet aggregator. Normally I make suggestions on
  #code4lib. However, Jonathan Rochkind asked me to bring them up onlist
  this time. (Who besides Jonathan is managing the planet at present?)
 
  (1) Bjorn Tipling suggested removing him, since he's going to focus on
  politics:
  Some of the places where my blog is being tracked, such as code4lib and
  netlamers, might want to look at whether or not they want to continue to
  follow me.
  http://bjorn.tipling.com/2008/05/17/blog-pundits/
  Can we remove his blog please?
 
  (2) I'd really like a changelog--which might further justify
  adding/dropping blogs without discussion.
 
  (3) Could we please label blogs consistently? For individuals, we have
  mostly lastname, firstname with a few firstname lastname. Either way
  works. But the mixture rankles (sad, I know!).
 
  Thanks!
 
  -Jodi
 
  Jodi Schneider
  Science Library Specialist
  Amherst College
  413-542-2076
 
 
 
 --
 Jonathan Rochkind
 Digital Services Software Engineer
 The Sheridan Libraries
 Johns Hopkins University
 410.516.8886
 rochkind (at) jhu.edu



Re: [CODE4LIB] free movie cover images?

2008-05-19 Thread Steve Oberg
All,

This has been an interesting discussion and frankly it is not uncommon in my
experience for these kinds of questions to arise. Not sure I have anything
to add in terms of answers, but see my response below to one part of Peter
Keane's recent message.

Looked at another way: a thumbnail is just a bit of visual metadata,
 and you cannot copyright metadata.


Well, it's been tried before.  In the mid-80s there was a big hullabaloo
over what was seen or interpreted as OCLC copyrighting all MARC records in
its WorldCat database.  See the following informative website that goes into
the context and history of the incident:

Guidelines for the Use and Transfer of OCLC-Derived Records [OCLC]
http://www.oclc.org/support/documentation/worldcat/records/guidelines/

Steve


Re: [CODE4LIB] HubMed defunct?

2008-05-01 Thread Steve Oberg
Ed,

Sorry for the extremely late response.  I, too, contacted Alf and was
relieved that it was a simple matter for him to get the site accessible
again.

Steve

On Fri, Apr 25, 2008 at 8:43 AM, Ed Summers [EMAIL PROTECTED] wrote:

 I just pinged Alf Eaton in IM and he said that he's got it back up
 now...there was some snafu with the domain registration that's now
 been corrected. He was pleased to see that you cared enough to write
 to code4lib :-)

 //Ed

 On Thu, Apr 24, 2008 at 2:25 PM, Steve Oberg [EMAIL PROTECTED] wrote:
  I don't know if anyone else on this discussion list knows about or has
 ever
   used HubMed (www.hubmed.org), an alternate interface to PubMed.  But if
 you
   have, did you know the site appears to be defunct now?  If this is
   temporary, relief.  If not, well, it'll be upsetting.  This is a very
 handy
   and feature rich site that I've used quite a bit lately, especially the
   Citation Finder part.
 
   Steve
 



[CODE4LIB] HubMed defunct?

2008-04-24 Thread Steve Oberg
I don't know if anyone else on this discussion list knows about or has ever
used HubMed (www.hubmed.org), an alternate interface to PubMed.  But if you
have, did you know the site appears to be defunct now?  If this is
temporary, relief.  If not, well, it'll be upsetting.  This is a very handy
and feature rich site that I've used quite a bit lately, especially the
Citation Finder part.

Steve