Re: [CODE4LIB] anti-harassment policy for code4lib?
bess++ giarlo++ matienzo++ tennant++ all who have agreed to volunteer++ I think there are plenty of volunteers, so I'll gladly defer to others. (If you do need more, you know where to find me.) I trust you guys to make it sensible, not too formal, blah blah. As for signing personal names -- I hate that we have such a litigious society, but we do. I would certainly sign my support for a motion, but I would not want any of us to be individually responsible in a legal sense for some else's behavior. So please be careful! I'm pondering if a code of conduct (the positive things we want) would be a nice counterpart to explicitly stating what we don't condone (anti-harrassment policy). It should be low barrier and low risk for individuals to tell us/someone when they feel uncomfortable. Hopefully with enough detail to allow for remediation/change. Lastly, I'd like to hang on to the sense that an individual who has been called out in a transgression has an opportunity to make amends, to avoid future incidents and to remain in the community. I commit so many social blunders that it scares me to think I could be excluded from this great community from an unintentional consequence of a poorly filtered action. - Naomi who is understanding why legal code gets so frickin' complicated! On Nov 26, 2012, at 4:47 PM, Michael J. Giarlo wrote: Hi Kyle, IMO, this is less an instrument to keep people playing nice and more an instrument to point to in the event that we have to take action against an offender. -Mike On Mon, Nov 26, 2012 at 7:42 PM, Kyle Banerjee kyle.baner...@gmail.comwrote: On Mon, Nov 26, 2012 at 4:15 PM, Jon Stroop jstr...@princeton.edu wrote: It's sad that we have to address this formally (as formal as c4l gets anyway), but that's reality, so yes, bess++ indeed, and mjgiarlo++, anarchivist++ for the quick assist. This. To that end, and as a show of (positive) force--not to mention how cool our community is--I think it might be neat if we could find a way to make whatever winds up being drafted something we can sign; i.e. attach our personal names Diversity and inclusiveness is a state of mind, and our individual and collective actions exert that force than any policy or pledge ever could. I'm hoping that things can be handled with the minimum formality necessary and that if something needs to be fixed, people can just talk about it so things can be made right. If we need a policy, I'm all for it. But it's truly a sad day if policy rather than just being motivated to do the right thing is what's keeping people playing nice. kyle
Re: [CODE4LIB] regexp for LCC?
You could also try to use the code I put in SolrMarc utilities classes ha ha ha. - Naomi On Mar 31, 2011, at 10:25 AM, Keith Jenkins wrote: The Google Code regex looks like it will accept any 1-3 letters at the start of the call number. But LCC has no I, O, W, X, or Y classifications. So you might want to use something more like ^[A-HJ-NP-VZ] at the start of the regex. Also, there are only a few major classifications that use three letters. Like DJK, and several in the Ks. I'm not sure, but there might be others. Keith On Thu, Mar 31, 2011 at 1:11 PM, Jonathan Rochkind rochk...@jhu.edu wrote: Except now I wonder if those annoying MLCS call numbers might actually be properly MATCHED by this regex, when I need em excluded. They are annoying _similar_ to a classified call number. Well, one way to find out. And the reason this matters is to try and use an LCC to map to a 'discipline' or other broad category, either directly from the LCC schedule labels, or using a mapping like umich's: http://www.lib.umich.edu/browse/categories/ But if it's not really an LCC at all, and you try to map it, you'll get bad postings. On 3/31/2011 1:03 PM, Jonathan Rochkind wrote: Thanks, that looks good! It's hosted on Google Code, but I don't think that code is anything Google uses, it looks like it's from our very own Bill Dueber. On 3/31/2011 12:38 PM, Tod Olson wrote: Check the regexp that Google uses in their call number normalization: http://code.google.com/p/library-callnumber-lc/wiki/Home You may want to remove the prefix part, and allow for a fourth cutter. The folks at UNC pointed me to this a few months ago. -Tod On Mar 31, 2011, at 11:29 AM, Jonathan Rochkind wrote: Does anyone have a good regular expression that will match all legal LC Call Numbers from the LC Classified Schedule, but will generally not match things that could not possibly be an LC Call Number from the LC Classified Schedule? In particular, I need it to NOT match an MLC call number, which is an LC assigned call number that shows up in an 050 with no way to distinguish based on indicators, but isn't actually from the LC Schedules. Here's an example of an MLC call number: MLCS 83/5180 (P) Hmm, maybe all MLC call numbers begin with MLC, okay I guess I can exclude them just like that. But it looks like there are also OTHER things that can show up in the 050 but aren't actually from the classified schedule, the OCLC documentation even contains an example of Microfilm 19072 E. What a mess, huh? So, yeah, regex anyone? [You can probably guess why I care if it's from the LC Classified Schedule or not]. Tod Olsont...@uchicago.edu Systems Librarian University of Chicago Library
Re: [CODE4LIB] A to Z lists
if you put the info in a Solr index, you could use Blacklight on top. On Feb 16, 2011, at 1:18 PM, Michele DeSilva wrote: Hi Code4Lib-ers, I want to chime in and say that I, too, enjoyed the streaming archive from the conference. I also have a question: my library has a horribly antiquated A to Z list of databases and online resources (it's based in Access). We'd like to do something that looks more modern and is far more user friendly. I found a great article in the Code4Lib journal (issue 12, by Danielle Rosenthal Mario Bernado) about building a searchable A to Z list using Drupal. I'm also wondering what other institutions have done as far as in-house solutions. I know there're products we could buy, but, like everyone else, we don't have much money at the moment. Thanks for any info or advice! Michele DeSilva Central Oregon Community College Library Emerging Technologies Librarian 541-383-7565 mdesi...@cocc.edu
[CODE4LIB] links for relevancy testing talk
What I should have said at my talk: this approach to relevancy testing leaves a lot of room for improvement. What else is out there? My slides, as a pdf: http://www.stanford.edu/~ndushay/code4lib2011/code4lib2011-dushay-relevancy-testing.pdf Additional documents: http://www.stanford.edu/~ndushay/code4lib2011/ My blog: http://discovery-grindstone.blogspot.com/ - instruction to a lay-person on how and why to write cucumber scenarios for search feedback. - the four different types of indexing / search result testing. - more on those four approaches - how I put our call number searching requirements into cucumber tests and was able to tweak the field analysis to meet the requirements - Naomi
[CODE4LIB] a Solr search recall problem you probably don't even know you're having
(sorry for cross postings - I think this is important information to disseminate) Executive Summary: you probably need to increase your query slop. A lot. We recently had a feedback ticket that a title search with a hyphen wasn't working properly. This is especially curious because we solved a bunch of problems with hyphen searching AND WROTE TESTS in the process, and all the existing hyphen tests pass. Tests like hyphens with no spaces before or after, 3 significant terms, 2 stopwords pass. Our metadata contains: record A with title: Red-rose chain. record B with title: Prisoner in a red-rose chain. A title search: prisoner in a red-rose chain returns no results Further exploration (the following are all title searches): red-rose chain == record A only red rose chain == record A only red rose chain == record A only red-rose chain == record A only red rose chain == records A and B red rose chain == records A and B (!!) For more details and more about the solution, see http://discovery-grindstone.blogspot.com/2010/11/solr-and-hyphenated-words.html - Naomi Dushay Senior Developer Stanford University Libraries
Re: [CODE4LIB] a Solr search recall problem you probably don't even know you're having
Robert, Thanks! I've been using Solr 1.5 from trunk back in March - time to upgrade! I also like the put the stopword filter after the WDF filter fix. - Naomi On Nov 5, 2010, at 12:36 PM, Robert Muir wrote: On Fri, Nov 5, 2010 at 3:04 PM, Naomi Dushay ndus...@stanford.edu wrote: (sorry for cross postings - I think this is important information to disseminate) Executive Summary: you probably need to increase your query slop. A lot. I looked at your example, and it really looks a lot like https://issues.apache.org/jira/browse/SOLR-1852 This was fixed, and released in Solr 1.4.1... and of course from the upgrading notes: However, a reindex is needed for some of the analysis fixes to take effect. Your example Prisoner in a red-rose chain in Solr 1.4.1 no longer has the positions 1,4,7,8, but instead 1,4,5,6. I recommend upgrading to this bugfix release and re-indexing if you are having problems like this
[CODE4LIB] (LC) call number searching in Solr
I recently set up a testing framework allowing me to twiddle Solr knobs until I met acceptance criteria for LC call number searching. I came up with two Solr field types that worked for my criteria. You can read all about it here: http://discovery-grindstone.blogspot.com/2010/10/lc-call-number-searching-in-solr.html - Naomi
[CODE4LIB] testing testing testing - Solr indexing software
I just finished a bunch of blog posts about the sorts of tests to write for Solr indexing software. Comments are welcome. Try not to drool when you fall asleep on your keyboard. Start with this one: http://discovery-grindstone.blogspot.com/2010/10/testing-solr-indexing-software.html - Naomi
[CODE4LIB] marc OSS coding efforts
Bess Sadler put together a wiki page on the marc OSS efforts: http://wiki.code4lib.org/index.php/Working_with_MaRC Please add other relevant projects! I am also organizing some conference calls for the committers of these efforts to promote community knowledge, participation and use of these coding nuggets. Please let me know if you work on Marc manipulation OSS and would like to be included in these calls. They are currently scheduled every 2 weeks, but it is possible the calls will morph into a solrmarc project call. Thanks, - Naomi
Re: [CODE4LIB] Choosing development platforms and/or tools, how'd you do it?
Marijane, It also makes sense to examine the available software for what you wish to accomplish. Available software goes beyond current features to - maintainability (one reason Stanford switched to Blacklight) I'll talk a little bit about this in our Code4Lib 2010 presentation about testing. - community - active development - potential applicability to additional projects. (we like Blacklight for its ability to run on any solr index, regardless of what's in there) probably some other stuff I've left out. Our experience at Stanford Libraries is that the common conventions of Rails give us a lot more ease in reading each others' code. - Naomi On Jan 5, 2010, at 3:04 PM, marijane white wrote: Greetings Code4Lib, Long time lurker, first time poster here. I've been turning over this question in my mind for a few weeks now, and Joe Hourcle's postscript in the Online PHP Course thread has prompted me to finally try to ask it. =) I'm interested in hearing how the members of this list have gone about choosing development platforms for their library coding projects and/ or existing open source projects (ie like VuFind vs Blacklight). For example, did you choose a language you already were familiar with? One you wanted to learn more about? Does your workplace have a standard enterprise architecture/platform that you are required to use? If you have chosen to implement an existing open source project, did you choose based on the development platform or project maturity and features or something else? Some background -- thanks to my undergraduate computer engineering studies, I have a pretty solid understanding of programming fundamentals, but most of my pre-LIS work experience was in software testing and did not require me to employ much of what I learned programming-wise, so I've mostly dabbled over the last decade or so. I've got a bit of experience with a bunch of languages and I'm not married to any of them. I also kind of like having excuses to learn new ones. My situation is this: I would like to eventually implement a discovery tool at MPOW, but I am having a hell of a time choosing one. I'm a solo librarian on a content team at a software and information services company, so I'm not really tied to the platforms used by the software engineering teams here. I know a bit of Ruby, so I've played with Blacklight some, got it to install on Windows and managed to import a really rough Solr index. I'm more attracted to the features in VuFind, but I don't know much PHP yet and I haven't gotten it installed successfully yet. My collection's metadata is not in an ILS (yet) and not in MARC, so I've also considered trying out more generic approaches like ajax-solr (though I don't know a lot of javascript yet, either). I've also given a cursory look at SOPAC and Scriblio. My options are wide open, and I'm having a rough time deciding what direction to go in. I guess it's kind of similar to someone who is new to programming and attempting to choose their first language to learn. I will attempt to head off a programming language religious war =) by stating that I'm not really interested in the virtues of one platform over another, moreso the abstract reasons one might have for selecting one. Have any of you ever been in a similar situation? How'd you get yourself unstuck? If you haven't, what do you think you might do in a situation like mine? -marijane
Re: [CODE4LIB] Choosing development platforms and/or tools, how'd you do it?
Marijane, Yes, I would encourage you to ask for help on the blacklight list, with specifics about the problems you're having. We've set up Blacklight on a bunch of non-Marc Solr indexes here. - Naomi On Jan 6, 2010, at 1:32 PM, marijane white wrote: I've read about Blacklight's ability to run on any Solr index, but I've struggled to make it work with mine. Honestly, I've been left with the impression that my data should be in MARC if I want to use it. Is there some documentation on this somewhere that I've overlooked? (Maybe I should ask this on the BL list) On Wed, Jan 6, 2010 at 12:24 PM, Naomi Dushay ndus...@stanford.edu wrote: Marijane, It also makes sense to examine the available software for what you wish to accomplish. Available software goes beyond current features to - maintainability (one reason Stanford switched to Blacklight) I'll talk a little bit about this in our Code4Lib 2010 presentation about testing. - community - active development - potential applicability to additional projects. (we like Blacklight for its ability to run on any solr index, regardless of what's in there) probably some other stuff I've left out. Our experience at Stanford Libraries is that the common conventions of Rails give us a lot more ease in reading each others' code. - Naomi On Jan 5, 2010, at 3:04 PM, marijane white wrote: Greetings Code4Lib, Long time lurker, first time poster here. I've been turning over this question in my mind for a few weeks now, and Joe Hourcle's postscript in the Online PHP Course thread has prompted me to finally try to ask it. =) I'm interested in hearing how the members of this list have gone about choosing development platforms for their library coding projects and/or existing open source projects (ie like VuFind vs Blacklight). For example, did you choose a language you already were familiar with? One you wanted to learn more about? Does your workplace have a standard enterprise architecture/platform that you are required to use? If you have chosen to implement an existing open source project, did you choose based on the development platform or project maturity and features or something else? Some background -- thanks to my undergraduate computer engineering studies, I have a pretty solid understanding of programming fundamentals, but most of my pre-LIS work experience was in software testing and did not require me to employ much of what I learned programming-wise, so I've mostly dabbled over the last decade or so. I've got a bit of experience with a bunch of languages and I'm not married to any of them. I also kind of like having excuses to learn new ones. My situation is this: I would like to eventually implement a discovery tool at MPOW, but I am having a hell of a time choosing one. I'm a solo librarian on a content team at a software and information services company, so I'm not really tied to the platforms used by the software engineering teams here. I know a bit of Ruby, so I've played with Blacklight some, got it to install on Windows and managed to import a really rough Solr index. I'm more attracted to the features in VuFind, but I don't know much PHP yet and I haven't gotten it installed successfully yet. My collection's metadata is not in an ILS (yet) and not in MARC, so I've also considered trying out more generic approaches like ajax-solr (though I don't know a lot of javascript yet, either). I've also given a cursory look at SOPAC and Scriblio. My options are wide open, and I'm having a rough time deciding what direction to go in. I guess it's kind of similar to someone who is new to programming and attempting to choose their first language to learn. I will attempt to head off a programming language religious war =) by stating that I'm not really interested in the virtues of one platform over another, moreso the abstract reasons one might have for selecting one. Have any of you ever been in a similar situation? How'd you get yourself unstuck? If you haven't, what do you think you might do in a situation like mine? -marijane
Re: [CODE4LIB] preconference proposals - solr
On Nov 13, 2009, at 8:47 AM, Erik Hatcher wrote: +1, Bess! I'm especially psyched for the kata demonstrations and sparring matches we'll have at the end of the session :) I'll tinker with the advanced session description a bit when I can, but let's run with that for the time being. I'm happy to have Noami join me however she likes. I'll be the eye candy! Erik On Nov 13, 2009, at 11:25 AM, Bess Sadler wrote: Hey, how about this? I've been discussing this off list with Erik and Naomi and this is what we came up with (I also added it to the wiki): This is a proposal for several pre-conference sessions that would fit together nicely for people interested in implementing a next- gen catalog system. 1. Morning session - solr white belt Instructor: Bess Sadler (anyone else want to join me?) The journey of solr mastery begins with installation. We will then proceed to data types, indexing, querying, and inner harmony. You will leave this session with enough information to start running a solr service with your own data. 2. Morning session - solr black belt Instructors: Erik Hatcher (and Naomi Dushay? she has offered to help, if that's of interest) Amaze your friends with your ability to combine boolean and weighted searching. Confound your enemies with your mastery of the secrets of dismax. Leave slow queries in the dust as you performance tune solr within an inch of its life. [We should probably add more specific advanced topics here... suggestions welcome] 3. Afternoon session - Blacklight Instructors: Naomi Dushay, Jessie Keck, and Bess Sadler Apply your solr skills to running Blacklight as a front end for your library catalog, institutional repository, or anything you can index into solr. We'll cover installation, source control with git, local modifications, test driving development, and writing object- specific behaviors. You'll leave this workshop ready to revolutionize discovery at your library. Solr white belts or black belts are welcome. And then anyone else who had a topic that built on solr (e.g., vufind?) could add it in the afternoon. Obviously I'm biased, but I really do think the topic of implementing a next gen catalog is meaty enough for a half day and I know people are asking me about it and eager to attend such a thing. What do you think, folks? Bess On 12-Nov-09, at 4:10 PM, Gabriel Farrell wrote: On Tue, Nov 10, 2009 at 02:47:42PM +, Jodi Schneider wrote: If you'd be up for it Erik, I'd envision a basic session in the morning. Some of us (like me) have never gotten Solr up and running. Then the afternoon could break off for an advanced session. Though I like Bess's idea, too! Would that be suitable for a conference breakout? Not sure I'd want to pit it against Solr advanced session! The preconfs should be as inclusive as possible, but I'm wondering if the Solr session might be more beneficial if we dive into the particulars right off the bat in the morning. There are only a few steps to get Solr up and running -- it's in the configuration for our custom needs that the advice of a certain Mr. Hatcher can really be helpful. You're right, though, that the NGC thing sounds more like a BOF session. I'd support that in order to attend a full preconf day of Solr. Gabriel Elizabeth (Bess) Sadler Chief Architect for the Online Library Environment Box 400129 Alderman Library University of Virginia Charlottesville, VA 22904 b...@virginia.edu (434) 243-2305
Re: [CODE4LIB] preconference proposals
yes, tuning! - NaomI On Nov 10, 2009, at 6:43 AM, Kevin S. Clarke wrote: On Tue, Nov 10, 2009 at 8:38 AM, Erik Hatcher erikhatc...@mac.com wrote: I could be game for a half day session. It could be either an introductory Solr class, get up and running with Solr (+ Blacklight, of course). Or maybe a more advanced session on topics like leveraging dismax, Solr performance and scalability tuning, and so on, or maybe a freer form Solr hackathon session where I'd be there to help with hurdles or answer questions. Thoughts? Suggestions? I think that'd be great. I'd be more interested in a more advanced session personally (dismax, tuning, etc.) Thanks! Kevin
Re: [CODE4LIB] preconference proposals
What do you think about the Solr part having some specific goodies like: lots on dismax magic how to do fielded searching (author/title/subject) with dismax how to do browsing (termsComponent query, then fielded query to get matching docs) how to do boolean (use lucene QP, or fake it with dismax) - Naomi On Nov 10, 2009, at 5:38 AM, Erik Hatcher wrote: I'm interested presenting something Solr+library related at c4l10. I'm soliciting ideas from the community on what angle makes the most sense. At first I was thinking a regular conference talk proposal, but perhaps a preconference session would be better. I could be game for a half day session. It could be either an introductory Solr class, get up and running with Solr (+ Blacklight, of course). Or maybe a more advanced session on topics like leveraging dismax, Solr performance and scalability tuning, and so on, or maybe a freer form Solr hackathon session where I'd be there to help with hurdles or answer questions. Thoughts? Suggestions? Anything I can do to help the library world with Solr is fair game - let me know. Thanks, Erik On Nov 9, 2009, at 9:55 PM, Kevin S. Clarke wrote: Hi all, It's time again to collect proposals for Code4Lib 2010 preconference sessions. We have space for six full day sessions (or 12 half day sessions (or some combination of the two)). If we get more than we can accommodate, we'll vote... but I don't think we will (take that as a challenge to propose lots of interesting preconference sessions). Like last year, attendees will pay $12.50 for a half day or $25 for the whole day. The preconference space will be in the hotel so we'll have wireless available. If you have a preconference idea, send it to this list, to me, or to the code4libcon planning list. We'll put them up on the wiki once we start receiving them. Some possible ideas? A Drupal in libraries session? LOD part two? An OCLC webservices hackathon? Send the proposals along... Thanks, Kevin
[CODE4LIB] Blacklight release 2.4 is here
Release 2.4 of Project Blacklight is now available in our new Git flavor! You can find the new improved flavor of Blacklight at http://github.com/projectblacklight/blacklight/tree/v2.4.0 In addition to our move to Git, we have listened to community feedback and have changed the installation process. Instructions for installation are at http://github.com/projectblacklight/blacklight/blob/v2.4.0/README.rdoc . In broad terms, Blacklight now uses a template to get required gems at installation time rather than bundling them in with the code. Besides our debut in Git and the move to a template, here are the changes for release 2.4: Release Notes - Blacklight Plugin - Version 2.4 Bug [CODEBASE-54] - rake gems:install does not work (using template now) [CODEBASE-111] - Ae and Oe ligature characters are not normalized correctly [CODEBASE-131] - Getting error from rails on startup that VERSION is already defined [CODEBASE-134] - Authlogic error [CODEBASE-135] - Fall back on net_http when curb gem is not present when using RSolr [CODEBASE-138] - A copy of ApplicationController has been removed from the module tree but is still active [CODEBASE-160] - why isn't the email and SMS working on demo.projectblacklight.org [CODEBASE-170] - Blacklight logo cannot be over-ridden [CODEBASE-178] - 3 specs fail when run with rake solr:spec ... no idea why [CODEBASE-187] - bookmarking seems to be broken in the latest code Improvement [CODEBASE-87] - Gracefully handle solr errors [CODEBASE-172] - demo - solr config - only build spell dictionaries on optimize, not on newSearcher / firstSearcher New Feature [CODEBASE-3] - exporting to Zotero [CODEBASE-109] - sort by pub date in demo [CODEBASE-182] - Rails Template installer instead of ./script/plugin [CODEBASE-183] - Add cursor focus to the search box on the home page [CODEBASE-190] - Cursor focus in search form on home page Task [CODEBASE-51] - Design a basic advanced search UI - see Stanford SearchWorks [CODEBASE-70] - Need a plugin release as well [CODEBASE-114] - demo index should have vernacular displayed [CODEBASE-146] - Change stylesheet link in the HTML to media=all [CODEBASE-151] - get some dublin core test data [CODEBASE-159] - get test data with call numbers [CODEBASE-173] - marc_mapper.rb - no longer in synch with solrmarc; its presence is confusing. [CODEBASE-176] - get continuous integration working again [CODEBASE-177] - update demo app and readme at projectblacklight.org [CODEBASE-186] - Implement Google Analytics on the main blacklightopac.org site
[CODE4LIB] de-dupping (was: marc4j 2.4 released)
I've wondered if standard number matching (ISBN, LCCN, OCLC, ISSN ...) would be a big piece. Isn't there such a service from OCLC, and another flavor of something-or-other from LibraryThing? - Naomi On Oct 20, 2008, at 12:21 PM, Jonathan Rochkind wrote: To me, de-duplication means throwing out some records as duplicates. Are we talking about that, or are we talking about what I call work set grouping and others (erroneously in my opinion) call FRBRization? If the latter, I don't think there is any mature open source software that addresses that yet. Or for that matter, any proprietary for-purchase software that you could use as a component in your own tools. Various proprietary software includes a work set grouping feature in it's black box (AquaBrowser, Primo, I believe the VTLS ILS). But I don't know of anything available to do it for you in your own tool. I've been just starting to give some thought to how to accomplish this, and it's a bit of a tricky problem on several grounds, including computationally (doing it in a way that performs efficiently). One choice is whether you group records at the indexing stage, or on-demand at the retrieval stage. Both have performance implications--we really don't want to slow down retrieval OR indexing. Usually if you have the choice, you put the slow down at indexing since it only happens once in abstract theory. But in fact, with what we do, when indexing that's already been optmized and does not have this feature can take hours or even days with some of our corpuses, and when in fact we do re-index from time to time (including 'incremental' addition to the index of new and changed records)---we really don't want to slow down indexing either. Jonathan Bess Sadler wrote: Hi, Mike. I don't know of any off-the-shelf software that does de-duplication of the kind you're describing, but it would be pretty useful. That would be awesome if someone wanted to build something like that into marc4j. Has anyone published any good algorithms for de- duping? As I understand it, if you have two records that are 100% identical except for holdings information, that's pretty easy. It gets harder when one record is more complete than the other, and very hard when one record has even slightly different information than the other, to tell whether they are the same record and decide whose information to privilege. Are there any good de-duping guidelines out there? When a library contracts out the de-duping of their catalog, what kind of specific guidelines are they expected to provide? Anyone know? I remember the open library folks were very interested in this question. Any open library folks on this list? Did that effort to de-dupe all those contributed marc records ever go anywhere? Bess On Oct 20, 2008, at 1:12 PM, Michael Beccaria wrote: Very cool! I noticed that a feature, MarcDirStreamReader, is capable of iterating over all marc record files in a given directory. Does anyone know of any de-duplicating efforts done with marc4j? For example, libraries that have similar holdings would have their records merged into one record with a location tag somewhere. I know places do it (consortia etc.) but I haven't been able to find a good open program that handles stuff like that. Mike Beccaria Systems Librarian Head of Digital Initiatives Paul Smith's College 518.327.6376 [EMAIL PROTECTED] -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu Naomi Dushay [EMAIL PROTECTED]
Re: [CODE4LIB] Open Source Discovery Portal Camp - November 6 - Philadelphia
I couldn't find anything for Thurs night, but I did find some BBs for Wed night. http://www.bedandbreakfast.com/philadelphia-pennsylvania.html A friend told me he saw, on travelocity: Comfort Inn Downtown. It is on the Delaware River (which unfortunately is the wrong river for your conference), but it doesn't look too far from the subway station, so you could commute to palinet via subway. - Naomi On Oct 7, 2008, at 3:55 PM, Lovins, Daniel wrote: Wow. I just checked a bunch of hotels, and couldn't find anything available for Nov. 5th. I guess I'll try to catch an early morning train from New Haven. If anyone finds a hotel with vacancies, though, let me know. / Daniel -Original Message- From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Andrew Nagy Sent: Tuesday, October 07, 2008 1:26 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Open Source Discovery Portal Camp - November 6 - Philadelphia I updated the wiki for the conference with a link of nearby hotels that are suggested by PALINET. Here is the link: http://www.palinet.org/ourorg_directions_hotels.aspx Andrew -Original Message- From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Eric Lease Morgan Sent: Tuesday, October 07, 2008 12:34 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Open Source Discovery Portal Camp - November 6 - Philadelphia It looks as if the University of Pennsylvania is having an event on or around the same time as the VUFind event, and that is why things are filling/full up. FYI. I believe it is better make reservations sooner rather than later. -- ELM Naomi Dushay [EMAIL PROTECTED]
Re: [CODE4LIB] Open Source Discovery Portal Camp - November 6 - Philadelphia
Doing a quick Google search, what do folks think about the Sheraton? (I haven't checked for availability) http://www.philadelphiasheraton.com/ Or can someone more knowledgeable give us a steer? - Naomi On Oct 6, 2008, at 11:08 AM, Eric Lease Morgan wrote: On Oct 2, 2008, at 10:40 AM, Andrew Nagy wrote: Implementing or hacking an Open Source discovery system such as VuFind or Blacklight? Interested in learning more about Lucene/Solr applications?... http://opensourcediscovery.pbwiki.com Andrew, where do you suggest people stay over night when they come to the Portal Camp? What hotel? -- Eric Lease Morgan University of Notre Dame Naomi Dushay [EMAIL PROTECTED]
Re: [CODE4LIB] [VuFind-General] Open Source Discovery Portal Camp - November 6 - Philadelphia
More potential topics, some present on the VuFind roadmap (http://vufind.org/roadmap.php ) : identifying items new to the collection for RSS feeds federated search virtual shelf list De-dupping usage data - Naomi On Oct 2, 2008, at 7:40 AM, Andrew Nagy wrote: Implementing or hacking an Open Source discovery system such as VuFind or Blacklight? Interested in learning more about Lucene/Solr applications? Join the development teams from VuFind and Blacklight at PALINET in Philadelphia, November 6, 2008, for day of discussion and sharing. We hope to examine difficult issues in developing discovery systems, such as: * ILS Connectivity * Authority Control * Data Importing * User Interface Issues Date and time: November 6, 2008, 9:00am to 4:00pm Registration Fee: $40 for PALINET members and $50 for PALINET non- members. For more information and how to register, visit our conference wiki: http://opensourcediscovery.pbwiki.com - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ ___ VuFind-General mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/vufind-general Naomi Dushay [EMAIL PROTECTED]
[CODE4LIB] yet more possible topics for OpenSourceDiscovery
Serials holdings Series issues? pooling usage stats for better recommender services Naomi Dushay [EMAIL PROTECTED]
Re: [CODE4LIB] creating call number browse
, that it allows a variety of sorting methods - although it is still limited. I think there are perhaps some other factors as well. Shelf-browsing allows users to wander into 'their' part of the library and look at stuff - but I don't think most OPACs have the equivalent. With a bookstore (physically and virtually) we might see genre sections we can browse. This might also work for public libraries? In research libraries we tend to just present the classification without further glossing I think - perhaps this is something we ought to consider online? The other thing that occurs to me about browsing by class mark is that it presents a 'spectrum' view of a kind. This could be easily lost in the type of 'search and sort' system you suggest (although I still think this is a good idea btw). At the same time I'm a bit reluctant to stop at providing a classification browse, as it seems inherently limited. I agree with the point about browsing the shelves and exploring the material in more depth are related - which suggests integration with other content-rich services are needed (Google Books, e-books, other providers) Owen Stephens Assistant Director: eStrategy and Information Resources Central Library Imperial College London South Kensington Campus London SW7 2AZ t: +44 (0)20 7594 8829 e: [EMAIL PROTECTED] -Original Message- From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Keith Jenkins Sent: 01 October 2008 13:22 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] creating call number browse I think that one advantage of browsing a physical shelf is that the shelf is linear, so it's very easy to methodically browse from the left end of the shelf to the right, and have a sense that you haven't accidentally missed anything. (Ignore, for the moment, all the books that happen to be checked out and not on the shelf...) Online, linearity is no longer a constraint, which is a very good thing, but it does have some drawbacks as well. There is usually no clear way to follow a series of more like this links and get a sense that you have seen all the books that the library has on a given subject. Yes, you might get lucky and discover some great things, but it usually involves a lot of aimless wandering, coming back to the same highly-related items again and again, while missing some slightly-more-distantly-related items. Ideally, the user should be able to run a query, retrieve a set of items, sort them however he wants (by author, date, call number, some kind of dynamic clustering algorithm, whatever), and be able to methodically browse from one end of that sort order to the other without any fear of missing something. Keith On Tue, Sep 30, 2008 at 6:08 PM, Stephens, Owen [EMAIL PROTECTED] wrote: I think we need to understand the way people use browse to navigate resources if we are to successfully bring the concept of collection browsing to our navigation tools. David suggests that we should think of a shelf browse as a type of 'show me more like this' which is definitely one reason to browse - but is it the only reason? Naomi Dushay [EMAIL PROTECTED]
[CODE4LIB] a teeny bit of MARC history
MARC is a very annoying data format, no question. And it's true that when it was designed, catalog cards were still state of the art. From a teensy bit of searching on the 'net: the MARC pilot project final report was published in 1968. (http://www.eric.ed.gov/ERICWebPortal/custom/portlets/recordDetails/detailmini.jsp?_nfpb=true_ERICExtSearch_SearchValue_0=ED029663ERICExtSearch_SearchType_0=noaccno=ED029663 ). It was apparently designed to work well on tapes (as a backup medium, and for data transfer). It predates relational databases. It was at least timely in the sense that it was pretty much universally adopted, at least in USA/Canada, as far as I know. On Jun 26, 2008, at 5:46 AM, Eric Lease Morgan wrote: On Jun 25, 2008, at 7:27 PM, Hahn, Harvey wrote: I appreciate that MARC is really a data structure. Leader. Directory. Data. Thus using alpha characters for field names is legitimate. This demonstrates the flexibility of MARC as a data structure. Considering the environment when it was designed, it is a marvelous beast. Sequential in nature to accommodating tape. Complete with redundant error-checking devices with the leader, the directory, and end-of-field, -subfield, and -record characters. Exploits the existing character set. It is nice that fields do not have to be in any particular order. It is nice that specific characters as specific position have specific meanings. For the time, MARC exploited the existing environment to the fullest. Applause! A computer science historian, if there ever will be such a thing, would have a field day with MARC. But now-a-days, these things are just weird. A novelty. I'm getting tired of it. Worse, many of us in Library Land confuse MARC as a data structure with bibliographic description. We mix presentation and content and think we are doing MARC. Moreover, I don't appreciate ILS vendors who extend and enhance the standard making it difficult to use standard tools against their data. This just makes my work unnecessarily difficult. Why do we tolerate such things? I won't even get into the fact that MARC was designed to enable the printing of catalog cards and the profession has gone on to use it (poorly) in so many other ways. If we in Library Land really want to live and work in an Internet environment, then we have some serious evolution to go through! The way we encode and make available our data is just one example. I feel like a dinosaur. Whew! -- Eric Lease Morgan University of Notre Dam Naomi Dushay [EMAIL PROTECTED]