Re: [CODE4LIB] anti-harassment policy for code4lib?

2012-11-27 Thread Naomi Dushay
bess++
giarlo++
matienzo++
tennant++
all who have agreed to volunteer++

I think there are plenty of volunteers, so I'll gladly defer to others.  (If 
you do need more, you know where to find me.)   I trust you guys to make it 
sensible, not too formal, blah blah.   As for signing personal names -- I hate 
that we have such a litigious society, but we do.  I would certainly sign my 
support for a motion, but I would not want any of us to be individually 
responsible in a legal sense for some else's behavior.   So please be careful!

I'm pondering if a code of conduct (the positive things we want) would be a 
nice counterpart to explicitly stating what we don't condone (anti-harrassment 
policy).  

It should be low barrier and low risk for individuals to tell us/someone 
when they feel uncomfortable.   Hopefully with enough detail to allow for 
remediation/change.

Lastly, I'd like to hang on to the sense that an individual who has been called 
out in a transgression has an opportunity to make amends, to avoid future 
incidents and to remain in the community.  I commit so many social blunders 
that it scares me to think I could be excluded from this great community from 
an unintentional consequence of a poorly filtered action. 

- Naomi
who is understanding why legal code gets so frickin' complicated!

On Nov 26, 2012, at 4:47 PM, Michael J. Giarlo wrote:

 Hi Kyle,
 
 IMO, this is less an instrument to keep people playing nice and more an
 instrument to point to in the event that we have to take action against an
 offender.
 
 -Mike
 
 
 
 On Mon, Nov 26, 2012 at 7:42 PM, Kyle Banerjee kyle.baner...@gmail.comwrote:
 
 On Mon, Nov 26, 2012 at 4:15 PM, Jon Stroop jstr...@princeton.edu wrote:
 
 It's sad that we have to address this formally (as formal as c4l gets
 anyway), but that's reality, so yes, bess++ indeed, and mjgiarlo++,
 anarchivist++ for the quick assist.
 
 
 This.
 
 
 To that end, and as a show of (positive) force--not to mention how cool
 our community is--I think it might be neat if we could find a way to make
 whatever winds up being drafted something we can sign; i.e. attach our
 personal names
 
 
 Diversity and inclusiveness is a state of mind, and our individual and
 collective actions exert that force than any policy or pledge ever could.
 
 I'm hoping that things can be handled with the minimum formality necessary
 and that if something needs to be fixed, people can just talk about it so
 things can be made right. If we need a policy, I'm all for it. But it's
 truly a sad day if policy rather than just being motivated to do the right
 thing is what's keeping people playing nice.
 
 kyle
 


Re: [CODE4LIB] regexp for LCC?

2011-03-31 Thread Naomi Dushay
You could also try to use the code I put in SolrMarc utilities classes  
ha ha ha.


- Naomi

On Mar 31, 2011, at 10:25 AM, Keith Jenkins wrote:


The Google Code regex looks like it will accept any 1-3 letters at the
start of the call number.  But LCC has no I, O, W, X, or Y
classifications.

So you might want to use something more like ^[A-HJ-NP-VZ] at the
start of the regex.

Also, there are only a few major classifications that use three
letters.  Like DJK, and several in the Ks.  I'm not sure, but there
might be others.

Keith


On Thu, Mar 31, 2011 at 1:11 PM, Jonathan Rochkind  
rochk...@jhu.edu wrote:
Except now I wonder if those annoying MLCS call numbers might  
actually be
properly MATCHED by this regex, when I need em excluded. They are  
annoying

_similar_ to a classified call number. Well, one way to find out.

And the reason this matters is to try and use an LCC to map to a
'discipline' or other broad category, either directly from the LCC  
schedule

labels, or using a mapping like umich's:
http://www.lib.umich.edu/browse/categories/

But if it's not really an LCC at all, and you try to map it, you'll  
get bad

postings.

On 3/31/2011 1:03 PM, Jonathan Rochkind wrote:


Thanks, that looks good!

It's hosted on Google Code, but I don't think that code is anything
Google uses, it looks like it's from our very own Bill Dueber.

On 3/31/2011 12:38 PM, Tod Olson wrote:


Check the regexp that Google uses in their call number  
normalization:


   http://code.google.com/p/library-callnumber-lc/wiki/Home

You may want to remove the prefix part, and allow for a fourth  
cutter.


The folks at UNC pointed me to this a few months ago.

-Tod

On Mar 31, 2011, at 11:29 AM, Jonathan Rochkind wrote:

Does anyone have a good regular expression that will match all  
legal LC
Call Numbers from the LC Classified Schedule, but will generally  
not
match things that could not possibly be an LC Call Number from  
the LC

Classified Schedule?

In particular, I need it to NOT match an MLC call number,  
which is an

LC assigned call number that shows up in an 050 with no way to
distinguish based on indicators, but isn't actually from the LC
Schedules.  Here's an example of an MLC call number:

MLCS 83/5180 (P)

Hmm, maybe all MLC call numbers begin with MLC, okay I guess I can
exclude them just like that. But it looks like there are also  
OTHER

things that can show up in the 050 but aren't actually from the
classified schedule, the OCLC documentation even contains an  
example of

Microfilm 19072 E.

What a mess, huh?  So, yeah, regex anyone?

[You can probably guess why I care if it's from the LC Classified
Schedule or not].


Tod Olsont...@uchicago.edu
Systems Librarian
University of Chicago Library





Re: [CODE4LIB] A to Z lists

2011-02-16 Thread Naomi Dushay

if you put the info in a Solr index, you could use Blacklight on top.

On Feb 16, 2011, at 1:18 PM, Michele DeSilva wrote:


Hi Code4Lib-ers,

I want to chime in and say that I, too, enjoyed the streaming  
archive from the conference.


I also have a question: my library has a horribly antiquated A to Z  
list of databases and online resources (it's based in Access). We'd  
like to do something that looks more modern and is far more user  
friendly. I found a great article in the Code4Lib journal (issue 12,  
by Danielle Rosenthal  Mario Bernado) about building a searchable A  
to Z list using Drupal. I'm also wondering what other institutions  
have done as far as in-house solutions. I know there're products we  
could buy, but, like everyone else, we don't have much money at the  
moment.


Thanks for any info or advice!

Michele DeSilva
Central Oregon Community College Library
Emerging Technologies Librarian
541-383-7565
mdesi...@cocc.edu


[CODE4LIB] links for relevancy testing talk

2011-02-10 Thread Naomi Dushay
What I should have said at my talk:  this approach to relevancy  
testing leaves a lot of room for improvement.   What else is out there?



My slides, as a pdf:
http://www.stanford.edu/~ndushay/code4lib2011/code4lib2011-dushay-relevancy-testing.pdf


Additional documents:   http://www.stanford.edu/~ndushay/code4lib2011/


My blog:   http://discovery-grindstone.blogspot.com/

- instruction to a lay-person on how and why to write cucumber  
scenarios for search feedback.

- the four different types of indexing / search result testing.
- more on those four approaches
- how I put our call number searching requirements into cucumber tests  
and was able to tweak the field analysis to meet the requirements



- Naomi


[CODE4LIB] a Solr search recall problem you probably don't even know you're having

2010-11-05 Thread Naomi Dushay
(sorry for cross postings - I think this is important information to  
disseminate)


Executive Summary:  you probably need to increase your query slop.  A  
lot.



We recently had a feedback ticket that a title search with a hyphen  
wasn't working properly.  This is especially curious because we solved  
a bunch of problems with hyphen searching AND WROTE TESTS in the  
process, and all the existing hyphen tests pass.  Tests like hyphens  
with no spaces before or after, 3 significant terms, 2 stopwords pass.


Our metadata contains:
record A with title:   Red-rose chain.
record B with title:   Prisoner in a red-rose chain.

A title search:  prisoner in a red-rose chain  returns no results

Further exploration (the following are all title searches):
red-rose chain  ==  record A only
red rose chain ==  record A only
red rose chain == record A only
red-rose chain == record A only
red rose chain ==  records A and B
red rose chain ==  records A and B  (!!)

For more details and more about the solution, see  
http://discovery-grindstone.blogspot.com/2010/11/solr-and-hyphenated-words.html

- Naomi Dushay
Senior Developer
Stanford University Libraries
 


Re: [CODE4LIB] a Solr search recall problem you probably don't even know you're having

2010-11-05 Thread Naomi Dushay

Robert,

Thanks!   I've been using Solr 1.5 from trunk back in March - time to  
upgrade!  I also like the put the stopword filter after the WDF  
filter fix.


- Naomi

On Nov 5, 2010, at 12:36 PM, Robert Muir wrote:

On Fri, Nov 5, 2010 at 3:04 PM, Naomi Dushay ndus...@stanford.edu  
wrote:

(sorry for cross postings - I think this is important information to
disseminate)

Executive Summary:  you probably need to increase your query slop.   
A lot.




I looked at your example, and it really looks a lot like
https://issues.apache.org/jira/browse/SOLR-1852

This was fixed, and released in Solr 1.4.1... and of course from the
upgrading notes:
However, a reindex is needed for some of the analysis fixes to take  
effect.


Your example Prisoner in a red-rose chain in Solr 1.4.1 no longer
has the positions 1,4,7,8, but instead 1,4,5,6.

I recommend upgrading to this bugfix release and re-indexing if you
are having problems like this


[CODE4LIB] (LC) call number searching in Solr

2010-10-25 Thread Naomi Dushay
I recently set up a testing framework allowing me to twiddle Solr  
knobs until I met acceptance criteria for LC call number searching.  I  
came up with two Solr field types that worked for my criteria.


You can read all about it here:

http://discovery-grindstone.blogspot.com/2010/10/lc-call-number-searching-in-solr.html

- Naomi


[CODE4LIB] testing testing testing - Solr indexing software

2010-10-25 Thread Naomi Dushay
I just finished a bunch of blog posts about the sorts of tests to  
write for Solr indexing software.  Comments are welcome.  Try not to  
drool when you fall asleep on your keyboard.


Start with this one:

http://discovery-grindstone.blogspot.com/2010/10/testing-solr-indexing-software.html

- Naomi


[CODE4LIB] marc OSS coding efforts

2010-04-07 Thread Naomi Dushay

Bess Sadler put together a wiki page on the marc OSS efforts:
http://wiki.code4lib.org/index.php/Working_with_MaRC
Please add other relevant projects!

I am also organizing some conference calls for the committers of these  
efforts to promote community knowledge, participation and use of these  
coding nuggets.  Please let me know if you work on Marc manipulation  
OSS and would like to be included in these calls.   They are currently  
scheduled every 2 weeks, but it is possible the calls will morph into  
a solrmarc project call.


Thanks,
- Naomi


Re: [CODE4LIB] Choosing development platforms and/or tools, how'd you do it?

2010-01-06 Thread Naomi Dushay

Marijane,

It also makes sense to examine the available software for what you  
wish to accomplish.  Available software goes beyond current features to
- maintainability  (one reason Stanford switched to Blacklight)   I'll  
talk a little bit about this in our Code4Lib 2010 presentation about  
testing.

- community
- active development
- potential applicability to additional projects.   (we like  
Blacklight for its ability to run on any solr index, regardless of  
what's in there)


probably some other stuff I've left out.

Our experience at Stanford Libraries is that the common conventions of  
Rails give us a lot more ease in reading each others' code.


- Naomi

On Jan 5, 2010, at 3:04 PM, marijane white wrote:


Greetings Code4Lib,

Long time lurker, first time poster here.

I've been turning over this question in my mind for a few weeks now,  
and Joe
Hourcle's postscript in the Online PHP Course thread has prompted me  
to

finally try to ask it. =)

I'm interested in hearing how the members of this list have gone about
choosing development platforms for their library coding projects and/ 
or
existing open source projects (ie like VuFind vs Blacklight).  For  
example,
did you choose a language you already were familiar with?  One you  
wanted to

learn more about?  Does your workplace have a standard enterprise
architecture/platform that you are required to use?  If you have  
chosen to

implement an existing open source project, did you choose based on the
development platform or project maturity and features or something  
else?


Some background -- thanks to my undergraduate computer engineering  
studies,
I have a pretty solid understanding of programming fundamentals, but  
most of
my pre-LIS work experience was in software testing and did not  
require me to
employ much of what I learned programming-wise, so I've mostly  
dabbled over

the last decade or so.  I've got a bit of experience with a bunch of
languages and I'm not married to any of them.   I also kind of like  
having

excuses to learn new ones.

My situation is this: I would like to eventually implement a  
discovery tool

at MPOW, but I am having a hell of a time choosing one.  I'm a solo
librarian on a content team at a software and information services  
company,
so I'm not really tied to the platforms used by the software  
engineering
teams here.  I know a bit of Ruby, so I've played with Blacklight  
some, got
it to install on Windows and managed to import a really rough Solr  
index.
I'm more attracted to the features in VuFind, but I don't know much  
PHP yet

and I haven't gotten it installed successfully yet.  My collection's
metadata is not in an ILS (yet) and not in MARC, so I've also  
considered
trying out more generic approaches like ajax-solr (though I don't  
know a lot
of javascript yet, either).  I've also given a cursory look at SOPAC  
and
Scriblio.  My options are wide open, and I'm having a rough time  
deciding
what direction to go in.  I guess it's kind of similar to someone  
who is new

to programming and attempting to choose their first language to learn.

I will attempt to head off a programming language religious war =) by
stating that I'm not really interested in the virtues of one  
platform over

another, moreso the abstract reasons one might have for selecting one.
Have any of you ever been in a similar situation?  How'd you get  
yourself
unstuck?  If you haven't, what do you think you might do in a  
situation like

mine?


-marijane


Re: [CODE4LIB] Choosing development platforms and/or tools, how'd you do it?

2010-01-06 Thread Naomi Dushay

Marijane,

Yes, I would encourage you to ask for help on the blacklight list,  
with specifics about the problems you're having.  We've set up  
Blacklight on a bunch of non-Marc Solr indexes here.


- Naomi

On Jan 6, 2010, at 1:32 PM, marijane white wrote:

I've read about Blacklight's ability to run on any Solr index, but  
I've
struggled to make it work with mine.  Honestly, I've been left with  
the
impression that my data should be in MARC if I want to use it.  Is  
there
some documentation on this somewhere that I've overlooked?  (Maybe I  
should

ask this on the BL list)


On Wed, Jan 6, 2010 at 12:24 PM, Naomi Dushay ndus...@stanford.edu  
wrote:



Marijane,

It also makes sense to examine the available software for what you  
wish to

accomplish.  Available software goes beyond current features to
- maintainability  (one reason Stanford switched to Blacklight)
I'll talk
a little bit about this in our Code4Lib 2010 presentation about  
testing.

- community
- active development
- potential applicability to additional projects.   (we like  
Blacklight for

its ability to run on any solr index, regardless of what's in there)

probably some other stuff I've left out.

Our experience at Stanford Libraries is that the common conventions  
of

Rails give us a lot more ease in reading each others' code.

- Naomi


On Jan 5, 2010, at 3:04 PM, marijane white wrote:

Greetings Code4Lib,


Long time lurker, first time poster here.

I've been turning over this question in my mind for a few weeks  
now, and

Joe
Hourcle's postscript in the Online PHP Course thread has prompted  
me to

finally try to ask it. =)

I'm interested in hearing how the members of this list have gone  
about
choosing development platforms for their library coding projects  
and/or

existing open source projects (ie like VuFind vs Blacklight).  For
example,
did you choose a language you already were familiar with?  One you  
wanted

to
learn more about?  Does your workplace have a standard enterprise
architecture/platform that you are required to use?  If you have  
chosen to
implement an existing open source project, did you choose based on  
the
development platform or project maturity and features or something  
else?


Some background -- thanks to my undergraduate computer engineering
studies,
I have a pretty solid understanding of programming fundamentals,  
but most

of
my pre-LIS work experience was in software testing and did not  
require me

to
employ much of what I learned programming-wise, so I've mostly  
dabbled

over
the last decade or so.  I've got a bit of experience with a bunch of
languages and I'm not married to any of them.   I also kind of  
like having

excuses to learn new ones.

My situation is this: I would like to eventually implement a  
discovery

tool
at MPOW, but I am having a hell of a time choosing one.  I'm a solo
librarian on a content team at a software and information services
company,
so I'm not really tied to the platforms used by the software  
engineering
teams here.  I know a bit of Ruby, so I've played with Blacklight  
some,

got
it to install on Windows and managed to import a really rough Solr  
index.
I'm more attracted to the features in VuFind, but I don't know  
much PHP

yet
and I haven't gotten it installed successfully yet.  My collection's
metadata is not in an ILS (yet) and not in MARC, so I've also  
considered
trying out more generic approaches like ajax-solr (though I don't  
know a

lot
of javascript yet, either).  I've also given a cursory look at  
SOPAC and
Scriblio.  My options are wide open, and I'm having a rough time  
deciding
what direction to go in.  I guess it's kind of similar to someone  
who is

new
to programming and attempting to choose their first language to  
learn.


I will attempt to head off a programming language religious war =)  
by
stating that I'm not really interested in the virtues of one  
platform over
another, moreso the abstract reasons one might have for selecting  
one.
Have any of you ever been in a similar situation?  How'd you get  
yourself
unstuck?  If you haven't, what do you think you might do in a  
situation

like
mine?


-marijane





Re: [CODE4LIB] preconference proposals - solr

2009-11-13 Thread Naomi Dushay

On Nov 13, 2009, at 8:47 AM, Erik Hatcher wrote:

+1, Bess!  I'm especially psyched for the kata demonstrations and  
sparring matches we'll have at the end of the session :)


I'll tinker with the advanced session description a bit when I can,  
but let's run with that for the time being.  I'm happy to have Noami  
join me however she likes.


I'll be the eye candy!



Erik


On Nov 13, 2009, at 11:25 AM, Bess Sadler wrote:

Hey, how about this? I've been discussing this off list with Erik  
and Naomi and this is what we came up with (I also added it to the  
wiki):


This is a proposal for several pre-conference sessions that would  
fit together nicely for people interested in implementing a next- 
gen catalog system.


1. Morning session - solr white belt
Instructor: Bess Sadler (anyone else want to join me?)
The journey of solr mastery begins with installation. We will then  
proceed to data types, indexing, querying, and inner harmony. You  
will leave this session with enough information to start running a  
solr service with your own data.


2. Morning session - solr black belt
Instructors: Erik Hatcher (and Naomi Dushay? she has offered to  
help, if that's of interest)
Amaze your friends with your ability to combine boolean and  
weighted searching. Confound your enemies with your mastery of the  
secrets of dismax. Leave slow queries in the dust as you  
performance tune solr within an inch of its life. [We should  
probably add more specific advanced topics here... suggestions  
welcome]


3. Afternoon session - Blacklight
Instructors: Naomi Dushay, Jessie Keck, and Bess Sadler
Apply your solr skills to running Blacklight as a front end for  
your library catalog, institutional repository, or anything you can  
index into solr. We'll cover installation, source control with git,  
local modifications, test driving development, and writing object- 
specific behaviors. You'll leave this workshop ready to  
revolutionize discovery at your library. Solr white belts or black  
belts are welcome.


And then anyone else who had a topic that built on solr (e.g.,  
vufind?) could add it in the afternoon. Obviously I'm biased, but I  
really do think the topic of implementing a next gen catalog is  
meaty enough for a half day and I know people are asking me about  
it and eager to attend such a thing.


What do you think, folks?

Bess

On 12-Nov-09, at 4:10 PM, Gabriel Farrell wrote:


On Tue, Nov 10, 2009 at 02:47:42PM +, Jodi Schneider wrote:
If you'd be up for it Erik, I'd envision a basic session in the  
morning.

Some of us (like me) have never gotten Solr up and running.

Then the afternoon could break off for an advanced session.

Though I like Bess's idea, too! Would that be suitable for a  
conference
breakout? Not sure I'd want to pit it against Solr advanced  
session!


The preconfs should be as inclusive as possible, but I'm wondering  
if

the Solr session might be more beneficial if we dive into the
particulars right off the bat in the morning.  There are only a few
steps to get Solr up and running -- it's in the configuration for  
our

custom needs that the advice of a certain Mr. Hatcher can really be
helpful.

You're right, though, that the NGC thing sounds more like a BOF  
session.

I'd support that in order to attend a full preconf day of Solr.


Gabriel


Elizabeth (Bess) Sadler
Chief Architect for the Online Library Environment
Box 400129
Alderman Library
University of Virginia
Charlottesville, VA 22904

b...@virginia.edu
(434) 243-2305



Re: [CODE4LIB] preconference proposals

2009-11-11 Thread Naomi Dushay

yes, tuning!  - NaomI

On Nov 10, 2009, at 6:43 AM, Kevin S. Clarke wrote:

On Tue, Nov 10, 2009 at 8:38 AM, Erik Hatcher erikhatc...@mac.com  
wrote:

 I could be game for a half day
session.  It could be either an introductory Solr class, get up and  
running
with Solr (+ Blacklight, of course).  Or maybe a more advanced  
session on
topics like leveraging dismax, Solr performance and scalability  
tuning, and
so on, or maybe a freer form Solr hackathon session where I'd be  
there to

help with hurdles or answer questions.

Thoughts?  Suggestions?


I think that'd be great.  I'd be more interested in a more advanced
session personally (dismax, tuning, etc.)

Thanks!
Kevin


Re: [CODE4LIB] preconference proposals

2009-11-11 Thread Naomi Dushay

What do you think about the Solr part having some specific goodies like:


lots on dismax magic

how to do fielded searching (author/title/subject) with dismax

how to do browsing (termsComponent query, then fielded query to get  
matching docs)


how to do boolean  (use lucene QP, or fake it with dismax)

- Naomi


On Nov 10, 2009, at 5:38 AM, Erik Hatcher wrote:

I'm interested presenting something Solr+library related at c4l10.   
I'm soliciting ideas from the community on what angle makes the most  
sense.  At first I was thinking a regular conference talk proposal,  
but perhaps a preconference session would be better.  I could be  
game for a half day session.  It could be either an introductory  
Solr class, get up and running with Solr (+ Blacklight, of course).   
Or maybe a more advanced session on topics like leveraging dismax,  
Solr performance and scalability tuning, and so on, or maybe a freer  
form Solr hackathon session where I'd be there to help with hurdles  
or answer questions.


Thoughts?  Suggestions?   Anything I can do to help the library  
world with Solr is fair game - let me know.


Thanks,
Erik

On Nov 9, 2009, at 9:55 PM, Kevin S. Clarke wrote:


Hi all,

It's time again to collect proposals for Code4Lib 2010 preconference
sessions.  We have space for six full day sessions (or 12 half day
sessions (or some combination of the two)).  If we get more than we
can accommodate, we'll vote... but I don't think we will (take that  
as

a challenge to propose lots of interesting preconference sessions).
Like last year, attendees will pay $12.50 for a half day or $25 for
the whole day.  The preconference space will be in the hotel so we'll
have wireless available.  If you have a preconference idea, send it  
to
this list, to me, or to the code4libcon planning list.  We'll put  
them

up on the wiki once we start receiving them.  Some possible ideas?  A
Drupal in libraries session? LOD part two?  An OCLC webservices
hackathon?  Send the proposals along...

Thanks,
Kevin


[CODE4LIB] Blacklight release 2.4 is here

2009-11-09 Thread Naomi Dushay
Release 2.4 of Project Blacklight is now available in our new Git  
flavor!  You can find the new improved flavor of  Blacklight at http://github.com/projectblacklight/blacklight/tree/v2.4.0


In addition to our move to Git, we have listened to community feedback  
and have changed the installation process. Instructions for  
installation are at http://github.com/projectblacklight/blacklight/blob/v2.4.0/README.rdoc 
.  In broad terms, Blacklight now uses a template to get required gems  
at installation time rather than bundling them in with the code.


Besides our debut in Git and the move to a template, here are the  
changes for release 2.4:


Release Notes - Blacklight Plugin - Version 2.4

Bug

[CODEBASE-54] - rake gems:install does not work (using template now)
[CODEBASE-111] - Ae and Oe ligature characters are not normalized  
correctly
[CODEBASE-131] - Getting error from rails on startup that VERSION is  
already defined

[CODEBASE-134] - Authlogic error
[CODEBASE-135] - Fall back on net_http when curb gem is not present  
when using RSolr
[CODEBASE-138] - A copy of ApplicationController has been removed from  
the module tree but is still active
[CODEBASE-160] - why isn't the email and SMS working on  
demo.projectblacklight.org

[CODEBASE-170] - Blacklight logo cannot be over-ridden
[CODEBASE-178] - 3 specs fail when run with rake solr:spec ... no idea  
why

[CODEBASE-187] - bookmarking seems to be broken in the latest code
Improvement

[CODEBASE-87] - Gracefully handle solr errors
[CODEBASE-172] - demo - solr config - only build spell dictionaries on  
optimize, not on newSearcher / firstSearcher

New Feature

[CODEBASE-3] - exporting to Zotero
[CODEBASE-109] - sort by pub date in demo
[CODEBASE-182] - Rails Template installer instead of ./script/plugin
[CODEBASE-183] - Add cursor focus to the search box on the home page
[CODEBASE-190] - Cursor focus in search form on home page
Task

[CODEBASE-51] - Design a basic advanced search UI - see Stanford  
SearchWorks

[CODEBASE-70] - Need a plugin release as well
[CODEBASE-114] - demo index should have vernacular displayed
[CODEBASE-146] - Change stylesheet link in the HTML to media=all
[CODEBASE-151] - get some dublin core test data
[CODEBASE-159] - get test data with call numbers
[CODEBASE-173] - marc_mapper.rb - no longer in synch with solrmarc;  
its presence is confusing.

[CODEBASE-176] - get continuous integration working again
[CODEBASE-177] - update demo app and readme at projectblacklight.org
[CODEBASE-186] - Implement Google Analytics on the main  
blacklightopac.org site


[CODE4LIB] de-dupping (was: marc4j 2.4 released)

2008-10-20 Thread Naomi Dushay
I've wondered if standard number matching  (ISBN, LCCN, OCLC,  
ISSN ...) would be a big piece.  Isn't there such a service from OCLC,  
and another flavor of something-or-other from LibraryThing?


- Naomi

On Oct 20, 2008, at 12:21 PM, Jonathan Rochkind wrote:

To me, de-duplication means throwing out some records as  
duplicates. Are we talking about that, or are we talking about what  
I call work set grouping and others (erroneously in my opinion)  
call FRBRization?


If the latter, I don't think there is any mature open source  
software that addresses that yet. Or for that matter, any  
proprietary for-purchase software that you could use as a component  
in your own tools. Various proprietary software includes a work set  
grouping feature in it's black box (AquaBrowser, Primo, I believe  
the VTLS ILS).  But I don't know of anything available to do it for  
you in your own tool.


I've been just starting to give some thought to how to accomplish  
this, and it's a bit of a tricky problem on several grounds,  
including computationally (doing it in a way that performs  
efficiently). One choice is whether you group records at the  
indexing stage, or on-demand at the retrieval stage. Both have  
performance implications--we really don't want to slow down  
retrieval OR indexing.  Usually if you have the choice, you put the  
slow down at indexing since it only happens once in abstract  
theory. But in fact, with what we do, when indexing that's already  
been optmized and does not have this feature can take hours or even  
days with some of our corpuses, and when in fact we do re-index from  
time to time (including 'incremental' addition to the index of new  
and changed records)---we really don't want to slow down indexing  
either.


Jonathan

Bess Sadler wrote:

Hi, Mike.

I don't know of any off-the-shelf software that does de-duplication  
of the kind you're describing, but it would be pretty useful. That  
would be awesome if someone wanted to build something like that  
into marc4j. Has anyone published any good algorithms for de- 
duping? As I understand it, if you have two records that are 100%  
identical except for holdings information, that's pretty easy. It  
gets harder when one record is more complete than the other, and  
very hard when one record has even slightly different information  
than the other, to tell whether they are the same record and decide  
whose information to privilege. Are there any good de-duping  
guidelines out there? When a library contracts out the de-duping of  
their catalog, what kind of specific guidelines are they expected  
to provide? Anyone know?


I remember the open library folks were very interested in this  
question. Any open library folks on this list? Did that effort to  
de-dupe all those contributed marc records ever go anywhere?


Bess

On Oct 20, 2008, at 1:12 PM, Michael Beccaria wrote:

Very cool! I noticed that a feature, MarcDirStreamReader, is  
capable of
iterating over all marc record files in a given directory. Does  
anyone

know of any de-duplicating efforts done with marc4j? For example,
libraries that have similar holdings would have their records merged
into one record with a location tag somewhere. I know places do it
(consortia etc.) but I haven't been able to find a good open program
that handles stuff like that.

Mike Beccaria
Systems Librarian
Head of Digital Initiatives
Paul Smith's College
518.327.6376
[EMAIL PROTECTED]



--
Jonathan Rochkind
Digital Services Software Engineer
The Sheridan Libraries
Johns Hopkins University
410.516.8886 rochkind (at) jhu.edu


Naomi Dushay
[EMAIL PROTECTED]


Re: [CODE4LIB] Open Source Discovery Portal Camp - November 6 - Philadelphia

2008-10-08 Thread Naomi Dushay
I couldn't find anything for Thurs night, but I did find some BBs for  
Wed night.


http://www.bedandbreakfast.com/philadelphia-pennsylvania.html

A friend told me he saw, on travelocity:  Comfort Inn Downtown.  It is  
on the Delaware River (which unfortunately is the wrong river for your  
conference),  but it doesn't look too far from the subway station,  so  
you could commute to palinet via subway.


- Naomi

On Oct 7, 2008, at 3:55 PM, Lovins, Daniel wrote:

Wow. I just checked a bunch of hotels, and couldn't find anything  
available for Nov. 5th. I guess I'll try to catch an early morning  
train from New Haven. If anyone finds a hotel with vacancies,  
though, let me know.


/ Daniel

-Original Message-
From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf  
Of Andrew Nagy

Sent: Tuesday, October 07, 2008 1:26 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Open Source Discovery Portal Camp - November  
6 - Philadelphia


I updated the wiki for the conference with a link of nearby hotels  
that are suggested by PALINET.


Here is the link:
http://www.palinet.org/ourorg_directions_hotels.aspx

Andrew


-Original Message-
From: Code for Libraries [mailto:[EMAIL PROTECTED] On  
Behalf Of

Eric Lease Morgan
Sent: Tuesday, October 07, 2008 12:34 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Open Source Discovery Portal Camp -  
November 6

- Philadelphia

It looks as if the University of Pennsylvania is having an event on  
or

around the same time as the VUFind event, and that is why things are
filling/full up. FYI. I believe it is better make reservations sooner
rather than later.

--
ELM


Naomi Dushay
[EMAIL PROTECTED]


Re: [CODE4LIB] Open Source Discovery Portal Camp - November 6 - Philadelphia

2008-10-06 Thread Naomi Dushay
Doing a quick Google search, what do folks think about the Sheraton?   
(I haven't checked for availability)


http://www.philadelphiasheraton.com/

Or can someone more knowledgeable give us a steer?

- Naomi

On Oct 6, 2008, at 11:08 AM, Eric Lease Morgan wrote:


On Oct 2, 2008, at 10:40 AM, Andrew Nagy wrote:

Implementing or hacking an Open Source discovery system such as  
VuFind or Blacklight?

Interested in learning more about Lucene/Solr applications?...

 http://opensourcediscovery.pbwiki.com




Andrew, where do you suggest people stay over night when they come  
to the Portal Camp? What hotel?


--
Eric Lease Morgan
University of Notre Dame


Naomi Dushay
[EMAIL PROTECTED]


Re: [CODE4LIB] [VuFind-General] Open Source Discovery Portal Camp - November 6 - Philadelphia

2008-10-02 Thread Naomi Dushay
More potential topics, some present on the VuFind roadmap (http://vufind.org/roadmap.php 
) :


identifying items new to the collection for RSS feeds
federated search
virtual shelf list
De-dupping
usage data

- Naomi

On Oct 2, 2008, at 7:40 AM, Andrew Nagy wrote:

Implementing or hacking an Open Source discovery system such as  
VuFind or Blacklight?

Interested in learning more about Lucene/Solr applications?

Join the development teams from VuFind and Blacklight at PALINET in  
Philadelphia, November 6, 2008, for day of discussion and sharing.  
We hope to examine difficult issues in developing discovery systems,  
such as:


   * ILS Connectivity
   * Authority Control
   * Data Importing
   * User Interface Issues

Date and time: November 6, 2008, 9:00am to 4:00pm

Registration Fee: $40 for PALINET members and $50 for PALINET non- 
members.


For more information and how to register, visit our conference wiki:
http://opensourcediscovery.pbwiki.com

-
This SF.Net email is sponsored by the Moblin Your Move Developer's  
challenge
Build the coolest Linux based applications with Moblin SDK  win  
great prizes
Grand prize is a trip for two to an Open Source event anywhere in  
the world

http://moblin-contest.org/redirect.php?banner_id=100url=/
___
VuFind-General mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/vufind-general


Naomi Dushay
[EMAIL PROTECTED]


[CODE4LIB] yet more possible topics for OpenSourceDiscovery

2008-10-02 Thread Naomi Dushay

Serials holdings

Series issues?

pooling usage stats for better recommender services

Naomi Dushay
[EMAIL PROTECTED]


Re: [CODE4LIB] creating call number browse

2008-10-01 Thread Naomi Dushay
, that it allows a variety of sorting methods -  
although it is still limited.


I think there are perhaps some other factors as well. Shelf-browsing  
allows users to wander into 'their' part of the library and look at  
stuff - but I don't think most OPACs have the equivalent. With a  
bookstore (physically and virtually) we might see genre sections we  
can browse. This might also work for public libraries? In research  
libraries we tend to just present the classification without further  
glossing I think - perhaps this is something we ought to consider  
online?


The other thing that occurs to me about browsing by class mark is  
that it presents a 'spectrum' view of a kind. This could be easily  
lost in the type of 'search and sort' system you suggest (although I  
still think this is a good idea btw). At the same time I'm a bit  
reluctant to stop at providing a classification browse, as it seems  
inherently limited.


I agree with the point about browsing the shelves and exploring the  
material in more depth are related - which suggests integration with  
other content-rich services are needed (Google Books, e-books, other  
providers)


Owen Stephens
Assistant Director: eStrategy and Information Resources
Central Library
Imperial College London
South Kensington Campus
London
SW7 2AZ

t: +44 (0)20 7594 8829
e: [EMAIL PROTECTED]


-Original Message-
From: Code for Libraries [mailto:[EMAIL PROTECTED] On  
Behalf Of

Keith Jenkins
Sent: 01 October 2008 13:22
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] creating call number browse

I think that one advantage of browsing a physical shelf is that the
shelf is linear, so it's very easy to methodically browse from the
left end of the shelf to the right, and have a sense that you haven't
accidentally missed anything.  (Ignore, for the moment, all the books
that happen to be checked out and not on the shelf...)

Online, linearity is no longer a constraint, which is a very good
thing, but it does have some drawbacks as well.  There is usually no
clear way to follow a series of more like this links and get a  
sense

that you have seen all the books that the library has on a given
subject.  Yes, you might get lucky and discover some great things,  
but

it usually involves a lot of aimless wandering, coming back to the
same highly-related items again and again, while missing some
slightly-more-distantly-related items.

Ideally, the user should be able to run a query, retrieve a set of
items, sort them however he wants (by author, date, call number, some
kind of dynamic clustering algorithm, whatever), and be able to
methodically browse from one end of that sort order to the other
without any fear of missing something.

Keith


On Tue, Sep 30, 2008 at 6:08 PM, Stephens, Owen
[EMAIL PROTECTED] wrote:

I think we need to understand the
way people use browse to navigate resources if we are to  
successfully

bring

the concept of collection browsing to our navigation tools. David

suggests

that we should think of a shelf browse as a type of 'show me more

like this'
which is definitely one reason to browse - but is it the only  
reason?


Naomi Dushay
[EMAIL PROTECTED]


[CODE4LIB] a teeny bit of MARC history

2008-06-29 Thread Naomi Dushay
MARC is a very annoying data format, no question.  And it's true that  
when it was designed, catalog cards were still state of the art.


From a teensy bit of searching on the 'net:  the MARC pilot project  
final report was published in 1968.
(http://www.eric.ed.gov/ERICWebPortal/custom/portlets/recordDetails/detailmini.jsp?_nfpb=true_ERICExtSearch_SearchValue_0=ED029663ERICExtSearch_SearchType_0=noaccno=ED029663 
).


It was apparently designed to work well on tapes (as a backup medium,  
and for data transfer).   It predates relational databases.  It was at  
least timely in the sense that it was pretty much universally adopted,  
at least in USA/Canada, as far as I know.



On Jun 26, 2008, at 5:46 AM, Eric Lease Morgan wrote:


On Jun 25, 2008, at 7:27 PM, Hahn, Harvey wrote:

I appreciate that MARC is really a data structure. Leader.  
Directory. Data. Thus using alpha characters for field names is  
legitimate. This demonstrates the flexibility of MARC as a data  
structure. Considering the environment when it was designed, it is a  
marvelous beast. Sequential in nature to accommodating tape.  
Complete with redundant error-checking devices with the leader, the  
directory, and end-of-field, -subfield, and -record characters.  
Exploits the existing character set. It is nice that fields do not  
have to be in any particular order. It is nice that specific  
characters as specific position have specific meanings. For the  
time, MARC exploited the existing environment to the fullest.  
Applause! A computer science historian, if there ever will be such  
a thing, would have a field day with MARC.


But now-a-days, these things are just weird. A novelty. I'm getting  
tired of it. Worse, many of us in Library Land confuse MARC as a  
data structure with bibliographic description. We mix presentation  
and content and think we are doing MARC. Moreover, I don't  
appreciate ILS vendors who extend and enhance the standard  
making it difficult to use standard tools against their data. This  
just makes my work unnecessarily difficult. Why do we tolerate such  
things?


I won't even get into the fact that MARC was designed to enable the  
printing of catalog cards and the profession has gone on to use it  
(poorly) in so many other ways. If we in Library Land really want to  
live and work in an Internet environment, then we have some serious  
evolution to go through! The way we encode and make available our  
data is just one example. I feel like a dinosaur.


Whew!

--
Eric Lease Morgan
University of Notre Dam


Naomi Dushay
[EMAIL PROTECTED]