Re: [CODE4LIB] Accessible reCaptcha Was: Bookmarking web links - authoritativeness or focused searching

2009-10-02 Thread Casey Durfee
On Thu, Oct 1, 2009 at 8:39 AM, MJ Ray m...@phonecoop.coop wrote:

 Eric Hellman wrote:
  Are you arguing that reCaptcha cannot be accessible or that it is
  incorrectly implemented on this site?

 Primarily that it is incorrectly implemented.  However, I've yet to
 see an implementation of recaptcha that is accessible and does not
 needlessly insult users with impaired vision.  Even the one on
 recaptcha.net includes the fully-abled=human insults.


The space shuttle is not wheelchair-accessible.  Is that a reason not to go
to the moon?  Are non-astronauts less than human?  People in foreign
countries who don't speak English are not discriminating against you by not
speaking English.  Fancy restaurants don't have picture menus.  People who
don't have the internet can't query google via snail mail.  Do you consider
yourself more human than people who don't have internet access or don't know
how to read?

Captcha isn't meant as a judgment about whether you happen to have a soul or
something, so there's no need to take it personally.  It's meant to keep the
bots out, period.  It's easy to not understand the importance of that if
you've never had to deal with your site getting spammed.  No business owner
in their right mind wants to exclude potential customers if they don't have
to.

If the site itself is not accessible, maybe it's better they use ReCaptcha
and screen people they're unable to serve out before they even try to sign
up...


Re: [CODE4LIB] alpha characters used for field names

2008-06-25 Thread Casey Durfee
Why don't systems use the 900 fields for local stuff like this?  That's what
they're there for, right?

--Casey

On Wed, Jun 25, 2008 at 12:23 PM, Steve Oberg [EMAIL PROTECTED] wrote:

 Eric,

 This is definitely not a feature of MARC but rather a feature of your local
 ILS (Aleph 500).  Those are local fields for which you'd need to make a
 translation to a standard MARC field if you wanted to move that information
 to another system that is based on MARC.

 Steve

 On Wed, Jun 25, 2008 at 2:20 PM, Eric Lease Morgan [EMAIL PROTECTED] wrote:

  Are alpha characters used for field names valid in MARC records?
 
  When we do dumps of MARC records our ILS often dumps them with FMT and
 CAT
  field names. So not only do I have glorious 246 fields and 100 fields but
 I
  also have CAT fields and FMT fields. Are these features of my ILS --
  extensions of the standard -- or really a part of MARC? Moreover, does
  something like Marc4J or MARC::Batch and friends deal with these alpha
 field
  names correctly?
 
  --
  Eric Lease Morgan
 



Re: [CODE4LIB] free movie cover images?

2008-05-19 Thread Casey Durfee
One could embed the actual cataloging record data in the thumbnails using
steganography...

On Mon, May 19, 2008 at 2:12 PM, Peter Keane [EMAIL PROTECTED] wrote:

 Looked at another way: a thumbnail is just a bit of visual metadata,
 and you cannot copyright metadata.

 --peter keane




Re: [CODE4LIB] Latest OpenLibrary.org release

2008-05-07 Thread Casey Durfee
SRU is crap, in my opinion -- overengineered and under-thought,
incomprehensible to non-librarians and burdened by the weight of history.
The notion that it was designed to be used by all kinds of clients on all
kinds of data is irrelevant in my book.  Nobody in the *library world* uses
it, much less non-libraries.  APIs are for use.  You don't get any points
for idealogical correctness.  A non-librarian could look at that API
document, understand it all, and start working with it right away.  There is
no way you can say that about SRU.

Kudos to the OpenLibrary team, whatever the reason was, for coming up with
something better that people outside the library world might actually be
willing to use.


On Wed, May 7, 2008 at 12:55 PM, Dr R. Sanderson [EMAIL PROTECTED]
wrote:

 I'm the only non-techie on the team, so I don't know that much about
  SRU.  (Our head programmer lives in India, and is presumably asleep at
  the moment, otherwise I'd ask him!)  Is it an interface that is used
  primarily by libraries?  We are definitely hoping that our API will be
  used by all kinds, so perhaps that's the reasoning.
 

 It's designed to be used by all kinds of clients on all kinds of data,
 but is from the library world so perhaps the most well defined use cases
 are in this arena.  Have a look at:
  http://www.loc.gov/standards/sru/

  But this is an Open Source project, so if anyone would like to volunteer
  to build an SRU interface... you can!  Please do! :-)
 

 I feel a student project coming on. :)

 Rob



Re: [CODE4LIB] Serials Solutions API and NDA

2008-04-23 Thread Casey Durfee
My opinion is that this sounds like a very odd or poorly-designed API.  If
some of their APIs are for unreleased or experimental features, I understand
having NDA's for those.  But for the most part, the API should cover the
core functions of the product.  What those core functions are should be no
secret, and anything proprietary about how they work should be fully hidden
from the people using the API.  Otherwise, NDA or no, the API is worthless.

--Casey

On Wed, Apr 23, 2008 at 7:00 AM, Bill Dueber [EMAIL PROTECTED] wrote:

 Thanks -- this is great news! Is there anyone from Ex Libris (or, really,
 any other vendor) floating around that would like to comment in kind???

  -Bill-

 On Tue, Apr 22, 2008 at 1:45 PM, Kaplanian, Harry 
 [EMAIL PROTECTED] wrote:

  Hello everyone,
  There was a thread that started April 2nd about the Serials Solutions
  API and its NDA.
 
  We would like to clarify that the non-disclosure agreement which we ask
  libraries to sign before receiving the documentation for our APIs does
  not limit the library IN ANY WAY from contributing their own code to
  other institutions. The posting on code4lib from one of our support
  staff was incorrect.
 
  We ask libraries to sign a non-disclosure agreement before receiving the
  API's and accompanying documentation because once signed, API users have
  access to propriety information through communication with our
  development staff.
 
  Obviously, our software is our primary asset.  We ask for the
  non-disclosure so that the technical details of that asset are not
  shared with a potential competitor. However, the code that the library
  develops using the API belongs to the library.  The library is not
  limited from contributing that code to the community. In fact, we would
  encourage you to do so.
 
 
  Thanks!
  Harry Kaplanian
  Director of Product Management
  Serials Solutions
 



 --
 Bill Dueber
 Library Systems Programmer
 University of Michigan Library



Re: [CODE4LIB] KR

2008-04-03 Thread Casey Durfee
No, you could write them in J [1].  This is how you do quicksort in J:

quicksort=: (($:@(#[) , (=#[) , $:@(#[)) ({~ [EMAIL PROTECTED])) ^: (1#)


--Casey

[1] http://en.wikipedia.org/wiki/J_programming_language


On Thu, Apr 3, 2008 at 12:41 PM, Tim Shearer [EMAIL PROTECTED] wrote:

 So now I have to compile my jokes?

 -t


 On Thu, 3 Apr 2008, Ryan Ordway wrote:

  #include stdio.h
  main(t,_,a)
  char *a;
  {
  return!0t?t3?main(-79,-13,a+main(-87,1-_,main(-86,0,a+1)+a)):
  1,t_?main(t+1,_,a):3,main(-94,-27+t,a)t==2?_13?
  main(2,_+1,%s %d %d\n):9:16:t0?t-72?main(_,t,
  @n'+,#'/*{}w+/w#cdnr/+,{}r/*de}+,/*{*+,/w{%+,/w#q#n+,/#{l+,/n{n+,/+#n
  +,/#\
  ;#q#n+,/+k#;*+,/'r :'d*'3,}{w+K w'K:'+}e#';dq#'l \
  q#'+d'K#!/+k#;q#'r}eKK#}w'r}eKK{nl]'/#;#q#n'){)#}w'){){nl]'/+#n';d}rw'
  i;# \
  ){nl]!/n{n#'; r{#w'r nc{nl]'/#{l,+'K {rw' iK{;[{nl]'/w#q#n'wk nw' \
  iwk{KK{nl]!/w{%'l##w#' i; :{nl]'/*{q#'ld;r'}{nlwb!/*de}'c \
  ;;{nl'-{}rw]'/+,}##'*}#nc,',#nw]'/+kd'+e}+;#'rdq#w! nr'/ ') }+}
  {rl#'{n' ')# \
  }'+}##(!!/)
  :t-50?_==*a?putchar(31[a]):main(-65,_,a+1):main((*a=='/')+t,_,a+1)
  :0t?main(2,2,%s):*a=='/'||main(0,main(-61,*a,
  !ek;dc [EMAIL PROTECTED]'(q)-[w]*%n+r3#l,{}:\nuwloca-O;m 
  .vpbks,fxntdCeghiry),a+1);
  }
 
 
 
  On Apr 3, 2008, at 8:54 AM, Jeremy Frumkin wrote:
 
   ..- .-.. .-..   .. .. --   --. --- .. -. --.   - ---   ... .-
   -.--   .-
   -... --- ..- -   -  .. ...   -  .-. . .- -..   .. ...
   -  .- -
   -. --- -. .   --- ..-.   -.--
   --- ..-   ... ..- ..-. ..-. . .-.   ..-. .-.
   --- --   .-. -- ..   -  .   .-- .- -.--   ..   -..
   ---   .--  . -.
   ..   ..- ... .   -- -.--   .--. .-. . ..-. . .-. .-. . -..   ..
   -. .--. ..-
   -   -.. . ...- .. -.-. . .-.-.- .-.-.- .-.-.-
  
   -- --   .--- .- ..-.
  
  
   On 4/3/08 6:51 AM, Walter Lewis [EMAIL PROTECTED] wrote:
  
Sebastian Hammer wrote:
   
 A true hacker has no need for these crude tools. He waits for
  cosmic
  radiation to pummel the magnetic patterns on his drive into a
  pleasing
  and functional sequence of bits.
 
 Alas, having been doing this (along with my partners, the four
Yorkshiremen) since the Stone Age ...
   
We used to arrange pebbles in the middle of road into the relevant
patterns (we *dreamed* of being able to afford the wire for an
abacus).
Passing carts would then help crunch the numbers.
   
Walter
 for whom graph paper, templates, pencils, 80 column punchcards and
IBM Assembler were formative experiences
   
   
  
  
   ===
   Jeremy Frumkin
   Head, Emerging Technologies and Services
   121 The Valley Library, Oregon State University
   Corvallis OR 97331-4501
  
   [EMAIL PROTECTED]
  
   541.602.4905
   541.737.3453 (Fax)
   ===
Without ambition one starts nothing. Without work one finishes
   nothing. 
   - Emerson
  
  
  --
  Ryan Ordway   E-mail: [EMAIL PROTECTED]
  Unix Systems Administrator   [EMAIL PROTECTED]
  OSU Libraries, Corvallis, OR 97331Office: Valley Library #4657
 



Re: [CODE4LIB] Reminder: Code4Lib 2008 Call for Proposals

2007-11-28 Thread Casey Durfee
Sorry, I only submit to conferences where the CFP is a Petrarchan sonnet.
None of that Shakespearean Sonnet 2.0 crap for me.

--Casey



On 11/28/07, D Chudnov [EMAIL PROTECTED] wrote:

 Hear ye, hear ye, the deadline comes anon,
 but we have yet to hear from most of you.
 What hacks, pray tell, in IDEs o'er yon
 might come forth with a demo, or two?
 Time waits, but not for you.  Who will anoint
 the keynote speakers?  Now, for their sake
 we must act fast, and soon, you see, my point
 is that the schedule blocks are open.  Point Break
 (and yes, I mean the movie) might could be
 a way to pass two hours.  But who will slake
 the thirst that would remain 'twould not we see
 another six propos'ls?  Script this for rake:
 Gather ideas, write one and send, or more,
 a break point's been herewith reset - step o'er!


 -- Forwarded message --
 From: Roy Tennant [EMAIL PROTECTED]
 Date: Oct 31, 2007 1:55 PM
 Subject: [CODE4LIB] Code4Lib 2008 Call for Proposals
 To: CODE4LIB@listserv.nd.edu


 Code4lib 2008 Call for Proposals

 We are now accepting proposals for prepared talks for Code4lib 2008.
 Code4lib 2008 is a loosely structured conference for library technologists
 to commune, gather/create/share ideas and software, be inspired, and forge
 collaborations. It is also an outgrowth of the Access HackFest, wrapped
 into
 a conference-like format. It is *the* event for technologists building
 digital libraries and digital information systems, tools, and software.

 Prepared talks are 20 minutes, and must focus on one or more of the
 following areas:
 - tools (some cool new software, software library or integration
 platform)
 - specs (how to get the most out of some protocols, or proposals for new
 ones)
 - challenges (one or more big problems we should collectively address).

 The community will vote on proposals using the criteria of:
 - usefulness
 - newness
 - geekiness
 - diversity of topics.

 We cannot accept every prepared talk proposal, but multiple lightning talk
 sessions will provide everyone who wishes to present with an opportunity
 to
 do so.

 Please send your name, email address, and proposal of no more than 75
 words
 to code4libcon at googlegroups.com. The proposal deadline is November 30,
 2007, and proposers will be notified by December 14, 2007.



[CODE4LIB] OCLC is us (was Re: [CODE4LIB] more metadata from xISBN)

2007-05-10 Thread Casey Durfee
 
I've said it before and I'll probably say it again: OSLC anyone?  OCLC is too 
large and too old to substantially change their business practices.  They have 
great people working there and do some excellent things (which is why the fact 
they won't share their goodies with the rest of us is so galling) but they're 
just not going to fundamentally change the way they do business until they have 
to, and since they're a monopoly, that may be never.
 
We need to recognize this.  Building an open content library data commons is 
far more likely to happen than OCLC changing the way they've done things 
forever.  No flies on OCLC but they are what they are.
 

 Jonathan Rochkind [EMAIL PROTECTED] 5/10/2007 7:59 AM 
PS: The more I think about this, the more burned up I actually get.
Which maybe means I shouldn't post about it, but hey, I've never been
one for circumspection.

If OCLC is us, then OCLC will gladly share with us (who are in fact
them, right?) their research on workset grouping algorithms, and
precisely what workset grouping algorithm they are using in current
implementations of xISBN and other services, right? After all, if OCLC
is not a vendor, but just us collectively, why would one part of us
need to keep trade secrets from another part of us?  Right?

While OCLC is at it, OCLC could throw in some more information on this
project, which has apparently been consigned to trade secret land since
it's sole (apparently mistaken) public outing:
http://www.code4lib.org/2006/smith 

Our field needs publically shared research results and publically shared
solutions, to build a research community, to solve the vexing problems
we have in front of us in increasingly better ways, building off each
other. We need public domain solutions. We are not interested in
secret solutions. Vendors, however, need proprietary trade secrets, to
make sure they can solve the problems better than their competitors. If
OCLC is not a vendor but is instead us, then why does OCLC treat it's
research findings as something that needs to be kept secret from the
actual _us_---everyone here who does not work for OCLC. That's us.

Jonathan

Eric Hellman wrote:
 Jonathan,

 It's worth noting that OCLC *is* the we you are talking about.

 OCLC member libraries contribute resources to do exactly what you
 suggest, and to do it in a way that is sustainable for the long term.
 Worldcat is created and maintained by libraries and by librarians.
 I'm the last to suggest that OCLC is the best possible instantiation
 of libraries-working-together, but we do try.


 Eric



 At 3:01 PM -0400 5/9/07, Jonathan Rochkind wrote:
 2) More interesting---OCLC's _initial_ work set grouping algorithm is
 public. However, we know they've done a lot of additional work to
 fine-tune the work set grouping algorithms.
 (http://www.frbr.org/2007/01/16/midwinter-implementers).  Some of these
 algorithms probably take advantage of all the cool data OCLC has that we
 don't, okay.

 But how about we start working to re-create this algorithm? Re-create
 isn't a good word, because we aren't going to violate any NDA's, we're
 going to develop/invent our own algorithm, but this one is going to be
 open source, not a trade secret like OCLC's.

 So we develop an algorithm on our own, and we run that algorithm on our
 own data. Our own local catalog. Union catalogs. Conglomerations of
 different catalogs that we do ourselves. Even reproductions of the OCLC
 corpus (or significant subsets thereof) that we manage to assemble in
 ways that don't violate copyright or license agreements.

 And then we've got our own workset grouping service. Which is really all
 xISBN is.  What is OCLC providing that is so special? Well, if what I've
 just outlined above is so much work that we _can't_ pull it off, then I
 guess we've got pay OCLC, and if we are willing to do so (rather than go
 without the service), then I guess OCLC has correctly pegged their
 market price.

 But our field is not a healthy field if all research is being done by
 OCLC and other vendors. We need research from other places, we need
 research that produces public domain results, not proprietary trade
 secrets.


 --

 Eric Hellman, DirectorOCLC Openly
 Informatics Division
 [EMAIL PROTECTED]2 Broad St., Suite 208
 tel 1-973-509-7800 fax 1-734-468-6216  Bloomfield, NJ 07003
 http://openly.oclc.org/1cate/  1 Click Access To Everything


--
Jonathan Rochkind
Sr. Programmer/Analyst
The Sheridan Libraries
Johns Hopkins University
410.516.8886
rochkind (at) jhu.edu


[CODE4LIB] Job Posting: ILS Administrator at The Seattle Public Library

2007-02-02 Thread Casey Durfee
 
This is my current gig; I am moving into a new position at SPL.  Given that 
Seattle is both the most literate [1] and the geekiest [2] city in the United 
States, this would seem to be the perfect position for a library geek.  Don't 
hesitate to email me if you have any questions.
 
--Casey
[EMAIL PROTECTED]
 
 
[1] http://www.ccsu.edu/amlc/ 
[2] http://www.wired.com/wired/archive/15.01/geekcities.html 
 

The Seattle Public Library is seeking an experienced Integrated Library Systems 
Administrator.
 
The Integrated Library Systems Administrator serves as departmental lead and 
authority for administering and maintaining the Library's Horizon integrated 
library system (ILS).
- Configures ILS system, provides training for staff, maintains administrative 
tables, manage user accounts and security, and performs other administrative 
tasks on the database systems as directed.
-Writes and programs custom reports and scripts for undertaking routine and 
non-routine maintenance database tasks using SQL or a similar reporting and 
database modification language.
- Serves as primary technical contact with ILS vendor and related user 
communities.
- Oversees the implementation of new ILS features, and assures connectivity and 
data transmission between ILS and ancillary applications.
- Serves as final internal tier for support of ILS, analyzes, resolves and 
responds to trouble tickets, service requests and information queries from 
staff and public.
- Serves as a departmental lead or project manager for assigned projects, 
including overseeing the implementation of new services and systems.
- Creates and maintains procedures and documentation, including software 
configurations, for ILS and related 3rd-party applications.
- Interacts with third-party technology vendors and internal technology staff.
 
Required qualifications include: 
-A bachelor's degree in Computer Science, MIS, or other related discipline or 
commensurate experience
-3+ years progressively responsible experience with library automation systems 
(ILS) or other major database systems (system administration, ability to create 
custom reports).
- Training/experience with library cataloging practices and library practices 
and procedures (collections and technical services, circulation, reference, 
in-depth understanding of MARC formats, AACR2, etc.).
- Extensive experience with SQL or a similar database reporting and 
modification language, and with RDBMS (pref. Sybase, MS SQL, or DB2) 
administration.
- Scripting experience in a Windows NT/2000/XP or UNIX environment, or other 
programming or SQL/report languages experience.
 
Desired qualifications include:
- Experience administering the SirsiDynix Horizon system.
- Masters degree in Library and Information Science.
- Experience working with HTML. XML, metadata, XSL, Java or JavaScript.
- Experience administering applications or servers using Windows Active 
Directory in a large enterprise or organizational environment.
- Experience administering applications on Linux (Red Hat) or UNIX systems and 
hardware.
 
Salary: $66,788.80 - $81,161.60 annually, including excellent benefits.  This 
classification is part of a bargaining unit represented by AFSCME, Local 2083.
 
Application materials are due by 5:00pm Pacific Time, Monday, February 19, 
2007. For more information and instructions on how to apply, interested 
applicants should check out the full job posting at: 

http://www.spl.org/default.asp?pageID=about_jobsvolunteering_jobs_openings_detailcid=1170193875060


[CODE4LIB] Solr indexing -- why XSL is the wrong choice

2007-01-19 Thread Casey Durfee
 
I think there are many good reasons why XSLT is absolutely the wrong tool for 
the job of indexing MARC records for Solr.
 
1) Performance/Speed: In my experience even just transforming from MARCXML to 
MODS takes a second or two (using the LoC stylesheet), due to the stylesheet's 
complexity and inefficiency of doing heavy-duty string manipulation in XSL.  
That means you're looking at an indexing speed of around 1 record/second.  If 
you've got 1,000,000 bib records, it'll take a couple of weeks just to index 
your data.  For comparison, the indexer of our commercial OPAC does about 50 
records per second (~6 hours for a million records) and the one I've written in 
Jython (by no means the fastest language out there) that doesn't use XSL can do 
about 150 records a second (about 2 hours for 1 million records).  
 
2) Reusability:  What if you want to change how a field is indexed?  You would 
have to edit the XSLT directly (or have the XSL stylesheet automatically 
generated based on settings stored elsewhere).  
 
a) Users of the indexer shouldn't have to actually mess with programming logic 
to change how it indexes.  You shouldn't have to know a thing about programming 
to change the setup of an index.
 
b) It should be easy for an external application to know how your indexes have 
been built.  This would be very difficult with an XSL stylesheet.  Burying 
configuration inside of programming logic is a bad idea.  
 
c) The Solr schema should be automatically generated from your index setup so 
all your index configuration is in one place.  I guess you could write 
*another* XSL stylesheet that would transform your indexing stylesheet into the 
Solr schema file, but that seems ridiculous.
 
d) Automatic code generation is evil.  Blanchard's law: Systems that require 
code generation lack sufficient power to tackle the problem at hand.  If you 
find yourself considering automatic code generation, you should instead be 
considering a more dynamic programming language.
 
3) Ease of programming.  
 
a) Heavy-duty string manipulation is a pain in pure XSLT.  To index MARC 
records have to do normalization on dates and names and you probably want to do 
some translation between MARC codes and their meanings (for the audience  
language codes, for instance).  Is it doable?  Yes, especially if you use XSL 
extension functions.  But if you're going to have huge chunks of your logic 
buried in extension functions, why not go whole hog and do it all outside of 
XSLT, instead of having half your programming logic in an extension function 
and half in the XSLT itself?
 
b) Using XSLT makes object-oriented programming with your data harder.  Your 
indexer should be able to give you a nice object representation of a record (so 
you can use that object representation within other code).  If you go the XSLT 
route, you'd have to parse the MARC record, transform it to your Solr record 
XML format, then parse that XML and map the XML to an object.  If you avoid 
XSLT, you just parse the MARC record and transform it to an object 
programmatically (with the object having a method to print itself out as a Solr 
XML record).
 
Honestly, all this talk of using XSLT for indexing MARC records reminds me of 
that guy who rode across the United States on a riding lawnmower.  I am looking 
forward to there being a standard, well-tested MARC record indexer for Solr 
(and would be excited to contribute to such a project), but I don't think that 
XSL is the right tool to use.
 
 
--Casey
 

 


Re: [CODE4LIB] Solr indexing -- why XSL is the wrong choice

2007-01-19 Thread Casey Durfee
 
I'm perfectly willing to be persuaded to the 'light' side as well and I'm 
looking forward to learning more about your project as well, which is much more 
mature than mine at this point ... I'm just interested in something that works 
and is easily tweakable.  I don't hold a lot of hope that a one-size-fits-all 
XSL transformation could ever be put together -- I think there are too many 
minor but significant variations in how people catalog stuff and how different 
ILSes stick things like item data in the MARC record.  I could be wrong about 
that though.
 
Maybe I've just been traumatized by having to deal with so many bad uses of XSL 
featuring multiple 250KB+ stylesheets with extension functions spaghetti'd 
throughout that I'm disinclined to use it even when it is the best and simplest 
tool for the job.  I'd love to see how you're getting around the somewhat 
convoluted back and forth between XSL and extension functions that has been my 
experience.
 
As far as performance goes, if you've got an ILS that allows you to dump all 
the MARC records in the system at once but not a way to incrementally get all 
the MARC records that have changed since you last updated the indexes, then 
indexing performance is very important -- if you can reindex all your records 
in an hour or two it makes it feasible to just rebuild your indexes from 
scratch every night from 3-4 AM where it wouldn't be if it takes 8 hours.  It 
also makes the cost of fine-tuning your indexes much lower.
 
Just for some clarification, in my system, you don't need to know a thing about 
programming or XML at all or ever look at a single line of code to change how 
an index is created.There is just one configuration file (in the future 
this may all be stored in a database and accessible via Django's automatic web 
admin interface but for now it's just a text file) and the core indexing code 
is never modified at all.  The three lines in the config file that define the 
title index look something like this:
 
title.type = single
title.marcMap = 245$ab,246$ab,240$a
title.stripTrailingPunctuation = 1
 
(The .type argument says that it is not a repeated field in Solr, the .marcMap 
field dictates how the title data is extracted and .stripTrailingPunctuation 
does what it sounds like)
 
Now say you want to include the n subfields in there as well.  Well, you just 
change that one line in that one config file to:

title.marcMap = 245$abn,246$abn,240$an
 
Now say you want to introduce a new index in Solr.  Well, you just add a couple 
of new lines to the config file, run a little script that automatically 
generates the Solr schema (though I still have a ton of work to do on that 
piece of it), reindex, and you're done.   
 
Defining an index of the language of the material (English, Swahili, etc.) 
would look like:
 
language.type= singleTranslation
language.marcMap = 008/35:38
language.translationMap = LANGUAGE_CODING_MAP
 
(LANGUAGE_CODING_MAP is a hash map of the three letter LoC language codes, for 
example 'eng' = 'English' )
 
You can handle fields with processors (little bits of code) if you need 
something more sophisticated than a MARC map or a translation.  The processor I 
have for the common-sense format of the item (DVD, Book on CD, eMusic -- the 
kind of thing that is very annoying to get out of a MARC record but very 
important to patrons) is extremely complex and  would be unbelievably tedious 
to replicate in XSL.   Now, say somebody writes a better processor (which could 
theoretically be written in any JVM language - java, jruby, jython, javascript 
(rhino), etc.).  To use it would be as simple as changing one line in a 
configuration file and dropping the processor code in a particular spot.  
 
 
--Casey

 [EMAIL PROTECTED] 1/19/2007 2:35 PM 
Casey, we have had great successes with XSL for MARCXML to SOLR, so I
can't agree to everything you are saying.  However I anxiously await
your presentation on your successes with SOLR so you can persuade me to
the dark side :)

Casey Durfee wrote:

I agree with your argument of abstracting your programming from your
data so that a non-tech-savvy librarian could modify the solr settings.
But if you modify the solr settings, you need to (at this point)
reimport all of your data which mean that you either have to change your
XSLT or your transformation application.  I personally feel that a
less-tech savvy individual can pickup XSLT easier than coding java.
Maybe I am understanding you incorrectly though.

 3) Ease of programming.

 a) Heavy-duty string manipulation is a pain in pure XSLT.  To index MARC 
 records have to do normalization on dates and names and you probably want to 
 do some translation between MARC codes and their meanings (for the audience  
 language codes, for instance).  Is it doable?  Yes, especially if you use XSL 
 extension functions.  But if you're going to have huge chunks of your logic 
 buried in extension functions, why not go whole hog and do

Re: [CODE4LIB] Getting data from Voyager into XML?

2007-01-17 Thread Casey Durfee
Many ILSes give the ability to export item data in the MARC record in a 9xx 
tag, (usually the 949 since BT and other book jobbers like to put holdings 
data for newly-acquired items there so the ILS can automatically create an item 
record when the MARC record is loaded).  That is how I've been getting 
location/collection code info into my Solr-based catalog.  So you might want to 
look into that.
 
I think having separate XML files for holdings data (and hence, a second 
install of Solr just for holdings data) is less than optimal for a myriad of 
reasons.  Likewise I think XSLT is a pretty poor tool for generating Solr 
records.  XSLT is really difficult to do the kind of data manipulation I've 
been finding I need to do on our MARC records to get them nice and Solrized.  
Also, very very poor performance.
 
--Casey

 [EMAIL PROTECTED] 1/17/2007 12:48 PM 
On Jan 17, 2007, at 2:26 PM, Andrew Nagy wrote:

 Nate, it's pretty easy.  Once you dump your records into a giant marc
 file, you can run marc2xml
 (http://search.cpan.org/~kados/MARC-XML-0.82/bin/marc2xml).  Then
 run an
 XSLT against the marcxml file to create your SOLR xml docs.

Unless I'm totally, hugely mistaken, MARC doesn't say anything about
holdings data, right? If I want to facet on that, would it make more
sense to add holdings data to the MARC XML data, or keep separate xml
files for holdings that reference the item data?

In a lot of cases, location data might not be a hugely important
facet; at Madison, we have something like 42 libraries spread thinly
across campus (gah!) -- each with different loan policies -- as well
as a few request-only storage facilities. So there's a lot of Stuff
I Can't Check Out and a lot of Stuff I'll Need To Wait For in our
collection.

Thanks!
-Nate


Re: [CODE4LIB] Getting data from Voyager into XML?

2007-01-17 Thread Casey Durfee
Any Real System Guy worth their salt would know how to set up an account
for you to use for SQL queries like these with read-only rights and low
processing priority/throttling so there would be little to no chance of
it affecting system performance.  Even if they don't know, they could
find out all they need to know with about 5 minutes of hax0ring the
G00G13... or going to the library and getting an Oracle systems
administration for dummies book if they're not into the whole internet
thing.

So it sounds to me like they're stonewalling you because they flat out
don't know what they're doing and don't care to find out.  In which
cases, condolences.



On 1/17/07, Nathan Vack [EMAIL PROTECTED] wrote:
 On Jan 17, 2007, at 2:59 PM, Bess Sadler wrote:

  As long as we're on the subject, does anyone want to share
strategies
  for syncing circulation data? It sounds like we're all talking
about
  the parallel systems รก la NCSU's Endeca system, which I think is a
  great idea. It's the circ data that keeps nagging at me, though.
Is
  there an elegant way to use your fancy new faceted browser to
search
  against circ data w/out re-dumping the whole thing every night?

 Sure isn't elegant, but as our Real Systems Guys don't want us to
 look at the production Oracle instance (performance worries), we've
 had pretty good luck screen-scraping holdings and status data, once
 we get a Bib ID. Ugly, but functional, and surprisingly fast.

 Of course, spamming the OPAC with HTTP calls certainly impacts
 performance more than just querying the database... but I digress.

 In a perfect world, we'd get a trigger / stored proc on the database
 server when circ status changed. In a slightly less perfect world,
 I'd just keep a connection open to the production datbase server for
 all of that.

 -n




Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-27 Thread Casey Durfee
 
Lucene has a pretty well-specified search syntax which is unlikely to change 
all that much, even though it's not a standard.  It's not perfect, but I think 
it's pretty good.  Overview here:
 
http://lucene.apache.org/java/docs/queryparsersyntax.html 
 
I believe Solr adds a bit to the standard Lucene syntax for sorting:
 
http://incubator.apache.org/solr/tutorial.html#Sorting 
 
I do have a layer of abstraction between the end-user search interface and 
Lucene -- you'd have to have such a layer no matter what search engine you were 
using.
 
 
 [EMAIL PROTECTED] 11/27/2006 2:49 PM 
Casey Durfee wrote:

Just using Solr has proven to be much faster than doing the search in Solr and 
then retrieving full data from another database.  This also has the advantage 
of making it so there's only one thing you gotta keep in sync with the ILS.  
The only data that my OPAC needs to talk to a SQL database for is item-level 
information, which changes too often to keep synced.

My only concern about lucene is the lack of a standard query language.
I went down the native XML database path because of XQuery and XSL, does
something like lucene and solr offer a strong query language?  Is it a
standard?  What if someone developed a kick ass text indexer in 2 years
that totally blows lucene out of the water, would you easily be able to
switch systems?

Andrew


Re: [CODE4LIB] java application on a cd

2006-10-13 Thread Casey Durfee
Jetty's [1] tiny and a breeze to embed.
[1] http://jetty.mortbay.org/ 
 

 [EMAIL PROTECTED] 10/13/2006 1:01 PM 
On Oct 13, 2006, at 2:18 PM, Susan Teague Rector wrote:

 I'm pretty sure this is not doable this way - You'll have to use
 either JSP or servlets to get from a web based form to a java app.


Hmm... If I am unable to call something like search.java directly,
then maybe I could include something like tomcat on the CD too, but
that is beginning to sound a bit ugly.

--
Eric Lease Morgan
University Libraries of Notre Dame