Re: [CODE4LIB] Accessible reCaptcha Was: Bookmarking web links - authoritativeness or focused searching
On Thu, Oct 1, 2009 at 8:39 AM, MJ Ray m...@phonecoop.coop wrote: Eric Hellman wrote: Are you arguing that reCaptcha cannot be accessible or that it is incorrectly implemented on this site? Primarily that it is incorrectly implemented. However, I've yet to see an implementation of recaptcha that is accessible and does not needlessly insult users with impaired vision. Even the one on recaptcha.net includes the fully-abled=human insults. The space shuttle is not wheelchair-accessible. Is that a reason not to go to the moon? Are non-astronauts less than human? People in foreign countries who don't speak English are not discriminating against you by not speaking English. Fancy restaurants don't have picture menus. People who don't have the internet can't query google via snail mail. Do you consider yourself more human than people who don't have internet access or don't know how to read? Captcha isn't meant as a judgment about whether you happen to have a soul or something, so there's no need to take it personally. It's meant to keep the bots out, period. It's easy to not understand the importance of that if you've never had to deal with your site getting spammed. No business owner in their right mind wants to exclude potential customers if they don't have to. If the site itself is not accessible, maybe it's better they use ReCaptcha and screen people they're unable to serve out before they even try to sign up...
Re: [CODE4LIB] alpha characters used for field names
Why don't systems use the 900 fields for local stuff like this? That's what they're there for, right? --Casey On Wed, Jun 25, 2008 at 12:23 PM, Steve Oberg [EMAIL PROTECTED] wrote: Eric, This is definitely not a feature of MARC but rather a feature of your local ILS (Aleph 500). Those are local fields for which you'd need to make a translation to a standard MARC field if you wanted to move that information to another system that is based on MARC. Steve On Wed, Jun 25, 2008 at 2:20 PM, Eric Lease Morgan [EMAIL PROTECTED] wrote: Are alpha characters used for field names valid in MARC records? When we do dumps of MARC records our ILS often dumps them with FMT and CAT field names. So not only do I have glorious 246 fields and 100 fields but I also have CAT fields and FMT fields. Are these features of my ILS -- extensions of the standard -- or really a part of MARC? Moreover, does something like Marc4J or MARC::Batch and friends deal with these alpha field names correctly? -- Eric Lease Morgan
Re: [CODE4LIB] free movie cover images?
One could embed the actual cataloging record data in the thumbnails using steganography... On Mon, May 19, 2008 at 2:12 PM, Peter Keane [EMAIL PROTECTED] wrote: Looked at another way: a thumbnail is just a bit of visual metadata, and you cannot copyright metadata. --peter keane
Re: [CODE4LIB] Latest OpenLibrary.org release
SRU is crap, in my opinion -- overengineered and under-thought, incomprehensible to non-librarians and burdened by the weight of history. The notion that it was designed to be used by all kinds of clients on all kinds of data is irrelevant in my book. Nobody in the *library world* uses it, much less non-libraries. APIs are for use. You don't get any points for idealogical correctness. A non-librarian could look at that API document, understand it all, and start working with it right away. There is no way you can say that about SRU. Kudos to the OpenLibrary team, whatever the reason was, for coming up with something better that people outside the library world might actually be willing to use. On Wed, May 7, 2008 at 12:55 PM, Dr R. Sanderson [EMAIL PROTECTED] wrote: I'm the only non-techie on the team, so I don't know that much about SRU. (Our head programmer lives in India, and is presumably asleep at the moment, otherwise I'd ask him!) Is it an interface that is used primarily by libraries? We are definitely hoping that our API will be used by all kinds, so perhaps that's the reasoning. It's designed to be used by all kinds of clients on all kinds of data, but is from the library world so perhaps the most well defined use cases are in this arena. Have a look at: http://www.loc.gov/standards/sru/ But this is an Open Source project, so if anyone would like to volunteer to build an SRU interface... you can! Please do! :-) I feel a student project coming on. :) Rob
Re: [CODE4LIB] Serials Solutions API and NDA
My opinion is that this sounds like a very odd or poorly-designed API. If some of their APIs are for unreleased or experimental features, I understand having NDA's for those. But for the most part, the API should cover the core functions of the product. What those core functions are should be no secret, and anything proprietary about how they work should be fully hidden from the people using the API. Otherwise, NDA or no, the API is worthless. --Casey On Wed, Apr 23, 2008 at 7:00 AM, Bill Dueber [EMAIL PROTECTED] wrote: Thanks -- this is great news! Is there anyone from Ex Libris (or, really, any other vendor) floating around that would like to comment in kind??? -Bill- On Tue, Apr 22, 2008 at 1:45 PM, Kaplanian, Harry [EMAIL PROTECTED] wrote: Hello everyone, There was a thread that started April 2nd about the Serials Solutions API and its NDA. We would like to clarify that the non-disclosure agreement which we ask libraries to sign before receiving the documentation for our APIs does not limit the library IN ANY WAY from contributing their own code to other institutions. The posting on code4lib from one of our support staff was incorrect. We ask libraries to sign a non-disclosure agreement before receiving the API's and accompanying documentation because once signed, API users have access to propriety information through communication with our development staff. Obviously, our software is our primary asset. We ask for the non-disclosure so that the technical details of that asset are not shared with a potential competitor. However, the code that the library develops using the API belongs to the library. The library is not limited from contributing that code to the community. In fact, we would encourage you to do so. Thanks! Harry Kaplanian Director of Product Management Serials Solutions -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] KR
No, you could write them in J [1]. This is how you do quicksort in J: quicksort=: (($:@(#[) , (=#[) , $:@(#[)) ({~ [EMAIL PROTECTED])) ^: (1#) --Casey [1] http://en.wikipedia.org/wiki/J_programming_language On Thu, Apr 3, 2008 at 12:41 PM, Tim Shearer [EMAIL PROTECTED] wrote: So now I have to compile my jokes? -t On Thu, 3 Apr 2008, Ryan Ordway wrote: #include stdio.h main(t,_,a) char *a; { return!0t?t3?main(-79,-13,a+main(-87,1-_,main(-86,0,a+1)+a)): 1,t_?main(t+1,_,a):3,main(-94,-27+t,a)t==2?_13? main(2,_+1,%s %d %d\n):9:16:t0?t-72?main(_,t, @n'+,#'/*{}w+/w#cdnr/+,{}r/*de}+,/*{*+,/w{%+,/w#q#n+,/#{l+,/n{n+,/+#n +,/#\ ;#q#n+,/+k#;*+,/'r :'d*'3,}{w+K w'K:'+}e#';dq#'l \ q#'+d'K#!/+k#;q#'r}eKK#}w'r}eKK{nl]'/#;#q#n'){)#}w'){){nl]'/+#n';d}rw' i;# \ ){nl]!/n{n#'; r{#w'r nc{nl]'/#{l,+'K {rw' iK{;[{nl]'/w#q#n'wk nw' \ iwk{KK{nl]!/w{%'l##w#' i; :{nl]'/*{q#'ld;r'}{nlwb!/*de}'c \ ;;{nl'-{}rw]'/+,}##'*}#nc,',#nw]'/+kd'+e}+;#'rdq#w! nr'/ ') }+} {rl#'{n' ')# \ }'+}##(!!/) :t-50?_==*a?putchar(31[a]):main(-65,_,a+1):main((*a=='/')+t,_,a+1) :0t?main(2,2,%s):*a=='/'||main(0,main(-61,*a, !ek;dc [EMAIL PROTECTED]'(q)-[w]*%n+r3#l,{}:\nuwloca-O;m .vpbks,fxntdCeghiry),a+1); } On Apr 3, 2008, at 8:54 AM, Jeremy Frumkin wrote: ..- .-.. .-.. .. .. -- --. --- .. -. --. - --- ... .- -.-- .- -... --- ..- - - .. ... - .-. . .- -.. .. ... - .- - -. --- -. . --- ..-. -.-- --- ..- ... ..- ..-. ..-. . .-. ..-. .-. --- -- .-. -- .. - . .-- .- -.-- .. -.. --- .-- . -. .. ..- ... . -- -.-- .--. .-. . ..-. . .-. .-. . -.. .. -. .--. ..- - -.. . ...- .. -.-. . .-.-.- .-.-.- .-.-.- -- -- .--- .- ..-. On 4/3/08 6:51 AM, Walter Lewis [EMAIL PROTECTED] wrote: Sebastian Hammer wrote: A true hacker has no need for these crude tools. He waits for cosmic radiation to pummel the magnetic patterns on his drive into a pleasing and functional sequence of bits. Alas, having been doing this (along with my partners, the four Yorkshiremen) since the Stone Age ... We used to arrange pebbles in the middle of road into the relevant patterns (we *dreamed* of being able to afford the wire for an abacus). Passing carts would then help crunch the numbers. Walter for whom graph paper, templates, pencils, 80 column punchcards and IBM Assembler were formative experiences === Jeremy Frumkin Head, Emerging Technologies and Services 121 The Valley Library, Oregon State University Corvallis OR 97331-4501 [EMAIL PROTECTED] 541.602.4905 541.737.3453 (Fax) === Without ambition one starts nothing. Without work one finishes nothing. - Emerson -- Ryan Ordway E-mail: [EMAIL PROTECTED] Unix Systems Administrator [EMAIL PROTECTED] OSU Libraries, Corvallis, OR 97331Office: Valley Library #4657
Re: [CODE4LIB] Reminder: Code4Lib 2008 Call for Proposals
Sorry, I only submit to conferences where the CFP is a Petrarchan sonnet. None of that Shakespearean Sonnet 2.0 crap for me. --Casey On 11/28/07, D Chudnov [EMAIL PROTECTED] wrote: Hear ye, hear ye, the deadline comes anon, but we have yet to hear from most of you. What hacks, pray tell, in IDEs o'er yon might come forth with a demo, or two? Time waits, but not for you. Who will anoint the keynote speakers? Now, for their sake we must act fast, and soon, you see, my point is that the schedule blocks are open. Point Break (and yes, I mean the movie) might could be a way to pass two hours. But who will slake the thirst that would remain 'twould not we see another six propos'ls? Script this for rake: Gather ideas, write one and send, or more, a break point's been herewith reset - step o'er! -- Forwarded message -- From: Roy Tennant [EMAIL PROTECTED] Date: Oct 31, 2007 1:55 PM Subject: [CODE4LIB] Code4Lib 2008 Call for Proposals To: CODE4LIB@listserv.nd.edu Code4lib 2008 Call for Proposals We are now accepting proposals for prepared talks for Code4lib 2008. Code4lib 2008 is a loosely structured conference for library technologists to commune, gather/create/share ideas and software, be inspired, and forge collaborations. It is also an outgrowth of the Access HackFest, wrapped into a conference-like format. It is *the* event for technologists building digital libraries and digital information systems, tools, and software. Prepared talks are 20 minutes, and must focus on one or more of the following areas: - tools (some cool new software, software library or integration platform) - specs (how to get the most out of some protocols, or proposals for new ones) - challenges (one or more big problems we should collectively address). The community will vote on proposals using the criteria of: - usefulness - newness - geekiness - diversity of topics. We cannot accept every prepared talk proposal, but multiple lightning talk sessions will provide everyone who wishes to present with an opportunity to do so. Please send your name, email address, and proposal of no more than 75 words to code4libcon at googlegroups.com. The proposal deadline is November 30, 2007, and proposers will be notified by December 14, 2007.
[CODE4LIB] OCLC is us (was Re: [CODE4LIB] more metadata from xISBN)
I've said it before and I'll probably say it again: OSLC anyone? OCLC is too large and too old to substantially change their business practices. They have great people working there and do some excellent things (which is why the fact they won't share their goodies with the rest of us is so galling) but they're just not going to fundamentally change the way they do business until they have to, and since they're a monopoly, that may be never. We need to recognize this. Building an open content library data commons is far more likely to happen than OCLC changing the way they've done things forever. No flies on OCLC but they are what they are. Jonathan Rochkind [EMAIL PROTECTED] 5/10/2007 7:59 AM PS: The more I think about this, the more burned up I actually get. Which maybe means I shouldn't post about it, but hey, I've never been one for circumspection. If OCLC is us, then OCLC will gladly share with us (who are in fact them, right?) their research on workset grouping algorithms, and precisely what workset grouping algorithm they are using in current implementations of xISBN and other services, right? After all, if OCLC is not a vendor, but just us collectively, why would one part of us need to keep trade secrets from another part of us? Right? While OCLC is at it, OCLC could throw in some more information on this project, which has apparently been consigned to trade secret land since it's sole (apparently mistaken) public outing: http://www.code4lib.org/2006/smith Our field needs publically shared research results and publically shared solutions, to build a research community, to solve the vexing problems we have in front of us in increasingly better ways, building off each other. We need public domain solutions. We are not interested in secret solutions. Vendors, however, need proprietary trade secrets, to make sure they can solve the problems better than their competitors. If OCLC is not a vendor but is instead us, then why does OCLC treat it's research findings as something that needs to be kept secret from the actual _us_---everyone here who does not work for OCLC. That's us. Jonathan Eric Hellman wrote: Jonathan, It's worth noting that OCLC *is* the we you are talking about. OCLC member libraries contribute resources to do exactly what you suggest, and to do it in a way that is sustainable for the long term. Worldcat is created and maintained by libraries and by librarians. I'm the last to suggest that OCLC is the best possible instantiation of libraries-working-together, but we do try. Eric At 3:01 PM -0400 5/9/07, Jonathan Rochkind wrote: 2) More interesting---OCLC's _initial_ work set grouping algorithm is public. However, we know they've done a lot of additional work to fine-tune the work set grouping algorithms. (http://www.frbr.org/2007/01/16/midwinter-implementers). Some of these algorithms probably take advantage of all the cool data OCLC has that we don't, okay. But how about we start working to re-create this algorithm? Re-create isn't a good word, because we aren't going to violate any NDA's, we're going to develop/invent our own algorithm, but this one is going to be open source, not a trade secret like OCLC's. So we develop an algorithm on our own, and we run that algorithm on our own data. Our own local catalog. Union catalogs. Conglomerations of different catalogs that we do ourselves. Even reproductions of the OCLC corpus (or significant subsets thereof) that we manage to assemble in ways that don't violate copyright or license agreements. And then we've got our own workset grouping service. Which is really all xISBN is. What is OCLC providing that is so special? Well, if what I've just outlined above is so much work that we _can't_ pull it off, then I guess we've got pay OCLC, and if we are willing to do so (rather than go without the service), then I guess OCLC has correctly pegged their market price. But our field is not a healthy field if all research is being done by OCLC and other vendors. We need research from other places, we need research that produces public domain results, not proprietary trade secrets. -- Eric Hellman, DirectorOCLC Openly Informatics Division [EMAIL PROTECTED]2 Broad St., Suite 208 tel 1-973-509-7800 fax 1-734-468-6216 Bloomfield, NJ 07003 http://openly.oclc.org/1cate/ 1 Click Access To Everything -- Jonathan Rochkind Sr. Programmer/Analyst The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu
[CODE4LIB] Job Posting: ILS Administrator at The Seattle Public Library
This is my current gig; I am moving into a new position at SPL. Given that Seattle is both the most literate [1] and the geekiest [2] city in the United States, this would seem to be the perfect position for a library geek. Don't hesitate to email me if you have any questions. --Casey [EMAIL PROTECTED] [1] http://www.ccsu.edu/amlc/ [2] http://www.wired.com/wired/archive/15.01/geekcities.html The Seattle Public Library is seeking an experienced Integrated Library Systems Administrator. The Integrated Library Systems Administrator serves as departmental lead and authority for administering and maintaining the Library's Horizon integrated library system (ILS). - Configures ILS system, provides training for staff, maintains administrative tables, manage user accounts and security, and performs other administrative tasks on the database systems as directed. -Writes and programs custom reports and scripts for undertaking routine and non-routine maintenance database tasks using SQL or a similar reporting and database modification language. - Serves as primary technical contact with ILS vendor and related user communities. - Oversees the implementation of new ILS features, and assures connectivity and data transmission between ILS and ancillary applications. - Serves as final internal tier for support of ILS, analyzes, resolves and responds to trouble tickets, service requests and information queries from staff and public. - Serves as a departmental lead or project manager for assigned projects, including overseeing the implementation of new services and systems. - Creates and maintains procedures and documentation, including software configurations, for ILS and related 3rd-party applications. - Interacts with third-party technology vendors and internal technology staff. Required qualifications include: -A bachelor's degree in Computer Science, MIS, or other related discipline or commensurate experience -3+ years progressively responsible experience with library automation systems (ILS) or other major database systems (system administration, ability to create custom reports). - Training/experience with library cataloging practices and library practices and procedures (collections and technical services, circulation, reference, in-depth understanding of MARC formats, AACR2, etc.). - Extensive experience with SQL or a similar database reporting and modification language, and with RDBMS (pref. Sybase, MS SQL, or DB2) administration. - Scripting experience in a Windows NT/2000/XP or UNIX environment, or other programming or SQL/report languages experience. Desired qualifications include: - Experience administering the SirsiDynix Horizon system. - Masters degree in Library and Information Science. - Experience working with HTML. XML, metadata, XSL, Java or JavaScript. - Experience administering applications or servers using Windows Active Directory in a large enterprise or organizational environment. - Experience administering applications on Linux (Red Hat) or UNIX systems and hardware. Salary: $66,788.80 - $81,161.60 annually, including excellent benefits. This classification is part of a bargaining unit represented by AFSCME, Local 2083. Application materials are due by 5:00pm Pacific Time, Monday, February 19, 2007. For more information and instructions on how to apply, interested applicants should check out the full job posting at: http://www.spl.org/default.asp?pageID=about_jobsvolunteering_jobs_openings_detailcid=1170193875060
[CODE4LIB] Solr indexing -- why XSL is the wrong choice
I think there are many good reasons why XSLT is absolutely the wrong tool for the job of indexing MARC records for Solr. 1) Performance/Speed: In my experience even just transforming from MARCXML to MODS takes a second or two (using the LoC stylesheet), due to the stylesheet's complexity and inefficiency of doing heavy-duty string manipulation in XSL. That means you're looking at an indexing speed of around 1 record/second. If you've got 1,000,000 bib records, it'll take a couple of weeks just to index your data. For comparison, the indexer of our commercial OPAC does about 50 records per second (~6 hours for a million records) and the one I've written in Jython (by no means the fastest language out there) that doesn't use XSL can do about 150 records a second (about 2 hours for 1 million records). 2) Reusability: What if you want to change how a field is indexed? You would have to edit the XSLT directly (or have the XSL stylesheet automatically generated based on settings stored elsewhere). a) Users of the indexer shouldn't have to actually mess with programming logic to change how it indexes. You shouldn't have to know a thing about programming to change the setup of an index. b) It should be easy for an external application to know how your indexes have been built. This would be very difficult with an XSL stylesheet. Burying configuration inside of programming logic is a bad idea. c) The Solr schema should be automatically generated from your index setup so all your index configuration is in one place. I guess you could write *another* XSL stylesheet that would transform your indexing stylesheet into the Solr schema file, but that seems ridiculous. d) Automatic code generation is evil. Blanchard's law: Systems that require code generation lack sufficient power to tackle the problem at hand. If you find yourself considering automatic code generation, you should instead be considering a more dynamic programming language. 3) Ease of programming. a) Heavy-duty string manipulation is a pain in pure XSLT. To index MARC records have to do normalization on dates and names and you probably want to do some translation between MARC codes and their meanings (for the audience language codes, for instance). Is it doable? Yes, especially if you use XSL extension functions. But if you're going to have huge chunks of your logic buried in extension functions, why not go whole hog and do it all outside of XSLT, instead of having half your programming logic in an extension function and half in the XSLT itself? b) Using XSLT makes object-oriented programming with your data harder. Your indexer should be able to give you a nice object representation of a record (so you can use that object representation within other code). If you go the XSLT route, you'd have to parse the MARC record, transform it to your Solr record XML format, then parse that XML and map the XML to an object. If you avoid XSLT, you just parse the MARC record and transform it to an object programmatically (with the object having a method to print itself out as a Solr XML record). Honestly, all this talk of using XSLT for indexing MARC records reminds me of that guy who rode across the United States on a riding lawnmower. I am looking forward to there being a standard, well-tested MARC record indexer for Solr (and would be excited to contribute to such a project), but I don't think that XSL is the right tool to use. --Casey
Re: [CODE4LIB] Solr indexing -- why XSL is the wrong choice
I'm perfectly willing to be persuaded to the 'light' side as well and I'm looking forward to learning more about your project as well, which is much more mature than mine at this point ... I'm just interested in something that works and is easily tweakable. I don't hold a lot of hope that a one-size-fits-all XSL transformation could ever be put together -- I think there are too many minor but significant variations in how people catalog stuff and how different ILSes stick things like item data in the MARC record. I could be wrong about that though. Maybe I've just been traumatized by having to deal with so many bad uses of XSL featuring multiple 250KB+ stylesheets with extension functions spaghetti'd throughout that I'm disinclined to use it even when it is the best and simplest tool for the job. I'd love to see how you're getting around the somewhat convoluted back and forth between XSL and extension functions that has been my experience. As far as performance goes, if you've got an ILS that allows you to dump all the MARC records in the system at once but not a way to incrementally get all the MARC records that have changed since you last updated the indexes, then indexing performance is very important -- if you can reindex all your records in an hour or two it makes it feasible to just rebuild your indexes from scratch every night from 3-4 AM where it wouldn't be if it takes 8 hours. It also makes the cost of fine-tuning your indexes much lower. Just for some clarification, in my system, you don't need to know a thing about programming or XML at all or ever look at a single line of code to change how an index is created.There is just one configuration file (in the future this may all be stored in a database and accessible via Django's automatic web admin interface but for now it's just a text file) and the core indexing code is never modified at all. The three lines in the config file that define the title index look something like this: title.type = single title.marcMap = 245$ab,246$ab,240$a title.stripTrailingPunctuation = 1 (The .type argument says that it is not a repeated field in Solr, the .marcMap field dictates how the title data is extracted and .stripTrailingPunctuation does what it sounds like) Now say you want to include the n subfields in there as well. Well, you just change that one line in that one config file to: title.marcMap = 245$abn,246$abn,240$an Now say you want to introduce a new index in Solr. Well, you just add a couple of new lines to the config file, run a little script that automatically generates the Solr schema (though I still have a ton of work to do on that piece of it), reindex, and you're done. Defining an index of the language of the material (English, Swahili, etc.) would look like: language.type= singleTranslation language.marcMap = 008/35:38 language.translationMap = LANGUAGE_CODING_MAP (LANGUAGE_CODING_MAP is a hash map of the three letter LoC language codes, for example 'eng' = 'English' ) You can handle fields with processors (little bits of code) if you need something more sophisticated than a MARC map or a translation. The processor I have for the common-sense format of the item (DVD, Book on CD, eMusic -- the kind of thing that is very annoying to get out of a MARC record but very important to patrons) is extremely complex and would be unbelievably tedious to replicate in XSL. Now, say somebody writes a better processor (which could theoretically be written in any JVM language - java, jruby, jython, javascript (rhino), etc.). To use it would be as simple as changing one line in a configuration file and dropping the processor code in a particular spot. --Casey [EMAIL PROTECTED] 1/19/2007 2:35 PM Casey, we have had great successes with XSL for MARCXML to SOLR, so I can't agree to everything you are saying. However I anxiously await your presentation on your successes with SOLR so you can persuade me to the dark side :) Casey Durfee wrote: I agree with your argument of abstracting your programming from your data so that a non-tech-savvy librarian could modify the solr settings. But if you modify the solr settings, you need to (at this point) reimport all of your data which mean that you either have to change your XSLT or your transformation application. I personally feel that a less-tech savvy individual can pickup XSLT easier than coding java. Maybe I am understanding you incorrectly though. 3) Ease of programming. a) Heavy-duty string manipulation is a pain in pure XSLT. To index MARC records have to do normalization on dates and names and you probably want to do some translation between MARC codes and their meanings (for the audience language codes, for instance). Is it doable? Yes, especially if you use XSL extension functions. But if you're going to have huge chunks of your logic buried in extension functions, why not go whole hog and do
Re: [CODE4LIB] Getting data from Voyager into XML?
Many ILSes give the ability to export item data in the MARC record in a 9xx tag, (usually the 949 since BT and other book jobbers like to put holdings data for newly-acquired items there so the ILS can automatically create an item record when the MARC record is loaded). That is how I've been getting location/collection code info into my Solr-based catalog. So you might want to look into that. I think having separate XML files for holdings data (and hence, a second install of Solr just for holdings data) is less than optimal for a myriad of reasons. Likewise I think XSLT is a pretty poor tool for generating Solr records. XSLT is really difficult to do the kind of data manipulation I've been finding I need to do on our MARC records to get them nice and Solrized. Also, very very poor performance. --Casey [EMAIL PROTECTED] 1/17/2007 12:48 PM On Jan 17, 2007, at 2:26 PM, Andrew Nagy wrote: Nate, it's pretty easy. Once you dump your records into a giant marc file, you can run marc2xml (http://search.cpan.org/~kados/MARC-XML-0.82/bin/marc2xml). Then run an XSLT against the marcxml file to create your SOLR xml docs. Unless I'm totally, hugely mistaken, MARC doesn't say anything about holdings data, right? If I want to facet on that, would it make more sense to add holdings data to the MARC XML data, or keep separate xml files for holdings that reference the item data? In a lot of cases, location data might not be a hugely important facet; at Madison, we have something like 42 libraries spread thinly across campus (gah!) -- each with different loan policies -- as well as a few request-only storage facilities. So there's a lot of Stuff I Can't Check Out and a lot of Stuff I'll Need To Wait For in our collection. Thanks! -Nate
Re: [CODE4LIB] Getting data from Voyager into XML?
Any Real System Guy worth their salt would know how to set up an account for you to use for SQL queries like these with read-only rights and low processing priority/throttling so there would be little to no chance of it affecting system performance. Even if they don't know, they could find out all they need to know with about 5 minutes of hax0ring the G00G13... or going to the library and getting an Oracle systems administration for dummies book if they're not into the whole internet thing. So it sounds to me like they're stonewalling you because they flat out don't know what they're doing and don't care to find out. In which cases, condolences. On 1/17/07, Nathan Vack [EMAIL PROTECTED] wrote: On Jan 17, 2007, at 2:59 PM, Bess Sadler wrote: As long as we're on the subject, does anyone want to share strategies for syncing circulation data? It sounds like we're all talking about the parallel systems รก la NCSU's Endeca system, which I think is a great idea. It's the circ data that keeps nagging at me, though. Is there an elegant way to use your fancy new faceted browser to search against circ data w/out re-dumping the whole thing every night? Sure isn't elegant, but as our Real Systems Guys don't want us to look at the production Oracle instance (performance worries), we've had pretty good luck screen-scraping holdings and status data, once we get a Bib ID. Ugly, but functional, and surprisingly fast. Of course, spamming the OPAC with HTTP calls certainly impacts performance more than just querying the database... but I digress. In a perfect world, we'd get a trigger / stored proc on the database server when circ status changed. In a slightly less perfect world, I'd just keep a connection open to the production datbase server for all of that. -n
Re: [CODE4LIB] code4lib lucene pre-conference
Lucene has a pretty well-specified search syntax which is unlikely to change all that much, even though it's not a standard. It's not perfect, but I think it's pretty good. Overview here: http://lucene.apache.org/java/docs/queryparsersyntax.html I believe Solr adds a bit to the standard Lucene syntax for sorting: http://incubator.apache.org/solr/tutorial.html#Sorting I do have a layer of abstraction between the end-user search interface and Lucene -- you'd have to have such a layer no matter what search engine you were using. [EMAIL PROTECTED] 11/27/2006 2:49 PM Casey Durfee wrote: Just using Solr has proven to be much faster than doing the search in Solr and then retrieving full data from another database. This also has the advantage of making it so there's only one thing you gotta keep in sync with the ILS. The only data that my OPAC needs to talk to a SQL database for is item-level information, which changes too often to keep synced. My only concern about lucene is the lack of a standard query language. I went down the native XML database path because of XQuery and XSL, does something like lucene and solr offer a strong query language? Is it a standard? What if someone developed a kick ass text indexer in 2 years that totally blows lucene out of the water, would you easily be able to switch systems? Andrew
Re: [CODE4LIB] java application on a cd
Jetty's [1] tiny and a breeze to embed. [1] http://jetty.mortbay.org/ [EMAIL PROTECTED] 10/13/2006 1:01 PM On Oct 13, 2006, at 2:18 PM, Susan Teague Rector wrote: I'm pretty sure this is not doable this way - You'll have to use either JSP or servlets to get from a web based form to a java app. Hmm... If I am unable to call something like search.java directly, then maybe I could include something like tomcat on the CD too, but that is beginning to sound a bit ugly. -- Eric Lease Morgan University Libraries of Notre Dame