Re: [CODE4LIB] code4lib lucene pre-conference

2006-12-13 Thread Andrew Nagy
Erik Hatcher wrote: At this point, I'm planning on winging it with the datasets. By late February I will have (high on my TODO list now!) built a light-weight Solr mechanism for bringing in MARC data, and perhaps more (iTunes data files would make a fun one) and doing simple skinnable front-

Re: [CODE4LIB] code4lib lucene pre-conference

2006-12-12 Thread Tom Keays
Hi Bess, Put me down for this too. I hadn't made up my mind until I heard you and Eric and Art on Library Geeks. Lucene and Solr sound really excellent. I don't have a specific dataset that I'm working with; will there be one provided or should I dredge something up? I'm not a systems librarian

Re: [CODE4LIB] code4lib lucene pre-conference

2006-12-12 Thread Elizabeth Sadler
Hi, Tom. I will put you down. I'm glad you'll be joining us. You don't have to bring your own data set, but it might be cool if you did. Many people are bringing Marc XML dumps of part of their library catalogue, for example, or else another data set with which they work regularly. It's for

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-29 Thread Eric Lease Morgan
Thom Hickey wrote: Ralph LeVan here at OCLC has worked on an SRU interface to Lucene. The combined indexer/SRU approach is the tack I've been taking regarding search too. I don't care a whole lot whether I use this indexer, that indexer, or the other indexer as long as I can make sure I have

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-29 Thread Art Rhyno
I don't care a whole lot whether I use this indexer, that indexer, or the other indexer as long as I can make sure I have an SRU, OpenURL, Z39.50, etc. interface to the index. This will always allow me to swap out the an older indexer for a new one as they become available. I am so behind in

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-29 Thread Andrew Nagy
Clay Redding wrote: Hi Andrew (or anyone else that cares to answer), I've missed out on hearing about incompatabilites between MARCXML and NXDBs. Can you explain? Is this just eXist and Sleepycat, or are there others? I seem to recall putting a few records in X-Hive with no problems, but I

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-29 Thread Erik Hatcher
On Nov 29, 2006, at 10:27 AM, Art Rhyno wrote: I am so behind in e-mail that I might be treading on ground that is worn out on this, but I would add to Eric's list that I don't care about the indexer if: Here's how Lucene/Solr fares on these points: * the indexer has an open and configurable

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-29 Thread Kevin S. Clarke
I think this is a data structure problem... MARC is well structured for compact transmission (or was at one point) but not so much for data (re)use (in my opinion). One solution, as Erik has suggested, is to parse the data and build intelligible indices. Another, as Andrew suggests (and which I

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-29 Thread Art Rhyno
For all you Java savvy folks out there, how about standards like J2EE that make it easy to move an application from one vendors app. server to another. Works for the simplest of applications, but all vendors have their own specific custom deployment descriptors too. One thing I really like about

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-29 Thread Binkley, Peter
-Original Message- From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Andrew Nagy Sent: Wednesday, November 29, 2006 8:14 AM To: CODE4LIB@listserv.nd.edu Subject: Re: [CODE4LIB] code4lib lucene pre-conference Clay Redding wrote: Hi Andrew (or anyone else that cares to answer

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-29 Thread Andrew Nagy
Kevin S. Clarke wrote: Fwiw Andrew, I'd suggest you are not seeing the true spirit of your NXDB. Try to put MARC into a RDBMS and you are going to run into the same problem. You have to index intelligently or reorganize the data (which is the default when you put XML into a RDBMS anyway).

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-28 Thread Kevin S. Clarke
On 11/27/06, Ross Singer [EMAIL PROTECTED] wrote: On 11/27/06, Kevin S. Clarke [EMAIL PROTECTED] wrote: Seriously, please don't get hung up on the 'proprietary'-ness of Lucene's query syntax. It's open, it's widely used, and has been ported to a handful of languages. I mean, why would you

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-28 Thread Andrew Nagy
Erik Hatcher wrote: What if games are mostly just guessing games in the high tech world. Agility is the trait our projects need. Software is just that... soft. And malleable. Sure, we can code ourselves into a corner, but generally we can code ourselves right back out of it too. If

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-28 Thread Andrew Nagy
Art Rhyno wrote: I made a big mistake along the way in trying to work with Voyager's call number setup in Oracle, and dragged Ross along in an attempt to get past Oracle's constant quibbles with rogue characters in call number ranges. The idea was to expose the library catalogue as a series of

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-28 Thread Kevin S. Clarke
On 11/28/06, Kevin S. Clarke [EMAIL PROTECTED] wrote: it is just how you pass that along to Lucene that will be different (e.g., do you do it in Java, in Ruby, in XML via Solr, do you do it in XQuery, etc.) By the way, I see a very interesting intersection between Solr and XQuery because both

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-28 Thread Clay Redding
I'm sure most of you have seen this, but there is a lot of good work going on regarding XQuery full text searching by the W3C. LC is pushing a lot of the activity in this group, and using hefty document-centric EAD examples in the testing. http://www.w3.org/TR/xquery-full-text/ FWIW,

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-28 Thread Andrew Nagy
Kevin S. Clarke wrote: By the way, I see a very interesting intersection between Solr and XQuery because both are speaking XML. You may have XQueries that generate the XML that makes Solr do it's magic for instance. This is an alternative to fulltext in XQuery, sure... it is something that is

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-28 Thread Art Rhyno
So you are replacing SQL calls with WebDAV? Can you explain this a bit further? Hi, No, WebDAV is, among other things, an XML representation of a folder structure, and we were using SQL to help build the XML needed for WebDAV support, not replacing one with the other. Voyager stores normalized

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-28 Thread Ross Singer
On 11/28/06, Kevin S. Clarke [EMAIL PROTECTED] wrote: How do you switch to it? How do the pieces talk? This is the point of standards. If there is a standard way of addressing an index then you don't have to care what the newest greatest indexer is. This paragraph seems in contrast to your

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-28 Thread Gabriel Farrell
On Tue, Nov 28, 2006 at 10:27:22AM -0500, Ross Singer wrote: On 11/28/06, Kevin S. Clarke [EMAIL PROTECTED] wrote: How do you switch to it? How do the pieces talk? This is the point of standards. If there is a standard way of addressing an index then you don't have to care what the newest

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-28 Thread Andrew Nagy
Casey Durfee wrote: I thought that was the point of using interfaces? I guess I don't get why you need a standard to be compelled to do something you should be doing anyway -- coding to interfaces, not implementations. Interfaces work well with like products (a database abstraction

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-28 Thread Kevin S. Clarke
In this respect standard just means a programming interface. I'm suggesting using XQuery is like using interfaces in Java (a defined way of accessing something independent of implementation). You could do this in Java (there is an XQJ... I think you can use this independent of a textual XQuery

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-28 Thread Kevin S. Clarke
On 11/28/06, Erik Hatcher [EMAIL PROTECTED] wrote: Is there a standard for specifying how textual analysis works as well, so that tokenization can be standardized across these XQuery engines as well? Not that I know. What I've seen so far is that tokenization is implementation specific.

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-28 Thread Andrew Nagy
Kevin S. Clarke wrote: Have you had a chance yet to evaluate the 1.1 development line? It is supposed to have solved the scaling issues. I haven't tried it myself (and remain skeptical that it can scale up to the level that we talk about with Lucene (but, as you point out, it is trying to do

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-28 Thread Erik Hatcher
On Nov 28, 2006, at 5:44 PM, Kevin S. Clarke wrote: Is there a standard for specifying how textual analysis works as well, so that tokenization can be standardized across these XQuery engines as well? Not that I know. What I've seen so far is that tokenization is implementation specific.

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-28 Thread Erik Hatcher
On Nov 28, 2006, at 3:28 PM, Andrew Nagy wrote: The major problem with it all is the ugly mess that is marcxml This brings up an interesting point about just dropping our source XML data into an XML-savvy database and using XQuery on it. Maybe y'all have much cleaner data that I've seen, but

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-28 Thread Kevin S. Clarke
On 11/28/06, Erik Hatcher [EMAIL PROTECTED] wrote: And if XQuery on your raw data does what you need, by all means I recommend it. Well structured data and a good language for working with XML are two completely different things in my opinion. Even XQuery doesn't make MARCXML a pleasure to

[CODE4LIB] code4lib lucene pre-conference

2006-11-27 Thread Bess Sadler
Since the request I sent out last week, I've received quite a lot of email expressing interest in a lucene pre-conference, but it hasn't been an overwhelming amount. Based on this, I think it's safe to reserve the smaller, wi-fi enabled room that we've been discussing, and to plan for a max of 40

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-27 Thread Edward Summers
On Nov 27, 2006, at 2:37 PM, Bess Sadler wrote: Comments? Suggestions? Sounds great! If you get a chance add a story to http://code4lib.org when things get solidified. //Ed

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-27 Thread Andrew Nagy
Bess Sadler wrote: Enough people are interested in ILS related topics that it might be worth forming groups around specific ILS products. If you are one of these people, email the list if you're interested in setting up such a thing. Bess, this sounds like a great conversation. You can count

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-27 Thread Andrew Nagy
Bess Sadler wrote: Hi, Andrew. Since this will be an all-day event, the session would be starting first thing in the morning on Feb 27. I'm thinking 9am, but I haven't confirmed that with anyone else. I'm just flying by the seat of my pants here. I wouldn't be able to make this then due to

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-27 Thread Binkley, Peter
To: CODE4LIB@listserv.nd.edu Subject: Re: [CODE4LIB] code4lib lucene pre-conference Bess Sadler wrote: application. That way you can use solr / lucene for search, faceted browse, etc, and your XML database only for known item retrieval, which it is generally able to do without performance issues. I'm

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-27 Thread Andrew Nagy
Binkley, Peter wrote: There would probably be a lot of optimizations you could do within Solr to help with this kind of thing. Art and I talked a little about this at the ILS symposium: why not nestle the XML db inside Solr alongside Lucene? Solr could then manage the indexing of the contents

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-27 Thread Binkley, Peter
Subject: Re: [CODE4LIB] code4lib lucene pre-conference Binkley, Peter wrote: There would probably be a lot of optimizations you could do within Solr to help with this kind of thing. Art and I talked a little about this at the ILS symposium: why not nestle the XML db inside Solr alongside Lucene

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-27 Thread Casey Durfee
Lucene has a pretty well-specified search syntax which is unlikely to change all that much, even though it's not a standard. It's not perfect, but I think it's pretty good. Overview here: http://lucene.apache.org/java/docs/queryparsersyntax.html I believe Solr adds a bit to the standard

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-27 Thread Erik Hatcher
On Nov 27, 2006, at 5:46 PM, Binkley, Peter wrote: You've got enough flexibility in the way you set up your Lucene index, and Lucene search results give you access to the term weights for each hit, It does? so you can tell which fields actually matched. You can? I'm curious how you're

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-27 Thread Erik Hatcher
On Nov 27, 2006, at 6:12 PM, Binkley, Peter wrote: Fair point, and that's how my current solr-based project works. I'm thinking I would like the other advantages of an XML db: the ability to run xqueries, batch updates, etc., alongside the Lucene searching. And I want them integrated under the

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-27 Thread Erik Hatcher
On Nov 27, 2006, at 5:04 PM, Jonathan Rochkind wrote: Bess Sadler wrote: application. That way you can use solr / lucene for search, faceted browse, etc, and your XML database only for known item retrieval, which it is generally able to do without performance issues. I'm hopping up and down

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-27 Thread Erik Hatcher
On Nov 27, 2006, at 5:49 PM, Andrew Nagy wrote: My only concern about lucene is the lack of a standard query language. I went down the native XML database path because of XQuery and XSL, does something like lucene and solr offer a strong query language? Is it a standard? What if someone

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-27 Thread Kevin S. Clarke
On 11/27/06, Bess Sadler [EMAIL PROTECTED] wrote: At UVa we have been burned several times by poor search performance of XML native databases. In light of this, we're starting to look at the database and the index as separate but cooperative pieces of the application. That way you can use solr

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-27 Thread Kevin S. Clarke
On 11/27/06, Andrew Nagy [EMAIL PROTECTED] wrote: My only concern about lucene is the lack of a standard query language. I went down the native XML database path because of XQuery and XSL, does something like lucene and solr offer a strong query language? Is it a standard? What if someone

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-27 Thread Art Rhyno
I'd agree... the nice thing about the WebDAV interface is that is works with all these backends! Whether SVN, a native XML database, or a file system... all have WebDAV interfaces. I've been very interested over the years watching Art (and Ross) experimenting with WebDAV and catalog records.