Erik Hatcher wrote:
At this point, I'm planning on winging it with the datasets. By late
February I will have (high on my TODO list now!) built a light-weight
Solr mechanism for bringing in MARC data, and perhaps more (iTunes
data files would make a fun one) and doing simple skinnable front-
Hi Bess,
Put me down for this too. I hadn't made up my mind until I heard you
and Eric and Art on Library Geeks. Lucene and Solr sound really
excellent.
I don't have a specific dataset that I'm working with; will there be
one provided or should I dredge something up? I'm not a systems
librarian
Hi, Tom.
I will put you down. I'm glad you'll be joining us. You don't have to
bring your own data set, but it might be cool if you did. Many people
are bringing Marc XML dumps of part of their library catalogue, for
example, or else another data set with which they work regularly.
It's for
Thom Hickey wrote:
Ralph LeVan here at OCLC has worked on an SRU interface to Lucene.
The combined indexer/SRU approach is the tack I've been taking
regarding search too.
I don't care a whole lot whether I use this indexer, that indexer, or
the other indexer as long as I can make sure I have
I don't care a whole lot whether I use this indexer, that indexer, or
the other indexer as long as I can make sure I have an SRU, OpenURL,
Z39.50, etc. interface to the index. This will always allow me to
swap out the an older indexer for a new one as they become available.
I am so behind in
Clay Redding wrote:
Hi Andrew (or anyone else that cares to answer),
I've missed out on hearing about incompatabilites between MARCXML and
NXDBs. Can you explain? Is this just eXist and Sleepycat, or are
there others? I seem to recall putting a few records in X-Hive with no
problems, but I
On Nov 29, 2006, at 10:27 AM, Art Rhyno wrote:
I am so behind in e-mail that I might be treading on ground that is
worn
out on this, but I would add to Eric's list that I don't care about
the
indexer if:
Here's how Lucene/Solr fares on these points:
* the indexer has an open and configurable
I think this is a data structure problem... MARC is well structured
for compact transmission (or was at one point) but not so much for
data (re)use (in my opinion).
One solution, as Erik has suggested, is to parse the data and build
intelligible indices. Another, as Andrew suggests (and which I
For all you Java savvy folks out there, how about standards like
J2EE that make it easy to move an application from one vendors app.
server to another. Works for the simplest of applications, but all
vendors have their own specific custom deployment descriptors too.
One thing I really like about
-Original Message-
From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
Andrew Nagy
Sent: Wednesday, November 29, 2006 8:14 AM
To: CODE4LIB@listserv.nd.edu
Subject: Re: [CODE4LIB] code4lib lucene pre-conference
Clay Redding wrote:
Hi Andrew (or anyone else that cares to answer
Kevin S. Clarke wrote:
Fwiw Andrew, I'd suggest you are not seeing the true spirit of your
NXDB. Try to put MARC into a RDBMS and you are going to run into the
same problem. You have to index intelligently or reorganize the data
(which is the default when you put XML into a RDBMS anyway).
On 11/27/06, Ross Singer [EMAIL PROTECTED] wrote:
On 11/27/06, Kevin S. Clarke [EMAIL PROTECTED] wrote:
Seriously, please don't get hung up on the 'proprietary'-ness of
Lucene's query syntax. It's open, it's widely used, and has been
ported to a handful of languages. I mean, why would you
Erik Hatcher wrote:
What if games are mostly just guessing games in the high tech
world. Agility is the trait our projects need. Software is just
that... soft. And malleable. Sure, we can code ourselves into a
corner, but generally we can code ourselves right back out of it
too. If
Art Rhyno wrote:
I made a big mistake along the way in trying to work with Voyager's call
number setup in Oracle, and dragged Ross along in an attempt to get past
Oracle's constant quibbles with rogue characters in call number ranges.
The idea was to expose the library catalogue as a series of
On 11/28/06, Kevin S. Clarke [EMAIL PROTECTED] wrote:
it is just how you pass that along to Lucene that will be
different (e.g., do you do it in Java, in Ruby, in XML via Solr, do
you do it in XQuery, etc.)
By the way, I see a very interesting intersection between Solr and
XQuery because both
I'm sure most of you have seen this, but there is a lot of good work
going on regarding XQuery full text searching by the W3C. LC is pushing
a lot of the activity in this group, and using hefty document-centric
EAD examples in the testing.
http://www.w3.org/TR/xquery-full-text/
FWIW,
Kevin S. Clarke wrote:
By the way, I see a very interesting intersection between Solr and
XQuery because both are speaking XML. You may have XQueries that
generate the XML that makes Solr do it's magic for instance. This is
an alternative to fulltext in XQuery, sure... it is something that is
So you are replacing SQL calls with WebDAV? Can you explain this a bit
further?
Hi,
No, WebDAV is, among other things, an XML representation of a folder
structure, and we were using SQL to help build the XML needed for WebDAV
support, not replacing one with the other. Voyager stores normalized
On 11/28/06, Kevin S. Clarke [EMAIL PROTECTED] wrote:
How do you switch to it? How do the pieces talk? This is the point
of standards. If there is a standard way of addressing an index then
you don't have to care what the newest greatest indexer is. This
paragraph seems in contrast to your
On Tue, Nov 28, 2006 at 10:27:22AM -0500, Ross Singer wrote:
On 11/28/06, Kevin S. Clarke [EMAIL PROTECTED] wrote:
How do you switch to it? How do the pieces talk? This is the point
of standards. If there is a standard way of addressing an index then
you don't have to care what the newest
Casey Durfee wrote:
I thought that was the point of using interfaces? I guess I don't get why you
need a standard to be compelled to do something you should be doing anyway --
coding to interfaces, not implementations.
Interfaces work well with like products (a database abstraction
In this respect standard just means a programming interface. I'm
suggesting using XQuery is like using interfaces in Java (a defined
way of accessing something independent of implementation). You could
do this in Java (there is an XQJ... I think you can use this
independent of a textual XQuery
On 11/28/06, Erik Hatcher [EMAIL PROTECTED] wrote:
Is there a standard for specifying how textual analysis works as
well, so that tokenization can be standardized across these XQuery
engines as well?
Not that I know. What I've seen so far is that tokenization is
implementation specific.
Kevin S. Clarke wrote:
Have you had a chance yet to evaluate the 1.1 development line? It is
supposed to have solved the scaling issues. I haven't tried it myself
(and remain skeptical that it can scale up to the level that we talk
about with Lucene (but, as you point out, it is trying to do
On Nov 28, 2006, at 5:44 PM, Kevin S. Clarke wrote:
Is there a standard for specifying how textual analysis works as
well, so that tokenization can be standardized across these XQuery
engines as well?
Not that I know. What I've seen so far is that tokenization is
implementation specific.
On Nov 28, 2006, at 3:28 PM, Andrew Nagy wrote:
The major problem
with it all is the ugly mess that is marcxml
This brings up an interesting point about just dropping our source
XML data into an XML-savvy database and using XQuery on it.
Maybe y'all have much cleaner data that I've seen, but
On 11/28/06, Erik Hatcher [EMAIL PROTECTED] wrote:
And if XQuery on your raw data does what you
need, by all means I recommend it.
Well structured data and a good language for working with XML are two
completely different things in my opinion. Even XQuery doesn't make
MARCXML a pleasure to
Since the request I sent out last week, I've received quite a lot of
email expressing interest in a lucene pre-conference, but it hasn't
been an overwhelming amount. Based on this, I think it's safe to
reserve the smaller, wi-fi enabled room that we've been discussing,
and to plan for a max of 40
On Nov 27, 2006, at 2:37 PM, Bess Sadler wrote:
Comments? Suggestions?
Sounds great! If you get a chance add a story to http://code4lib.org
when things get solidified.
//Ed
Bess Sadler wrote:
Enough people are interested in ILS related topics that it might be
worth forming groups around specific ILS products. If you are one of
these people, email the list if you're interested in setting up such
a thing.
Bess, this sounds like a great conversation. You can count
Bess Sadler wrote:
Hi, Andrew. Since this will be an all-day event, the session would be
starting first thing in the morning on Feb 27. I'm thinking 9am, but
I haven't confirmed that with anyone else. I'm just flying by the
seat of my pants here.
I wouldn't be able to make this then due to
To: CODE4LIB@listserv.nd.edu
Subject: Re: [CODE4LIB] code4lib lucene pre-conference
Bess Sadler wrote:
application. That way you can use solr / lucene for search, faceted
browse, etc, and your XML database only for known item retrieval,
which it is generally able to do without performance issues. I'm
Binkley, Peter wrote:
There would probably be a lot of optimizations you could do within Solr
to help with this kind of thing. Art and I talked a little about this at
the ILS symposium: why not nestle the XML db inside Solr alongside
Lucene? Solr could then manage the indexing of the contents
Subject: Re: [CODE4LIB] code4lib lucene pre-conference
Binkley, Peter wrote:
There would probably be a lot of optimizations you could do within Solr
to help with this kind of thing. Art and I talked a little about this
at the ILS symposium: why not nestle the XML db inside Solr alongside
Lucene
Lucene has a pretty well-specified search syntax which is unlikely to change
all that much, even though it's not a standard. It's not perfect, but I think
it's pretty good. Overview here:
http://lucene.apache.org/java/docs/queryparsersyntax.html
I believe Solr adds a bit to the standard
On Nov 27, 2006, at 5:46 PM, Binkley, Peter wrote:
You've got enough flexibility in the way you
set up your Lucene index, and Lucene search results give you access to
the term weights for each hit,
It does?
so you can tell which fields actually
matched.
You can?
I'm curious how you're
On Nov 27, 2006, at 6:12 PM, Binkley, Peter wrote:
Fair point, and that's how my current solr-based project works. I'm
thinking I would like the other advantages of an XML db: the
ability to
run xqueries, batch updates, etc., alongside the Lucene searching.
And I
want them integrated under the
On Nov 27, 2006, at 5:04 PM, Jonathan Rochkind wrote:
Bess Sadler wrote:
application. That way you can use solr / lucene for search, faceted
browse, etc, and your XML database only for known item retrieval,
which it is generally able to do without performance issues. I'm
hopping up and down
On Nov 27, 2006, at 5:49 PM, Andrew Nagy wrote:
My only concern about lucene is the lack of a standard query language.
I went down the native XML database path because of XQuery and XSL,
does
something like lucene and solr offer a strong query language? Is it a
standard? What if someone
On 11/27/06, Bess Sadler [EMAIL PROTECTED] wrote:
At UVa we have been burned several times by poor search performance
of XML native databases. In light of this, we're starting to look at
the database and the index as separate but cooperative pieces of the
application. That way you can use solr
On 11/27/06, Andrew Nagy [EMAIL PROTECTED] wrote:
My only concern about lucene is the lack of a standard query language.
I went down the native XML database path because of XQuery and XSL, does
something like lucene and solr offer a strong query language? Is it a
standard? What if someone
I'd agree... the nice thing about the WebDAV interface is that is
works with all these backends! Whether SVN, a native XML database, or
a file system... all have WebDAV interfaces. I've been very
interested over the years watching Art (and Ross) experimenting with
WebDAV and catalog records.
42 matches
Mail list logo