I've done several big projects (tens of millions of documents) with Solr and
Python.

I think it's one of those things where if you're not familiar with Solr,
you're better off doing things by hand first as a way to learn how it works.
 And if you are doing something complex or know Solr really well, you might
find a specific library more trouble than it's worth.

The main issues I encounter with Solr are:

1. Designing a good Solr data model
2. Detecting changed records in the DB and updating Solr efficiently
3. Dealing with Solr's rules about stopwords, stemming, tokenizing, etc. of
search terms and text and getting the right combo of them for the problem
you're trying to solve
4. Getting relevancy ranking to be good (tuning the weighting between
different query fields in Solr, and/or re-sorting results based on database
attributes after you get an initial rough result set from Solr.)
5. Massaging funky or incomplete data you want to index, munging character
sets, etc.

A client library isn't really going to solve any of those for you, I don't
think (except maybe #1 and #2, and probably not that well).  They might help
you get to a working solution marginally faster, but at the cost of you
having to go back and learn Solr anyway if you what it gives you
automagically isn't good enough in any number of ways.

Jython would probably only come into play in my book if you wanted to write
a tokenizer/filter/query analyzer for Solr to use and needed to plug into
someone else's Java code as a part of that.  Since you can pre-process the
data before you send it to Solr, however, I've never had occasion to write a
Solr plugin directly.  It makes more sense to me to have all of your
indexing logic in one place -- meaning do as much massaging as possible
before sending data to Solr, instead of half of your logic in shell scripts
and half in server-side Solr plugins.

Given how efficient Solr is at this point, and how much hassle it saves you
overall, I don't see a big advantage in interfacing directly with
(Py)Lucene.


--Casey

On Sat, Oct 15, 2011 at 10:53 AM, Christopher Bare <
[email protected]> wrote:

> Hi Pythonistas,
>
> Does anyone have experience accessing a Solr search engine from
> Python? There are several bindings out there, so if anyone has a
> recommendation, I'd appreciate it.
>
> Our needs are probably on the lighter end of the spectrum: moderate
> traffic, tens of thousands building to hundreds of thousands of search
> terms over time. Infrequent updates, accesses are mostly read.
>
> I looked briefly at Haystack and wasn't too excited by it. Too much
> "automagic" stuff going on. Plus, I like the idea of defining my own
> Solr schema, rather than directly mapping Django models into Solr.
> Sunburt looks pretty good, at first glance.
>
> Any hints would be appreciated. Thanks!
>
> - Chris
>

Reply via email to