[SEAPY] October Meeting, thank you

jentzen mooney Wed, 19 Oct 2011 07:47:50 -0700

I saw on the meeting notes page that Wesley Chun will be posting his slides 
from his talk soon.
I just wanted to say Thank you to Wesley Chun, for speaking and sharing his 
slides.

:)
I hope it was a good meeting.
-Jentzen

________________________________
From: Casey Durfee <[email protected]>
To: Seattle Python Interest Group <[email protected]>
Sent: Monday, October 17, 2011 5:34 PM
Subject: Re: [SEAPY] solr django and python recommendations

I've done several big projects (tens of millions of documents) with Solr and 
Python.

I think it's one of those things where if you're not familiar with Solr, you're 
better off doing things by hand first as a way to learn how it works.  And if 
you are doing something complex or know Solr really well, you might find a 
specific library more trouble than it's worth.

The main issues I encounter with Solr are:

1. Designing a good Solr data model
2. Detecting changed records in the DB and updating Solr efficiently
3. Dealing with Solr's rules about stopwords, stemming, tokenizing, etc. of 
search terms and text and getting the right combo of them for the problem 
you're trying to solve
4. Getting relevancy ranking to be good (tuning the weighting between different 
query fields in Solr, and/or re-sorting results based on database attributes 
after you get an initial rough result set from Solr.)
5. Massaging funky or incomplete data you want to index, munging character 
sets, etc.

A client library isn't really going to solve any of those for you, I don't 
think (except maybe #1 and #2, and probably not that well).  They might help 
you get to a working solution marginally faster, but at the cost of you having 
to go back and learn Solr anyway if you what it gives you automagically isn't 
good enough in any number of ways. 

Jython would probably only come into play in my book if you wanted to write a 
tokenizer/filter/query analyzer for Solr to use and needed to plug into someone 
else's Java code as a part of that.  Since you can pre-process the data before 
you send it to Solr, however, I've never had occasion to write a Solr plugin 
directly.  It makes more sense to me to have all of your indexing logic in one 
place -- meaning do as much massaging as possible before sending data to Solr, 
instead of half of your logic in shell scripts and half in server-side Solr 
plugins.  

Given how efficient Solr is at this point, and how much hassle it saves you 
overall, I don't see a big advantage in interfacing directly with (Py)Lucene.

--Casey

On Sat, Oct 15, 2011 at 10:53 AM, Christopher Bare <[email protected]> 
wrote:

Hi Pythonistas,
>
>Does anyone have experience accessing a Solr search engine from
>Python? There are several bindings out there, so if anyone has a
>recommendation, I'd appreciate it.
>
>Our needs are probably on the lighter end of the spectrum: moderate
>traffic, tens of thousands building to hundreds of thousands of search
>terms over time. Infrequent updates, accesses are mostly read.
>
>I looked briefly at Haystack and wasn't too excited by it. Too much
>"automagic" stuff going on. Plus, I like the idea of defining my own
>Solr schema, rather than directly mapping Django models into Solr.
>Sunburt looks pretty good, at first glance.
>
>Any hints would be appreciated. Thanks!
>
>- Chris
>

[SEAPY] October Meeting, thank you

Reply via email to