I saw on the meeting notes page that Wesley Chun will be posting his slides from his talk soon. I just wanted to say Thank you to Wesley Chun, for speaking and sharing his slides.
:) I hope it was a good meeting. -Jentzen ________________________________ From: Casey Durfee <[email protected]> To: Seattle Python Interest Group <[email protected]> Sent: Monday, October 17, 2011 5:34 PM Subject: Re: [SEAPY] solr django and python recommendations I've done several big projects (tens of millions of documents) with Solr and Python. I think it's one of those things where if you're not familiar with Solr, you're better off doing things by hand first as a way to learn how it works. And if you are doing something complex or know Solr really well, you might find a specific library more trouble than it's worth. The main issues I encounter with Solr are: 1. Designing a good Solr data model 2. Detecting changed records in the DB and updating Solr efficiently 3. Dealing with Solr's rules about stopwords, stemming, tokenizing, etc. of search terms and text and getting the right combo of them for the problem you're trying to solve 4. Getting relevancy ranking to be good (tuning the weighting between different query fields in Solr, and/or re-sorting results based on database attributes after you get an initial rough result set from Solr.) 5. Massaging funky or incomplete data you want to index, munging character sets, etc. A client library isn't really going to solve any of those for you, I don't think (except maybe #1 and #2, and probably not that well). They might help you get to a working solution marginally faster, but at the cost of you having to go back and learn Solr anyway if you what it gives you automagically isn't good enough in any number of ways. Jython would probably only come into play in my book if you wanted to write a tokenizer/filter/query analyzer for Solr to use and needed to plug into someone else's Java code as a part of that. Since you can pre-process the data before you send it to Solr, however, I've never had occasion to write a Solr plugin directly. It makes more sense to me to have all of your indexing logic in one place -- meaning do as much massaging as possible before sending data to Solr, instead of half of your logic in shell scripts and half in server-side Solr plugins. Given how efficient Solr is at this point, and how much hassle it saves you overall, I don't see a big advantage in interfacing directly with (Py)Lucene. --Casey On Sat, Oct 15, 2011 at 10:53 AM, Christopher Bare <[email protected]> wrote: Hi Pythonistas, > >Does anyone have experience accessing a Solr search engine from >Python? There are several bindings out there, so if anyone has a >recommendation, I'd appreciate it. > >Our needs are probably on the lighter end of the spectrum: moderate >traffic, tens of thousands building to hundreds of thousands of search >terms over time. Infrequent updates, accesses are mostly read. > >I looked briefly at Haystack and wasn't too excited by it. Too much >"automagic" stuff going on. Plus, I like the idea of defining my own >Solr schema, rather than directly mapping Django models into Solr. >Sunburt looks pretty good, at first glance. > >Any hints would be appreciated. Thanks! > >- Chris >
