Sorry for the confusion Jim. I'll update the documents to remove the mention of the default stemming.
-- Pat On 21/07/2009, at 3:35 PM, jim wrote: > > aaaaaaaaaaaaaahhhhhhhhhhhhhhhh !!!!!!!!!!!!!!! :) > > On May 27, 5:15 pm, Pat Allan <[email protected]> wrote: >> Okay, I've made the change. Anyone now installing Thinking Sphinx via >> plugin or gem gets a warning, and morphology has a default of nil. >> >> I'll remove the warning at some point, maybe in a couple of months. >> >> Cheers >> >> -- >> Pat >> >> On 17/05/2009, at 10:08 PM, Pat Allan wrote: >> >> >> >>> You did indeed write a lot, but that's okay, provides a more >>> thorough >>> understanding. >> >>> Grouping is probably the best way to do what you've done for the >>> account names - so your approach seems right to me (even though it's >>> not a perfect solution). As much as I can comprehend the problem at >>> the moment, anyway :) >> >>> As for alerting people to the removal of the default morphology, I >>> like the idea of having messages when the plugin or gem is installed >>> (and both of those should be doable, I'm almost certain). >> >>> If you want to have a go at forking and patching, be my guest - for >>> plugins, I think PLUGIN_ROOT/install.rb is what should hold code >>> that >>> gets run on installation (it might be housed under PLUGIN_ROOT/ >>> rails/ >>> install.rb since Rails 2.1). No idea what the process is for gems, >>> but >>> the rspec gem outputs a message, so TS should be able to as well. >> >>> Otherwise, when I have the time and motivation, I'll attempt it >>> myself >>> - which is fine by me, but don't be afraid to give it a shot >>> yourself. >> >>> Cheers >> >>> -- >>> Pat >> >>> On 15/05/2009, at 12:35 PM, aitrus wrote: >> >>>> Hi Pat, >> >>>> Thanks again for your work on TS. Sorry, I get worked up >>>> easily. To >>>> answer your question, first: >> >>>> I'm doing some data warehouse-ish applications. I pull in lots of >>>> data from various systems. Then I use things like account names, >>>> group names, resource names, host names, etc., to find unique >>>> records. >> >>>> When it comes to grouping, I have an association setup of >>>> Personnel/ >>>> Divisions <-- ownership --> Accounts. >> >>>> A person can have many accounts. However, each person has only one >>>> personnel record. If I render a search in Sphinx, it paginates the >>>> Personnel records--then if I try to display accounts, the >>>> pagination >>>> is very strange. >> >>>> So, I needed to do a search in Sphinx on the Accounts table, (a) >>>> eliminating duplicate account names, and (b) eliminating accounts >>>> with >>>> no owner (took some digging to figure out I need to have a "has" >>>> attribute). >> >>>> The way I eventually got this to work (after much whiskey and self- >>>> mutilation) is to setup: >> >>>> has staffs(:id), :as => :has_staffs, :type >>>> => :integer >>>> has ["LOWER(`accounts`.`name`)"], :as => :sort_account_name, :type >>>> => :string >> >>>> in my define index. Then I run the following sphinx search: >> >>>> @staff_results = Account.search query, :conditions => >>>> conditions, :page => params[:page], >>>> :group_function => :attr, :group_by => >>>> "sort_account_name", >>>> :group_clause => sort, :without => >>>> {:has_staffs => 0} >> >>>> Which solves my biggest problem. I still have the issue that one >>>> account can have many owners--but I have not begun that work. I >>>> also >>>> just noticed, after reviewing some logs, that if ":sortable => >>>> true" >>>> is enabled, you create a "<column>_sort" attribute. I haven't >>>> tried >>>> using this in the above "group_by" entry, yet. >> >>>> The biggest use of Sphinx (for me) is that it lets me minimize the >>>> size of my MySQL indexes (thus speeding up MySQL), and instead uses >>>> Sphinx to quickly crawl text fields. For example, a unique unix >>>> account could be described as an account (case-sensitive) per >>>> server. >>>> There's several platforms/accounts being warehoused. My account >>>> database has 634,000 records. A mysql search for this account >>>> would >>>> be ungodly, since InnoDB lacks fulltext indexing. etc. >> >>>> Another issue I've had is figuring out that I needed to setup the >>>> Charset Table for Sphinx, so it would index various special >>>> characters--some user/group/resource names can have those tucked >>>> away. Of special note are @ (at-sign), $ (dollar-sign), #(hash/ >>>> pound- >>>> symbol), and parenthesis, period, hyphen, underscore, etc. >> >>>> I solved that in the sphinx.yml and it looks like: >> >>>> development: >>>> morphology: "none" >>>> charset_table: "0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U >>>> +44F, U+430..U+44F, U+0024, $, @, *, ., -, U+0028, (, U+0029, ), >>>> \"# >>>> \"" >> >>>> I'm saying some things you probably already know--but I'm hoping >>>> google indexes my post and saves other developers from the >>>> psychological trauma that I experienced. >> >>>> I'll be using Sphinx also as part of a web page, but every search >>>> term >>>> will be literal--there's not much use for wordlists, stemming, etc, >>>> even in that situation. >> >>>> Hope this gives good insight into my experience. As for how to >>>> notify, that would be a question of how Rails plugin / gem install >>>> stuff works. >> >>>> My first question would be if you can issue a notice on screen when >>>> you first install a plugin. Or if "gem install" lets you output >>>> something, similar to a license agreement. >> >>>> If there's no easy, verbose way to do it--then I think you should >>>> have >>>> the next update look for a "sphinx.yml" file. If it doesn't exist, >>>> create it with a boiler plate and have your current defaults remain >>>> the default. But below them, comment out a line that overrides it. >> >>>> Another way is to intentionally break the existing plugin-install >>>> url >>>> for Sphinx--so people have to go look at your webpage and pay >>>> attention. >> >>>> I can think of more ideas. I'd be happy to contribute to TS, but >>>> I'm >>>> still new to Ruby/Rails (coming from Perl) and I want to avoid the >>>> risk of committing bad code. >> >>>> I wrote a lot :( Thank you. >> >>>> On May 14, 11:58 pm, Pat Allan <[email protected]> wrote: >>>>> Fair points, even if you're a little worked up about it. >> >>>>> When I was last doing some refactoring of the TS Configuration >>>>> class, >>>>> I considered removing the default morphology, but didn't because >>>>> people were already using TS working on the (yes, barely >>>>> documented) >>>>> assumption that it *is* the default. >> >>>>> So, I agree about the default being nothing, and people set it if >>>>> they >>>>> want one.. but how to we deprecate it cleanly? Beyond just >>>>> removing >>>>> it, which is easy to do, but a warning would be nice, except we >>>>> don't >>>>> want that warning appearing *every* time ts:in is run, or >>>>> something >>>>> like that. >> >>>>> Suggestions welcome. >> >>>>> Also, re: your grouping issue, care to elaborate? >> >>>>> -- >>>>> Pat >> >>>>> On 14/05/2009, at 12:04 PM, aitrus wrote: >> >>>>>> Pat, I love Thinking Sphinx and I appreciate everything you've >>>>>> done >>>>>> for Rails. >> >>>>>> Having said that.... for the love of god, please don't set >>>>>> defaults >>>>>> like this. I didn't even know what was going on. I'm doing an >>>>>> import >>>>>> on hundreds of thousands of records and the full-text search of >>>>>> Sphinx >>>>>> makes this so much faster. >> >>>>>> But apparently you're setting the morphology to "stem_en" as a >>>>>> default. I can't find anything about this behavior and it took >>>>>> me >>>>>> forever to figure out that this was the actual issue. I have >>>>>> spent >>>>>> hours trying to figure out why "AB0E" also matched "AB0S". In >>>>>> fact, I >>>>>> didn't even realize this was an issue until after I developed >>>>>> everything, and began to QA my records. >> >>>>>> Sweet jesus :( Please organize this in a way that is either >>>>>> obvious >>>>>> or painstakingly documented. >> >>>>>> I had another issue with TS, where I was trying to group results >>>>>> based >>>>>> on certain columns (via has_many and h_m:through). Such a >>>>>> nightmare. >> >>>>>> I really appreciate your work, but there needs to be some kind of >>>>>> emphasis on documenting various assumptions before >>>>>> implementation. Or >>>>>> maybe, at least, just have: >> >>>>>> rake ts:in --no-stems >> >>>>>> Sigh. >> >> > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/thinking-sphinx?hl=en -~----------~----~----~----~------~----~------~--~---
