Another point is that I can have many accounts with different names. The biggest example is Active Directory and UNIX. Traditional UNIX requirements stipulate an 8-character logon ID. However, my friends from India have clealy proven that AD blows such restrictions out of the water. So, I might have 100 UNIX accounts and one AD account. The search query should give me two results. One matching my UNIX ID, and one matching my AD account. Alternatively, I might also have an AS/400 (or other) account that matches my UNIX account name. In that case, I'd again have two results--as it is grouping by the account_name field/attribute.
Thanks again :) On May 15, 12:35 pm, aitrus <[email protected]> wrote: > Hi Pat, > > Thanks again for your work on TS. Sorry, I get worked up easily. To > answer your question, first: > > I'm doing some data warehouse-ish applications. I pull in lots of > data from various systems. Then I use things like account names, > group names, resource names, host names, etc., to find unique records. > > When it comes to grouping, I have an association setup of Personnel/ > Divisions <-- ownership --> Accounts. > > A person can have many accounts. However, each person has only one > personnel record. If I render a search in Sphinx, it paginates the > Personnel records--then if I try to display accounts, the pagination > is very strange. > > So, I needed to do a search in Sphinx on the Accounts table, (a) > eliminating duplicate account names, and (b) eliminating accounts with > no owner (took some digging to figure out I need to have a "has" > attribute). > > The way I eventually got this to work (after much whiskey and self- > mutilation) is to setup: > > has staffs(:id), :as => :has_staffs, :type > => :integer > has ["LOWER(`accounts`.`name`)"], :as => :sort_account_name, :type > => :string > > in my define index. Then I run the following sphinx search: > > @staff_results = Account.search query, :conditions => > conditions, :page => params[:page], > :group_function => :attr, :group_by => > "sort_account_name", > :group_clause => sort, :without => > {:has_staffs => 0} > > Which solves my biggest problem. I still have the issue that one > account can have many owners--but I have not begun that work. I also > just noticed, after reviewing some logs, that if ":sortable => true" > is enabled, you create a "<column>_sort" attribute. I haven't tried > using this in the above "group_by" entry, yet. > > The biggest use of Sphinx (for me) is that it lets me minimize the > size of my MySQL indexes (thus speeding up MySQL), and instead uses > Sphinx to quickly crawl text fields. For example, a unique unix > account could be described as an account (case-sensitive) per server. > There's several platforms/accounts being warehoused. My account > database has 634,000 records. A mysql search for this account would > be ungodly, since InnoDB lacks fulltext indexing. etc. > > Another issue I've had is figuring out that I needed to setup the > Charset Table for Sphinx, so it would index various special > characters--some user/group/resource names can have those tucked > away. Of special note are @ (at-sign), $ (dollar-sign), #(hash/pound- > symbol), and parenthesis, period, hyphen, underscore, etc. > > I solved that in the sphinx.yml and it looks like: > > development: > morphology: "none" > charset_table: "0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U > +44F, U+430..U+44F, U+0024, $, @, *, ., -, U+0028, (, U+0029, ), \"# > \"" > > I'm saying some things you probably already know--but I'm hoping > google indexes my post and saves other developers from the > psychological trauma that I experienced. > > I'll be using Sphinx also as part of a web page, but every search term > will be literal--there's not much use for wordlists, stemming, etc, > even in that situation. > > Hope this gives good insight into my experience. As for how to > notify, that would be a question of how Rails plugin / gem install > stuff works. > > My first question would be if you can issue a notice on screen when > you first install a plugin. Or if "gem install" lets you output > something, similar to a license agreement. > > If there's no easy, verbose way to do it--then I think you should have > the next update look for a "sphinx.yml" file. If it doesn't exist, > create it with a boiler plate and have your current defaults remain > the default. But below them, comment out a line that overrides it. > > Another way is to intentionally break the existing plugin-install url > for Sphinx--so people have to go look at your webpage and pay > attention. > > I can think of more ideas. I'd be happy to contribute to TS, but I'm > still new to Ruby/Rails (coming from Perl) and I want to avoid the > risk of committing bad code. > > I wrote a lot :( Thank you. > > On May 14, 11:58 pm, Pat Allan <[email protected]> wrote: > > > > > Fair points, even if you're a little worked up about it. > > > When I was last doing some refactoring of the TS Configuration class, > > I considered removing the default morphology, but didn't because > > people were already using TS working on the (yes, barely documented) > > assumption that it *is* the default. > > > So, I agree about the default being nothing, and people set it if they > > want one.. but how to we deprecate it cleanly? Beyond just removing > > it, which is easy to do, but a warning would be nice, except we don't > > want that warning appearing *every* time ts:in is run, or something > > like that. > > > Suggestions welcome. > > > Also, re: your grouping issue, care to elaborate? > > > -- > > Pat > > > On 14/05/2009, at 12:04 PM, aitrus wrote: > > > > Pat, I love Thinking Sphinx and I appreciate everything you've done > > > for Rails. > > > > Having said that.... for the love of god, please don't set defaults > > > like this. I didn't even know what was going on. I'm doing an import > > > on hundreds of thousands of records and the full-text search of Sphinx > > > makes this so much faster. > > > > But apparently you're setting the morphology to "stem_en" as a > > > default. I can't find anything about this behavior and it took me > > > forever to figure out that this was the actual issue. I have spent > > > hours trying to figure out why "AB0E" also matched "AB0S". In fact, I > > > didn't even realize this was an issue until after I developed > > > everything, and began to QA my records. > > > > Sweet jesus :( Please organize this in a way that is either obvious > > > or painstakingly documented. > > > > I had another issue with TS, where I was trying to group results based > > > on certain columns (via has_many and h_m:through). Such a nightmare. > > > > I really appreciate your work, but there needs to be some kind of > > > emphasis on documenting various assumptions before implementation. Or > > > maybe, at least, just have: > > > > rake ts:in --no-stems > > > > Sigh. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/thinking-sphinx?hl=en -~----------~----~----~----~------~----~------~--~---
