Another point is that I can have many accounts with different names.
The biggest example is Active Directory and UNIX.  Traditional UNIX
requirements stipulate an 8-character logon ID.  However, my friends
from India have clealy proven that AD blows such restrictions out of
the water.  So, I might have 100 UNIX accounts and one AD account.
The search query should give me two results.  One matching my UNIX ID,
and one matching my AD account.  Alternatively, I might also have an
AS/400 (or other) account that matches my UNIX account name.  In that
case, I'd again have two results--as it is grouping by the
account_name field/attribute.

Thanks again :)

On May 15, 12:35 pm, aitrus <[email protected]> wrote:
> Hi Pat,
>
> Thanks again for your work on TS.  Sorry, I get worked up easily.  To
> answer your question, first:
>
> I'm doing some data warehouse-ish applications.  I pull in lots of
> data from various systems.  Then I use things like account names,
> group names, resource names, host names, etc., to find unique records.
>
> When it comes to grouping, I have an association setup of Personnel/
> Divisions <-- ownership --> Accounts.
>
> A person can have many accounts.  However, each person has only one
> personnel record.  If I render a search in Sphinx, it paginates the
> Personnel records--then if I try to display accounts, the pagination
> is very strange.
>
> So, I needed to do a search in Sphinx on the Accounts table, (a)
> eliminating duplicate account names, and (b) eliminating accounts with
> no owner (took some digging to figure out I need to have a "has"
> attribute).
>
> The way I eventually got this to work (after much whiskey and self-
> mutilation) is to setup:
>
> has staffs(:id),                  :as => :has_staffs,         :type
> => :integer
> has ["LOWER(`accounts`.`name`)"], :as => :sort_account_name,  :type
> => :string
>
> in my define index.  Then I run the following sphinx search:
>
>       @staff_results =  Account.search query,     :conditions =>
> conditions,  :page => params[:page],
>                         :group_function => :attr, :group_by =>
> "sort_account_name",
>                         :group_clause => sort,    :without =>
> {:has_staffs => 0}
>
> Which solves my biggest problem.  I still have the issue that one
> account can have many owners--but I have not begun that work.  I also
> just noticed, after reviewing some logs, that if ":sortable => true"
> is enabled, you create a "<column>_sort" attribute.  I haven't tried
> using this in the above "group_by" entry, yet.
>
> The biggest use of Sphinx (for me) is that it lets me minimize the
> size of my MySQL indexes (thus speeding up MySQL), and instead uses
> Sphinx to quickly crawl text fields.  For example, a unique unix
> account could be described as an account (case-sensitive) per server.
> There's several platforms/accounts being warehoused.  My account
> database has 634,000 records.  A mysql search for this account would
> be ungodly, since InnoDB lacks fulltext indexing.  etc.
>
> Another issue I've had is figuring out that I needed to setup the
> Charset Table for Sphinx, so it would index various special
> characters--some user/group/resource names can have those tucked
> away.  Of special note are @ (at-sign), $ (dollar-sign), #(hash/pound-
> symbol), and parenthesis, period, hyphen, underscore, etc.
>
> I solved that in the sphinx.yml and it looks like:
>
> development:
>   morphology: "none"
>   charset_table: "0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U
> +44F, U+430..U+44F, U+0024, $, @, *, ., -, U+0028, (, U+0029, ), \"#
> \""
>
> I'm saying some things you probably already know--but I'm hoping
> google indexes my post and saves other developers from the
> psychological trauma that I experienced.
>
> I'll be using Sphinx also as part of a web page, but every search term
> will be literal--there's not much use for wordlists, stemming, etc,
> even in that situation.
>
> Hope this gives good insight into my experience.  As for how to
> notify, that would be a question of how Rails plugin / gem install
> stuff works.
>
> My first question would be if you can issue a notice on screen when
> you first install a plugin.  Or if "gem install" lets you output
> something, similar to a license agreement.
>
> If there's no easy, verbose way to do it--then I think you should have
> the next update look for a "sphinx.yml" file.  If it doesn't exist,
> create it with a boiler plate and have your current defaults remain
> the default.  But below them, comment out a line that overrides it.
>
> Another way is to intentionally break the existing plugin-install url
> for Sphinx--so people have to go look at your webpage and pay
> attention.
>
> I can think of more ideas.  I'd be happy to contribute to TS, but I'm
> still new to Ruby/Rails (coming from Perl) and I want to avoid the
> risk of committing bad code.
>
> I wrote a lot :(  Thank you.
>
> On May 14, 11:58 pm, Pat Allan <[email protected]> wrote:
>
>
>
> > Fair points, even if you're a little worked up about it.
>
> > When I was last doing some refactoring of the TS Configuration class,  
> > I considered removing the default morphology, but didn't because  
> > people were already using TS working on the (yes, barely documented)  
> > assumption that it *is* the default.
>
> > So, I agree about the default being nothing, and people set it if they  
> > want one.. but how to we deprecate it cleanly? Beyond just removing  
> > it, which is easy to do, but a warning would be nice, except we don't  
> > want that warning appearing *every* time ts:in is run, or something  
> > like that.
>
> > Suggestions welcome.
>
> > Also, re: your grouping issue, care to elaborate?
>
> > --
> > Pat
>
> > On 14/05/2009, at 12:04 PM, aitrus wrote:
>
> > > Pat, I love Thinking Sphinx and I appreciate everything you've done
> > > for Rails.
>
> > > Having said that.... for the love of god, please don't set defaults
> > > like this.  I didn't even know what was going on.  I'm doing an import
> > > on hundreds of thousands of records and the full-text search of Sphinx
> > > makes this so much faster.
>
> > > But apparently you're setting the morphology to "stem_en" as a
> > > default.  I can't find anything about this behavior and it took me
> > > forever to figure out that this was the actual issue.  I have spent
> > > hours trying to figure out why "AB0E" also matched "AB0S".  In fact, I
> > > didn't even realize this was an issue until after I developed
> > > everything, and began to QA my records.
>
> > > Sweet jesus :(  Please organize this in a way that is either obvious
> > > or painstakingly documented.
>
> > > I had another issue with TS, where I was trying to group results based
> > > on certain columns (via has_many and h_m:through).  Such a nightmare.
>
> > > I really appreciate your work, but there needs to be some kind of
> > > emphasis on documenting various assumptions before implementation.  Or
> > > maybe, at least, just have:
>
> > > rake ts:in --no-stems
>
> > > Sigh.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/thinking-sphinx?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to