Hi Pat,

Thanks again for your work on TS.  Sorry, I get worked up easily.  To
answer your question, first:

I'm doing some data warehouse-ish applications.  I pull in lots of
data from various systems.  Then I use things like account names,
group names, resource names, host names, etc., to find unique records.

When it comes to grouping, I have an association setup of Personnel/
Divisions <-- ownership --> Accounts.

A person can have many accounts.  However, each person has only one
personnel record.  If I render a search in Sphinx, it paginates the
Personnel records--then if I try to display accounts, the pagination
is very strange.

So, I needed to do a search in Sphinx on the Accounts table, (a)
eliminating duplicate account names, and (b) eliminating accounts with
no owner (took some digging to figure out I need to have a "has"
attribute).

The way I eventually got this to work (after much whiskey and self-
mutilation) is to setup:

has staffs(:id),                  :as => :has_staffs,         :type
=> :integer
has ["LOWER(`accounts`.`name`)"], :as => :sort_account_name,  :type
=> :string

in my define index.  Then I run the following sphinx search:

      @staff_results =  Account.search query,     :conditions =>
conditions,  :page => params[:page],
                        :group_function => :attr, :group_by =>
"sort_account_name",
                        :group_clause => sort,    :without =>
{:has_staffs => 0}

Which solves my biggest problem.  I still have the issue that one
account can have many owners--but I have not begun that work.  I also
just noticed, after reviewing some logs, that if ":sortable => true"
is enabled, you create a "<column>_sort" attribute.  I haven't tried
using this in the above "group_by" entry, yet.


The biggest use of Sphinx (for me) is that it lets me minimize the
size of my MySQL indexes (thus speeding up MySQL), and instead uses
Sphinx to quickly crawl text fields.  For example, a unique unix
account could be described as an account (case-sensitive) per server.
There's several platforms/accounts being warehoused.  My account
database has 634,000 records.  A mysql search for this account would
be ungodly, since InnoDB lacks fulltext indexing.  etc.

Another issue I've had is figuring out that I needed to setup the
Charset Table for Sphinx, so it would index various special
characters--some user/group/resource names can have those tucked
away.  Of special note are @ (at-sign), $ (dollar-sign), #(hash/pound-
symbol), and parenthesis, period, hyphen, underscore, etc.

I solved that in the sphinx.yml and it looks like:

development:
  morphology: "none"
  charset_table: "0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U
+44F, U+430..U+44F, U+0024, $, @, *, ., -, U+0028, (, U+0029, ), \"#
\""

I'm saying some things you probably already know--but I'm hoping
google indexes my post and saves other developers from the
psychological trauma that I experienced.


I'll be using Sphinx also as part of a web page, but every search term
will be literal--there's not much use for wordlists, stemming, etc,
even in that situation.

Hope this gives good insight into my experience.  As for how to
notify, that would be a question of how Rails plugin / gem install
stuff works.

My first question would be if you can issue a notice on screen when
you first install a plugin.  Or if "gem install" lets you output
something, similar to a license agreement.

If there's no easy, verbose way to do it--then I think you should have
the next update look for a "sphinx.yml" file.  If it doesn't exist,
create it with a boiler plate and have your current defaults remain
the default.  But below them, comment out a line that overrides it.

Another way is to intentionally break the existing plugin-install url
for Sphinx--so people have to go look at your webpage and pay
attention.

I can think of more ideas.  I'd be happy to contribute to TS, but I'm
still new to Ruby/Rails (coming from Perl) and I want to avoid the
risk of committing bad code.

I wrote a lot :(  Thank you.

On May 14, 11:58 pm, Pat Allan <[email protected]> wrote:
> Fair points, even if you're a little worked up about it.
>
> When I was last doing some refactoring of the TS Configuration class,  
> I considered removing the default morphology, but didn't because  
> people were already using TS working on the (yes, barely documented)  
> assumption that it *is* the default.
>
> So, I agree about the default being nothing, and people set it if they  
> want one.. but how to we deprecate it cleanly? Beyond just removing  
> it, which is easy to do, but a warning would be nice, except we don't  
> want that warning appearing *every* time ts:in is run, or something  
> like that.
>
> Suggestions welcome.
>
> Also, re: your grouping issue, care to elaborate?
>
> --
> Pat
>
> On 14/05/2009, at 12:04 PM, aitrus wrote:
>
>
>
>
>
> > Pat, I love Thinking Sphinx and I appreciate everything you've done
> > for Rails.
>
> > Having said that.... for the love of god, please don't set defaults
> > like this.  I didn't even know what was going on.  I'm doing an import
> > on hundreds of thousands of records and the full-text search of Sphinx
> > makes this so much faster.
>
> > But apparently you're setting the morphology to "stem_en" as a
> > default.  I can't find anything about this behavior and it took me
> > forever to figure out that this was the actual issue.  I have spent
> > hours trying to figure out why "AB0E" also matched "AB0S".  In fact, I
> > didn't even realize this was an issue until after I developed
> > everything, and began to QA my records.
>
> > Sweet jesus :(  Please organize this in a way that is either obvious
> > or painstakingly documented.
>
> > I had another issue with TS, where I was trying to group results based
> > on certain columns (via has_many and h_m:through).  Such a nightmare.
>
> > I really appreciate your work, but there needs to be some kind of
> > emphasis on documenting various assumptions before implementation.  Or
> > maybe, at least, just have:
>
> > rake ts:in --no-stems
>
> > Sigh.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/thinking-sphinx?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to