Okay, I've made the change. Anyone now installing Thinking Sphinx via  
plugin or gem gets a warning, and morphology has a default of nil.

I'll remove the warning at some point, maybe in a couple of months.

Cheers

-- 
Pat

On 17/05/2009, at 10:08 PM, Pat Allan wrote:

>
> You did indeed write a lot, but that's okay, provides a more thorough
> understanding.
>
> Grouping is probably the best way to do what you've done for the
> account names - so your approach seems right to me (even though it's
> not a perfect solution). As much as I can comprehend the problem at
> the moment, anyway :)
>
> As for alerting people to the removal of the default morphology, I
> like the idea of having messages when the plugin or gem is installed
> (and both of those should be doable, I'm almost certain).
>
> If you want to have a go at forking and patching, be my guest - for
> plugins, I think PLUGIN_ROOT/install.rb is what should hold code that
> gets run on installation (it might be housed under PLUGIN_ROOT/rails/
> install.rb since Rails 2.1). No idea what the process is for gems, but
> the rspec gem outputs a message, so TS should be able to as well.
>
> Otherwise, when I have the time and motivation, I'll attempt it myself
> - which is fine by me, but don't be afraid to give it a shot yourself.
>
> Cheers
>
> -- 
> Pat
>
> On 15/05/2009, at 12:35 PM, aitrus wrote:
>
>>
>> Hi Pat,
>>
>> Thanks again for your work on TS.  Sorry, I get worked up easily.  To
>> answer your question, first:
>>
>> I'm doing some data warehouse-ish applications.  I pull in lots of
>> data from various systems.  Then I use things like account names,
>> group names, resource names, host names, etc., to find unique  
>> records.
>>
>> When it comes to grouping, I have an association setup of Personnel/
>> Divisions <-- ownership --> Accounts.
>>
>> A person can have many accounts.  However, each person has only one
>> personnel record.  If I render a search in Sphinx, it paginates the
>> Personnel records--then if I try to display accounts, the pagination
>> is very strange.
>>
>> So, I needed to do a search in Sphinx on the Accounts table, (a)
>> eliminating duplicate account names, and (b) eliminating accounts  
>> with
>> no owner (took some digging to figure out I need to have a "has"
>> attribute).
>>
>> The way I eventually got this to work (after much whiskey and self-
>> mutilation) is to setup:
>>
>> has staffs(:id),                  :as => :has_staffs,         :type
>> => :integer
>> has ["LOWER(`accounts`.`name`)"], :as => :sort_account_name,  :type
>> => :string
>>
>> in my define index.  Then I run the following sphinx search:
>>
>>     @staff_results =  Account.search query,     :conditions =>
>> conditions,  :page => params[:page],
>>                       :group_function => :attr, :group_by =>
>> "sort_account_name",
>>                       :group_clause => sort,    :without =>
>> {:has_staffs => 0}
>>
>> Which solves my biggest problem.  I still have the issue that one
>> account can have many owners--but I have not begun that work.  I also
>> just noticed, after reviewing some logs, that if ":sortable => true"
>> is enabled, you create a "<column>_sort" attribute.  I haven't tried
>> using this in the above "group_by" entry, yet.
>>
>>
>> The biggest use of Sphinx (for me) is that it lets me minimize the
>> size of my MySQL indexes (thus speeding up MySQL), and instead uses
>> Sphinx to quickly crawl text fields.  For example, a unique unix
>> account could be described as an account (case-sensitive) per server.
>> There's several platforms/accounts being warehoused.  My account
>> database has 634,000 records.  A mysql search for this account would
>> be ungodly, since InnoDB lacks fulltext indexing.  etc.
>>
>> Another issue I've had is figuring out that I needed to setup the
>> Charset Table for Sphinx, so it would index various special
>> characters--some user/group/resource names can have those tucked
>> away.  Of special note are @ (at-sign), $ (dollar-sign), #(hash/ 
>> pound-
>> symbol), and parenthesis, period, hyphen, underscore, etc.
>>
>> I solved that in the sphinx.yml and it looks like:
>>
>> development:
>> morphology: "none"
>> charset_table: "0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U
>> +44F, U+430..U+44F, U+0024, $, @, *, ., -, U+0028, (, U+0029, ), \"#
>> \""
>>
>> I'm saying some things you probably already know--but I'm hoping
>> google indexes my post and saves other developers from the
>> psychological trauma that I experienced.
>>
>>
>> I'll be using Sphinx also as part of a web page, but every search  
>> term
>> will be literal--there's not much use for wordlists, stemming, etc,
>> even in that situation.
>>
>> Hope this gives good insight into my experience.  As for how to
>> notify, that would be a question of how Rails plugin / gem install
>> stuff works.
>>
>> My first question would be if you can issue a notice on screen when
>> you first install a plugin.  Or if "gem install" lets you output
>> something, similar to a license agreement.
>>
>> If there's no easy, verbose way to do it--then I think you should  
>> have
>> the next update look for a "sphinx.yml" file.  If it doesn't exist,
>> create it with a boiler plate and have your current defaults remain
>> the default.  But below them, comment out a line that overrides it.
>>
>> Another way is to intentionally break the existing plugin-install url
>> for Sphinx--so people have to go look at your webpage and pay
>> attention.
>>
>> I can think of more ideas.  I'd be happy to contribute to TS, but I'm
>> still new to Ruby/Rails (coming from Perl) and I want to avoid the
>> risk of committing bad code.
>>
>> I wrote a lot :(  Thank you.
>>
>> On May 14, 11:58 pm, Pat Allan <[email protected]> wrote:
>>> Fair points, even if you're a little worked up about it.
>>>
>>> When I was last doing some refactoring of the TS Configuration  
>>> class,
>>> I considered removing the default morphology, but didn't because
>>> people were already using TS working on the (yes, barely documented)
>>> assumption that it *is* the default.
>>>
>>> So, I agree about the default being nothing, and people set it if
>>> they
>>> want one.. but how to we deprecate it cleanly? Beyond just removing
>>> it, which is easy to do, but a warning would be nice, except we  
>>> don't
>>> want that warning appearing *every* time ts:in is run, or something
>>> like that.
>>>
>>> Suggestions welcome.
>>>
>>> Also, re: your grouping issue, care to elaborate?
>>>
>>> --
>>> Pat
>>>
>>> On 14/05/2009, at 12:04 PM, aitrus wrote:
>>>
>>>
>>>
>>>
>>>
>>>> Pat, I love Thinking Sphinx and I appreciate everything you've done
>>>> for Rails.
>>>
>>>> Having said that.... for the love of god, please don't set defaults
>>>> like this.  I didn't even know what was going on.  I'm doing an
>>>> import
>>>> on hundreds of thousands of records and the full-text search of
>>>> Sphinx
>>>> makes this so much faster.
>>>
>>>> But apparently you're setting the morphology to "stem_en" as a
>>>> default.  I can't find anything about this behavior and it took me
>>>> forever to figure out that this was the actual issue.  I have spent
>>>> hours trying to figure out why "AB0E" also matched "AB0S".  In
>>>> fact, I
>>>> didn't even realize this was an issue until after I developed
>>>> everything, and began to QA my records.
>>>
>>>> Sweet jesus :(  Please organize this in a way that is either  
>>>> obvious
>>>> or painstakingly documented.
>>>
>>>> I had another issue with TS, where I was trying to group results
>>>> based
>>>> on certain columns (via has_many and h_m:through).  Such a
>>>> nightmare.
>>>
>>>> I really appreciate your work, but there needs to be some kind of
>>>> emphasis on documenting various assumptions before
>>>> implementation.  Or
>>>> maybe, at least, just have:
>>>
>>>> rake ts:in --no-stems
>>>
>>>> Sigh.
>>>
>
>
> >


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/thinking-sphinx?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to