Or I can just cut and paste that message here:

With the release of Thinking Sphinx 1.1.18, there is one important  
change to
note: previously, the default morphology for indexing was 'stem_en'.  
The new
default is nil, to avoid any unexpected behavior. If you wish to keep  
the old
value though, you will need to add the following settings to your
config/sphinx.yml file:

development:
   morphology: stem_en
test:
   morphology: stem_en
production:
   morphology: stem_en

To understand morphologies/stemmers better, visit the following link:
http://www.sphinxsearch.com/docs/manual-0.9.8.html#conf-morphology


Hope this helps.

Cheers

-- 
Pat

On 21/07/2009, at 4:24 PM, jim wrote:

>
> cool but, how do you turn on stemming. Sorry haven't read this entire
> post yet but was betting that when I did I'd see my answer. Plus, I
> think I remember seeing some info/post install notes on the screen
> when I installed TS. I was going to re-install and look at that again
> real close.
>
> On Jul 21, 4:48 pm, Pat Allan <[email protected]> wrote:
>> Sorry for the confusion Jim. I'll update the documents to remove the
>> mention of the default stemming.
>>
>> --
>> Pat
>>
>> On 21/07/2009, at 3:35 PM, jim wrote:
>>
>>
>>
>>> aaaaaaaaaaaaaahhhhhhhhhhhhhhhh !!!!!!!!!!!!!!!    :)
>>
>>> On May 27, 5:15 pm, Pat Allan <[email protected]> wrote:
>>>> Okay, I've made the change. Anyone now installing Thinking Sphinx  
>>>> via
>>>> plugin or gem gets a warning, and morphology has a default of nil.
>>
>>>> I'll remove the warning at some point, maybe in a couple of months.
>>
>>>> Cheers
>>
>>>> --
>>>> Pat
>>
>>>> On 17/05/2009, at 10:08 PM, Pat Allan wrote:
>>
>>>>> You did indeed write a lot, but that's okay, provides a more
>>>>> thorough
>>>>> understanding.
>>
>>>>> Grouping is probably the best way to do what you've done for the
>>>>> account names - so your approach seems right to me (even though  
>>>>> it's
>>>>> not a perfect solution). As much as I can comprehend the problem  
>>>>> at
>>>>> the moment, anyway :)
>>
>>>>> As for alerting people to the removal of the default morphology, I
>>>>> like the idea of having messages when the plugin or gem is  
>>>>> installed
>>>>> (and both of those should be doable, I'm almost certain).
>>
>>>>> If you want to have a go at forking and patching, be my guest -  
>>>>> for
>>>>> plugins, I think PLUGIN_ROOT/install.rb is what should hold code
>>>>> that
>>>>> gets run on installation (it might be housed under PLUGIN_ROOT/
>>>>> rails/
>>>>> install.rb since Rails 2.1). No idea what the process is for gems,
>>>>> but
>>>>> the rspec gem outputs a message, so TS should be able to as well.
>>
>>>>> Otherwise, when I have the time and motivation, I'll attempt it
>>>>> myself
>>>>> - which is fine by me, but don't be afraid to give it a shot
>>>>> yourself.
>>
>>>>> Cheers
>>
>>>>> --
>>>>> Pat
>>
>>>>> On 15/05/2009, at 12:35 PM, aitrus wrote:
>>
>>>>>> Hi Pat,
>>
>>>>>> Thanks again for your work on TS.  Sorry, I get worked up
>>>>>> easily.  To
>>>>>> answer your question, first:
>>
>>>>>> I'm doing some data warehouse-ish applications.  I pull in lots  
>>>>>> of
>>>>>> data from various systems.  Then I use things like account names,
>>>>>> group names, resource names, host names, etc., to find unique
>>>>>> records.
>>
>>>>>> When it comes to grouping, I have an association setup of
>>>>>> Personnel/
>>>>>> Divisions <-- ownership --> Accounts.
>>
>>>>>> A person can have many accounts.  However, each person has only  
>>>>>> one
>>>>>> personnel record.  If I render a search in Sphinx, it paginates  
>>>>>> the
>>>>>> Personnel records--then if I try to display accounts, the
>>>>>> pagination
>>>>>> is very strange.
>>
>>>>>> So, I needed to do a search in Sphinx on the Accounts table, (a)
>>>>>> eliminating duplicate account names, and (b) eliminating accounts
>>>>>> with
>>>>>> no owner (took some digging to figure out I need to have a "has"
>>>>>> attribute).
>>
>>>>>> The way I eventually got this to work (after much whiskey and  
>>>>>> self-
>>>>>> mutilation) is to setup:
>>
>>>>>> has staffs(:id),                  :as  
>>>>>> => :has_staffs,         :type
>>>>>> => :integer
>>>>>> has ["LOWER(`accounts`.`name`)"], :as  
>>>>>> => :sort_account_name,  :type
>>>>>> => :string
>>
>>>>>> in my define index.  Then I run the following sphinx search:
>>
>>>>>>     @staff_results =  Account.search query,     :conditions =>
>>>>>> conditions,  :page => params[:page],
>>>>>>                       :group_function => :attr, :group_by =>
>>>>>> "sort_account_name",
>>>>>>                       :group_clause => sort,    :without =>
>>>>>> {:has_staffs => 0}
>>
>>>>>> Which solves my biggest problem.  I still have the issue that one
>>>>>> account can have many owners--but I have not begun that work.  I
>>>>>> also
>>>>>> just noticed, after reviewing some logs, that if ":sortable =>
>>>>>> true"
>>>>>> is enabled, you create a "<column>_sort" attribute.  I haven't
>>>>>> tried
>>>>>> using this in the above "group_by" entry, yet.
>>
>>>>>> The biggest use of Sphinx (for me) is that it lets me minimize  
>>>>>> the
>>>>>> size of my MySQL indexes (thus speeding up MySQL), and instead  
>>>>>> uses
>>>>>> Sphinx to quickly crawl text fields.  For example, a unique unix
>>>>>> account could be described as an account (case-sensitive) per
>>>>>> server.
>>>>>> There's several platforms/accounts being warehoused.  My account
>>>>>> database has 634,000 records.  A mysql search for this account
>>>>>> would
>>>>>> be ungodly, since InnoDB lacks fulltext indexing.  etc.
>>
>>>>>> Another issue I've had is figuring out that I needed to setup the
>>>>>> Charset Table for Sphinx, so it would index various special
>>>>>> characters--some user/group/resource names can have those tucked
>>>>>> away.  Of special note are @ (at-sign), $ (dollar-sign), #(hash/
>>>>>> pound-
>>>>>> symbol), and parenthesis, period, hyphen, underscore, etc.
>>
>>>>>> I solved that in the sphinx.yml and it looks like:
>>
>>>>>> development:
>>>>>> morphology: "none"
>>>>>> charset_table: "0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U
>>>>>> +44F, U+430..U+44F, U+0024, $, @, *, ., -, U+0028, (, U+0029, ),
>>>>>> \"#
>>>>>> \""
>>
>>>>>> I'm saying some things you probably already know--but I'm hoping
>>>>>> google indexes my post and saves other developers from the
>>>>>> psychological trauma that I experienced.
>>
>>>>>> I'll be using Sphinx also as part of a web page, but every search
>>>>>> term
>>>>>> will be literal--there's not much use for wordlists, stemming,  
>>>>>> etc,
>>>>>> even in that situation.
>>
>>>>>> Hope this gives good insight into my experience.  As for how to
>>>>>> notify, that would be a question of how Rails plugin / gem  
>>>>>> install
>>>>>> stuff works.
>>
>>>>>> My first question would be if you can issue a notice on screen  
>>>>>> when
>>>>>> you first install a plugin.  Or if "gem install" lets you output
>>>>>> something, similar to a license agreement.
>>
>>>>>> If there's no easy, verbose way to do it--then I think you should
>>>>>> have
>>>>>> the next update look for a "sphinx.yml" file.  If it doesn't  
>>>>>> exist,
>>>>>> create it with a boiler plate and have your current defaults  
>>>>>> remain
>>>>>> the default.  But below them, comment out a line that overrides  
>>>>>> it.
>>
>>>>>> Another way is to intentionally break the existing plugin-install
>>>>>> url
>>>>>> for Sphinx--so people have to go look at your webpage and pay
>>>>>> attention.
>>
>>>>>> I can think of more ideas.  I'd be happy to contribute to TS, but
>>>>>> I'm
>>>>>> still new to Ruby/Rails (coming from Perl) and I want to avoid  
>>>>>> the
>>>>>> risk of committing bad code.
>>
>>>>>> I wrote a lot :(  Thank you.
>>
>>>>>> On May 14, 11:58 pm, Pat Allan <[email protected]> wrote:
>>>>>>> Fair points, even if you're a little worked up about it.
>>
>>>>>>> When I was last doing some refactoring of the TS Configuration
>>>>>>> class,
>>>>>>> I considered removing the default morphology, but didn't because
>>>>>>> people were already using TS working on the (yes, barely
>>>>>>> documented)
>>>>>>> assumption that it *is* the default.
>>
>>>>>>> So, I agree about the default being nothing, and people set it  
>>>>>>> if
>>>>>>> they
>>>>>>> want one.. but how to we deprecate it cleanly? Beyond just
>>>>>>> removing
>>>>>>> it, which is easy to do, but a warning would be nice, except we
>>>>>>> don't
>>>>>>> want that warning appearing *every* time ts:in is run, or
>>>>>>> something
>>>>>>> like that.
>>
>>>>>>> Suggestions welcome.
>>
>>>>>>> Also, re: your grouping issue, care to elaborate?
>>
>>>>>>> --
>>>>>>> Pat
>>
>>>>>>> On 14/05/2009, at 12:04 PM, aitrus wrote:
>>
>>>>>>>> Pat, I love Thinking Sphinx and I appreciate everything you've
>>>>>>>> done
>>>>>>>> for Rails.
>>
>>>>>>>> Having said that.... for the love of god, please don't set
>>>>>>>> defaults
>>>>>>>> like this.  I didn't even know what was going on.  I'm doing an
>>>>>>>> import
>>>>>>>> on hundreds of thousands of records and the full-text search of
>>>>>>>> Sphinx
>>>>>>>> makes this so much faster.
>>
>>>>>>>> But apparently you're setting the morphology to "stem_en" as a
>>>>>>>> default.  I can't find anything about this behavior and it took
>>>>>>>> me
>>>>>>>> forever to figure out that this was the actual issue.  I have
>>>>>>>> spent
>>>>>>>> hours trying to figure out why "AB0E" also matched "AB0S".  In
>>>>>>>> fact, I
>>>>>>>> didn't even realize this was an issue until after I developed
>>>>>>>> everything, and began to QA my records.
>>
>>>>>>>> Sweet jesus :(  Please organize this in a way that is either
>>>>>>>> obvious
>>>>>>>> or painstakingly documented.
>>
>>>>>>>> I had another issue with TS, where I was trying to group  
>>>>>>>> results
>>>>>>>> based
>>>>>>>> on certain columns (via has_many and h_m:through).  Such a
>>>>>>>> nightmare.
>>
>>>>>>>> I really appreciate your work, but there needs to be some  
>>>>>>>> kind of
>>>>>>>> emphasis on documenting various assumptions before
>>>>>>>> implementation.  Or
>>>>>>>> maybe, at least, just have:
>>
>>>>>>>> rake ts:in --no-stems
>>
>>>>>>>> Sigh.
>>
>>
> >


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/thinking-sphinx?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to