Sorry for the confusion Jim. I'll update the documents to remove the  
mention of the default stemming.

-- 
Pat

On 21/07/2009, at 3:35 PM, jim wrote:

>
> aaaaaaaaaaaaaahhhhhhhhhhhhhhhh !!!!!!!!!!!!!!!    :)
>
> On May 27, 5:15 pm, Pat Allan <[email protected]> wrote:
>> Okay, I've made the change. Anyone now installing Thinking Sphinx via
>> plugin or gem gets a warning, and morphology has a default of nil.
>>
>> I'll remove the warning at some point, maybe in a couple of months.
>>
>> Cheers
>>
>> --
>> Pat
>>
>> On 17/05/2009, at 10:08 PM, Pat Allan wrote:
>>
>>
>>
>>> You did indeed write a lot, but that's okay, provides a more  
>>> thorough
>>> understanding.
>>
>>> Grouping is probably the best way to do what you've done for the
>>> account names - so your approach seems right to me (even though it's
>>> not a perfect solution). As much as I can comprehend the problem at
>>> the moment, anyway :)
>>
>>> As for alerting people to the removal of the default morphology, I
>>> like the idea of having messages when the plugin or gem is installed
>>> (and both of those should be doable, I'm almost certain).
>>
>>> If you want to have a go at forking and patching, be my guest - for
>>> plugins, I think PLUGIN_ROOT/install.rb is what should hold code  
>>> that
>>> gets run on installation (it might be housed under PLUGIN_ROOT/ 
>>> rails/
>>> install.rb since Rails 2.1). No idea what the process is for gems,  
>>> but
>>> the rspec gem outputs a message, so TS should be able to as well.
>>
>>> Otherwise, when I have the time and motivation, I'll attempt it  
>>> myself
>>> - which is fine by me, but don't be afraid to give it a shot  
>>> yourself.
>>
>>> Cheers
>>
>>> --
>>> Pat
>>
>>> On 15/05/2009, at 12:35 PM, aitrus wrote:
>>
>>>> Hi Pat,
>>
>>>> Thanks again for your work on TS.  Sorry, I get worked up  
>>>> easily.  To
>>>> answer your question, first:
>>
>>>> I'm doing some data warehouse-ish applications.  I pull in lots of
>>>> data from various systems.  Then I use things like account names,
>>>> group names, resource names, host names, etc., to find unique
>>>> records.
>>
>>>> When it comes to grouping, I have an association setup of  
>>>> Personnel/
>>>> Divisions <-- ownership --> Accounts.
>>
>>>> A person can have many accounts.  However, each person has only one
>>>> personnel record.  If I render a search in Sphinx, it paginates the
>>>> Personnel records--then if I try to display accounts, the  
>>>> pagination
>>>> is very strange.
>>
>>>> So, I needed to do a search in Sphinx on the Accounts table, (a)
>>>> eliminating duplicate account names, and (b) eliminating accounts
>>>> with
>>>> no owner (took some digging to figure out I need to have a "has"
>>>> attribute).
>>
>>>> The way I eventually got this to work (after much whiskey and self-
>>>> mutilation) is to setup:
>>
>>>> has staffs(:id),                  :as => :has_staffs,         :type
>>>> => :integer
>>>> has ["LOWER(`accounts`.`name`)"], :as => :sort_account_name,  :type
>>>> => :string
>>
>>>> in my define index.  Then I run the following sphinx search:
>>
>>>>    @staff_results =  Account.search query,     :conditions =>
>>>> conditions,  :page => params[:page],
>>>>                      :group_function => :attr, :group_by =>
>>>> "sort_account_name",
>>>>                      :group_clause => sort,    :without =>
>>>> {:has_staffs => 0}
>>
>>>> Which solves my biggest problem.  I still have the issue that one
>>>> account can have many owners--but I have not begun that work.  I  
>>>> also
>>>> just noticed, after reviewing some logs, that if ":sortable =>  
>>>> true"
>>>> is enabled, you create a "<column>_sort" attribute.  I haven't  
>>>> tried
>>>> using this in the above "group_by" entry, yet.
>>
>>>> The biggest use of Sphinx (for me) is that it lets me minimize the
>>>> size of my MySQL indexes (thus speeding up MySQL), and instead uses
>>>> Sphinx to quickly crawl text fields.  For example, a unique unix
>>>> account could be described as an account (case-sensitive) per  
>>>> server.
>>>> There's several platforms/accounts being warehoused.  My account
>>>> database has 634,000 records.  A mysql search for this account  
>>>> would
>>>> be ungodly, since InnoDB lacks fulltext indexing.  etc.
>>
>>>> Another issue I've had is figuring out that I needed to setup the
>>>> Charset Table for Sphinx, so it would index various special
>>>> characters--some user/group/resource names can have those tucked
>>>> away.  Of special note are @ (at-sign), $ (dollar-sign), #(hash/
>>>> pound-
>>>> symbol), and parenthesis, period, hyphen, underscore, etc.
>>
>>>> I solved that in the sphinx.yml and it looks like:
>>
>>>> development:
>>>> morphology: "none"
>>>> charset_table: "0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U
>>>> +44F, U+430..U+44F, U+0024, $, @, *, ., -, U+0028, (, U+0029, ),  
>>>> \"#
>>>> \""
>>
>>>> I'm saying some things you probably already know--but I'm hoping
>>>> google indexes my post and saves other developers from the
>>>> psychological trauma that I experienced.
>>
>>>> I'll be using Sphinx also as part of a web page, but every search
>>>> term
>>>> will be literal--there's not much use for wordlists, stemming, etc,
>>>> even in that situation.
>>
>>>> Hope this gives good insight into my experience.  As for how to
>>>> notify, that would be a question of how Rails plugin / gem install
>>>> stuff works.
>>
>>>> My first question would be if you can issue a notice on screen when
>>>> you first install a plugin.  Or if "gem install" lets you output
>>>> something, similar to a license agreement.
>>
>>>> If there's no easy, verbose way to do it--then I think you should
>>>> have
>>>> the next update look for a "sphinx.yml" file.  If it doesn't exist,
>>>> create it with a boiler plate and have your current defaults remain
>>>> the default.  But below them, comment out a line that overrides it.
>>
>>>> Another way is to intentionally break the existing plugin-install  
>>>> url
>>>> for Sphinx--so people have to go look at your webpage and pay
>>>> attention.
>>
>>>> I can think of more ideas.  I'd be happy to contribute to TS, but  
>>>> I'm
>>>> still new to Ruby/Rails (coming from Perl) and I want to avoid the
>>>> risk of committing bad code.
>>
>>>> I wrote a lot :(  Thank you.
>>
>>>> On May 14, 11:58 pm, Pat Allan <[email protected]> wrote:
>>>>> Fair points, even if you're a little worked up about it.
>>
>>>>> When I was last doing some refactoring of the TS Configuration
>>>>> class,
>>>>> I considered removing the default morphology, but didn't because
>>>>> people were already using TS working on the (yes, barely  
>>>>> documented)
>>>>> assumption that it *is* the default.
>>
>>>>> So, I agree about the default being nothing, and people set it if
>>>>> they
>>>>> want one.. but how to we deprecate it cleanly? Beyond just  
>>>>> removing
>>>>> it, which is easy to do, but a warning would be nice, except we
>>>>> don't
>>>>> want that warning appearing *every* time ts:in is run, or  
>>>>> something
>>>>> like that.
>>
>>>>> Suggestions welcome.
>>
>>>>> Also, re: your grouping issue, care to elaborate?
>>
>>>>> --
>>>>> Pat
>>
>>>>> On 14/05/2009, at 12:04 PM, aitrus wrote:
>>
>>>>>> Pat, I love Thinking Sphinx and I appreciate everything you've  
>>>>>> done
>>>>>> for Rails.
>>
>>>>>> Having said that.... for the love of god, please don't set  
>>>>>> defaults
>>>>>> like this.  I didn't even know what was going on.  I'm doing an
>>>>>> import
>>>>>> on hundreds of thousands of records and the full-text search of
>>>>>> Sphinx
>>>>>> makes this so much faster.
>>
>>>>>> But apparently you're setting the morphology to "stem_en" as a
>>>>>> default.  I can't find anything about this behavior and it took  
>>>>>> me
>>>>>> forever to figure out that this was the actual issue.  I have  
>>>>>> spent
>>>>>> hours trying to figure out why "AB0E" also matched "AB0S".  In
>>>>>> fact, I
>>>>>> didn't even realize this was an issue until after I developed
>>>>>> everything, and began to QA my records.
>>
>>>>>> Sweet jesus :(  Please organize this in a way that is either
>>>>>> obvious
>>>>>> or painstakingly documented.
>>
>>>>>> I had another issue with TS, where I was trying to group results
>>>>>> based
>>>>>> on certain columns (via has_many and h_m:through).  Such a
>>>>>> nightmare.
>>
>>>>>> I really appreciate your work, but there needs to be some kind of
>>>>>> emphasis on documenting various assumptions before
>>>>>> implementation.  Or
>>>>>> maybe, at least, just have:
>>
>>>>>> rake ts:in --no-stems
>>
>>>>>> Sigh.
>>
>>
> >


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/thinking-sphinx?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to