cool but, how do you turn on stemming. Sorry haven't read this entire
post yet but was betting that when I did I'd see my answer. Plus, I
think I remember seeing some info/post install notes on the screen
when I installed TS. I was going to re-install and look at that again
real close.

On Jul 21, 4:48 pm, Pat Allan <[email protected]> wrote:
> Sorry for the confusion Jim. I'll update the documents to remove the  
> mention of the default stemming.
>
> --
> Pat
>
> On 21/07/2009, at 3:35 PM, jim wrote:
>
>
>
> > aaaaaaaaaaaaaahhhhhhhhhhhhhhhh !!!!!!!!!!!!!!!    :)
>
> > On May 27, 5:15 pm, Pat Allan <[email protected]> wrote:
> >> Okay, I've made the change. Anyone now installing Thinking Sphinx via
> >> plugin or gem gets a warning, and morphology has a default of nil.
>
> >> I'll remove the warning at some point, maybe in a couple of months.
>
> >> Cheers
>
> >> --
> >> Pat
>
> >> On 17/05/2009, at 10:08 PM, Pat Allan wrote:
>
> >>> You did indeed write a lot, but that's okay, provides a more  
> >>> thorough
> >>> understanding.
>
> >>> Grouping is probably the best way to do what you've done for the
> >>> account names - so your approach seems right to me (even though it's
> >>> not a perfect solution). As much as I can comprehend the problem at
> >>> the moment, anyway :)
>
> >>> As for alerting people to the removal of the default morphology, I
> >>> like the idea of having messages when the plugin or gem is installed
> >>> (and both of those should be doable, I'm almost certain).
>
> >>> If you want to have a go at forking and patching, be my guest - for
> >>> plugins, I think PLUGIN_ROOT/install.rb is what should hold code  
> >>> that
> >>> gets run on installation (it might be housed under PLUGIN_ROOT/
> >>> rails/
> >>> install.rb since Rails 2.1). No idea what the process is for gems,  
> >>> but
> >>> the rspec gem outputs a message, so TS should be able to as well.
>
> >>> Otherwise, when I have the time and motivation, I'll attempt it  
> >>> myself
> >>> - which is fine by me, but don't be afraid to give it a shot  
> >>> yourself.
>
> >>> Cheers
>
> >>> --
> >>> Pat
>
> >>> On 15/05/2009, at 12:35 PM, aitrus wrote:
>
> >>>> Hi Pat,
>
> >>>> Thanks again for your work on TS.  Sorry, I get worked up  
> >>>> easily.  To
> >>>> answer your question, first:
>
> >>>> I'm doing some data warehouse-ish applications.  I pull in lots of
> >>>> data from various systems.  Then I use things like account names,
> >>>> group names, resource names, host names, etc., to find unique
> >>>> records.
>
> >>>> When it comes to grouping, I have an association setup of  
> >>>> Personnel/
> >>>> Divisions <-- ownership --> Accounts.
>
> >>>> A person can have many accounts.  However, each person has only one
> >>>> personnel record.  If I render a search in Sphinx, it paginates the
> >>>> Personnel records--then if I try to display accounts, the  
> >>>> pagination
> >>>> is very strange.
>
> >>>> So, I needed to do a search in Sphinx on the Accounts table, (a)
> >>>> eliminating duplicate account names, and (b) eliminating accounts
> >>>> with
> >>>> no owner (took some digging to figure out I need to have a "has"
> >>>> attribute).
>
> >>>> The way I eventually got this to work (after much whiskey and self-
> >>>> mutilation) is to setup:
>
> >>>> has staffs(:id),                  :as => :has_staffs,         :type
> >>>> => :integer
> >>>> has ["LOWER(`accounts`.`name`)"], :as => :sort_account_name,  :type
> >>>> => :string
>
> >>>> in my define index.  Then I run the following sphinx search:
>
> >>>>     @staff_results =  Account.search query,     :conditions =>
> >>>> conditions,  :page => params[:page],
> >>>>                       :group_function => :attr, :group_by =>
> >>>> "sort_account_name",
> >>>>                       :group_clause => sort,    :without =>
> >>>> {:has_staffs => 0}
>
> >>>> Which solves my biggest problem.  I still have the issue that one
> >>>> account can have many owners--but I have not begun that work.  I  
> >>>> also
> >>>> just noticed, after reviewing some logs, that if ":sortable =>  
> >>>> true"
> >>>> is enabled, you create a "<column>_sort" attribute.  I haven't  
> >>>> tried
> >>>> using this in the above "group_by" entry, yet.
>
> >>>> The biggest use of Sphinx (for me) is that it lets me minimize the
> >>>> size of my MySQL indexes (thus speeding up MySQL), and instead uses
> >>>> Sphinx to quickly crawl text fields.  For example, a unique unix
> >>>> account could be described as an account (case-sensitive) per  
> >>>> server.
> >>>> There's several platforms/accounts being warehoused.  My account
> >>>> database has 634,000 records.  A mysql search for this account  
> >>>> would
> >>>> be ungodly, since InnoDB lacks fulltext indexing.  etc.
>
> >>>> Another issue I've had is figuring out that I needed to setup the
> >>>> Charset Table for Sphinx, so it would index various special
> >>>> characters--some user/group/resource names can have those tucked
> >>>> away.  Of special note are @ (at-sign), $ (dollar-sign), #(hash/
> >>>> pound-
> >>>> symbol), and parenthesis, period, hyphen, underscore, etc.
>
> >>>> I solved that in the sphinx.yml and it looks like:
>
> >>>> development:
> >>>> morphology: "none"
> >>>> charset_table: "0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U
> >>>> +44F, U+430..U+44F, U+0024, $, @, *, ., -, U+0028, (, U+0029, ),  
> >>>> \"#
> >>>> \""
>
> >>>> I'm saying some things you probably already know--but I'm hoping
> >>>> google indexes my post and saves other developers from the
> >>>> psychological trauma that I experienced.
>
> >>>> I'll be using Sphinx also as part of a web page, but every search
> >>>> term
> >>>> will be literal--there's not much use for wordlists, stemming, etc,
> >>>> even in that situation.
>
> >>>> Hope this gives good insight into my experience.  As for how to
> >>>> notify, that would be a question of how Rails plugin / gem install
> >>>> stuff works.
>
> >>>> My first question would be if you can issue a notice on screen when
> >>>> you first install a plugin.  Or if "gem install" lets you output
> >>>> something, similar to a license agreement.
>
> >>>> If there's no easy, verbose way to do it--then I think you should
> >>>> have
> >>>> the next update look for a "sphinx.yml" file.  If it doesn't exist,
> >>>> create it with a boiler plate and have your current defaults remain
> >>>> the default.  But below them, comment out a line that overrides it.
>
> >>>> Another way is to intentionally break the existing plugin-install  
> >>>> url
> >>>> for Sphinx--so people have to go look at your webpage and pay
> >>>> attention.
>
> >>>> I can think of more ideas.  I'd be happy to contribute to TS, but  
> >>>> I'm
> >>>> still new to Ruby/Rails (coming from Perl) and I want to avoid the
> >>>> risk of committing bad code.
>
> >>>> I wrote a lot :(  Thank you.
>
> >>>> On May 14, 11:58 pm, Pat Allan <[email protected]> wrote:
> >>>>> Fair points, even if you're a little worked up about it.
>
> >>>>> When I was last doing some refactoring of the TS Configuration
> >>>>> class,
> >>>>> I considered removing the default morphology, but didn't because
> >>>>> people were already using TS working on the (yes, barely  
> >>>>> documented)
> >>>>> assumption that it *is* the default.
>
> >>>>> So, I agree about the default being nothing, and people set it if
> >>>>> they
> >>>>> want one.. but how to we deprecate it cleanly? Beyond just  
> >>>>> removing
> >>>>> it, which is easy to do, but a warning would be nice, except we
> >>>>> don't
> >>>>> want that warning appearing *every* time ts:in is run, or  
> >>>>> something
> >>>>> like that.
>
> >>>>> Suggestions welcome.
>
> >>>>> Also, re: your grouping issue, care to elaborate?
>
> >>>>> --
> >>>>> Pat
>
> >>>>> On 14/05/2009, at 12:04 PM, aitrus wrote:
>
> >>>>>> Pat, I love Thinking Sphinx and I appreciate everything you've  
> >>>>>> done
> >>>>>> for Rails.
>
> >>>>>> Having said that.... for the love of god, please don't set  
> >>>>>> defaults
> >>>>>> like this.  I didn't even know what was going on.  I'm doing an
> >>>>>> import
> >>>>>> on hundreds of thousands of records and the full-text search of
> >>>>>> Sphinx
> >>>>>> makes this so much faster.
>
> >>>>>> But apparently you're setting the morphology to "stem_en" as a
> >>>>>> default.  I can't find anything about this behavior and it took  
> >>>>>> me
> >>>>>> forever to figure out that this was the actual issue.  I have  
> >>>>>> spent
> >>>>>> hours trying to figure out why "AB0E" also matched "AB0S".  In
> >>>>>> fact, I
> >>>>>> didn't even realize this was an issue until after I developed
> >>>>>> everything, and began to QA my records.
>
> >>>>>> Sweet jesus :(  Please organize this in a way that is either
> >>>>>> obvious
> >>>>>> or painstakingly documented.
>
> >>>>>> I had another issue with TS, where I was trying to group results
> >>>>>> based
> >>>>>> on certain columns (via has_many and h_m:through).  Such a
> >>>>>> nightmare.
>
> >>>>>> I really appreciate your work, but there needs to be some kind of
> >>>>>> emphasis on documenting various assumptions before
> >>>>>> implementation.  Or
> >>>>>> maybe, at least, just have:
>
> >>>>>> rake ts:in --no-stems
>
> >>>>>> Sigh.
>
>
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/thinking-sphinx?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to