Hi Paul

Thanks for all the detail - makes it far easier to debug.

A few quick observations:

* If you want to match on specific fields (using :conditions), that requires 
:match_mode to be :extended. Indeed, Thinking Sphinx will automatically set it 
to that if you use :conditions. You've been overwriting that though with :any.

* Commas to separate field values in queries don't do anything - to get OR 
syntax on a specific field, you'd want to do something like this instead (using 
| for ORs):

  User.search(
    :with       => { :company_id => 1 },
    :conditions => { :skills => '(1 | 2)' }
  )

* Integers in fields aren't completely reliable (I've seen people have issues 
with four-or-more digits). However, putting those values in attributes instead 
means you probably lose any ability to weight based on the number of skill 
matches.

All that said, if you're switching skills to match on the text instead of the 
ids, then that becomes less exact with multi-word skills (it'll look for the 
words across the entire set of skills for that user - whether or not they're 
actually the one skill). For example, searching for a skill of Hardware Design 
will match users who have Hardware and Design skills, as well as those with the 
Hardware Design skill.

You can improve this by adding quotes around each skill you're searching for - 
but if two skills are next to each other in the concatation during indexing, 
then you'll hit the same problem. Still, the odds of that happening perhaps are 
okay from a larger perspective?

  User.search(
    :with       => { :company_id => 1 },
    :conditions => { :skills => '("SQL" | "Public Speaking")' }
  )

The catch with *this* approach is you'll need to be very careful about 
non-alphanumeric characters - as most won't get indexed by default. Eg: the # 
in C#. You can get around this though by using charset_table:
http://sphinxsearch.com/docs/manual-0.9.9.html#conf-charset-table

If the attribute filters approach is preferred (and personally, I like it - 
it's just that you lose the weightings), then you could try adding separate 
computed attributes for each skill you're learning on, via the :sphinx_select 
option:
http://sphinxsearch.com/docs/manual-0.9.9.html#api-func-setselect

Some combination of IF() and IN() functions should do the trick - repeat for 
each skill, combine the values somehow in your sort_mode (check out the 
expression sort mode for this):
http://sphinxsearch.com/docs/manual-0.9.9.html#sort-expr
http://freelancing-god.github.com/ts/en/searching.html#sorting

I've not played with the select stuff at all, and with expression sorting only 
a very little, so unable to be much more helpful than that down this path.


Still, hopefully parts of this are helpful at the very least :)

Any questions, don't hesitate to ask.

Cheers

-- 
Pat

On 24/12/2010, at 11:51 PM, Paul Graham wrote:

> Below is a summary of what I've been trying to achieve and some sample
> code, along with what I've actually done that seems to produce the
> right sort of results. I'm still not entirely sure it's the right way
> to do it but hopefully you'll be able to steer me if I am indeed
> wrong. I had intended to raise issues with the :with parameter but
> that seems to be working fine this morning, although it seems not to
> like using indexes and only works with attributes - that may be
> designed behaviour, though I didn't spot that in the documentation.
> 
> Users in our application can be assigned "skills" - this is largely
> expected to be technical skills, such as "Java" and "MySQL" but can be
> pretty much anything. These skills are stored in a table and linked to
> a user through a simple has_and_belongs_to_many relationship (i.e., a
> user can have many skills and the same skill can belong to many
> users). The models and join table look a bit like this:
> 
> users:
> - id:integer
> - company_id:integer
> - forename:text
> - surname:text
> 
> skills:
> - id:integer
> - title:text
> 
> users_skills:
> - user_id:integer
> - skill_id:integer
> 
> The User model has the following declaration:
> - has_and_belongs_to_many :skills, :join_table => :users_skills
> 
> And the Skill model has a similar declaration:
> - has_and_belongs_to_many :users, :join_table => :users_skills
> 
> 
> I've got ThinkingSphinx all set up and working and I created the
> following index on the User mode (not sure if indenting works on here
> so I'm doing it myself with invalid characters so please ignore them):
> - define_index do
> 
> -   set_property :enable_star => true
> -   set_property :min_prefix_len => 3
> 
> -   indexes :forename, :sortable => true
> -   indexes :surname, :sortable => true
> -   indexes "forename + ' ' + surname", :as => 'name', :sortable =>
> true
> -   indexes skills(:id), :as => 'skills'
> 
> -   has :id, as => :user_id
> -   has :company_id
> -   set_property :delta => true
> 
> - end
> 
> 
> Now to the searching part. What I'm looking for is all those users
> within a particular company that match a set of skills I specify.
> Further, if I specify 2 skills and the user matches 1 out of the 2, I
> still want that user returned but obviously further down the list than
> a user that matches both. Here's a sample data set:
> 
> company:
> - id:1, name:Apple
> - id:2, name:Microsoft
> 
> users:
> - id:1, company_id:1, forename:John, surname:Doe, { associated skills:
> [ 1, 2 ] }
> - id:2, company_id:1, forename:Steve, surname:Jobs, { associated
> skills: [ 1, 2, 4 ] }
> - id:3, company_id:1, forename:Paul, surname:Graham, { associated
> skills: [ ] }
> - id:4, company_id:2, forename:Bill, surname:Gates, { associated
> skills: [ 2, 3, 4 ] }
> 
> skills:
> - id:1, title:Java
> - id:2, title:SQL
> - id:3, title:C#
> - id:4, title:Public Speaking
> 
> users_skills:
> - user_id:1, skill_id:1
> - user_id:1, skill_id:2
> 
> - user_id:2, skill_id:1
> - user_id:2, skill_id:2
> - user_id:2, skill_id:4
> 
> - user_id:4, skill_id:2
> - user_id:4, skill_id:3
> - user_id:4, skill_id:4
> 
> So, a simple scenario. We want to find those users from Apple that
> have SQL as a skill. Here's my search call:
> 
> - matching_users = User.search :with => { :company_id =>
> 1 }, :conditions => { :skills => '2' }, :match_mode => :any
> 
> Now, this morning, that seems to work fine, bringing back Steve Jobs
> and John Doe, with an equal weighting. Yesterday, however, I was also
> getting back Bill Gates on that search so the behaviour is a bit
> erratic. There is more, though. Remember Paul Graham (me), who has no
> skills? Well, let's try another search (this morning), this time for
> SQL and Java, but still only at Apple:
> 
> - matching_users = User.search :with => { :company_id =>
> 1 }, :conditions => { :skills => '1,2' }, :match_mode => :any
> 
> This time, I'm also getting Paul Graham back. Granted, he's at the
> bottom of the pile but why is he coming back at all? Worse, when I do
> a search for skills 3 and 4 within Apple (C# and Public Speaking) I
> STILL get back Paul Graham but this time I don't get back John Doe,
> which is correct. And then, even worse than that, when I do a search
> for skill 3 within Apple (C#), which no one has, the only one I get
> back is Paul Graham!
> 
> The behaviour seems very inconsistent and, in some cases, simply
> incorrect. Re-indexing and restarting ThinkingSphinx doesn't make any
> difference - the results always come back the same.
> 
> Another thing I've noticed is that the order in which the skills in
> the conditions are specified seems to be important. For example:
> 
> - matching_users = User.search :conditions => { :skills =>
> '1,2,3,4' }, :match_mode => :any
> 
> This should bring back John Doe, Steve Jobs, and Bill Gates (no
> filtering by company this time) and it does that... and also returns
> Paul Graham. However, the ordering is not what I would expect. Steve
> Jobs and Bill Gates both match 3 out of the 4 and John Doe matches 2
> (Paul Graham matches none, of course). However, the weighting looks
> like this:
> - Bill Gates => weight:53
> - Steve Jobs => weight:28
> - John Doe => weight:2
> - Paul Graham => weight:1
> 
> Okay, that's not the end of the world seeing as you could argue that
> there's some built-in ordering around forename and surname as well,
> which adjusts the weights. However, if we reverse the order in which
> those skills are specified:
> 
> - matching_users = User.search :conditions => { :skills =>
> '4,3,2,1' }, :match_mode => :any
> 
> We get the following weightings and ordering:
> - John Doe => weight:27
> - Steve Jobs => weight:3
> - Bill Gates => weight:3
> - Paul Graham => weight:1
> 
> John Doe definitely shouldn't be appearing at the top of that list.
> And why have Steve Jobs and Bill Gates reversed their order? Perhaps
> my idea about the forename and surname ordering was incorrect after
> all.
> 
> As you can see, it all looks very peculiar to me and some of the
> results coming back look like they simply shouldn't be there.
> 
> 
> I HAVE managed to get more consistent results by switching
> to :match_mode => :extended and :rank_mode => :wordcount, along with
> switching the :skills condition to being a pipe-separated list of
> skill_ids. I don't know if this is actually correct but without that
> rank_mode, the ordering depends entirely on the order in which the
> skill_ids are specified (as above) and without that match_mode, the
> results that are returned often include ones that shouldn't be there.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Thinking Sphinx" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/thinking-sphinx?hl=en.
> 

-- 
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/thinking-sphinx?hl=en.

Reply via email to