Below is a summary of what I've been trying to achieve and some sample
code, along with what I've actually done that seems to produce the
right sort of results. I'm still not entirely sure it's the right way
to do it but hopefully you'll be able to steer me if I am indeed
wrong. I had intended to raise issues with the :with parameter but
that seems to be working fine this morning, although it seems not to
like using indexes and only works with attributes - that may be
designed behaviour, though I didn't spot that in the documentation.
Users in our application can be assigned "skills" - this is largely
expected to be technical skills, such as "Java" and "MySQL" but can be
pretty much anything. These skills are stored in a table and linked to
a user through a simple has_and_belongs_to_many relationship (i.e., a
user can have many skills and the same skill can belong to many
users). The models and join table look a bit like this:
users:
- id:integer
- company_id:integer
- forename:text
- surname:text
skills:
- id:integer
- title:text
users_skills:
- user_id:integer
- skill_id:integer
The User model has the following declaration:
- has_and_belongs_to_many :skills, :join_table => :users_skills
And the Skill model has a similar declaration:
- has_and_belongs_to_many :users, :join_table => :users_skills
I've got ThinkingSphinx all set up and working and I created the
following index on the User mode (not sure if indenting works on here
so I'm doing it myself with invalid characters so please ignore them):
- define_index do
- set_property :enable_star => true
- set_property :min_prefix_len => 3
- indexes :forename, :sortable => true
- indexes :surname, :sortable => true
- indexes "forename + ' ' + surname", :as => 'name', :sortable =>
true
- indexes skills(:id), :as => 'skills'
- has :id, as => :user_id
- has :company_id
- set_property :delta => true
- end
Now to the searching part. What I'm looking for is all those users
within a particular company that match a set of skills I specify.
Further, if I specify 2 skills and the user matches 1 out of the 2, I
still want that user returned but obviously further down the list than
a user that matches both. Here's a sample data set:
company:
- id:1, name:Apple
- id:2, name:Microsoft
users:
- id:1, company_id:1, forename:John, surname:Doe, { associated skills:
[ 1, 2 ] }
- id:2, company_id:1, forename:Steve, surname:Jobs, { associated
skills: [ 1, 2, 4 ] }
- id:3, company_id:1, forename:Paul, surname:Graham, { associated
skills: [ ] }
- id:4, company_id:2, forename:Bill, surname:Gates, { associated
skills: [ 2, 3, 4 ] }
skills:
- id:1, title:Java
- id:2, title:SQL
- id:3, title:C#
- id:4, title:Public Speaking
users_skills:
- user_id:1, skill_id:1
- user_id:1, skill_id:2
- user_id:2, skill_id:1
- user_id:2, skill_id:2
- user_id:2, skill_id:4
- user_id:4, skill_id:2
- user_id:4, skill_id:3
- user_id:4, skill_id:4
So, a simple scenario. We want to find those users from Apple that
have SQL as a skill. Here's my search call:
- matching_users = User.search :with => { :company_id =>
1 }, :conditions => { :skills => '2' }, :match_mode => :any
Now, this morning, that seems to work fine, bringing back Steve Jobs
and John Doe, with an equal weighting. Yesterday, however, I was also
getting back Bill Gates on that search so the behaviour is a bit
erratic. There is more, though. Remember Paul Graham (me), who has no
skills? Well, let's try another search (this morning), this time for
SQL and Java, but still only at Apple:
- matching_users = User.search :with => { :company_id =>
1 }, :conditions => { :skills => '1,2' }, :match_mode => :any
This time, I'm also getting Paul Graham back. Granted, he's at the
bottom of the pile but why is he coming back at all? Worse, when I do
a search for skills 3 and 4 within Apple (C# and Public Speaking) I
STILL get back Paul Graham but this time I don't get back John Doe,
which is correct. And then, even worse than that, when I do a search
for skill 3 within Apple (C#), which no one has, the only one I get
back is Paul Graham!
The behaviour seems very inconsistent and, in some cases, simply
incorrect. Re-indexing and restarting ThinkingSphinx doesn't make any
difference - the results always come back the same.
Another thing I've noticed is that the order in which the skills in
the conditions are specified seems to be important. For example:
- matching_users = User.search :conditions => { :skills =>
'1,2,3,4' }, :match_mode => :any
This should bring back John Doe, Steve Jobs, and Bill Gates (no
filtering by company this time) and it does that... and also returns
Paul Graham. However, the ordering is not what I would expect. Steve
Jobs and Bill Gates both match 3 out of the 4 and John Doe matches 2
(Paul Graham matches none, of course). However, the weighting looks
like this:
- Bill Gates => weight:53
- Steve Jobs => weight:28
- John Doe => weight:2
- Paul Graham => weight:1
Okay, that's not the end of the world seeing as you could argue that
there's some built-in ordering around forename and surname as well,
which adjusts the weights. However, if we reverse the order in which
those skills are specified:
- matching_users = User.search :conditions => { :skills =>
'4,3,2,1' }, :match_mode => :any
We get the following weightings and ordering:
- John Doe => weight:27
- Steve Jobs => weight:3
- Bill Gates => weight:3
- Paul Graham => weight:1
John Doe definitely shouldn't be appearing at the top of that list.
And why have Steve Jobs and Bill Gates reversed their order? Perhaps
my idea about the forename and surname ordering was incorrect after
all.
As you can see, it all looks very peculiar to me and some of the
results coming back look like they simply shouldn't be there.
I HAVE managed to get more consistent results by switching
to :match_mode => :extended and :rank_mode => :wordcount, along with
switching the :skills condition to being a pipe-separated list of
skill_ids. I don't know if this is actually correct but without that
rank_mode, the ordering depends entirely on the order in which the
skill_ids are specified (as above) and without that match_mode, the
results that are returned often include ones that shouldn't be there.
--
You received this message because you are subscribed to the Google Groups
"Thinking Sphinx" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/thinking-sphinx?hl=en.