On 01/11/2013 05:23 PM, Gora Mohanty wrote:
You are still thinking of Solr as a RDBMS, where you should not
be. In your case, it is easiest to flatten out the data. This increases
the size of the index, but that should not really be of concern. As
your courses and languages tables are connected only to user, the
schema that I described earlier should suffice. To extend my
earlier example, given:
* userA with courses c1, c2, c3, and languages l1, l2
* userB with c2, c3, and l2
you should flatten it such that you get the following Solr documents
<userA> <c1 name> <c1 startdate>...<l1> <l1 writing skill>...
<userA> <c1 name> <c1 startdate>...<l2> <l2 writing skill>...
<userA> <c2 name> <c2 startdate>...<l1> <l1 writing skill>...
....
<userB> <c2 name> <c2 startdate>...<l2> <l2 writing skill>...
<userB> <c3 name> <c3 startdate>...<l2> <l2 writing skill>...
i.e., a total of 3 courses x 2 languages = 6 documents for
userA, and 2 courses x 1 language = 2 documents for userB
Actually, that is what you would get when doing a join in an RDBMS, the
cross-product of your tables. This is NOT AT ALL what you typically do
in Solr.
Best start the other way around, think of Solr as a retrieval system,
not a storage system. What are your queries? What do you want to find,
and what criteria do you use to search for it?
If your intention is to find users that match certain criteria, each
entry should be a user (with ALL associated information, e.g. all
courses, all language skills, etc.), if you want to retrieve courses,
each entry should be a course.
Let's say you want to find users who have certain language skills, you
would have a schema that describes a user:
- user id
- user name
- languages
- ...
In languages, you could store e.g. things like: en|reading|high
es|writing|low, etc. It could be a multivalued field or just have
everything separated by space and a tokenizer that splits on whitespace.
Now you can query:
- language:es* -- return all users with some spanish skills
- language:en|writing|high -- return all users with high english writing
skills
- +(language:es* language:fr*) +language:en|writing|high -- return users
with high english writing skills and some knowledge of french or spanish
If you want to avoid wildcard queries (more costly) you can just add
plain "en" and "es", etc. to your field so "language:es" will match
anybody with spanish skills.
Best,
Jens