On 01/11/2013 05:23 PM, Gora Mohanty wrote:
You are still thinking of Solr as a RDBMS, where you should not
be. In your case, it is easiest to flatten out the data. This increases
the size of the index, but that should not really be of concern. As
your courses and languages tables are connected only to user, the
schema that I described earlier should suffice. To extend my
earlier example, given:
* userA with courses c1, c2, c3, and languages l1, l2
* userB with c2, c3, and l2
you should flatten it such that you get the following Solr documents
<userA> <c1 name> <c1 startdate>...<l1> <l1 writing skill>...
<userA> <c1 name> <c1 startdate>...<l2> <l2 writing skill>...
<userA> <c2 name> <c2 startdate>...<l1> <l1 writing skill>...
....
<userB> <c2 name> <c2 startdate>...<l2> <l2 writing skill>...
<userB> <c3 name> <c3 startdate>...<l2> <l2 writing skill>...
i.e., a total of 3 courses x 2 languages = 6 documents for
userA, and 2 courses x 1 language = 2 documents for userB

Actually, that is what you would get when doing a join in an RDBMS, the cross-product of your tables. This is NOT AT ALL what you typically do in Solr.

Best start the other way around, think of Solr as a retrieval system, not a storage system. What are your queries? What do you want to find, and what criteria do you use to search for it?

If your intention is to find users that match certain criteria, each entry should be a user (with ALL associated information, e.g. all courses, all language skills, etc.), if you want to retrieve courses, each entry should be a course.

Let's say you want to find users who have certain language skills, you would have a schema that describes a user:
- user id
- user name
- languages
- ...

In languages, you could store e.g. things like: en|reading|high es|writing|low, etc. It could be a multivalued field or just have everything separated by space and a tokenizer that splits on whitespace.

Now you can query:

- language:es* -- return all users with some spanish skills
- language:en|writing|high -- return all users with high english writing skills - +(language:es* language:fr*) +language:en|writing|high -- return users with high english writing skills and some knowledge of french or spanish

If you want to avoid wildcard queries (more costly) you can just add plain "en" and "es", etc. to your field so "language:es" will match anybody with spanish skills.

Best,
Jens

Reply via email to