Re: configuring schema to match database
On 01/11/2013 06:14 PM, Gora Mohanty wrote: On 11 January 2013 22:30, Jens Grivolla j+...@grivolla.net wrote: [...] Actually, that is what you would get when doing a join in an RDBMS, the cross-product of your tables. This is NOT AT ALL what you typically do in Solr. Best start the other way around, think of Solr as a retrieval system, not a storage system. What are your queries? What do you want to find, and what criteria do you use to search for it? [...] Um, he did describe his desired queries, and there was a reason that I proposed the above schema design. He said he wants queries such as users how have taken courseA and are fluent in english, which is exactly one case I was describing. UserA has taken courseA, courseB and courseC and has writingskill good verbalskill good for english and writingskill excellent verbalskill excellent for spanish UserB has taken courseA, courseF, courseG and courseH and has writingskill fluent verbalskill fluent for english and writingskill good verbalskill good for italian Unless the index is becoming huge, I feel that it is better to flatten everything out rather than combine fields, and post-process the results. Then please show me the query to find users that are fluent in spanish and english. Bonus points if you manage to not retrieve the same user several times. (Hint, your schema stores only one language skill per row). Regards, Jens
Re: configuring schema to match database
On 14 January 2013 16:59, Jens Grivolla j+...@grivolla.net wrote: [...] Then please show me the query to find users that are fluent in spanish and english. Bonus points if you manage to not retrieve the same user several times. (Hint, your schema stores only one language skill per row). Doh! You are right, of course. Brainfart from my side. Regards, Gora
Re: configuring schema to match database
On 01/14/2013 12:50 PM, Gora Mohanty wrote: On 14 January 2013 16:59, Jens Grivolla j+...@grivolla.net wrote: [...] Then please show me the query to find users that are fluent in spanish and english. Bonus points if you manage to not retrieve the same user several times. (Hint, your schema stores only one language skill per row). Doh! You are right, of course. Brainfart from my side. Ok, I was starting to wonder if I was the one missing something. Re-reading what I wrote I see I may have sounded a bit rude, that was not my intention, sorry. Best, Jens
Re: configuring schema to match database
On 14 January 2013 17:28, Jens Grivolla j+...@grivolla.net wrote: On 01/14/2013 12:50 PM, Gora Mohanty wrote: [...] Doh! You are right, of course. Brainfart from my side. Ok, I was starting to wonder if I was the one missing something. Re-reading what I wrote I see I may have sounded a bit rude, that was not my intention, sorry. Did not take it as rude, and in any case am willing to tolerate a lot of impoliteness when someone shows me that I was wrong. Must have been half-asleep when I wrote my original reply, and was then trying to defend it. At least that's my story, and I am sticking to it :-) Regards, Gora
configuring schema to match database
Hi! I'm quite new to solr and trying to understand how to create a schema from how our postgres database and then search for the content in solr instead of querying the db. My question should be really easy, it has most likely been asked many times but still I'm not able to google any answer to it. To make it easy, I have 3 columns: users, courses and languages Users has columns , userid, firstname, lastname Courses has column coursename, startdate, enddate Languages has column language, writingskill, verbalskill UserA has taken courseA, courseB and courseC and has writingskill good verbalskill good for english and writingskill excellent verbalskill excellent for spanish UserB has taken courseA, courseF, courseG and courseH and has writingskill fluent verbalskill fluent for english and writingskill good verbalskill good for italian I would like to put this data into solr so I can search for all users how have taken courseA and are fluent in english. Can I do that? The problem is I'm not sure how to flatten this database into a schema It's easy to understand the users column, for example field name=userid type=string indexed=true / field name=firstname type=string indexed=true / field name=lastname type=string indexed=true / But then I'm not so sure how the schema should look like for courses and languages field name=userid type=string indexed=true / field name=coursename type=string indexed=true / field name=startdate type=string indexed=true / field name=enddate type=string indexed=true / Thanks for any help /Niklas
Re: configuring schema to match database
Hi Niklas, Maybe this link helps: http://www.coderthing.com/solr-with-multicore-and-database-hook-part-1/ D. On Fri, Jan 11, 2013 at 2:19 PM, Niklas Langvig niklas.lang...@globesoft.com wrote: Hi! I'm quite new to solr and trying to understand how to create a schema from how our postgres database and then search for the content in solr instead of querying the db. My question should be really easy, it has most likely been asked many times but still I'm not able to google any answer to it. To make it easy, I have 3 columns: users, courses and languages Users has columns , userid, firstname, lastname Courses has column coursename, startdate, enddate Languages has column language, writingskill, verbalskill UserA has taken courseA, courseB and courseC and has writingskill good verbalskill good for english and writingskill excellent verbalskill excellent for spanish UserB has taken courseA, courseF, courseG and courseH and has writingskill fluent verbalskill fluent for english and writingskill good verbalskill good for italian I would like to put this data into solr so I can search for all users how have taken courseA and are fluent in english. Can I do that? The problem is I'm not sure how to flatten this database into a schema It's easy to understand the users column, for example field name=userid type=string indexed=true / field name=firstname type=string indexed=true / field name=lastname type=string indexed=true / But then I'm not so sure how the schema should look like for courses and languages field name=userid type=string indexed=true / field name=coursename type=string indexed=true / field name=startdate type=string indexed=true / field name=enddate type=string indexed=true / Thanks for any help /Niklas
SV: configuring schema to match database
When thinkting some more, Perhaps I could have coursename and such as multivalue? Or should I have separate indeces for users, courses and languages? I get the feeling both would work, but now sure which way is the best to go. When a user is updating/removing/adding a course it would be nice to to have to query the database for users courses and languages and update everything but just update a course document But perhaps I'm thinking to much in database terms? But still I'm unsure how the schema should look like Thanks /Niklas -Ursprungligt meddelande- Från: Niklas Langvig [mailto:niklas.lang...@globesoft.com] Skickat: den 11 januari 2013 14:19 Till: solr-user@lucene.apache.org Ämne: configuring schema to match database Hi! I'm quite new to solr and trying to understand how to create a schema from how our postgres database and then search for the content in solr instead of querying the db. My question should be really easy, it has most likely been asked many times but still I'm not able to google any answer to it. To make it easy, I have 3 columns: users, courses and languages Users has columns , userid, firstname, lastname Courses has column coursename, startdate, enddate Languages has column language, writingskill, verbalskill UserA has taken courseA, courseB and courseC and has writingskill good verbalskill good for english and writingskill excellent verbalskill excellent for spanish UserB has taken courseA, courseF, courseG and courseH and has writingskill fluent verbalskill fluent for english and writingskill good verbalskill good for italian I would like to put this data into solr so I can search for all users how have taken courseA and are fluent in english. Can I do that? The problem is I'm not sure how to flatten this database into a schema It's easy to understand the users column, for example field name=userid type=string indexed=true / field name=firstname type=string indexed=true / field name=lastname type=string indexed=true / But then I'm not so sure how the schema should look like for courses and languages field name=userid type=string indexed=true / field name=coursename type=string indexed=true / field name=startdate type=string indexed=true / field name=enddate type=string indexed=true / Thanks for any help /Niklas
SV: configuring schema to match database
Hmm noticed I wrote I have 3 columns: users, courses and languages I ofcourse mean I have 3 tables: users, courses and languages /Niklas -Ursprungligt meddelande- Från: Niklas Langvig [mailto:niklas.lang...@globesoft.com] Skickat: den 11 januari 2013 14:19 Till: solr-user@lucene.apache.org Ämne: configuring schema to match database Hi! I'm quite new to solr and trying to understand how to create a schema from how our postgres database and then search for the content in solr instead of querying the db. My question should be really easy, it has most likely been asked many times but still I'm not able to google any answer to it. To make it easy, I have 3 columns: users, courses and languages Users has columns , userid, firstname, lastname Courses has column coursename, startdate, enddate Languages has column language, writingskill, verbalskill UserA has taken courseA, courseB and courseC and has writingskill good verbalskill good for english and writingskill excellent verbalskill excellent for spanish UserB has taken courseA, courseF, courseG and courseH and has writingskill fluent verbalskill fluent for english and writingskill good verbalskill good for italian I would like to put this data into solr so I can search for all users how have taken courseA and are fluent in english. Can I do that? The problem is I'm not sure how to flatten this database into a schema It's easy to understand the users column, for example field name=userid type=string indexed=true / field name=firstname type=string indexed=true / field name=lastname type=string indexed=true / But then I'm not so sure how the schema should look like for courses and languages field name=userid type=string indexed=true / field name=coursename type=string indexed=true / field name=startdate type=string indexed=true / field name=enddate type=string indexed=true / Thanks for any help /Niklas
SV: configuring schema to match database
Hi Dariusz, To me this example has one table user and I have many tables that connects to one user and that is what I'm unsure how how to do. /Niklas -Ursprungligt meddelande- Från: Dariusz Borowski [mailto:darius...@gmail.com] Skickat: den 11 januari 2013 14:56 Till: solr-user@lucene.apache.org Ämne: Re: configuring schema to match database Hi Niklas, Maybe this link helps: http://www.coderthing.com/solr-with-multicore-and-database-hook-part-1/ D. On Fri, Jan 11, 2013 at 2:19 PM, Niklas Langvig niklas.lang...@globesoft.com wrote: Hi! I'm quite new to solr and trying to understand how to create a schema from how our postgres database and then search for the content in solr instead of querying the db. My question should be really easy, it has most likely been asked many times but still I'm not able to google any answer to it. To make it easy, I have 3 columns: users, courses and languages Users has columns , userid, firstname, lastname Courses has column coursename, startdate, enddate Languages has column language, writingskill, verbalskill UserA has taken courseA, courseB and courseC and has writingskill good verbalskill good for english and writingskill excellent verbalskill excellent for spanish UserB has taken courseA, courseF, courseG and courseH and has writingskill fluent verbalskill fluent for english and writingskill good verbalskill good for italian I would like to put this data into solr so I can search for all users how have taken courseA and are fluent in english. Can I do that? The problem is I'm not sure how to flatten this database into a schema It's easy to understand the users column, for example field name=userid type=string indexed=true / field name=firstname type=string indexed=true / field name=lastname type=string indexed=true / But then I'm not so sure how the schema should look like for courses and languages field name=userid type=string indexed=true / field name=coursename type=string indexed=true / field name=startdate type=string indexed=true / field name=enddate type=string indexed=true / Thanks for any help /Niklas
Re: configuring schema to match database
Hi, No, it has actually two tables. User and Item. The example shown on the blog is for one table, because you repeat the same thing for the other table. Only your data-import.xml file changes. For the rest, just copy and paste it in the conf directory. If you are running your solr in Linux, then you can work with symlinks. D. On Fri, Jan 11, 2013 at 3:12 PM, Niklas Langvig niklas.lang...@globesoft.com wrote: Hi Dariusz, To me this example has one table user and I have many tables that connects to one user and that is what I'm unsure how how to do. /Niklas -Ursprungligt meddelande- Från: Dariusz Borowski [mailto:darius...@gmail.com] Skickat: den 11 januari 2013 14:56 Till: solr-user@lucene.apache.org Ämne: Re: configuring schema to match database Hi Niklas, Maybe this link helps: http://www.coderthing.com/solr-with-multicore-and-database-hook-part-1/ D. On Fri, Jan 11, 2013 at 2:19 PM, Niklas Langvig niklas.lang...@globesoft.com wrote: Hi! I'm quite new to solr and trying to understand how to create a schema from how our postgres database and then search for the content in solr instead of querying the db. My question should be really easy, it has most likely been asked many times but still I'm not able to google any answer to it. To make it easy, I have 3 columns: users, courses and languages Users has columns , userid, firstname, lastname Courses has column coursename, startdate, enddate Languages has column language, writingskill, verbalskill UserA has taken courseA, courseB and courseC and has writingskill good verbalskill good for english and writingskill excellent verbalskill excellent for spanish UserB has taken courseA, courseF, courseG and courseH and has writingskill fluent verbalskill fluent for english and writingskill good verbalskill good for italian I would like to put this data into solr so I can search for all users how have taken courseA and are fluent in english. Can I do that? The problem is I'm not sure how to flatten this database into a schema It's easy to understand the users column, for example field name=userid type=string indexed=true / field name=firstname type=string indexed=true / field name=lastname type=string indexed=true / But then I'm not so sure how the schema should look like for courses and languages field name=userid type=string indexed=true / field name=coursename type=string indexed=true / field name=startdate type=string indexed=true / field name=enddate type=string indexed=true / Thanks for any help /Niklas
SV: configuring schema to match database
Ahh sorry, Now I understand, Ok seems like a good solution, I just know need to understand how to query multiple cores now :) -Ursprungligt meddelande- Från: Dariusz Borowski [mailto:darius...@gmail.com] Skickat: den 11 januari 2013 15:15 Till: solr-user@lucene.apache.org Ämne: Re: configuring schema to match database Hi, No, it has actually two tables. User and Item. The example shown on the blog is for one table, because you repeat the same thing for the other table. Only your data-import.xml file changes. For the rest, just copy and paste it in the conf directory. If you are running your solr in Linux, then you can work with symlinks. D. On Fri, Jan 11, 2013 at 3:12 PM, Niklas Langvig niklas.lang...@globesoft.com wrote: Hi Dariusz, To me this example has one table user and I have many tables that connects to one user and that is what I'm unsure how how to do. /Niklas -Ursprungligt meddelande- Från: Dariusz Borowski [mailto:darius...@gmail.com] Skickat: den 11 januari 2013 14:56 Till: solr-user@lucene.apache.org Ämne: Re: configuring schema to match database Hi Niklas, Maybe this link helps: http://www.coderthing.com/solr-with-multicore-and-database-hook-part-1 / D. On Fri, Jan 11, 2013 at 2:19 PM, Niklas Langvig niklas.lang...@globesoft.com wrote: Hi! I'm quite new to solr and trying to understand how to create a schema from how our postgres database and then search for the content in solr instead of querying the db. My question should be really easy, it has most likely been asked many times but still I'm not able to google any answer to it. To make it easy, I have 3 columns: users, courses and languages Users has columns , userid, firstname, lastname Courses has column coursename, startdate, enddate Languages has column language, writingskill, verbalskill UserA has taken courseA, courseB and courseC and has writingskill good verbalskill good for english and writingskill excellent verbalskill excellent for spanish UserB has taken courseA, courseF, courseG and courseH and has writingskill fluent verbalskill fluent for english and writingskill good verbalskill good for italian I would like to put this data into solr so I can search for all users how have taken courseA and are fluent in english. Can I do that? The problem is I'm not sure how to flatten this database into a schema It's easy to understand the users column, for example field name=userid type=string indexed=true / field name=firstname type=string indexed=true / field name=lastname type=string indexed=true / But then I'm not so sure how the schema should look like for courses and languages field name=userid type=string indexed=true / field name=coursename type=string indexed=true / field name=startdate type=string indexed=true / field name=enddate type=string indexed=true / Thanks for any help /Niklas
Re: configuring schema to match database
I don't know how to query multiple cores and if it's possible at once, but otherwise I would create a JOIN sql script if you need values from multiple tables. D. On Fri, Jan 11, 2013 at 3:27 PM, Niklas Langvig niklas.lang...@globesoft.com wrote: Ahh sorry, Now I understand, Ok seems like a good solution, I just know need to understand how to query multiple cores now :) -Ursprungligt meddelande- Från: Dariusz Borowski [mailto:darius...@gmail.com] Skickat: den 11 januari 2013 15:15 Till: solr-user@lucene.apache.org Ämne: Re: configuring schema to match database Hi, No, it has actually two tables. User and Item. The example shown on the blog is for one table, because you repeat the same thing for the other table. Only your data-import.xml file changes. For the rest, just copy and paste it in the conf directory. If you are running your solr in Linux, then you can work with symlinks. D. On Fri, Jan 11, 2013 at 3:12 PM, Niklas Langvig niklas.lang...@globesoft.com wrote: Hi Dariusz, To me this example has one table user and I have many tables that connects to one user and that is what I'm unsure how how to do. /Niklas -Ursprungligt meddelande- Från: Dariusz Borowski [mailto:darius...@gmail.com] Skickat: den 11 januari 2013 14:56 Till: solr-user@lucene.apache.org Ämne: Re: configuring schema to match database Hi Niklas, Maybe this link helps: http://www.coderthing.com/solr-with-multicore-and-database-hook-part-1 / D. On Fri, Jan 11, 2013 at 2:19 PM, Niklas Langvig niklas.lang...@globesoft.com wrote: Hi! I'm quite new to solr and trying to understand how to create a schema from how our postgres database and then search for the content in solr instead of querying the db. My question should be really easy, it has most likely been asked many times but still I'm not able to google any answer to it. To make it easy, I have 3 columns: users, courses and languages Users has columns , userid, firstname, lastname Courses has column coursename, startdate, enddate Languages has column language, writingskill, verbalskill UserA has taken courseA, courseB and courseC and has writingskill good verbalskill good for english and writingskill excellent verbalskill excellent for spanish UserB has taken courseA, courseF, courseG and courseH and has writingskill fluent verbalskill fluent for english and writingskill good verbalskill good for italian I would like to put this data into solr so I can search for all users how have taken courseA and are fluent in english. Can I do that? The problem is I'm not sure how to flatten this database into a schema It's easy to understand the users column, for example field name=userid type=string indexed=true / field name=firstname type=string indexed=true / field name=lastname type=string indexed=true / But then I'm not so sure how the schema should look like for courses and languages field name=userid type=string indexed=true / field name=coursename type=string indexed=true / field name=startdate type=string indexed=true / field name=enddate type=string indexed=true / Thanks for any help /Niklas
Re: configuring schema to match database
On 11 January 2013 19:57, Niklas Langvig niklas.lang...@globesoft.com wrote: Ahh sorry, Now I understand, Ok seems like a good solution, I just know need to understand how to query multiple cores now :) There is no need to use multiple cores in your setup. Going back to your original problem statement, it can easily be handled with a single core, and it actually makes more sense to do it that way. You will need to give us more details. My question should be really easy, it has most likely been asked many times but still I'm not able to google any answer to it. To make it easy, I have 3 columns: users, courses and languages Presumably, you mean three tables, as you describe each as having columns. How are the tables connected? Is there a foreign key relationship between them? Is the relationship one-to-one, one-to-many, or what? Users has columns , userid, firstname, lastname Courses has column coursename, startdate, enddate Languages has column language, writingskill, verbalskill [...] I would like to put this data into solr so I can search for all users how have taken courseA and are fluent in english. Can I do that? 1. Your schema for the single core is quite straightforward, and along the lines of what you had described (one field for each database column in each table), e.g., field name=userid type=string indexed=true / field name=firstname type=string indexed=true / field name=lastname type=string indexed=true / field name=coursename type=string indexed=true / field name=startdate type=date indexed=true / field name=enddate type= indexed=true / field name=language type=string indexed=true / field name=writingskill type=string indexed=true / field name=verbalskill type=string indexed=true / Pay attention to the type. Dates should typically be solr.DateField. The others can be strings, but if they are integers in the database, you might benefit from making these integers in Solr also. 2. One has to stop thinking of Solr as a RDBMS. Instead, one flattens out data from a typical RDBMS structure. It is difficult to give you complete instructions unless you describe the database relationships, but, e.g., if one has userA with course1, course2, and course3, and userB with course2, course4, the Solr documents would be : userA course1 details for course1... userA course2 details for course2... userA course3 details for course3... userB course2 details for course2... userB course4 details for course4... This scheme could also be extended to languages, depending on how the tables are related. 3. While indexing into Solr, one has to select from the database, and flatten out the data as above. The two main ways of doing this are using a library like SolrJ for Java (other languages have other libraries, e.g., django-haystack is easy to get started with if one is using Python/Django), or the Solr DataImportHandler (please see http://wiki.apache.org/solr/DataImportHandler ) with nested entities. 4. With such a structure, querying Solr should be simple. Regards, Gora
SV: configuring schema to match database
It sounds good not to use more than one core, for sure I do not want to over complicate this. Yes I meant tables. It's pretty simple. Both table courses and languages has it's own primary key courseseqno and languagesseqno Both also have a foreign key userid that references the users table with column userid The relationship from users to courses and languages are one-to-many. but I guess I'm thinking wrong because my idead whould be to have a block of fields connected with one id field name=coursename type=string indexed=true / field name=startdate type=date indexed=true / field name=enddate type= indexed=true / These three are connected with a field name=courseseqno type=int indexed=true / But also have a field name=userid type=int indexed=true / To connect to a specific user? Thanks /Niklas -Ursprungligt meddelande- Från: Gora Mohanty [mailto:g...@mimirtech.com] Skickat: den 11 januari 2013 15:55 Till: solr-user@lucene.apache.org Ämne: Re: configuring schema to match database On 11 January 2013 19:57, Niklas Langvig niklas.lang...@globesoft.com wrote: Ahh sorry, Now I understand, Ok seems like a good solution, I just know need to understand how to query multiple cores now :) There is no need to use multiple cores in your setup. Going back to your original problem statement, it can easily be handled with a single core, and it actually makes more sense to do it that way. You will need to give us more details. My question should be really easy, it has most likely been asked many times but still I'm not able to google any answer to it. To make it easy, I have 3 columns: users, courses and languages Presumably, you mean three tables, as you describe each as having columns. How are the tables connected? Is there a foreign key relationship between them? Is the relationship one-to-one, one-to-many, or what? Users has columns , userid, firstname, lastname Courses has column coursename, startdate, enddate Languages has column language, writingskill, verbalskill [...] I would like to put this data into solr so I can search for all users how have taken courseA and are fluent in english. Can I do that? 1. Your schema for the single core is quite straightforward, and along the lines of what you had described (one field for each database column in each table), e.g., field name=userid type=string indexed=true / field name=firstname type=string indexed=true / field name=lastname type=string indexed=true / field name=coursename type=string indexed=true / field name=startdate type=date indexed=true / field name=enddate type= indexed=true / field name=language type=string indexed=true / field name=writingskill type=string indexed=true / field name=verbalskill type=string indexed=true / Pay attention to the type. Dates should typically be solr.DateField. The others can be strings, but if they are integers in the database, you might benefit from making these integers in Solr also. 2. One has to stop thinking of Solr as a RDBMS. Instead, one flattens out data from a typical RDBMS structure. It is difficult to give you complete instructions unless you describe the database relationships, but, e.g., if one has userA with course1, course2, and course3, and userB with course2, course4, the Solr documents would be : userA course1 details for course1... userA course2 details for course2... userA course3 details for course3... userB course2 details for course2... userB course4 details for course4... This scheme could also be extended to languages, depending on how the tables are related. 3. While indexing into Solr, one has to select from the database, and flatten out the data as above. The two main ways of doing this are using a library like SolrJ for Java (other languages have other libraries, e.g., django-haystack is easy to get started with if one is using Python/Django), or the Solr DataImportHandler (please see http://wiki.apache.org/solr/DataImportHandler ) with nested entities. 4. With such a structure, querying Solr should be simple. Regards, Gora
Re: configuring schema to match database
On 11 January 2013 21:13, Niklas Langvig niklas.lang...@globesoft.com wrote: It sounds good not to use more than one core, for sure I do not want to over complicate this. [...] Yes, not only are multiple cores unnecessarily complicated here, your searches will also be be less complex, and faster. Both table courses and languages has it's own primary key courseseqno and languagesseqno There is no need to index these. Both also have a foreign key userid that references the users table with column userid The relationship from users to courses and languages are one-to-many. but I guess I'm thinking wrong because my idead whould be to have a block of fields connected with one id field name=coursename type=string indexed=true / field name=startdate type=date indexed=true / field name=enddate type= indexed=true / These three are connected with a field name=courseseqno type=int indexed=true / But also have a field name=userid type=int indexed=true / To connect to a specific user? [...] You are still thinking of Solr as a RDBMS, where you should not be. In your case, it is easiest to flatten out the data. This increases the size of the index, but that should not really be of concern. As your courses and languages tables are connected only to user, the schema that I described earlier should suffice. To extend my earlier example, given: * userA with courses c1, c2, c3, and languages l1, l2 * userB with c2, c3, and l2 you should flatten it such that you get the following Solr documents userA c1 name c1 startdate...l1 l1 writing skill... userA c1 name c1 startdate...l2 l2 writing skill... userA c2 name c2 startdate...l1 l1 writing skill... ... userB c2 name c2 startdate...l2 l2 writing skill... userB c3 name c3 startdate...l2 l2 writing skill... i.e., a total of 3 courses x 2 languages = 6 documents for userA, and 2 courses x 1 language = 2 documents for userB In order to get this form of flattened data into Solr, I would suggest using the DataImportHandler with nested entities. Please see the earlier link to DIH. Also, a Google search for Solr dataimporthandler nested entities turns up many examples, including: http://solr.pl/en/2010/10/11/data-import-handler-%E2%80%93-how-to-import-data-from-sql-databases-part-1/ Please give it a try, and post here with your attempts if you run into any issues. Regards, Gora
Re: configuring schema to match database
On 01/11/2013 05:23 PM, Gora Mohanty wrote: You are still thinking of Solr as a RDBMS, where you should not be. In your case, it is easiest to flatten out the data. This increases the size of the index, but that should not really be of concern. As your courses and languages tables are connected only to user, the schema that I described earlier should suffice. To extend my earlier example, given: * userA with courses c1, c2, c3, and languages l1, l2 * userB with c2, c3, and l2 you should flatten it such that you get the following Solr documents userA c1 name c1 startdate...l1 l1 writing skill... userA c1 name c1 startdate...l2 l2 writing skill... userA c2 name c2 startdate...l1 l1 writing skill... userB c2 name c2 startdate...l2 l2 writing skill... userB c3 name c3 startdate...l2 l2 writing skill... i.e., a total of 3 courses x 2 languages = 6 documents for userA, and 2 courses x 1 language = 2 documents for userB Actually, that is what you would get when doing a join in an RDBMS, the cross-product of your tables. This is NOT AT ALL what you typically do in Solr. Best start the other way around, think of Solr as a retrieval system, not a storage system. What are your queries? What do you want to find, and what criteria do you use to search for it? If your intention is to find users that match certain criteria, each entry should be a user (with ALL associated information, e.g. all courses, all language skills, etc.), if you want to retrieve courses, each entry should be a course. Let's say you want to find users who have certain language skills, you would have a schema that describes a user: - user id - user name - languages - ... In languages, you could store e.g. things like: en|reading|high es|writing|low, etc. It could be a multivalued field or just have everything separated by space and a tokenizer that splits on whitespace. Now you can query: - language:es* -- return all users with some spanish skills - language:en|writing|high -- return all users with high english writing skills - +(language:es* language:fr*) +language:en|writing|high -- return users with high english writing skills and some knowledge of french or spanish If you want to avoid wildcard queries (more costly) you can just add plain en and es, etc. to your field so language:es will match anybody with spanish skills. Best, Jens
Re: configuring schema to match database
On 11 January 2013 22:30, Jens Grivolla j+...@grivolla.net wrote: [...] Actually, that is what you would get when doing a join in an RDBMS, the cross-product of your tables. This is NOT AT ALL what you typically do in Solr. Best start the other way around, think of Solr as a retrieval system, not a storage system. What are your queries? What do you want to find, and what criteria do you use to search for it? [...] Um, he did describe his desired queries, and there was a reason that I proposed the above schema design. UserA has taken courseA, courseB and courseC and has writingskill good verbalskill good for english and writingskill excellent verbalskill excellent for spanish UserB has taken courseA, courseF, courseG and courseH and has writingskill fluent verbalskill fluent for english and writingskill good verbalskill good for italian Unless the index is becoming huge, I feel that it is better to flatten everything out rather than combine fields, and post-process the results. Regards, Gora