Well, Avoiding flattening the db to a flat table sounds like a great plan. I found this solution http://wiki.apache.org/solr/DataImportHandler#Full_Import_Example
import.a join. not handling a flat table. On Tue, Jun 18, 2013 at 5:53 PM, Jack Krupansky <j...@basetechnology.com>wrote: > You can in fact have multiple collections in Solr and do a limited amount > of joining, and Solr has multivalued fields as well, but none of those > techniques should be used to avoid the process of flattening and > denormalizing a relational data model. It is hard work, but yes, it is > required to use Solr effectively. > > Again, start with the queries - what problem are you trying to solve. > Nobody stores data just for the sake of storing it - how will the data be > used? > > > -- Jack Krupansky > > -----Original Message----- From: Mysurf Mail > Sent: Tuesday, June 18, 2013 9:58 AM > > To: solr-user@lucene.apache.org > Subject: Re: How to define my data in schema.xml > > Hi Jack, > Thanks, for you kind comment. > > I am truly in the beginning of data modeling my schema over an existing > working DB. > I have used the school-teachers-student db as an example scenario. > (a, I have written it as a disclaimer in my first post. b. I really do not > know anyone that has 300 hobbies too.) > > In real life my db is obviously much different, > I just used this as an example of potential pitfalls that will occur if I > use my old db data modeling notions. > obviously, the old relational modeling idioms do not apply here. > > Now, my question was referring to the fact that I would really like to > avoid a flat table/join/view because of the reason listed above. > So, my scenario is answering a plain user generated text search over a > MSSQLDB that contains a few 1:n relation (and a few 1:n:n relationship). > > So, I come here for tips. Should I use one combined index (treat it as a > nosql source) or separate indices or another. any other ways to define > relation data ? > Thanks. > > > > On Tue, Jun 18, 2013 at 4:30 PM, Jack Krupansky <j...@basetechnology.com>* > *wrote: > > It sounds like you still have a lot of work to do on your data model. No >> matter how you slice it, 8 billion rows/fields/whatever is still way too >> much for any engine to search on a single server. If you have 8 billion of >> anything, a heavily sharded SolrCloud cluster is probably warranted. Don't >> plan ahead to put more than 100 million rows on a single node; plan on a >> proof of concept implementation to determine that number. >> >> When we in Solr land say "flattened" or "denormalized", we mean in an >> intelligent, "smart", thoughtful sense, not a mindless, mechanical >> flattening. It is an opportunity for you to reconsider your data models, >> both old and new. >> >> Maybe data modeling is beyond your skill set. If so, have a chat with your >> boss and ask for some assistance, training, whatever. >> >> Actually, I am suspicious of your 8 billion number - change each of those >> 300's to realistic, average numbers. Each teacher teaches 300 courses? >> Right. Each Student has 300 hobbies? If you say so, but... >> >> Don't worry about schema.xml until you get your data model under control. >> >> For an initial focus, try envisioning the use cases for user queries. That >> will guide you in thinking about how the data would need to be organized >> to >> satisfy those user queries. >> >> -- Jack Krupansky >> >> -----Original Message----- From: Mysurf Mail >> Sent: Tuesday, June 18, 2013 2:20 AM >> To: solr-user@lucene.apache.org >> Subject: Re: How to define my data in schema.xml >> >> >> Thanks for your reply. >> I have tried the simplest approach and it works absolutely fantastic. >> Huge table - 0s to result. >> >> two problems as I described earlier, and that is what I try to solve: >> 1. I create a flat table just for solar. This requires maintenance and >> develop. Can I run solr over my regular tables? >> This is my simplest approach. Working over my relational tables, >> 2. When you query a flat table by school name, as I described, if the >> school has 300 student, 300 teachers, 300 with 300 teacherCourses, 300 >> studentHobbies, >> you get 8.1 Billion rows (300*300*300*300). As I am sure this will work >> great on solar - searching for the school name will retrieve 8.1 B rows. >> 3. Lets say all my searches are user generated free text search that is >> searching name and comments columns. >> Thanks. >> >> >> On Tue, Jun 18, 2013 at 7:32 AM, Gora Mohanty <g...@mimirtech.com> wrote: >> >> On 18 June 2013 01:10, Mysurf Mail <stammail...@gmail.com> wrote: >> >>> > Thanks for your quick reply. Here are some notes: >>> > >>> > 1. Consider that all tables in my example have two columns: Name & >>> > Description which I would like to index and search. >>> > 2. I have no other reason to create flat table other than for solar. So >>> > I >>> > would like to see if I can avoid it. >>> > 3. If in my example I will have a flat table then obviously it will > >>> hold >>> a >>> > lot of rows for a single school. >>> > By searching the exact school name I will likely receive a lot of >>> rows. >>> > (my flat table has its own pk) >>> >>> Yes, all of this is definitely the case, but in practice >>> it does not matter. Solr can efficiently search through >>> millions of rows. To start with, just try the simplest >>> approach, and only complicate things as and when >>> needed. >>> >>> > That is something I would like to avoid and I thought I can avoid >>> this >>> > by defining teachers and students as multiple value or something like >>> this >>> > and than teacherCourses and studentHobbies as 1:n respectively. >>> > This is quite similiar to my real life demand, so I came here to > >>> get >>> > some tips as a solr noob. >>> >>> You have still not described what are the searches that >>> you would want to do. Again, I would suggest starting >>> with the most straightforward approach. >>> >>> Regards, >>> Gora >>> >>> >>> >> >