Re: need best solution for indexing and searching multiple, related database tables
Fyi, block join query doesnt require denormalization, performant, but has own limitations, of course. Many to many is the most painful point. I deal with it but quite far from contributing generally applicable approach. 29.09.2012 5:21 пользователь Biff Baxter tom.bren...@acmedata.net написал: Hi Walter, I have bought into the denormalize approach. My remaining questions are around how to construct the denormlized view and any solr functions that would support issues related to a) minimizing the denormalization explosion for 3 or more tables and b) handling many to many relationships. One issue I am concerned with is, if I search for IBM and Steve Jones in my example above, no records should be returned. How do I manage that with the equivalent of the one denormalized record approach? I appreciate your help. Biff -- View this message in context: http://lucene.472066.n3.nabble.com/need-best-solution-for-indexing-and-searching-multiple-related-database-tables-tp4009857p4011010.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: need best solution for indexing and searching multiple, related database tables
Hi jimtronic, I think you have the answers I need. I have several questions. As a rough example, when you create your view, if you have one parent record and three child records, do you end up with three denormalized records or one record that has data for three child records consolidated into each child field? To illustrate, if you start with two tables that look like this: http://lucene.472066.n3.nabble.com/file/n4011006/tables.jpg does your view look like this: http://lucene.472066.n3.nabble.com/file/n4011006/denormalized.jpg or this: http://lucene.472066.n3.nabble.com/file/n4011006/consolidated_fields.jpg Biff -- View this message in context: http://lucene.472066.n3.nabble.com/need-best-solution-for-indexing-and-searching-multiple-related-database-tables-tp4009857p4011006.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: need best solution for indexing and searching multiple, related database tables
Hi Jack, The queries we need to do have a lot in common with SQL queries in that they are field specific, but we need lots of fuzzy stuff so SQL won't do. Speed is also an issue. I will provide sample queries and data if needed. For now, I am waiting to hear back from jimtronic. Thanks for your reply. Biff -- View this message in context: http://lucene.472066.n3.nabble.com/need-best-solution-for-indexing-and-searching-multiple-related-database-tables-tp4009857p4011007.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: need best solution for indexing and searching multiple, related database tables
Denormalize. Think about a view that shows your results page. Put that view in Solr. Query and get ready for speed. wunder On Sep 28, 2012, at 5:34 PM, Biff Baxter wrote: Hi Jack, The queries we need to do have a lot in common with SQL queries in that they are field specific, but we need lots of fuzzy stuff so SQL won't do. Speed is also an issue. I will provide sample queries and data if needed. For now, I am waiting to hear back from jimtronic. Thanks for your reply. Biff -- View this message in context: http://lucene.472066.n3.nabble.com/need-best-solution-for-indexing-and-searching-multiple-related-database-tables-tp4009857p4011007.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: need best solution for indexing and searching multiple, related database tables
Hi Walter, I have bought into the denormalize approach. My remaining questions are around how to construct the denormlized view and any solr functions that would support issues related to a) minimizing the denormalization explosion for 3 or more tables and b) handling many to many relationships. One issue I am concerned with is, if I search for IBM and Steve Jones in my example above, no records should be returned. How do I manage that with the equivalent of the one denormalized record approach? I appreciate your help. Biff -- View this message in context: http://lucene.472066.n3.nabble.com/need-best-solution-for-indexing-and-searching-multiple-related-database-tables-tp4009857p4011010.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: need best solution for indexing and searching multiple, related database tables
I'm not sure if this will be relevant for you, but this is roughly what I do. Apologies if it's too basic. I have a complex view that normalizes all the data that I need to be together -- from over a dozen different tables. For one to many and many to many relationships, I have sql turn the data into a comma delimited string which the data import handler and the RegexTransformer will split into a multi-valued field. So, you might have a schema like this: id123/id name_sJohn Smith/name_s attr_products strpython/str strjava/str strjavascript/str /attr_products Often I've found that I don't really need to the data together into one solr core and it works better to just create a separate core just for that schema. -- View this message in context: http://lucene.472066.n3.nabble.com/need-best-solution-for-indexing-and-searching-multiple-related-database-tables-tp4009857p4009879.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: need best solution for indexing and searching multiple, related database tables
Could supply some sample user queries and some sample data the queries should match? In other words, how do your users expect to view the data? If you are simply trying to replicate full SQL queries in Solr, you're probably going to be disappointed, but if you look at what queries your users are likely to want to enter, maybe it won't be so bad. And maybe Solr's limited join capabilities might be sufficient to bridge the gap between a single flat schema and many relational tables. http://wiki.apache.org/solr/Join Join support is there, but don't leap before you think carefully. -- Jack Krupansky -Original Message- From: Thomas J. Brennan Sent: Monday, September 24, 2012 10:23 AM To: solr-user@lucene.apache.org Subject: need best solution for indexing and searching multiple, related database tables I have a requirement to search multiple, related database tables. Sometimes I need to join two tables, sometimes three or four and possibly more. The tables will generally store structured data related to individuals or organizations. This will include things like company, contact and address tables and may include other child tables like products, assets, etc. It is something of a moving target. Record counts are commonly in the tens of millions but can be upwards of a few hundred million or even much more. My understanding is that denormalization is most commonly the preferred solution. For two tables that is pretty straightforward. For three or four or more tables, or many to many relationships, and depending on the record counts, this can generate a lot of redundant data, indexing time, etc. Any information on the best way to design a single approach to this problem or any options I might employ like faceted search, NoSQL (based on my limited research I am guessing this is not a solution), etc. would be greatly appreciated. Answers that are terribly obvious, even to a newb, are a tiny bit annoying. Things like you should test several scenarios or there is no one good solution. I really do appreciate any suggestions that would help me solve this problem. Biff P.S. - I did search the existing posts and found some related topics but nothing as specific as I was looking for.