Re: need best solution for indexing and searching multiple, related database tables

2012-09-29 Thread Mikhail Khludnev
Fyi, block join query doesnt require denormalization, performant, but has
own limitations, of course. Many to many is the most painful point. I deal
with it but quite far from contributing generally applicable approach.
 29.09.2012 5:21 пользователь Biff Baxter tom.bren...@acmedata.net
написал:

 Hi Walter,

 I have bought into the denormalize approach.  My remaining questions are
 around how to
 construct the denormlized view and any solr functions that would support
 issues related
 to a) minimizing the denormalization explosion for 3 or more tables and b)
 handling many
 to many relationships.

 One issue I am concerned with is, if I search for IBM and Steve Jones in my
 example
 above, no records should be returned.  How do I manage that with the
 equivalent of the
 one denormalized record approach?

 I appreciate your help.

 Biff



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/need-best-solution-for-indexing-and-searching-multiple-related-database-tables-tp4009857p4011010.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: need best solution for indexing and searching multiple, related database tables

2012-09-28 Thread Biff Baxter
Hi jimtronic,

I think you have the answers I need.  I have several questions.

As a rough example, when you create your view, if you have one parent record
and three child records, do you end up with three denormalized records or
one
record that has data for three child records consolidated into each child
field?

To illustrate, if you start with two tables that look like this:

http://lucene.472066.n3.nabble.com/file/n4011006/tables.jpg 

does your view look like this:

http://lucene.472066.n3.nabble.com/file/n4011006/denormalized.jpg 

or this:

http://lucene.472066.n3.nabble.com/file/n4011006/consolidated_fields.jpg 

Biff



--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-best-solution-for-indexing-and-searching-multiple-related-database-tables-tp4009857p4011006.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: need best solution for indexing and searching multiple, related database tables

2012-09-28 Thread Biff Baxter
Hi Jack,

The queries we need to do have a lot in common with SQL queries in that they
are field specific, but we need lots of fuzzy stuff so SQL won't do.  Speed
is
also an issue.

I will provide sample queries and data if needed.  For now, I am waiting to
hear
back from jimtronic.

Thanks for your reply.

Biff



--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-best-solution-for-indexing-and-searching-multiple-related-database-tables-tp4009857p4011007.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: need best solution for indexing and searching multiple, related database tables

2012-09-28 Thread Walter Underwood
Denormalize. Think about a view that shows your results page. Put that view in 
Solr. Query and get ready for speed.

wunder

On Sep 28, 2012, at 5:34 PM, Biff Baxter wrote:

 Hi Jack,
 
 The queries we need to do have a lot in common with SQL queries in that they
 are field specific, but we need lots of fuzzy stuff so SQL won't do.  Speed
 is
 also an issue.
 
 I will provide sample queries and data if needed.  For now, I am waiting to
 hear
 back from jimtronic.
 
 Thanks for your reply.
 
 Biff
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/need-best-solution-for-indexing-and-searching-multiple-related-database-tables-tp4009857p4011007.html
 Sent from the Solr - User mailing list archive at Nabble.com.







Re: need best solution for indexing and searching multiple, related database tables

2012-09-28 Thread Biff Baxter
Hi Walter,

I have bought into the denormalize approach.  My remaining questions are
around how to
construct the denormlized view and any solr functions that would support
issues related
to a) minimizing the denormalization explosion for 3 or more tables and b)
handling many
to many relationships.

One issue I am concerned with is, if I search for IBM and Steve Jones in my
example
above, no records should be returned.  How do I manage that with the
equivalent of the
one denormalized record approach?

I appreciate your help.

Biff



--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-best-solution-for-indexing-and-searching-multiple-related-database-tables-tp4009857p4011010.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: need best solution for indexing and searching multiple, related database tables

2012-09-24 Thread jimtronic
I'm not sure if this will be relevant for you, but this is roughly what I do.
Apologies if it's too basic. 

I have a complex view that normalizes all the data that I need to be
together -- from over a dozen different tables. For one to many and many to
many relationships, I have sql turn the data into a comma delimited string
which the data import handler and the RegexTransformer will split into a
multi-valued field.

So, you might have a schema like this:

id123/id
name_sJohn Smith/name_s
attr_products
  strpython/str
  strjava/str
  strjavascript/str
/attr_products

Often I've found that I don't really need to the data together into one solr
core and it works better to just create a separate core just for that
schema. 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-best-solution-for-indexing-and-searching-multiple-related-database-tables-tp4009857p4009879.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: need best solution for indexing and searching multiple, related database tables

2012-09-24 Thread Jack Krupansky
Could supply some sample user queries and some sample data the queries 
should match? In other words, how do your users expect to view the data? 
If you are simply trying to replicate full SQL queries in Solr, you're 
probably going to be disappointed, but if you look at what queries your 
users are likely to want to enter, maybe it won't be so bad.


And maybe Solr's limited join capabilities might be sufficient to bridge 
the gap between a single flat schema and many relational tables.


http://wiki.apache.org/solr/Join

Join support is there, but don't leap before you think carefully.

-- Jack Krupansky

-Original Message- 
From: Thomas J. Brennan

Sent: Monday, September 24, 2012 10:23 AM
To: solr-user@lucene.apache.org
Subject: need best solution for indexing and searching multiple, related 
database tables






I have a requirement to search multiple, related database tables.  Sometimes
I need to join two tables, sometimes three or four and possibly more.  The
tables will generally store structured data related to individuals or
organizations.  This will include things like company, contact and address
tables and may include other child tables like products, assets, etc.  It is
something of a moving target.  Record counts are commonly in the tens of
millions but can be upwards of a few hundred million or even much more.

My understanding is that denormalization is most commonly the preferred
solution.  For two tables that is pretty straightforward.  For three or four
or more tables, or many to many relationships, and depending on the record
counts, this can generate a lot of redundant data, indexing time, etc.

Any information on the best way to design a single approach to this problem
or any options I might employ like faceted search, NoSQL (based on my
limited research I am guessing this is not a solution), etc. would be
greatly appreciated.

Answers that are terribly obvious, even to a newb, are a tiny bit annoying.
Things like you should test several scenarios or there is no one good
solution.  I really do appreciate any suggestions that would help me solve
this problem.

Biff

P.S. - I did search the existing posts and found some related topics but
nothing as specific as I was looking for.