Re: Solr best practices for many to many relations...

2016-04-17 Thread Bastien Latard - MDPI AG
Thanks everybody. Your answers are very interesting, however I'm not sure I'm getting them properly (sorry I'm not an expert... it might be evident for you)... *When you're speaking about denormalization, does it mean: 1. something like that?* */-> I think that the answer i

Re: Solr best practices for many to many relations...

2016-04-15 Thread Jack Krupansky
And it may also be that there are whole classes of user for whom denormalization is just too heavy a cross to bear and for who a little extra money spent on more hardware is a great tradeoff. And... Lucene's indexing may be superior to your average SQL database, so that a Solr JOIN could be so muc

Re: Solr best practices for many to many relations...

2016-04-15 Thread Joel Bernstein
I think people are going to be surprised though by the speed of the joins. The joins also get faster as the number of shards, replicas and worker nodes grow in the cluster. So we may see people building out large clusters and and using the joins in OLTP scenarios. Joel Bernstein http://joelsolr.bl

Re: Solr best practices for many to many relations...

2016-04-15 Thread Jack Krupansky
And of course it depends on the specific queries, both in terms of what fields will be searched and which fields need to be returned. Yes, OLAP is the clear sweet spot, where taking 500 ms to 2 or even 20 seconds for a complex query may be just fine vs. OLTP/search where under 150 ms is the target

Re: Solr best practices for many to many relations...

2016-04-15 Thread Joel Bernstein
In general the Streaming Expression joins are designed for interactive OLAP type work loads. So BI and data warehousing scenarios are the sweet spot. There may be scenarios where high QPS search applications will work with the distributed joins, particularly if the joins themselves are not huge. Bu

Re: Solr best practices for many to many relations...

2016-04-15 Thread Jack Krupansky
It will be interesting to see which use cases work best with the new streaming JOIN vs. which will remain best with full denormalization, or whether you simply have to try both and benchmark them. My impression had been that streaming JOIN would be ideal for bulk operations rather than traditional

Re: Solr best practices for many to many relations...

2016-04-15 Thread Joel Bernstein
You may also want to keep an eye on SOLR-8925 which supports distributed, cross collection graph traversals. This may be useful in traversing the relationships. Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Apr 15, 2016 at 9:56 AM, Joel Bernstein wrote: > Solr now has full distributed jo

Re: Solr best practices for many to many relations...

2016-04-15 Thread Joel Bernstein
Solr now has full distributed join capabilities as part of the Streaming Expression library. Keep in mind that these are distributed joins so they shuffle records to worker nodes to perform the joins. These are comparable to joins done by SQL over MapReduce systems, but they are very responsive and

Re: Solr best practices for many to many relations...

2016-04-15 Thread Dennis Gove
The Streaming API with Streaming Expressions (or Parallel SQL if you want to use SQL) can give you the functionality you're looking for. See https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions and https://cwiki.apache.org/confluence/display/solr/Parallel+SQL+Interface. SQL querie

Re: Solr best practices for many to many relations...

2016-04-15 Thread Bastien Latard - MDPI AG
'/would I then be able to query a specific field of articles or other "table" (with the same OR BETTER performances)?/' -> And especially, would I be able to get only 1 article in the result... On 15/04/2016 09:06, Bastien Latard - MDPI AG wrote: Thanks Jack. I know that Solr is a search engin

Re: Solr best practices for many to many relations...

2016-04-15 Thread Bastien Latard - MDPI AG
Thanks Jack. I know that Solr is a search engine, but this replace a search in my mysql DB with this model: *My goal is to improve my environment (and my performances at the same time).* / //Yes, I have a Solr data model... but atm I created 4 different indexes for "similar service usage".

Re: Solr best practices for many to many relations...

2016-04-14 Thread Jack Krupansky
Solr is a search engine, not a database. JOINs? Although Solr does have some limited JOIN capabilities, they are more for special situations, not the front-line go-to technique for data modeling for search. Rather, denormalization is the front-line go-to technique for data modeling in Solr. In a

Solr best practices for many to many relations...

2016-04-14 Thread Bastien Latard - MDPI AG
Hi Guys, /I am upgrading from solr 4.2 to 6.0.// //I successfully (after some time) migrated the config files and other parameters.../ Now I'm just wondering if my indexes are following the best practices...(and they are probably not :-) ) What would be the best if we have this kind of sql