Re: Performance of cross join vs block join

mihaela olteanu Fri, 12 Jul 2013 01:20:46 -0700

Hi Mikhail,

I have used wrong the term block join. When I said block join I was referring 
to a join performed on a single core versus cross join which was performed on 
multiple cores.
But I saw your benchmark (from cache) and it seems that block join has better 
performance. Is this functionality available on Solr 4.3.1? I did not find such 
examples on Solr's wiki page.
Does this functionality require a special schema, or a special indexing? How 
would I need to index the data from my tables? In my case anyway all the 
indices have a common schema since I am using dynamic fields, thus I can easily 
add all documents from all tables in one Solr core, but for each document to 
add a discriminator field.

Could you point me to some more documentation?

Thanks in advance,
Mihaela

________________________________
 From: Mikhail Khludnev <mkhlud...@griddynamics.com>
To: solr-user <solr-user@lucene.apache.org>; mihaela olteanu 
<mihaela...@yahoo.com> 
Sent: Thursday, July 11, 2013 2:25 PM
Subject: Re: Performance of cross join vs block join

Mihaela,

For me it's reasonable that single core join takes the same time as cross
core one. I just can't see which gain can be obtained from in the former
case.
I hardly able to comment join code, I looked into, it's not trivial, at
least. With block join it doesn't need to obtain parentId term
values/numbers and lookup parents by them. Both of these actions are
expensive. Also blockjoin works as an iterator, but join need to allocate
memory for parents bitset and populate it out of order that impacts
scalability.
Also in None scoring mode BJQ don't need to walk through all children, but
only hits first. Also, nice feature is 'both side leapfrog' if you have a
highly restrictive filter/query intersects with BJQ, it allows to skip many
parents and children as well, that's not possible in Join, which has fairly
'full-scan' nature.
Main performance factor for Join is number of child docs.
I'm not sure I got all your questions, please specify them in more details,
if something is still unclear.
have you saw my benchmark
http://blog.griddynamics.com/2012/08/block-join-query-performs.html ?

On Thu, Jul 11, 2013 at 1:52 PM, mihaela olteanu <mihaela...@yahoo.com>wrote:

> Hello,
>
> Does anyone know about some measurements in terms of performance for cross
> joins compared to joins inside a single index?
>
> Is it faster the join inside a single index that stores all documents of
> various types (from parent table or from children tables)with a
> discriminator field compared to the cross join (basically in this case each
> document type resides in its own index)?
>
> I have performed some tests but to me it seems that having a join in a
> single index (bigger index) does not add too much speed improvements
> compared to cross joins.
>
> Why a block join would be faster than a cross join if this is the case?
> What are the variables that count when trying to improve the query
> execution time?
>
> Thanks!
> Mihaela

-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mkhlud...@griddynamics.com>

Re: Performance of cross join vs block join

Reply via email to