Re: nested documents performance

2019-04-15 Thread Emir Arnautović
Hi Roi,
I don’t know the details about your test, but trying to assume how it looks 
like and explain observed. With your flat test you are denormalising data, 
meaning creating data duplication so the resulting document set is larger. That 
means more fields/text for Solr/Lucene to analyse and to write to disk. With 
parent/child you are doing some data normalisation so less data, less analysis, 
less disk writes. You should observe a similar behaviour with RDBMs as well, 
and similar to RDBMs you pay the price at query time. What is different from 
RDBMs is that they are built to work with relational/normalised data while Solr 
is not so joining is not as fast as with RDBMs.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 14 Apr 2019, at 08:57, Roi Wexler  wrote:
> 
> Hi,
> we're at the process of testing Solr for its indexing speed which is very 
> impotent to our application.
> we've witnessed strange behavior that we wish to understand before using it.
> when we indexed 1M docs it took about 63 seconds but when we indexed the same 
> documents only now we've nested them as 1000 parented with 1000 child 
> documents each, it took only 27 seconds.
> 
> we know that Lucene don't support nested documents for it has a flat object 
> model, and we do see that in fact it does index each of the child documents 
> as a separate document.
> 
> we have tests shows that we get the same results in case we index all 
> documents flat (without childs) or when we index them as 1000 parents with 
> 1000 nested documents each.
> 
> do we miss something here?
> why does it behave like that?
> what kind of constraints does child documents have, or what is the price we 
> pay to get this better index speed?
> we're trying to establish if this is a valid way to get a better performance 
> in index speed..
> 
> any help will be appreciated.



nested documents performance

2019-04-14 Thread Roi Wexler
Hi,
we're at the process of testing Solr for its indexing speed which is very 
impotent to our application.
we've witnessed strange behavior that we wish to understand before using it.
when we indexed 1M docs it took about 63 seconds but when we indexed the same 
documents only now we've nested them as 1000 parented with 1000 child documents 
each, it took only 27 seconds.

we know that Lucene don't support nested documents for it has a flat object 
model, and we do see that in fact it does index each of the child documents as 
a separate document.

we have tests shows that we get the same results in case we index all documents 
flat (without childs) or when we index them as 1000 parents with 1000 nested 
documents each.

do we miss something here?
why does it behave like that?
what kind of constraints does child documents have, or what is the price we pay 
to get this better index speed?
we're trying to establish if this is a valid way to get a better performance in 
index speed..

any help will be appreciated.