Re: Solr-Cloud, join and collection collocation

Mikhail Khludnev Wed, 16 Oct 2019 05:33:19 -0700

Note: adding score=none as a local param. Turns another algorithm dragging
by from side join.


On Wed, Oct 16, 2019 at 11:37 AM Nicolas Paris <[email protected]>
wrote:

> Sadly, the join performances are poor.
> The joined collection is 12M documents, and the performances are 6k ms
> versus 60ms when I compare to the denormalized field.
>
> Apparently, the performances does not change when the filter on the
> joined collection is changed. It is still 6k ms when the subset is 12M
> or 1 document in size. So the performance of join looks correlated to
> size of joined collection and not the kind of filter applied to it.
>
> I will explore the streaming expressions
>
> On Wed, Oct 16, 2019 at 08:00:43AM +0200, Nicolas Paris wrote:
> > > You can certainly replicate the joined collection to every shard. It
> > > must fit in one shard and a replica of that shard must be co-located
> > > with every replica of the “to” collection.
> >
> > Yes, I found this in the documentation, with a clear example just after
> > this mail. I will test it today. I also read your blog about join
> > performances[1] and I suspect the performance impact of joins will be
> > huge because the joined collection is about 10M documents (only two
> > fields, unique id and an array of longs and a filter applied to the
> > array, join key is 10M unique IDs).
> >
> > > Have you looked at streaming and “streaming expressions"? It does not
> > > have the same problem, although it does have its own limitations.
> >
> > I never tested them, and I am not very confortable yet in how to test
> > them. Is it possible to mix query parsers and streaming expression in
> > the client call via http parameters - or is streaming expression apply
> > programmatically only ?
> >
> > [1] https://lucidworks.com/post/solr-and-joins/
> >
> > On Tue, Oct 15, 2019 at 07:12:25PM -0400, Erick Erickson wrote:
> > > You can certainly replicate the joined collection to every shard. It
> must fit in one shard and a replica of that shard must be co-located with
> every replica of the “to” collection.
> > >
> > > Have you looked at streaming and “streaming expressions"? It does not
> have the same problem, although it does have its own limitations.
> > >
> > > Best,
> > > Erick
> > >
> > > > On Oct 15, 2019, at 6:58 PM, Nicolas Paris <[email protected]>
> wrote:
> > > >
> > > > Hi
> > > >
> > > > I have several large collections that cannot fit in a standalone solr
> > > > instance. They are split over multiple shards in solr-cloud mode.
> > > >
> > > > Those collections are supposed to be joined to an other collection to
> > > > retrieve subset. Because I am using distributed collections, I am not
> > > > able to use the solr join feature.
> > > >
> > > > For this reason, I denormalize the information by adding the joined
> > > > collection within every collections. Naturally, when I want to update
> > > > the joined collection, I have to update every one of the distributed
> > > > collections.
> > > >
> > > > In standalone mode, I only would have to update the joined
> collection.
> > > >
> > > > I wonder if there is a way to overcome this limitation. For example,
> by
> > > > replicating the joined collection to every shard - or other method I
> am
> > > > ignoring.
> > > >
> > > > Any thought ?
> > > > --
> > > > nicolas
> > >
> >
> > --
> > nicolas
> >
>
> --
> nicolas
>


-- 
Sincerely yours
Mikhail Khludnev

Re: Solr-Cloud, join and collection collocation

Reply via email to