Re: Aligning Shards from different Collections on the same Solr server based on Date Range

Matt Kuiper Fri, 09 Jul 2021 08:30:39 -0700

Thanks Joel!

On my list is to investigate Block Joins and Nested Child docs.


https://solr.apache.org/guide/8_8/other-parsers.html#block-join-query-parsers

https://solr.apache.org/guide/8_8/indexing-nested-documents.html#indexing-nested-documents

However, it looks like you are not suggesting using nested docs, but
specifying a type field to differentiate between types of docs and then a
join field.  Not having to build nested docs prior to updates would be an
advantage.  And it makes sense that the join field would allow for reliable
routing to appropriate the shard for both doc types.

I will take a further look and see if this approach will work, and get back
if more info is needed on the optimized self join.

Thanks again,
Matt


On Fri, Jul 9, 2021 at 7:01 AM Joel Bernstein <[email protected]> wrote:

> Can you solve this problem by adding all documents into the same collection
> and performing self joins. You could add a field called rec_type to
> differentiate between the records.
>
> There are two good reasons for wanting to do this.
>
> 1) This allows you to route by the join key and easily co-locate records.
>
> 2) There is an optimized self join which is extremely fast that you could
> take advantage of if you did this.
>
> Let me know if this might be an option for you and we can discuss the
> optimized self join in more detail.
>
> Joel
>
>
>
>
>
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Fri, Jul 2, 2021 at 6:28 PM Matt Kuiper <[email protected]> wrote:
>
> > After some research, it appears the following approach may help in this
> > situation and relieve the requirement of collocating indexes for Joins.
> It
> > appears one drawback maybe the types of fields supported for the JOIN
> > field.
> >
> >
> https://solr.apache.org/guide/8_8/other-parsers.html#cross-collection-join
> >
> > Matt
> >
> > On Wed, Jun 30, 2021 at 11:59 AM Matt Kuiper <[email protected]> wrote:
> >
> > > Hi Solr Group,
> > >
> > > I am not sure the following is a viable use-case, welcoming input and
> any
> > > implementation recommendations.
> > >
> > > I would like to perform joins over two sharded collections.  Where docs
> > > are routed to specific shards based on a date range and are the same
> for
> > > shards in each collection.
> > >
> > > I understand that this means that the replicas from each collection
> that
> > > hold data to be joined need to be collated on the same Solr Server.   I
> > > have read solutions that use ADD REPLICA to add a Collection B replica
> to
> > > all SolrServers assuming Collection B has only one Shard.  For my use
> > case
> > > I need Collection B to have multiple shards.
> > >
> > > *Collection A                Collection B              SolrServer *
> > > Shard1_2020              Shard1_2020           172.33.0.1:8983_solr
> > > Shard2_2021              Shard2_2021           172.33.0.2:8983_solr
> > > Shard3_2022              Shard3_2022           172.33.0.3:8983_solr
> > >
> > > I think my question comes down to how do I break shards by a date
> range,
> > > and do it in a way that both Collections A and B would be defined by
> the
> > > same date range?  If could reliably break shards by date, and know the
> > date
> > > range of the shard, I think I could use ADD REPLICA api to align.
> > >
> > > Not sure a compositeId routing approach would work, but thinking an
> > > implicit id may be hard to manage over time.
> > >
> > > Is an approach like this viable, concerned a bit about
> > > maintenance concerns, other ideas to support this join?
> > >
> > > Note: I am considering this within Time series collections...
> > >
> > > Matt
> > >
> >
>

Re: Aligning Shards from different Collections on the same Solr server based on Date Range

Reply via email to