Can you solve this problem by adding all documents into the same collection
and performing self joins. You could add a field called rec_type to
differentiate between the records.

There are two good reasons for wanting to do this.

1) This allows you to route by the join key and easily co-locate records.

2) There is an optimized self join which is extremely fast that you could
take advantage of if you did this.

Let me know if this might be an option for you and we can discuss the
optimized self join in more detail.

Joel









Joel Bernstein
http://joelsolr.blogspot.com/


On Fri, Jul 2, 2021 at 6:28 PM Matt Kuiper <[email protected]> wrote:

> After some research, it appears the following approach may help in this
> situation and relieve the requirement of collocating indexes for Joins.  It
> appears one drawback maybe the types of fields supported for the JOIN
> field.
>
> https://solr.apache.org/guide/8_8/other-parsers.html#cross-collection-join
>
> Matt
>
> On Wed, Jun 30, 2021 at 11:59 AM Matt Kuiper <[email protected]> wrote:
>
> > Hi Solr Group,
> >
> > I am not sure the following is a viable use-case, welcoming input and any
> > implementation recommendations.
> >
> > I would like to perform joins over two sharded collections.  Where docs
> > are routed to specific shards based on a date range and are the same for
> > shards in each collection.
> >
> > I understand that this means that the replicas from each collection that
> > hold data to be joined need to be collated on the same Solr Server.   I
> > have read solutions that use ADD REPLICA to add a Collection B replica to
> > all SolrServers assuming Collection B has only one Shard.  For my use
> case
> > I need Collection B to have multiple shards.
> >
> > *Collection A                Collection B              SolrServer *
> > Shard1_2020              Shard1_2020           172.33.0.1:8983_solr
> > Shard2_2021              Shard2_2021           172.33.0.2:8983_solr
> > Shard3_2022              Shard3_2022           172.33.0.3:8983_solr
> >
> > I think my question comes down to how do I break shards by a date range,
> > and do it in a way that both Collections A and B would be defined by the
> > same date range?  If could reliably break shards by date, and know the
> date
> > range of the shard, I think I could use ADD REPLICA api to align.
> >
> > Not sure a compositeId routing approach would work, but thinking an
> > implicit id may be hard to manage over time.
> >
> > Is an approach like this viable, concerned a bit about
> > maintenance concerns, other ideas to support this join?
> >
> > Note: I am considering this within Time series collections...
> >
> > Matt
> >
>

Reply via email to