I tried to reproduce join on the coord node, and test passed https://github.com/apache/solr/pull/4184/changes I propose to double check the cluster setup, and usage of the coord node https://solr.apache.org/guide/solr/latest/deployment-guide/node-roles.html#the-work-flow-in-a-coordinator-node Once again the exception above might only occur in the data node with "to"-side where query parser is actually executed.
On Tue, Mar 3, 2026 at 8:00 PM Endika Posadas <[email protected]> wrote: > Sorry, I'll add more context. The main collection is a sharded collection > with over ten shards and where each shard has 2 replicas. The from > collection (fromData) has a single shard and one replica in each of the > solr nodes. > The query I send is a Json Query, looking like: > > { > "filter":[{"join":{ > "query":{"lucene":{ > "query":"\"test\"", > "df":"value_s"}}, > "from":"id", > "to":"to_s", > "fromIndex":"fromData"}}, > ], > "offset":0, > "query":"*:*", > "limit":1, > "params":{ > "TZ":"GMT+01:00", > "timeAllowed":1800000}, > "fields":["id"] > } > > It works perfectly fine when sending it to any random solr node, but it > fails when it gets sent from the coordinator query. Every other query that > doesn't have a join works fine, or at least I haven't found any other > problems. > > Thanks > > On Tue, 3 Mar 2026 at 17:38, Mikhail Khludnev <[email protected]> wrote: > > > Hello, > > I'm in doubt. Assuming you use > > > > > https://solr.apache.org/guide/solr/latest/query-guide/join-query-parser.html#joining-multiple-shard-collections > > Please confirm. > > There;s no exact coordinator test for shard joins here > > > > > https://github.com/apache/solr/blob/main/solr/core/src/test/org/apache/solr/search/join/ShardToShardJoinAbstract.java#L58 > > But it creates 5 nodes for 3 shard collections, and I believe pick a > > coordinator randomly. So, we may expect it's working. > > Then, the error you provide might occur at "to"-node when it didn't find > > expected co-shard. > > I'm afraid we need to check shard alignment across cluster, and detailed > > request log across nodes. what exactly happened at coordinator and > > subordinate nodes. > > Regarding shards allocation: even if there's a node with a shard1 of "to" > > collection collocated with "from" shard1, nothing will stop the > coordinator > > from attempting to search "to" shard1 at another node where "from" shard1 > > is absent, and got the error like this. > > > > On Tue, Mar 3, 2026 at 6:02 PM Endika Posadas <[email protected]> > > wrote: > > > > > Hi, > > > > > > We're running dedicated coordinator nodes for query performance, with > > > collections that are properly co-located across data nodes. > > > > > > > > > When sending a join query (fromIndex pointing to a co-located > collection) > > > through the coordinator, we get an error: > > > > > > "error":{ > > > > > > > > > "metadata":["error-class","org.apache.solr.common.SolrException","root-error-class","org.apache.solr.common.SolrException"], > > > "msg":"SolrCloud join: To join with a collection that might not be > > > co-located, use method=crossCollection.", > > > "code":400 > > > } > > > > > > > > > The same query works fine when sent directly to a data node. > > > > > > It seems like the coordinator is trying to resolve the join instead of > > > delegating it to the data nodes. Is there a workaround around this? > > > > > > Thanks > > > > > > > > > -- > > Sincerely yours > > Mikhail Khludnev > > > -- Sincerely yours Mikhail Khludnev
