https://github.com/apache/solr/pull/4186 There seems to be a difference. I have modified the tests by creating a dedicated coordinator node and then they fail when I target the coordinator but succeed when I target the data nodes. I'll continue in github.
Thanks On Tue, 3 Mar 2026 at 22:11, Mikhail Khludnev <[email protected]> wrote: > I tried to reproduce join on the coord node, and test passed > https://github.com/apache/solr/pull/4184/changes > I propose to double check the cluster setup, and usage of the coord node > > https://solr.apache.org/guide/solr/latest/deployment-guide/node-roles.html#the-work-flow-in-a-coordinator-node > Once again the exception above might only occur in the data node with > "to"-side where query parser is actually executed. > > On Tue, Mar 3, 2026 at 8:00 PM Endika Posadas <[email protected]> > wrote: > > > Sorry, I'll add more context. The main collection is a sharded collection > > with over ten shards and where each shard has 2 replicas. The from > > collection (fromData) has a single shard and one replica in each of the > > solr nodes. > > The query I send is a Json Query, looking like: > > > > { > > "filter":[{"join":{ > > "query":{"lucene":{ > > "query":"\"test\"", > > "df":"value_s"}}, > > "from":"id", > > "to":"to_s", > > "fromIndex":"fromData"}}, > > ], > > "offset":0, > > "query":"*:*", > > "limit":1, > > "params":{ > > "TZ":"GMT+01:00", > > "timeAllowed":1800000}, > > "fields":["id"] > > } > > > > It works perfectly fine when sending it to any random solr node, but it > > fails when it gets sent from the coordinator query. Every other query > that > > doesn't have a join works fine, or at least I haven't found any other > > problems. > > > > Thanks > > > > On Tue, 3 Mar 2026 at 17:38, Mikhail Khludnev <[email protected]> wrote: > > > > > Hello, > > > I'm in doubt. Assuming you use > > > > > > > > > https://solr.apache.org/guide/solr/latest/query-guide/join-query-parser.html#joining-multiple-shard-collections > > > Please confirm. > > > There;s no exact coordinator test for shard joins here > > > > > > > > > https://github.com/apache/solr/blob/main/solr/core/src/test/org/apache/solr/search/join/ShardToShardJoinAbstract.java#L58 > > > But it creates 5 nodes for 3 shard collections, and I believe pick a > > > coordinator randomly. So, we may expect it's working. > > > Then, the error you provide might occur at "to"-node when it didn't > find > > > expected co-shard. > > > I'm afraid we need to check shard alignment across cluster, and > detailed > > > request log across nodes. what exactly happened at coordinator and > > > subordinate nodes. > > > Regarding shards allocation: even if there's a node with a shard1 of > "to" > > > collection collocated with "from" shard1, nothing will stop the > > coordinator > > > from attempting to search "to" shard1 at another node where "from" > shard1 > > > is absent, and got the error like this. > > > > > > On Tue, Mar 3, 2026 at 6:02 PM Endika Posadas <[email protected]> > > > wrote: > > > > > > > Hi, > > > > > > > > We're running dedicated coordinator nodes for query performance, with > > > > collections that are properly co-located across data nodes. > > > > > > > > > > > > When sending a join query (fromIndex pointing to a co-located > > collection) > > > > through the coordinator, we get an error: > > > > > > > > "error":{ > > > > > > > > > > > > > > "metadata":["error-class","org.apache.solr.common.SolrException","root-error-class","org.apache.solr.common.SolrException"], > > > > "msg":"SolrCloud join: To join with a collection that might not > be > > > > co-located, use method=crossCollection.", > > > > "code":400 > > > > } > > > > > > > > > > > > The same query works fine when sent directly to a data node. > > > > > > > > It seems like the coordinator is trying to resolve the join instead > of > > > > delegating it to the data nodes. Is there a workaround around this? > > > > > > > > Thanks > > > > > > > > > > > > > -- > > > Sincerely yours > > > Mikhail Khludnev > > > > > > > > -- > Sincerely yours > Mikhail Khludnev >
