Gotcha! test passed https://github.com/apache/solr/pull/4186/changes/eb8a483bf52a987ea98e10b78b6b90c4fed2f383
Endika, to move forward you can deploy it as a separate query parsed plugin. @Dev, what's the preferred way to handle this case? On Thu, Mar 5, 2026 at 10:22 AM Mikhail Khludnev <[email protected]> wrote: > Coordinator should fan-out per-shard requests and it what happens in > https://github.com/apache/solr/pull/4186 > but not in https://github.com/apache/solr/pull/4184 and now I barely now > how #4184 works, probably it forwards to data-node. > With regards to https://github.com/apache/solr/pull/4186 > the stack trace of the failure is > org.apache.solr.common.SolrException: SolrCloud join: To join with a > collection that might not be co-located, use method=crossCollection. > at > org.apache.solr.search.join.ScoreJoinQParserPlugin.getLocalSingleShard(ScoreJoinQParserPlugin.java:523) > at > org.apache.solr.search.join.ScoreJoinQParserPlugin.findLocalReplicaForFromIndex(ScoreJoinQParserPlugin.java:391) > at > org.apache.solr.search.join.ScoreJoinQParserPlugin.getCoreName(ScoreJoinQParserPlugin.java:346) > at > org.apache.solr.search.join.ScoreJoinQParserPlugin$1.createQuery(ScoreJoinQParserPlugin.java:277) > at > org.apache.solr.search.join.ScoreJoinQParserPlugin$1.parse(ScoreJoinQParserPlugin.java:253) > at > org.apache.solr.search.JoinQParserPlugin$1.parse(JoinQParserPlugin.java:227) > at org.apache.solr.search.QParser.getQuery(QParser.java:196) > at > org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:191) > at > org.apache.solr.handler.component.SearchHandler.prepareComponents(SearchHandler.java:427) > at > org.apache.solr.handler.component.SearchHandler.processComponents(SearchHandler.java:406) > at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:239) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:260) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2953) > at > org.apache.solr.servlet.HttpSolrCall.executeCoreRequest(HttpSolrCall.java:719) > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:484) > at > org.apache.solr.servlet.SolrDispatchFilter.dispatch(SolrDispatchFilter.java:183) > > it occurs in the real coordinator data-less node. It's caused by awkward > flow when Query Component triggers query parsing even if it will throw away > the lucene query parsed because stepping into distributed process (fan out > per-shards reqs). It would be great to redesign this old flaw. Another > part of the trouble is that JoinQP is too eager - checking indices on > parsing despite the query will be thrown away in a coordinator node. > Meanwhile I'll think about quickly hacking JoinQP to make it lazy > deferring query creation. > > On Wed, Mar 4, 2026 at 4:58 PM Gus Heck <[email protected]> wrote: > >> That begins to sound like it should have a JIRA. A coordinator node should >> probably be forwarding the request without any sort of interference. >> >> On Wed, Mar 4, 2026 at 7:05 AM Endika Posadas <[email protected]> >> wrote: >> >> > https://github.com/apache/solr/pull/4186 There seems to be a >> difference. I >> > have modified the tests by creating a dedicated coordinator node and >> then >> > they fail when I target the coordinator but succeed when I target the >> data >> > nodes. I'll continue in github. >> > >> > Thanks >> > >> > On Tue, 3 Mar 2026 at 22:11, Mikhail Khludnev <[email protected]> wrote: >> > >> > > I tried to reproduce join on the coord node, and test passed >> > > https://github.com/apache/solr/pull/4184/changes >> > > I propose to double check the cluster setup, and usage of the coord >> node >> > > >> > > >> > >> https://solr.apache.org/guide/solr/latest/deployment-guide/node-roles.html#the-work-flow-in-a-coordinator-node >> > > Once again the exception above might only occur in the data node with >> > > "to"-side where query parser is actually executed. >> > > >> > > On Tue, Mar 3, 2026 at 8:00 PM Endika Posadas <[email protected]> >> > > wrote: >> > > >> > > > Sorry, I'll add more context. The main collection is a sharded >> > collection >> > > > with over ten shards and where each shard has 2 replicas. The from >> > > > collection (fromData) has a single shard and one replica in each of >> the >> > > > solr nodes. >> > > > The query I send is a Json Query, looking like: >> > > > >> > > > { >> > > > "filter":[{"join":{ >> > > > "query":{"lucene":{ >> > > > "query":"\"test\"", >> > > > "df":"value_s"}}, >> > > > "from":"id", >> > > > "to":"to_s", >> > > > "fromIndex":"fromData"}}, >> > > > ], >> > > > "offset":0, >> > > > "query":"*:*", >> > > > "limit":1, >> > > > "params":{ >> > > > "TZ":"GMT+01:00", >> > > > "timeAllowed":1800000}, >> > > > "fields":["id"] >> > > > } >> > > > >> > > > It works perfectly fine when sending it to any random solr node, >> but it >> > > > fails when it gets sent from the coordinator query. Every other >> query >> > > that >> > > > doesn't have a join works fine, or at least I haven't found any >> other >> > > > problems. >> > > > >> > > > Thanks >> > > > >> > > > On Tue, 3 Mar 2026 at 17:38, Mikhail Khludnev <[email protected]> >> wrote: >> > > > >> > > > > Hello, >> > > > > I'm in doubt. Assuming you use >> > > > > >> > > > > >> > > > >> > > >> > >> https://solr.apache.org/guide/solr/latest/query-guide/join-query-parser.html#joining-multiple-shard-collections >> > > > > Please confirm. >> > > > > There;s no exact coordinator test for shard joins here >> > > > > >> > > > > >> > > > >> > > >> > >> https://github.com/apache/solr/blob/main/solr/core/src/test/org/apache/solr/search/join/ShardToShardJoinAbstract.java#L58 >> > > > > But it creates 5 nodes for 3 shard collections, and I believe >> pick a >> > > > > coordinator randomly. So, we may expect it's working. >> > > > > Then, the error you provide might occur at "to"-node when it >> didn't >> > > find >> > > > > expected co-shard. >> > > > > I'm afraid we need to check shard alignment across cluster, and >> > > detailed >> > > > > request log across nodes. what exactly happened at coordinator and >> > > > > subordinate nodes. >> > > > > Regarding shards allocation: even if there's a node with a shard1 >> of >> > > "to" >> > > > > collection collocated with "from" shard1, nothing will stop the >> > > > coordinator >> > > > > from attempting to search "to" shard1 at another node where "from" >> > > shard1 >> > > > > is absent, and got the error like this. >> > > > > >> > > > > On Tue, Mar 3, 2026 at 6:02 PM Endika Posadas < >> [email protected]> >> > > > > wrote: >> > > > > >> > > > > > Hi, >> > > > > > >> > > > > > We're running dedicated coordinator nodes for query performance, >> > with >> > > > > > collections that are properly co-located across data nodes. >> > > > > > >> > > > > > >> > > > > > When sending a join query (fromIndex pointing to a co-located >> > > > collection) >> > > > > > through the coordinator, we get an error: >> > > > > > >> > > > > > "error":{ >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> "metadata":["error-class","org.apache.solr.common.SolrException","root-error-class","org.apache.solr.common.SolrException"], >> > > > > > "msg":"SolrCloud join: To join with a collection that might >> not >> > > be >> > > > > > co-located, use method=crossCollection.", >> > > > > > "code":400 >> > > > > > } >> > > > > > >> > > > > > >> > > > > > The same query works fine when sent directly to a data node. >> > > > > > >> > > > > > It seems like the coordinator is trying to resolve the join >> instead >> > > of >> > > > > > delegating it to the data nodes. Is there a workaround around >> this? >> > > > > > >> > > > > > Thanks >> > > > > > >> > > > > >> > > > > >> > > > > -- >> > > > > Sincerely yours >> > > > > Mikhail Khludnev >> > > > > >> > > > >> > > >> > > >> > > -- >> > > Sincerely yours >> > > Mikhail Khludnev >> > > >> > >> >> >> -- >> http://www.needhamsoftware.com (work) >> https://a.co/d/b2sZLD9 (my fantasy fiction book) >> > > > -- > Sincerely yours > Mikhail Khludnev > -- Sincerely yours Mikhail Khludnev
