Gotcha! test passed
https://github.com/apache/solr/pull/4186/changes/eb8a483bf52a987ea98e10b78b6b90c4fed2f383

Endika, to move forward you can deploy it as a separate query parsed
plugin.

@Dev, what's the preferred way to handle this case?

On Thu, Mar 5, 2026 at 10:22 AM Mikhail Khludnev <[email protected]> wrote:

> Coordinator should fan-out per-shard requests and it what happens in
> https://github.com/apache/solr/pull/4186
> but not in https://github.com/apache/solr/pull/4184 and now I barely now
> how #4184 works, probably it forwards to data-node.
> With regards to https://github.com/apache/solr/pull/4186
> the stack trace of the failure is
> org.apache.solr.common.SolrException: SolrCloud join: To join with a
> collection that might not be co-located, use method=crossCollection.
> at
> org.apache.solr.search.join.ScoreJoinQParserPlugin.getLocalSingleShard(ScoreJoinQParserPlugin.java:523)
> at
> org.apache.solr.search.join.ScoreJoinQParserPlugin.findLocalReplicaForFromIndex(ScoreJoinQParserPlugin.java:391)
> at
> org.apache.solr.search.join.ScoreJoinQParserPlugin.getCoreName(ScoreJoinQParserPlugin.java:346)
> at
> org.apache.solr.search.join.ScoreJoinQParserPlugin$1.createQuery(ScoreJoinQParserPlugin.java:277)
> at
> org.apache.solr.search.join.ScoreJoinQParserPlugin$1.parse(ScoreJoinQParserPlugin.java:253)
> at
> org.apache.solr.search.JoinQParserPlugin$1.parse(JoinQParserPlugin.java:227)
> at org.apache.solr.search.QParser.getQuery(QParser.java:196)
> at
> org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:191)
> at
> org.apache.solr.handler.component.SearchHandler.prepareComponents(SearchHandler.java:427)
> at
> org.apache.solr.handler.component.SearchHandler.processComponents(SearchHandler.java:406)
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:239)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:260)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2953)
> at
> org.apache.solr.servlet.HttpSolrCall.executeCoreRequest(HttpSolrCall.java:719)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:484)
> at
> org.apache.solr.servlet.SolrDispatchFilter.dispatch(SolrDispatchFilter.java:183)
>
> it occurs in the real coordinator data-less node. It's caused by awkward
> flow when Query Component triggers query parsing even if it will throw away
> the lucene query parsed because stepping into distributed process (fan out
> per-shards reqs). It would be great to redesign this old flaw. Another
> part of the trouble is that JoinQP is too eager - checking indices on
> parsing despite the query will be thrown away in a coordinator node.
> Meanwhile I'll think about quickly hacking JoinQP to make it lazy
> deferring query creation.
>
> On Wed, Mar 4, 2026 at 4:58 PM Gus Heck <[email protected]> wrote:
>
>> That begins to sound like it should have a JIRA. A coordinator node should
>> probably be forwarding the request without any sort of interference.
>>
>> On Wed, Mar 4, 2026 at 7:05 AM Endika Posadas <[email protected]>
>> wrote:
>>
>> > https://github.com/apache/solr/pull/4186 There seems to be a
>> difference. I
>> > have modified the tests by creating a dedicated coordinator node and
>> then
>> > they fail when I target the coordinator but succeed when I target the
>> data
>> > nodes. I'll continue in github.
>> >
>> > Thanks
>> >
>> > On Tue, 3 Mar 2026 at 22:11, Mikhail Khludnev <[email protected]> wrote:
>> >
>> > > I tried to reproduce join on the coord node, and test passed
>> > > https://github.com/apache/solr/pull/4184/changes
>> > > I propose to double check the cluster setup, and usage of the coord
>> node
>> > >
>> > >
>> >
>> https://solr.apache.org/guide/solr/latest/deployment-guide/node-roles.html#the-work-flow-in-a-coordinator-node
>> > > Once again the exception above might only occur in the data node with
>> > > "to"-side where query parser is actually executed.
>> > >
>> > > On Tue, Mar 3, 2026 at 8:00 PM Endika Posadas <[email protected]>
>> > > wrote:
>> > >
>> > > > Sorry, I'll add more context. The main collection is a sharded
>> > collection
>> > > > with over ten shards and where each shard has 2 replicas. The from
>> > > > collection (fromData) has a single shard and one replica in each of
>> the
>> > > > solr nodes.
>> > > > The query I send is a Json Query, looking like:
>> > > >
>> > > > {
>> > > >   "filter":[{"join":{
>> > > >         "query":{"lucene":{
>> > > >             "query":"\"test\"",
>> > > >             "df":"value_s"}},
>> > > >         "from":"id",
>> > > >         "to":"to_s",
>> > > >         "fromIndex":"fromData"}},
>> > > >     ],
>> > > >   "offset":0,
>> > > >   "query":"*:*",
>> > > >   "limit":1,
>> > > >   "params":{
>> > > >     "TZ":"GMT+01:00",
>> > > >     "timeAllowed":1800000},
>> > > >   "fields":["id"]
>> > > > }
>> > > >
>> > > > It works perfectly fine when sending it to any random solr node,
>> but it
>> > > > fails when it gets sent from the coordinator query. Every other
>> query
>> > > that
>> > > > doesn't have a join works fine, or at least I haven't found any
>> other
>> > > > problems.
>> > > >
>> > > > Thanks
>> > > >
>> > > > On Tue, 3 Mar 2026 at 17:38, Mikhail Khludnev <[email protected]>
>> wrote:
>> > > >
>> > > > > Hello,
>> > > > > I'm in doubt. Assuming you use
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://solr.apache.org/guide/solr/latest/query-guide/join-query-parser.html#joining-multiple-shard-collections
>> > > > > Please confirm.
>> > > > > There;s no exact coordinator test for shard joins here
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/solr/blob/main/solr/core/src/test/org/apache/solr/search/join/ShardToShardJoinAbstract.java#L58
>> > > > > But it creates 5 nodes for 3 shard collections, and I believe
>> pick a
>> > > > > coordinator randomly. So, we may expect it's working.
>> > > > > Then, the error you provide might occur at "to"-node when it
>> didn't
>> > > find
>> > > > > expected co-shard.
>> > > > > I'm afraid we need to check shard alignment across cluster, and
>> > > detailed
>> > > > > request log across nodes. what exactly happened at coordinator and
>> > > > > subordinate nodes.
>> > > > > Regarding shards allocation: even if there's a node with a shard1
>> of
>> > > "to"
>> > > > > collection collocated with "from" shard1, nothing will stop the
>> > > > coordinator
>> > > > > from attempting to search "to" shard1 at another node where "from"
>> > > shard1
>> > > > > is absent, and got the error like this.
>> > > > >
>> > > > > On Tue, Mar 3, 2026 at 6:02 PM Endika Posadas <
>> [email protected]>
>> > > > > wrote:
>> > > > >
>> > > > > > Hi,
>> > > > > >
>> > > > > > We're running dedicated coordinator nodes for query performance,
>> > with
>> > > > > > collections that are properly co-located across data nodes.
>> > > > > >
>> > > > > >
>> > > > > > When sending a join query (fromIndex pointing to a co-located
>> > > > collection)
>> > > > > > through the coordinator, we get an error:
>> > > > > >
>> > > > > > "error":{
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> "metadata":["error-class","org.apache.solr.common.SolrException","root-error-class","org.apache.solr.common.SolrException"],
>> > > > > >     "msg":"SolrCloud join: To join with a collection that might
>> not
>> > > be
>> > > > > > co-located, use method=crossCollection.",
>> > > > > >     "code":400
>> > > > > >   }
>> > > > > >
>> > > > > >
>> > > > > > The same query works fine when sent directly to a data node.
>> > > > > >
>> > > > > > It seems like the coordinator is trying to resolve the join
>> instead
>> > > of
>> > > > > > delegating it to the data nodes. Is there a workaround around
>> this?
>> > > > > >
>> > > > > > Thanks
>> > > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Sincerely yours
>> > > > > Mikhail Khludnev
>> > > > >
>> > > >
>> > >
>> > >
>> > > --
>> > > Sincerely yours
>> > > Mikhail Khludnev
>> > >
>> >
>>
>>
>> --
>> http://www.needhamsoftware.com (work)
>> https://a.co/d/b2sZLD9 (my fantasy fiction book)
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


-- 
Sincerely yours
Mikhail Khludnev

Reply via email to