[jira] [Commented] (SOLR-11292) Querying against an alias can lead to incorrect routing

2017-10-17 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16208761#comment-16208761
 ] 

David Smiley commented on SOLR-11292:
-

I noticed that logic too [~varunthacker] while working on SOLR-11444.  
[~ichattopadhyaya] I recall you added the stateProvider stuff.  Would there be 
a performance problem if CloudSolrClient resolved aliases first, thereby doing 
an HTTP fetch for aliases that would otherwise have never occurred if you're 
not even using the aliases?  Probably not a big deal but the thought crossed my 
mind.

> Querying against an alias can lead to incorrect routing
> ---
>
> Key: SOLR-11292
> URL: https://issues.apache.org/jira/browse/SOLR-11292
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Varun Thacker
>Assignee: Varun Thacker
>
> collection1 has 2 shards and 1 replica
> collection2 has 8 shards and 1 replica
> I have 8 nodes so collection2 is spread across all 8 , while collection1 is 
> hosted by two nodes
> If we create an alias called "collection1" and point it to "collection2".
> Querying against the alias "collection1" works as expected but what I noticed 
> was the top level queries would only hit 2 out of the 8 JVMs when querying 
> using SolrJ
> It turns out that SolrJ is using the state.json of collection1 ( the actual 
> collection ) and routing queries to only those nodes.
> There are two negatives to this:
>  - If those two nodes are down all queries fail.
>  - Top level queries are only routed to those two nodes thus causing a skew 
> in the top level requests
> The obvious solution would be to use the state.json file of the underlying 
> collection that the alias is pointing to  . But if we have the alias pointing 
> to multiple collections then this might get tricky?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11292) Querying against an alias can lead to incorrect routing

2017-10-17 Thread Varun Thacker (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16208425#comment-16208425
 ] 

Varun Thacker commented on SOLR-11292:
--

I think the logic *only* fails when we have a collection name and alias with 
the same name

{code}
  private LinkedHashSet resolveAliasesAndValidateExistence(List 
inputCollections) {
LinkedHashSet collectionNames = new LinkedHashSet<>(); // 
consistent ordering
// validate collections
for (String collectionName : inputCollections) {
  if (stateProvider.getState(collectionName) == null) {
// perhaps it's an alias
List aliasedCollections = 
stateProvider.getAlias(collectionName);
if (aliasedCollections.isEmpty()) {
  throw new SolrException(ErrorCode.BAD_REQUEST, "Collection not found: 
" + collectionName);
}
collectionNames.addAll(aliasedCollections);
  } else {
collectionNames.add(collectionName);
  }
}
return collectionNames;
  }
{code}

We are first checking if the collection exists. If the collection doesn't exist 
only then do we resolve it as an alias.

Maybe we should always resolve it as an alias first?

> Querying against an alias can lead to incorrect routing
> ---
>
> Key: SOLR-11292
> URL: https://issues.apache.org/jira/browse/SOLR-11292
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Varun Thacker
>Assignee: Varun Thacker
>
> collection1 has 2 shards and 1 replica
> collection2 has 8 shards and 1 replica
> I have 8 nodes so collection2 is spread across all 8 , while collection1 is 
> hosted by two nodes
> If we create an alias called "collection1" and point it to "collection2".
> Querying against the alias "collection1" works as expected but what I noticed 
> was the top level queries would only hit 2 out of the 8 JVMs when querying 
> using SolrJ
> It turns out that SolrJ is using the state.json of collection1 ( the actual 
> collection ) and routing queries to only those nodes.
> There are two negatives to this:
>  - If those two nodes are down all queries fail.
>  - Top level queries are only routed to those two nodes thus causing a skew 
> in the top level requests
> The obvious solution would be to use the state.json file of the underlying 
> collection that the alias is pointing to  . But if we have the alias pointing 
> to multiple collections then this might get tricky?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11292) Querying against an alias can lead to incorrect routing

2017-10-17 Thread Varun Thacker (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16208347#comment-16208347
 ] 

Varun Thacker commented on SOLR-11292:
--

Hi David,

I applied SOLR-1144 and I still don't think it's working

Here is how I tested:

Built solr from master after applying the patch.
Started solr with {{bin/solr start -e cloud -noprompt}}
Created c1 on node 8983
Created c2 on node 7574
Created an alias called c1 - 
{{admin/collections?action=createalias=c1=c2}}

>From my IDE which had the patch applied I tried these two snippets and the 
>results were the same

{code}
CloudSolrClient cloudSolrClient = new 
CloudSolrClient.Builder().withZkHost("localhost:9983").build();

for (int i=0; i<10; i++) {
  ModifiableSolrParams params = new ModifiableSolrParams();
  params.add("q", "*:*");
  cloudSolrClient.query("c1", params);
}
cloudSolrClient.close();
{code}

{code}
CloudSolrClient cloudSolrClient = new 
CloudSolrClient.Builder().withZkHost("localhost:9983").build();
cloudSolrClient.setDefaultCollection("c1");

for (int i=0; i<10; i++) {
  ModifiableSolrParams params = new ModifiableSolrParams();
  params.add("q", "*:*");
  cloudSolrClient.query(params);
}
cloudSolrClient.close();
{code}

I only see log entries on node1 8983 which was hosting c1 but the top level 
query should have gone only to node2 in my setup

{code}
INFO  - 2017-10-17 21:10:28.964; [c:c2 s:shard1 r:core_node2 
x:c2_shard1_replica_n1] org.apache.solr.core.SolrCore; [c2_shard1_replica_n1]  
webapp=/solr path=/select params={q=*:*&_stateVer_=c1:4=javabin=2} 
hits=0 status=0 QTime=28
INFO  - 2017-10-17 21:10:28.994; [c:c2 s:shard1 r:core_node2 
x:c2_shard1_replica_n1] org.apache.solr.core.SolrCore; [c2_shard1_replica_n1]  
webapp=/solr path=/select params={q=*:*&_stateVer_=c1:4=javabin=2} 
hits=0 status=0 QTime=0
INFO  - 2017-10-17 21:10:28.996; [c:c2 s:shard1 r:core_node2 
x:c2_shard1_replica_n1] org.apache.solr.core.SolrCore; [c2_shard1_replica_n1]  
webapp=/solr path=/select params={q=*:*&_stateVer_=c1:4=javabin=2} 
hits=0 status=0 QTime=0
INFO  - 2017-10-17 21:10:28.999; [c:c2 s:shard1 r:core_node2 
x:c2_shard1_replica_n1] org.apache.solr.core.SolrCore; [c2_shard1_replica_n1]  
webapp=/solr path=/select params={q=*:*&_stateVer_=c1:4=javabin=2} 
hits=0 status=0 QTime=0
INFO  - 2017-10-17 21:10:29.001; [c:c2 s:shard1 r:core_node2 
x:c2_shard1_replica_n1] org.apache.solr.core.SolrCore; [c2_shard1_replica_n1]  
webapp=/solr path=/select params={q=*:*&_stateVer_=c1:4=javabin=2} 
hits=0 status=0 QTime=0
INFO  - 2017-10-17 21:10:29.005; [c:c2 s:shard1 r:core_node2 
x:c2_shard1_replica_n1] org.apache.solr.core.SolrCore; [c2_shard1_replica_n1]  
webapp=/solr path=/select params={q=*:*&_stateVer_=c1:4=javabin=2} 
hits=0 status=0 QTime=0
INFO  - 2017-10-17 21:10:29.008; [c:c2 s:shard1 r:core_node2 
x:c2_shard1_replica_n1] org.apache.solr.core.SolrCore; [c2_shard1_replica_n1]  
webapp=/solr path=/select params={q=*:*&_stateVer_=c1:4=javabin=2} 
hits=0 status=0 QTime=0
INFO  - 2017-10-17 21:10:29.011; [c:c2 s:shard1 r:core_node2 
x:c2_shard1_replica_n1] org.apache.solr.core.SolrCore; [c2_shard1_replica_n1]  
webapp=/solr path=/select params={q=*:*&_stateVer_=c1:4=javabin=2} 
hits=0 status=0 QTime=0
INFO  - 2017-10-17 21:10:29.013; [c:c2 s:shard1 r:core_node2 
x:c2_shard1_replica_n1] org.apache.solr.core.SolrCore; [c2_shard1_replica_n1]  
webapp=/solr path=/select params={q=*:*&_stateVer_=c1:4=javabin=2} 
hits=0 status=0 QTime=0
INFO  - 2017-10-17 21:10:29.015; [c:c2 s:shard1 r:core_node2 
x:c2_shard1_replica_n1] org.apache.solr.core.SolrCore; [c2_shard1_replica_n1]  
webapp=/solr path=/select params={q=*:*&_stateVer_=c1:4=javabin=2} 
hits=0 status=0 QTime=0
{code}

Looking at {{_stateVer_=c1}} in the request params , it looks like we are still 
passing the state of c1 only from the client ?

> Querying against an alias can lead to incorrect routing
> ---
>
> Key: SOLR-11292
> URL: https://issues.apache.org/jira/browse/SOLR-11292
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Varun Thacker
>Assignee: Varun Thacker
>
> collection1 has 2 shards and 1 replica
> collection2 has 8 shards and 1 replica
> I have 8 nodes so collection2 is spread across all 8 , while collection1 is 
> hosted by two nodes
> If we create an alias called "collection1" and point it to "collection2".
> Querying against the alias "collection1" works as expected but what I noticed 
> was the top level queries would only hit 2 out of the 8 JVMs when querying 
> using SolrJ
> It turns out that SolrJ is using the state.json of collection1 ( the actual 
> collection ) and routing queries to only those nodes.
> There are two negatives to this:

[jira] [Commented] (SOLR-11292) Querying against an alias can lead to incorrect routing

2017-10-14 Thread Varun Thacker (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16204813#comment-16204813
 ] 

Varun Thacker commented on SOLR-11292:
--

Hi David,

Let me apply SOLR-11444 and see the if it fixes the routing

> Querying against an alias can lead to incorrect routing
> ---
>
> Key: SOLR-11292
> URL: https://issues.apache.org/jira/browse/SOLR-11292
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Varun Thacker
>Assignee: Varun Thacker
>
> collection1 has 2 shards and 1 replica
> collection2 has 8 shards and 1 replica
> I have 8 nodes so collection2 is spread across all 8 , while collection1 is 
> hosted by two nodes
> If we create an alias called "collection1" and point it to "collection2".
> Querying against the alias "collection1" works as expected but what I noticed 
> was the top level queries would only hit 2 out of the 8 JVMs when querying 
> using SolrJ
> It turns out that SolrJ is using the state.json of collection1 ( the actual 
> collection ) and routing queries to only those nodes.
> There are two negatives to this:
>  - If those two nodes are down all queries fail.
>  - Top level queries are only routed to those two nodes thus causing a skew 
> in the top level requests
> The obvious solution would be to use the state.json file of the underlying 
> collection that the alias is pointing to  . But if we have the alias pointing 
> to multiple collections then this might get tricky?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11292) Querying against an alias can lead to incorrect routing

2017-10-13 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16204203#comment-16204203
 ] 

David Smiley commented on SOLR-11292:
-

I think it's bizarre that you can create an alias with a name that is also that 
of a collection.  It seems ripe for problems.

I'm not sure how this particular issue is happening.  SOLR-11444 maybe fixes 
it?.  CloudSolrClient.sendRequest should resolve the collection list and 
aliases to a list of target collections.  Then it should loop over the slices 
across all of them to build a list of URLs to the nodes it will communicate 
with.  SOLR-11444 improves the clarity of this logic substantially IMO; I'm not 
sure if there is a change in behavior with respect to the issue here.  
[~varunthacker] might you apply SOLR-11444 and see if there is an impact?

> Querying against an alias can lead to incorrect routing
> ---
>
> Key: SOLR-11292
> URL: https://issues.apache.org/jira/browse/SOLR-11292
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Varun Thacker
>Assignee: Varun Thacker
>
> collection1 has 2 shards and 1 replica
> collection2 has 8 shards and 1 replica
> I have 8 nodes so collection2 is spread across all 8 , while collection1 is 
> hosted by two nodes
> If we create an alias called "collection1" and point it to "collection2".
> Querying against the alias "collection1" works as expected but what I noticed 
> was the top level queries would only hit 2 out of the 8 JVMs when querying 
> using SolrJ
> It turns out that SolrJ is using the state.json of collection1 ( the actual 
> collection ) and routing queries to only those nodes.
> There are two negatives to this:
>  - If those two nodes are down all queries fail.
>  - Top level queries are only routed to those two nodes thus causing a skew 
> in the top level requests
> The obvious solution would be to use the state.json file of the underlying 
> collection that the alias is pointing to  . But if we have the alias pointing 
> to multiple collections then this might get tricky?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org