[jira] [Commented] (SOLR-11739) Solr can accept duplicated async IDs

2018-02-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366452#comment-16366452
 ] 

ASF subversion and git services commented on SOLR-11739:


Commit dfb0803bbb972d730627fbdcd4df66558d06f13a in lucene-solr's branch 
refs/heads/branch_7x from [~tomasflobbe]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=dfb0803 ]

SOLR-11739: Remove cast no longer needed


> Solr can accept duplicated async IDs
> 
>
> Key: SOLR-11739
> URL: https://issues.apache.org/jira/browse/SOLR-11739
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Tomás Fernández Löbbe
>Assignee: Tomás Fernández Löbbe
>Priority: Minor
> Attachments: SOLR-11739.patch, SOLR-11739.patch, SOLR-11739.patch
>
>
> Solr is supposed to reject duplicated async IDs, however, if the repeated IDs 
> are sent fast enough, a race condition in Solr will let the repeated IDs 
> through. The duplicated task is ran and and then silently fails to report as 
> completed because the same async ID is already in the completed map. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11739) Solr can accept duplicated async IDs

2018-02-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366451#comment-16366451
 ] 

ASF subversion and git services commented on SOLR-11739:


Commit f6b6f5070270c93fa7d8604ed456c9df041e7454 in lucene-solr's branch 
refs/heads/branch_7x from [~tomasflobbe]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f6b6f50 ]

SOLR-11739: Don't accept duplicate async IDs in collection API operations


> Solr can accept duplicated async IDs
> 
>
> Key: SOLR-11739
> URL: https://issues.apache.org/jira/browse/SOLR-11739
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Tomás Fernández Löbbe
>Assignee: Tomás Fernández Löbbe
>Priority: Minor
> Attachments: SOLR-11739.patch, SOLR-11739.patch, SOLR-11739.patch
>
>
> Solr is supposed to reject duplicated async IDs, however, if the repeated IDs 
> are sent fast enough, a race condition in Solr will let the repeated IDs 
> through. The duplicated task is ran and and then silently fails to report as 
> completed because the same async ID is already in the completed map. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11739) Solr can accept duplicated async IDs

2018-02-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366450#comment-16366450
 ] 

ASF subversion and git services commented on SOLR-11739:


Commit 250e5b2aba2c25540f597f4436c619fd97febd0d in lucene-solr's branch 
refs/heads/master from [~tomasflobbe]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=250e5b2 ]

SOLR-11739: Remove cast no longer needed


> Solr can accept duplicated async IDs
> 
>
> Key: SOLR-11739
> URL: https://issues.apache.org/jira/browse/SOLR-11739
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Tomás Fernández Löbbe
>Assignee: Tomás Fernández Löbbe
>Priority: Minor
> Attachments: SOLR-11739.patch, SOLR-11739.patch, SOLR-11739.patch
>
>
> Solr is supposed to reject duplicated async IDs, however, if the repeated IDs 
> are sent fast enough, a race condition in Solr will let the repeated IDs 
> through. The duplicated task is ran and and then silently fails to report as 
> completed because the same async ID is already in the completed map. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11739) Solr can accept duplicated async IDs

2018-02-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366446#comment-16366446
 ] 

ASF subversion and git services commented on SOLR-11739:


Commit 61ea8f60b1c69ba9ed753fe533d571fcbb452887 in lucene-solr's branch 
refs/heads/master from [~tomasflobbe]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=61ea8f6 ]

SOLR-11739: Don't accept duplicate async IDs in collection API operations


> Solr can accept duplicated async IDs
> 
>
> Key: SOLR-11739
> URL: https://issues.apache.org/jira/browse/SOLR-11739
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Tomás Fernández Löbbe
>Assignee: Tomás Fernández Löbbe
>Priority: Minor
> Attachments: SOLR-11739.patch, SOLR-11739.patch, SOLR-11739.patch
>
>
> Solr is supposed to reject duplicated async IDs, however, if the repeated IDs 
> are sent fast enough, a race condition in Solr will let the repeated IDs 
> through. The duplicated task is ran and and then silently fails to report as 
> completed because the same async ID is already in the completed map. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11739) Solr can accept duplicated async IDs

2018-02-03 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-11739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351606#comment-16351606
 ] 

Tomás Fernández Löbbe commented on SOLR-11739:
--

Thanks for the thorough review Anshum.
{quote}We should add back the check in old places instead of leaving that as a 
TODO
{quote}
Yes, that was my plan. I'll upload a patch now with the back compat support.
{quote}L296: It would be good to log the reason for the failure to offer the 
request to the OverseerCollectionQueue
{quote}
Note that the exception will be thrown in this case
{quote}Are the log4j.properties changes intentional ?
{quote}
For my testing, I'll revert before committing
{quote}SizeLimitedDistributedMap.OnChildDeleteObserver interface doesn’t need 
the keyword static also can you add some javadoc there?
{quote}
Good catch. Changed.
{quote}As I understand, you would also want to override clear() and call 
onDeleteObserver.onChildDelete in those cases too.
{quote}
Really, I was thinking on the observer to be called only in the case of 
overflow, not for a regular delete (it's also not called on remove). I modified 
the name to {{OnOverflowObserver}} of the observer to be clearer about this.
{quote}Javadocs for claim/clearAsyncId. It’d be good to mention that it returns 
false when it tries to clear an ID that doesn’t exist any more.
{quote}
Added

Also added some more tests to the new code and to {{DistributedMap}} since I 
couldn't find any.
{quote}...How about when the Overseer is elected, it establishes a source of 
entropy (Random initialized from time) and uses that to issue UUID's...
{quote}
That is another option. Although note that there is a client option to call 
with an auto-generated asyncId, which should prevent collisions. We could also 
switch to a model in which we don't accept client-provided async IDs, but I 
guess that should be another Jira, there would be much more changes for that, 
including an API change

> Solr can accept duplicated async IDs
> 
>
> Key: SOLR-11739
> URL: https://issues.apache.org/jira/browse/SOLR-11739
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Tomás Fernández Löbbe
>Assignee: Tomás Fernández Löbbe
>Priority: Minor
> Attachments: SOLR-11739.patch, SOLR-11739.patch, SOLR-11739.patch
>
>
> Solr is supposed to reject duplicated async IDs, however, if the repeated IDs 
> are sent fast enough, a race condition in Solr will let the repeated IDs 
> through. The duplicated task is ran and and then silently fails to report as 
> completed because the same async ID is already in the completed map. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11739) Solr can accept duplicated async IDs

2018-01-31 Thread Gus Heck (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348004#comment-16348004
 ] 

Gus Heck commented on SOLR-11739:
-

[~hossman], To me it seems that with your 3 command example things are a little 
worse than you say... There would be a window where id 222 could be seen as the 
success of CREATESOMETHING, and someone checking on DOWHATEVER might think 
DOWHATEVER had been done successfully (yay, go home, throw a party... ) but 
then DOWHATEVER fails (Monday's gonna be less fun...), and then some automated 
process checks on 222 to verify that it did actually CREATESOMETHING, but sees 
a failure... (drat, do it again... and again and again.. continually failing 
because SOMETHING now exists). 

Sure, it's their fault for not coordinating their ID's but... why help them 
make that mistake?

I think any ID that is not unique is more or less useless. I haven't used async 
requests and haven't previously paid much attention to it, don't know the 
history, and I might be missing something, but I find it shocking that Solr is 
not generating the id and ensuring it's uniqueness. How about when the Overseer 
is elected, it establishes a source of entropy (Random initialized from time) 
and uses that to issue UUID's. There's only one overseer at a time, and the 
cases where 2 or more overseers are started at exactly the same time and 
coexist is a bug, right? If there's no overseer, commands can't be run anyway...

> Solr can accept duplicated async IDs
> 
>
> Key: SOLR-11739
> URL: https://issues.apache.org/jira/browse/SOLR-11739
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Tomás Fernández Löbbe
>Assignee: Tomás Fernández Löbbe
>Priority: Minor
> Attachments: SOLR-11739.patch, SOLR-11739.patch
>
>
> Solr is supposed to reject duplicated async IDs, however, if the repeated IDs 
> are sent fast enough, a race condition in Solr will let the repeated IDs 
> through. The duplicated task is ran and and then silently fails to report as 
> completed because the same async ID is already in the completed map. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11739) Solr can accept duplicated async IDs

2018-01-05 Thread Anshum Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16314192#comment-16314192
 ] 

Anshum Gupta commented on SOLR-11739:
-

Thanks for taking this up [~tomasflobbe]! This looks good and not super 
invasive. 

Here's some feedback:
* CollectionsHandler.java
** We should add back the check in old places instead of leaving that as a TODO
** L296: It would be good to log the reason for the failure to offer the 
request to the OverseerCollectionQueue
* Are the log4j.properties changes intentional ?
* SizeLimitedDistributedMap.OnChildDeleteObserver interface doesn’t need the 
keyword static also can you add some javadoc there?
* As I understand, you would also want to override clear() and call 
onDeleteObserver.onChildDelete in those cases too.
* Javadocs for claim/clearAsyncId. It’d be good to mention that it returns 
false when it tries to clear an ID that doesn’t exist any more.

> Solr can accept duplicated async IDs
> 
>
> Key: SOLR-11739
> URL: https://issues.apache.org/jira/browse/SOLR-11739
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Tomás Fernández Löbbe
>Assignee: Tomás Fernández Löbbe
>Priority: Minor
> Attachments: SOLR-11739.patch, SOLR-11739.patch
>
>
> Solr is supposed to reject duplicated async IDs, however, if the repeated IDs 
> are sent fast enough, a race condition in Solr will let the repeated IDs 
> through. The duplicated task is ran and and then silently fails to report as 
> completed because the same async ID is already in the completed map. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11739) Solr can accept duplicated async IDs

2017-12-20 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-11739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16298901#comment-16298901
 ] 

Tomás Fernández Löbbe commented on SOLR-11739:
--

bq.  feel like my suggestion was orthognal to to the concern you suggested in 
response. 
No, I meant "Solr won't re-execute the same request twice" *even if it receives 
it twice*. What I say is that, by Solr rejecting the duplicate async ID it 
makes the admin request idempotent, you could do something like:
{code}
while (!success) {
try {
performRequest(asyncId=1=CREATESOMETHING)
success = true
} catch (e) {
//backoff
}
 }
{code}

If there is some error back from {{performRequest}}, you don’t know if the 
request was scheduled or not, but you don’t care if you know Solr won’t 
re-schedule it, you can just send it again. Another use case could be workers 
watching a queue and executing actions, if the queue delivers a message more 
than once, you don’t have to worry about sending the command to Solr multiple 
times, and can assume a “request already exists” response from Solr to be a 
success.

> Solr can accept duplicated async IDs
> 
>
> Key: SOLR-11739
> URL: https://issues.apache.org/jira/browse/SOLR-11739
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Tomás Fernández Löbbe
>Assignee: Tomás Fernández Löbbe
>Priority: Minor
> Attachments: SOLR-11739.patch, SOLR-11739.patch
>
>
> Solr is supposed to reject duplicated async IDs, however, if the repeated IDs 
> are sent fast enough, a race condition in Solr will let the repeated IDs 
> through. The duplicated task is ran and and then silently fails to report as 
> completed because the same async ID is already in the completed map. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11739) Solr can accept duplicated async IDs

2017-12-19 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16297890#comment-16297890
 ] 

Hoss Man commented on SOLR-11739:
-

i have no strong opinions ... at a high level i think the approach you describe 
(i didn't review the patch) is fine from a stability/reliability and back 
compat standpoint, but I have no idea how it affects performance (although it 
should go w/o saying that i favor stability/reliability over performance when 
discussing "admin" actions)

To clarify one comment where i suspect we may have been misscommunicating...

{quote}
bq. Solr should happily let you specify the same asyncId multiple times

hmm I'm not sure. I think the fact that Solr won't re-execute the same request 
twice makes it much easier to write workflows that do collection operations. 
{quote}

I feel like my suggestion was orthoginal to to the concern you suggested in 
response.  What i was suggesting is that, in theory: Solr could be agnostic to 
the asyncId and *only* use them for  tracking results.  Example: if these 4 
commands come in...

* asyncId=222=CREATESOMETHING
* asyncId=1=DOSOMETHINGELSE
* asyncId=222=DOWHATEVER

...then Solr could in fact execute those 3 commands, in that sequence, 
reliably, and w/o any risk of any of them being executed more or less then 
exactly 1 time.  The fact that 2 of them have the same asyncId should in no way 
impact Solr's execution of those commands.  The only impact that the duplicated 
asyncId should have is that once the DOWHATEVER cmd is executed, it will no 
longer be possible for a client to ever check the status of the 
CREATESOMETHING, because the results of the DOWHATEVER command should overwrite 
the results of the CREATESOMETHING ... that could be solr's *sole* internal use 
of the asyncId

(ie: only track all commands to process in an ordered queue, use the asyncId 
only in a map we don't care about internally, other then if a user asks for 
status info)

that was the crux of my suggestion ... assuming we want a policy of solr being 
"agnostic" to duplicate asyncIds in order to reduce the zk overhead of 
tracking/rejecting dups.  (BUt i don't feel strongly about it)


> Solr can accept duplicated async IDs
> 
>
> Key: SOLR-11739
> URL: https://issues.apache.org/jira/browse/SOLR-11739
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Tomás Fernández Löbbe
>Assignee: Tomás Fernández Löbbe
>Priority: Minor
> Attachments: SOLR-11739.patch, SOLR-11739.patch
>
>
> Solr is supposed to reject duplicated async IDs, however, if the repeated IDs 
> are sent fast enough, a race condition in Solr will let the repeated IDs 
> through. The duplicated task is ran and and then silently fails to report as 
> completed because the same async ID is already in the completed map. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11739) Solr can accept duplicated async IDs

2017-12-11 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-11739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286410#comment-16286410
 ] 

Tomás Fernández Löbbe commented on SOLR-11739:
--

bq. Solr should assign the asyncIds and guarantee that they are unique
We now do something like this when async IDs are not provided, but *on the 
client side*. See 
https://github.com/apache/lucene-solr/blob/41644bdcdcc0734115ce08ec24d6b408e1f8cf28/solr/solrj/src/java/org/apache/solr/client/solrj/request/CollectionAdminRequest.java#L150-L152
bq. It sounds like tomas is saying that there is existing code in solr that 
tries to reject duplicate asyncIds
Yes, here: 
https://github.com/apache/lucene-solr/blob/41644bdcdcc0734115ce08ec24d6b408e1f8cf28/solr/core/src/java/org/apache/solr/handler/admin/CollectionsHandler.java#L282-L286
bq. Solr should happily let you specify the same asyncId multiple times
hmm I'm not sure. I think the fact that Solr won't re-execute the same request 
twice makes it much easier to write workflows that do collection operations. 
bq. isn't the amount of work / zk writes needed to generate a universally 
unique asyncId on the server side essentially the same as the amount needed to 
tell the client that the asyncId they specified isn't unique?
We now do a bunch of reads to check if the async IDs are in any of the current 
queues/maps (see the code I linked above). I was considering either writing to 
some ZooKeeper path the async ID  as part of a lock, before checking the 
existing queues and then deleting the lock, or moving completely the async IDs 
to it's own path, and starting using this as the source of truth of asyncIDs. 
The later would then make it easier to check if an async ID is currently in 
use, however it's a much bigger change and we'll need to consider back 
compatibility

> Solr can accept duplicated async IDs
> 
>
> Key: SOLR-11739
> URL: https://issues.apache.org/jira/browse/SOLR-11739
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Tomás Fernández Löbbe
>Assignee: Tomás Fernández Löbbe
>Priority: Minor
> Attachments: SOLR-11739.patch
>
>
> Solr is supposed to reject duplicated async IDs, however, if the repeated IDs 
> are sent fast enough, a race condition in Solr will let the repeated IDs 
> through. The duplicated task is ran and and then silently fails to report as 
> completed because the same async ID is already in the completed map. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11739) Solr can accept duplicated async IDs

2017-12-11 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286346#comment-16286346
 ] 

Hoss Man commented on SOLR-11739:
-

My off the cuff, uneducated, impression w/o knowing much about the internals or 
the history of how the existing code go to the state it's currently in is 
that...

Either:
# Solr should assign the asyncIds and guarantee that they are unique
# The user should assign the asyncIds, and Solr should make no assumptions 
about them, nor use them for *anything* other then reporting status.

#1 seems a lot harder since it's essentially a distributed unique key 
generation problem, which IIUC is why asyncId wasn't implemented that way in 
the first place.

For #2, from my perspective, It sounds like tomas is saying that there is 
existing code in solr that tries to reject duplicate asyncIds -- and I would 
argue (as a straw man) that Solr making any attempt at doing that is where the 
real bug lies ... Solr should happily let you specify the same asyncId multiple 
times, and that should have no affect at all on the reliability of the commands 
being executed in the order recieved.  the only thing it should affect is that 
requesting status info on the commands may give unexpected results (depending 
on what the client is expecting) ... i would expect that requesting status for 
the id would return the status of the "1st" instance, until the "2nd" instance 
finishes at which point the status info is overridden.

that way if a user wants to re-use the exact same asyncId for every request, 
they are welcome to put that bullet in their foot as many times as they want -- 
it keeps things simpler for us internally, and we're not trying to coddle them 
for doing something (very advanced) in a silly way.



If we're going to coddle them, then we should coddle them all the way -- isn't 
the amount of work / zk writes needed to generate a universally unique asyncId 
on the server side essentially the same as the amount needed to tell the client 
that the asyncId they specified isn't unique?

> Solr can accept duplicated async IDs
> 
>
> Key: SOLR-11739
> URL: https://issues.apache.org/jira/browse/SOLR-11739
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Tomás Fernández Löbbe
>Assignee: Tomás Fernández Löbbe
>Priority: Minor
> Attachments: SOLR-11739.patch
>
>
> Solr is supposed to reject duplicated async IDs, however, if the repeated IDs 
> are sent fast enough, a race condition in Solr will let the repeated IDs 
> through. The duplicated task is ran and and then silently fails to report as 
> completed because the same async ID is already in the completed map. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11739) Solr can accept duplicated async IDs

2017-12-08 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-11739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16284147#comment-16284147
 ] 

Tomás Fernández Löbbe commented on SOLR-11739:
--

I thought about three options
1. Fix the actual race condition, don't let duplicate async IDs at all.
2. Fix the Overseer so that it checks before running each task if one with the 
same ID was completed before.
3. Let the Overseer re-run the tasks (leave it as it is now). Maybe just add 
logging, or a way to show the error (failed tasks)

#3 can be dangerous, since the task could be something like a DELETEREPLICA. If 
the duplicate ID was caused by some broken retry logic on the client side, Solr 
could be deleting many replicas with what the client thought was a single 
command. 

#2 may be OK, the problem I see with that is that it gives an inconsistent 
behavior to the user (sometimes the duplicate IDs are rejected, and sometimes 
not). Also, this would make the Overseer silently drop tasks (yes, we can add 
some sort of failure in the logs but we can’t assume anyone is going to 
notice). 

#1 is the correct fix from the functional stand point, however I can’t think of 
a way to really fix the race condition without adding an extra write to 
ZooKeeper, which we’d have to do for every collection request with an asyncID. 
And this is to cover from a client misuse edge case. 

I think (and I discussed this offline with [~anshumg], he thinks this too) #1 
is the way to go. I’ll put up a patch.

> Solr can accept duplicated async IDs
> 
>
> Key: SOLR-11739
> URL: https://issues.apache.org/jira/browse/SOLR-11739
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Tomás Fernández Löbbe
>Priority: Minor
> Attachments: SOLR-11739.patch
>
>
> Solr is supposed to reject duplicated async IDs, however, if the repeated IDs 
> are sent fast enough, a race condition in Solr will let the repeated IDs 
> through. The duplicated task is ran and and then silently fails to report as 
> completed because the same async ID is already in the completed map. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org