This is due to multiple offers for the same agent, rather than duplicate
offers.  I don't see a specific bug in the suspect code
(OfferManager.java), but it does stand out as subject to races.
Specifically, there is a lack of synchronization when checking for an offer
exists for a given agent ID and subsequently removing that offer.

Can you file a bug?

On Thu, Sep 28, 2017 at 1:56 PM, Mohit Jaggi <[email protected]> wrote:

> Folks,
>
> I saw the following crash in my scheduler. It appears to be due to
> duplicates offers. Any insights appreciated!
>
> Mohit.
>
> *Code:*
>
> https://github.com/apache/aurora/blob/master/src/main/
> java/org/apache/aurora/scheduler/preemptor/PendingTaskProcessor.java#L145
>
> *Logs:*
>
> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: Sep 28, 2017 6:09:00
> PM com.google.common.util.concurrent.ServiceManager$ServiceListener failed
>
>
> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: SEVERE: Service
> PreemptorService [FAILED] has failed in the RUNNING state.
>
>
> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: 
> java.lang.IllegalArgumentException:
> Multiple entries with same key: 1ed038e0-a3ef-4476-adfd-
> 70c86241c5f7-S102=HostOffer{offer=id {
>
>
> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: value:
> "f7b84805-a0c5-4405-be77-f7f1b7110405-O56597202"
>
>
> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: }
>
>
> ...
>
> ...
>
>
> ep 28 18:09:00 machine1163 aurora-scheduler[14266]: , hostAttributes=
> IHostAttributes{host=compute606-dca1.prod.uber.internal,
> attributes=[IAttribute{name=host, values=[compute606-dca1]},
> IAttribute{name=rack, values=[as13]}, IAttribute{name=pod, values=[d]},
> IAttribute{name=dedicated, values=[infra/cassandra]}], mode=NONE,
> slaveId=1ed038e0-a3ef-4476-adfd-70c86241c5f7-S102}}. To index multiple
> values under a key, use Multimaps.index.
>
>
> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at
> com.google.common.collect.Maps.uniqueIndex(Maps.java:1251)
>
>
> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at
> com.google.common.collect.Maps.uniqueIndex(Maps.java:1208)
>
>
> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at
> org.apache.aurora.scheduler.preemptor.PendingTaskProcessor.lambda$
> run$0(PendingTaskProcessor.java:146)
>
>
> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at
> org.apache.aurora.scheduler.storage.db.DbStorage.read(DbStorage.java:147)
>
>
> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at org.mybatis.guice.
> transactional.TransactionalMethodInterceptor.invoke(
> TransactionalMethodInterceptor.java:101)
>
>
> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at
> org.apache.aurora.common.inject.TimedInterceptor.
> invoke(TimedInterceptor.java:83)
>
>
> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at
> org.apache.aurora.scheduler.storage.log.LogStorage.read(
> LogStorage.java:562)
>
>
> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at
> org.apache.aurora.scheduler.storage.CallOrderEnforcingStorage.read(
> CallOrderEnforcingStorage.java:113)
>
>
> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at
> org.apache.aurora.scheduler.preemptor.PendingTaskProcessor.run(
> PendingTaskProcessor.java:135)
>
>
> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at
> org.apache.aurora.common.inject.TimedInterceptor.
> invoke(TimedInterceptor.java:83)
>
>
> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at
> org.apache.aurora.scheduler.preemptor.PreemptorModule$PreemptorService.
> runOneIteration(PreemptorModule.java:161)
>
>
> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at
> com.google.common.util.concurrent.AbstractScheduledService$
> ServiceDelegate$Task.run(AbstractScheduledService.java:188)
>
>
> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at
> com.google.common.util.concurrent.Callables$4.run(Callables.java:122)
>
>
> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>
>
> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>
>
> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at
> java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>
>
> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(
> ScheduledThreadPoolExecutor.java:294)
>
>
> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at
> java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
>
>
> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
>
>
> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at
> java.lang.Thread.run(Thread.java:748)
>
>
> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: E0928 18:09:00.316
> [PreemptorService RUNNING, GuavaUtils$LifecycleShutdownListener:55]
> Service: PreemptorService [FAILED] failed unexpectedly. Triggering shutdown.
>
>
> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: I0928 18:09:00.316
> [qtp1000734462-3068369, Slf4jRequestLog:60] 10.187.28.19 - -
> [28/Sep/2017:18:09:00 +0000] "POST //10.188.43.6:8082/api HTTP/1.1" 200 95
>
>
> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: I0928 18:09:00.316
> [PreemptorService RUNNING, Lifecycle:84] Shutting down application
>
>
>
> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: I0928 18:09:00.316
> [PreemptorService RUNNING, ShutdownRegistry$ShutdownRegistryImpl:77]
> Executing 4 shutdown commands.
>
>
>
>
>

Reply via email to