This is due to multiple offers for the same agent, rather than duplicate offers. I don't see a specific bug in the suspect code (OfferManager.java), but it does stand out as subject to races. Specifically, there is a lack of synchronization when checking for an offer exists for a given agent ID and subsequently removing that offer.
Can you file a bug? On Thu, Sep 28, 2017 at 1:56 PM, Mohit Jaggi <[email protected]> wrote: > Folks, > > I saw the following crash in my scheduler. It appears to be due to > duplicates offers. Any insights appreciated! > > Mohit. > > *Code:* > > https://github.com/apache/aurora/blob/master/src/main/ > java/org/apache/aurora/scheduler/preemptor/PendingTaskProcessor.java#L145 > > *Logs:* > > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: Sep 28, 2017 6:09:00 > PM com.google.common.util.concurrent.ServiceManager$ServiceListener failed > > > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: SEVERE: Service > PreemptorService [FAILED] has failed in the RUNNING state. > > > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: > java.lang.IllegalArgumentException: > Multiple entries with same key: 1ed038e0-a3ef-4476-adfd- > 70c86241c5f7-S102=HostOffer{offer=id { > > > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: value: > "f7b84805-a0c5-4405-be77-f7f1b7110405-O56597202" > > > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: } > > > ... > > ... > > > ep 28 18:09:00 machine1163 aurora-scheduler[14266]: , hostAttributes= > IHostAttributes{host=compute606-dca1.prod.uber.internal, > attributes=[IAttribute{name=host, values=[compute606-dca1]}, > IAttribute{name=rack, values=[as13]}, IAttribute{name=pod, values=[d]}, > IAttribute{name=dedicated, values=[infra/cassandra]}], mode=NONE, > slaveId=1ed038e0-a3ef-4476-adfd-70c86241c5f7-S102}}. To index multiple > values under a key, use Multimaps.index. > > > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at > com.google.common.collect.Maps.uniqueIndex(Maps.java:1251) > > > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at > com.google.common.collect.Maps.uniqueIndex(Maps.java:1208) > > > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at > org.apache.aurora.scheduler.preemptor.PendingTaskProcessor.lambda$ > run$0(PendingTaskProcessor.java:146) > > > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at > org.apache.aurora.scheduler.storage.db.DbStorage.read(DbStorage.java:147) > > > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at org.mybatis.guice. > transactional.TransactionalMethodInterceptor.invoke( > TransactionalMethodInterceptor.java:101) > > > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at > org.apache.aurora.common.inject.TimedInterceptor. > invoke(TimedInterceptor.java:83) > > > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at > org.apache.aurora.scheduler.storage.log.LogStorage.read( > LogStorage.java:562) > > > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at > org.apache.aurora.scheduler.storage.CallOrderEnforcingStorage.read( > CallOrderEnforcingStorage.java:113) > > > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at > org.apache.aurora.scheduler.preemptor.PendingTaskProcessor.run( > PendingTaskProcessor.java:135) > > > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at > org.apache.aurora.common.inject.TimedInterceptor. > invoke(TimedInterceptor.java:83) > > > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at > org.apache.aurora.scheduler.preemptor.PreemptorModule$PreemptorService. > runOneIteration(PreemptorModule.java:161) > > > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at > com.google.common.util.concurrent.AbstractScheduledService$ > ServiceDelegate$Task.run(AbstractScheduledService.java:188) > > > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at > com.google.common.util.concurrent.Callables$4.run(Callables.java:122) > > > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > > > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > > > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at > java.util.concurrent.ScheduledThreadPoolExecutor$ > ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > > > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run( > ScheduledThreadPoolExecutor.java:294) > > > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at > java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1142) > > > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at > java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:617) > > > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at > java.lang.Thread.run(Thread.java:748) > > > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: E0928 18:09:00.316 > [PreemptorService RUNNING, GuavaUtils$LifecycleShutdownListener:55] > Service: PreemptorService [FAILED] failed unexpectedly. Triggering shutdown. > > > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: I0928 18:09:00.316 > [qtp1000734462-3068369, Slf4jRequestLog:60] 10.187.28.19 - - > [28/Sep/2017:18:09:00 +0000] "POST //10.188.43.6:8082/api HTTP/1.1" 200 95 > > > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: I0928 18:09:00.316 > [PreemptorService RUNNING, Lifecycle:84] Shutting down application > > > > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: I0928 18:09:00.316 > [PreemptorService RUNNING, ShutdownRegistry$ShutdownRegistryImpl:77] > Executing 4 shutdown commands. > > > > >
