Will do. If the fix involves making the map of offers by agent id a concurrent map...I can contribute that.
On Fri, Sep 29, 2017 at 9:09 AM, Bill Farner <wfar...@apache.org> wrote: > This is due to multiple offers for the same agent, rather than duplicate > offers. I don't see a specific bug in the suspect code > (OfferManager.java), but it does stand out as subject to races. > Specifically, there is a lack of synchronization when checking for an offer > exists for a given agent ID and subsequently removing that offer. > > Can you file a bug? > > On Thu, Sep 28, 2017 at 1:56 PM, Mohit Jaggi <mohit.ja...@uber.com> wrote: > >> Folks, >> >> I saw the following crash in my scheduler. It appears to be due to >> duplicates offers. Any insights appreciated! >> >> Mohit. >> >> *Code:* >> >> https://github.com/apache/aurora/blob/master/src/main/java/ >> org/apache/aurora/scheduler/preemptor/PendingTaskProcessor.java#L145 >> >> *Logs:* >> >> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: Sep 28, 2017 6:09:00 >> PM com.google.common.util.concurrent.ServiceManager$ServiceListener >> failed >> >> >> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: SEVERE: Service >> PreemptorService [FAILED] has failed in the RUNNING state. >> >> >> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: >> java.lang.IllegalArgumentException: Multiple entries with same key: >> 1ed038e0-a3ef-4476-adfd-70c86241c5f7-S102=HostOffer{offer=id { >> >> >> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: value: >> "f7b84805-a0c5-4405-be77-f7f1b7110405-O56597202" >> >> >> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: } >> >> >> ... >> >> ... >> >> >> ep 28 18:09:00 machine1163 aurora-scheduler[14266]: , >> hostAttributes=IHostAttributes{host=compute606-dca1.prod.uber.internal, >> attributes=[IAttribute{name=host, values=[compute606-dca1]}, >> IAttribute{name=rack, values=[as13]}, IAttribute{name=pod, values=[d]}, >> IAttribute{name=dedicated, values=[infra/cassandra]}], mode=NONE, >> slaveId=1ed038e0-a3ef-4476-adfd-70c86241c5f7-S102}}. To index multiple >> values under a key, use Multimaps.index. >> >> >> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at >> com.google.common.collect.Maps.uniqueIndex(Maps.java:1251) >> >> >> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at >> com.google.common.collect.Maps.uniqueIndex(Maps.java:1208) >> >> >> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at >> org.apache.aurora.scheduler.preemptor.PendingTaskProcessor.lambda$run$0( >> PendingTaskProcessor.java:146) >> >> >> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at >> org.apache.aurora.scheduler.storage.db.DbStorage.read(DbStorage.java:147) >> >> >> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at >> org.mybatis.guice.transactional.TransactionalMethodIntercept >> or.invoke(TransactionalMethodInterceptor.java:101) >> >> >> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at >> org.apache.aurora.common.inject.TimedInterceptor.invoke( >> TimedInterceptor.java:83) >> >> >> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at >> org.apache.aurora.scheduler.storage.log.LogStorage.read(LogS >> torage.java:562) >> >> >> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at >> org.apache.aurora.scheduler.storage.CallOrderEnforcingStorag >> e.read(CallOrderEnforcingStorage.java:113) >> >> >> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at >> org.apache.aurora.scheduler.preemptor.PendingTaskProcessor. >> run(PendingTaskProcessor.java:135) >> >> >> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at >> org.apache.aurora.common.inject.TimedInterceptor.invoke( >> TimedInterceptor.java:83) >> >> >> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at >> org.apache.aurora.scheduler.preemptor.PreemptorModule$Preemp >> torService.runOneIteration(PreemptorModule.java:161) >> >> >> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at >> com.google.common.util.concurrent.AbstractScheduledService$S >> erviceDelegate$Task.run(AbstractScheduledService.java:188) >> >> >> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at >> com.google.common.util.concurrent.Callables$4.run(Callables.java:122) >> >> >> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >> >> >> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at >> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) >> >> >> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFu >> tureTask.access$301(ScheduledThreadPoolExecutor.java:180) >> >> >> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFu >> tureTask.run(ScheduledThreadPoolExecutor.java:294) >> >> >> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool >> Executor.java:1142) >> >> >> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo >> lExecutor.java:617) >> >> >> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at >> java.lang.Thread.run(Thread.java:748) >> >> >> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: E0928 18:09:00.316 >> [PreemptorService RUNNING, GuavaUtils$LifecycleShutdownListener:55] >> Service: PreemptorService [FAILED] failed unexpectedly. Triggering shutdown. >> >> >> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: I0928 18:09:00.316 >> [qtp1000734462-3068369, Slf4jRequestLog:60] 10.187.28.19 - - >> [28/Sep/2017:18:09:00 +0000] "POST //10.188.43.6:8082/api HTTP/1.1" 200 >> 95 >> >> >> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: I0928 18:09:00.316 >> [PreemptorService RUNNING, Lifecycle:84] Shutting down application >> >> >> >> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: I0928 18:09:00.316 >> [PreemptorService RUNNING, ShutdownRegistry$ShutdownRegistryImpl:77] >> Executing 4 shutdown commands. >> >> >> >> >> >