Will do. If the fix involves making the map of offers by agent id a
concurrent map...I can contribute that.

On Fri, Sep 29, 2017 at 9:09 AM, Bill Farner <wfar...@apache.org> wrote:

> This is due to multiple offers for the same agent, rather than duplicate
> offers.  I don't see a specific bug in the suspect code
> (OfferManager.java), but it does stand out as subject to races.
> Specifically, there is a lack of synchronization when checking for an offer
> exists for a given agent ID and subsequently removing that offer.
>
> Can you file a bug?
>
> On Thu, Sep 28, 2017 at 1:56 PM, Mohit Jaggi <mohit.ja...@uber.com> wrote:
>
>> Folks,
>>
>> I saw the following crash in my scheduler. It appears to be due to
>> duplicates offers. Any insights appreciated!
>>
>> Mohit.
>>
>> *Code:*
>>
>> https://github.com/apache/aurora/blob/master/src/main/java/
>> org/apache/aurora/scheduler/preemptor/PendingTaskProcessor.java#L145
>>
>> *Logs:*
>>
>> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: Sep 28, 2017 6:09:00
>> PM com.google.common.util.concurrent.ServiceManager$ServiceListener
>> failed
>>
>>
>> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: SEVERE: Service
>> PreemptorService [FAILED] has failed in the RUNNING state.
>>
>>
>> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]:
>> java.lang.IllegalArgumentException: Multiple entries with same key:
>> 1ed038e0-a3ef-4476-adfd-70c86241c5f7-S102=HostOffer{offer=id {
>>
>>
>> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: value:
>> "f7b84805-a0c5-4405-be77-f7f1b7110405-O56597202"
>>
>>
>> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: }
>>
>>
>> ...
>>
>> ...
>>
>>
>> ep 28 18:09:00 machine1163 aurora-scheduler[14266]: ,
>> hostAttributes=IHostAttributes{host=compute606-dca1.prod.uber.internal,
>> attributes=[IAttribute{name=host, values=[compute606-dca1]},
>> IAttribute{name=rack, values=[as13]}, IAttribute{name=pod, values=[d]},
>> IAttribute{name=dedicated, values=[infra/cassandra]}], mode=NONE,
>> slaveId=1ed038e0-a3ef-4476-adfd-70c86241c5f7-S102}}. To index multiple
>> values under a key, use Multimaps.index.
>>
>>
>> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at
>> com.google.common.collect.Maps.uniqueIndex(Maps.java:1251)
>>
>>
>> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at
>> com.google.common.collect.Maps.uniqueIndex(Maps.java:1208)
>>
>>
>> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at
>> org.apache.aurora.scheduler.preemptor.PendingTaskProcessor.lambda$run$0(
>> PendingTaskProcessor.java:146)
>>
>>
>> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at
>> org.apache.aurora.scheduler.storage.db.DbStorage.read(DbStorage.java:147)
>>
>>
>> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at
>> org.mybatis.guice.transactional.TransactionalMethodIntercept
>> or.invoke(TransactionalMethodInterceptor.java:101)
>>
>>
>> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at
>> org.apache.aurora.common.inject.TimedInterceptor.invoke(
>> TimedInterceptor.java:83)
>>
>>
>> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at
>> org.apache.aurora.scheduler.storage.log.LogStorage.read(LogS
>> torage.java:562)
>>
>>
>> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at
>> org.apache.aurora.scheduler.storage.CallOrderEnforcingStorag
>> e.read(CallOrderEnforcingStorage.java:113)
>>
>>
>> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at
>> org.apache.aurora.scheduler.preemptor.PendingTaskProcessor.
>> run(PendingTaskProcessor.java:135)
>>
>>
>> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at
>> org.apache.aurora.common.inject.TimedInterceptor.invoke(
>> TimedInterceptor.java:83)
>>
>>
>> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at
>> org.apache.aurora.scheduler.preemptor.PreemptorModule$Preemp
>> torService.runOneIteration(PreemptorModule.java:161)
>>
>>
>> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at
>> com.google.common.util.concurrent.AbstractScheduledService$S
>> erviceDelegate$Task.run(AbstractScheduledService.java:188)
>>
>>
>> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at
>> com.google.common.util.concurrent.Callables$4.run(Callables.java:122)
>>
>>
>> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>
>>
>> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at
>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>>
>>
>> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFu
>> tureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>>
>>
>> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFu
>> tureTask.run(ScheduledThreadPoolExecutor.java:294)
>>
>>
>> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1142)
>>
>>
>> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:617)
>>
>>
>> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at
>> java.lang.Thread.run(Thread.java:748)
>>
>>
>> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: E0928 18:09:00.316
>> [PreemptorService RUNNING, GuavaUtils$LifecycleShutdownListener:55]
>> Service: PreemptorService [FAILED] failed unexpectedly. Triggering shutdown.
>>
>>
>> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: I0928 18:09:00.316
>> [qtp1000734462-3068369, Slf4jRequestLog:60] 10.187.28.19 - -
>> [28/Sep/2017:18:09:00 +0000] "POST //10.188.43.6:8082/api HTTP/1.1" 200
>> 95
>>
>>
>> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: I0928 18:09:00.316
>> [PreemptorService RUNNING, Lifecycle:84] Shutting down application
>>
>>
>>
>> Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: I0928 18:09:00.316
>> [PreemptorService RUNNING, ShutdownRegistry$ShutdownRegistryImpl:77]
>> Executing 4 shutdown commands.
>>
>>
>>
>>
>>
>

Reply via email to