[ 
https://issues.apache.org/jira/browse/MESOS-10008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler reassigned MESOS-10008:
---------------------------------------

    Assignee: Benjamin Mahler

> Invalid quota config can crash master
> -------------------------------------
>
>                 Key: MESOS-10008
>                 URL: https://issues.apache.org/jira/browse/MESOS-10008
>             Project: Mesos
>          Issue Type: Improvement
>            Reporter: Andrei Sekretenko
>            Assignee: Benjamin Mahler
>            Priority: Major
>
> We are observing the following crash on the 1.9.1 master:
> {code}
> I1008 10:12:15.148486  4687 http.cpp:1115] HTTP POST for 
> /master/api/v1?_ts=1570529541073&UPDATE_QUOTA from 10.0.7.253:35410 with 
> User-Agent='Mozilla/5.0 (Windows NT 6.1; Win64; x64) Ap>
> I1008 10:12:15.148665  4687 http.cpp:263] Processing call UPDATE_QUOTA
> I1008 10:12:15.148756  4687 quota_handler.cpp:1136] Authorizing principal 
> 'bootstrapuser' to update quota config for role 's1'
> I1008 10:12:15.149169  4685 registrar.cpp:487] Applied 1 operations in 
> 56277ns; attempting to update the registry
> I1008 10:12:15.149338  4681 coordinator.cpp:348] Coordinator attempting to 
> write APPEND action at position 13
> I1008 10:12:15.149467  4689 replica.cpp:541] Replica received write request 
> for position 13 from __req_res__(29)@10.0.7.253:5050
> I1008 10:12:15.151820  4683 replica.cpp:695] Replica received learned notice 
> for position 13 from log-network(2)@10.0.7.253:5050
> I1008 10:12:15.153559  4679 registrar.cpp:544] Successfully updated the 
> registry in 4.348928ms
> I1008 10:12:15.153592  4678 coordinator.cpp:348] Coordinator attempting to 
> write TRUNCATE action at position 14
> I1008 10:12:15.153715  4679 hierarchical.cpp:1619] Updated quota for role 
> 's1',  guarantees: {} limits: cpus:2; disk:-9.22337203685478e+15; gpus:3; 
> mem:1000000000000
> I1008 10:12:15.153796  4677 replica.cpp:541] Replica received write request 
> for position 14 from __req_res__(30)@10.0.7.253:5050
> I1008 10:12:15.155380  4691 replica.cpp:695] Replica received learned notice 
> for position 14 from log-network(2)@10.0.7.253:5050
> I1008 10:12:15.249722  4677 authenticator.cpp:324] dstip=10.0.7.253 
> type=audit timestamp=2019-10-08 10:12:15.249673984+00:00 reason="Valid 
> authentication token" uid="bootstrapuser" obje>
> I1008 10:12:15.249956  4682 http.cpp:1115] HTTP GET for 
> /master/state-summary?_ts=1570529541169 from 10.0.7.253:35414 with 
> User-Agent='Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebK>
> I1008 10:12:15.250633  4691 http.cpp:1132] HTTP GET for 
> /master/state-summary?_ts=1570529541169 from 10.0.7.253:35414: '200 OK' after 
> 1.72621ms
> I1008 10:12:15.570379  4689 hierarchical.cpp:1908] Before allocation, 
> required quota headroom is {} and available quota headroom is cpus:0.9; 
> disk:75853; mem:5507
> F1008 10:12:15.570580  4689 resource_quantities.cpp:330] Check failed: scalar 
> >= Value::Scalar() (-9.22337203685478e+15 vs. 0)
> *** Check failure stack trace: ***
>     @     0x7fc786f0148d  google::LogMessage::Fail()
>     @     0x7fc786f036e8  google::LogMessage::SendToLog()
>     @     0x7fc786f01023  google::LogMessage::Flush()
>     @     0x7fc786f04029  google::LogMessageFatal::~LogMessageFatal()
>     @     0x7fc785954dfa  mesos::ResourceQuantities::add()
>     @     0x7fc785954fb6  mesos::ResourceQuantities::fromScalarResource()
>     @     0x7fc78595e135  mesos::shrinkResources()
>     @     0x7fc785a874a9  
> mesos::internal::master::allocator::internal::HierarchicalAllocatorProcess::__allocate()
>     @     0x7fc785a88089  
> mesos::internal::master::allocator::internal::HierarchicalAllocatorProcess::_allocate()
>     @     0x7fc785a93882  
> _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8dispatchI7NothingN5mesos8internal6master9allocator8internal28Hier>
>     @     0x7fc786e49e21  process::ProcessBase::consume()
>     @     0x7fc786e6141b  process::ProcessManager::resume()
>     @     0x7fc786e670b6  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv
>     @     0x7fc782a28b22  (unknown)
>     @     0x7fc7821be94a  (unknown)
>     @     0x7fc781eef07f  clone
> {code}
> Note that the value of disk quota limit is *logged* as "negative".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to