FWIW, the only time I've seen this happen here is when someone accidentally clears the work dir (default=/tmp/mesos), which I personally would advise to put somewhere else where rogue people or processes are less likely to throw things away accidentally. Could it be that? Although... tasks were 'lost' at that point, so it differs slightly (same general outcome, not entirely the same symptoms).
On Tue, Apr 5, 2016 at 11:35 PM, Justin Ryan <[email protected]> wrote: > An interesting fact I left out, the count of “Running” tasks remains > intact, while absolutely no history remains in the dashboard. > > > > From: Justin Ryan <[email protected]> > Reply-To: "[email protected]" <[email protected]> > Date: Tuesday, April 5, 2016 at 12:29 PM > To: "[email protected]" <[email protected]> > Subject: Disappearing tasks > > Hiya folks! > > I’ve spent the past few weeks prototyping a new data cluster with Mesos, > Kafka, and Flume delivering data to HDFS which we plan to interact with via > Spark. In the prototype environment, I had a fairly high volume of test > data flowing for some weeks with little to no major issues except for > learning about tuning Kafka and Flume. > > I’m launching kafka with the github.com/mesos/kafka project, and flume is > run via marathon. > > Yesterday morning, I came in and my flume jobs had disappeared from the > task list in Mesos, though I found the actual processes still running when > I searched the cluster ’ps’ output. Later in the day, I had the same > happen to my kafka brokers. In some cases, the only way I’ve found to > recover from this is to shut everything down and clear the zookeeper data, > which would be fairly drastic if it happened in production, and > particularly if we had many tasks / frameworks that were fine, but one or > two disappeared. > > I’d appreciate any help sorting through this, I’m using latest Mesos and > CDH5 installed via community Chef cookbooks. > >

