[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2015-05-27 Thread Aaron Bentley
It has returned: http://reports.vapour.ws/releases/2691/job/run-unit-tests-trusty-amd64/attempt/2554 ** Changed in: juju-core Status: Fix Released = Triaged ** Changed in: juju-core Milestone: 1.21-alpha1 = 1.25.0 ** Information type changed from Public to Private -- You received

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2015-05-27 Thread Aaron Bentley
It has returned: http://reports.vapour.ws/releases/2691/job/run-unit-tests-trusty-amd64/attempt/2554 ** Changed in: juju-core Status: Fix Released = Triaged ** Changed in: juju-core Milestone: 1.21-alpha1 = 1.25.0 ** Information type changed from Public to Private -- You received

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2015-04-06 Thread Curtis Hovey
** Changed in: juju-core (Ubuntu) Status: Triaged = Fix Released -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1318366 Title: jujud on state server panic misses transaction in queue To

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2015-04-06 Thread Curtis Hovey
** Changed in: juju-core (Ubuntu) Status: Triaged = Fix Released -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to juju-core in Ubuntu. https://bugs.launchpad.net/bugs/1318366 Title: jujud on state server panic misses

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-09-08 Thread Curtis Hovey
** Changed in: juju-core Status: Fix Committed = Fix Released -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to juju-core in Ubuntu. https://bugs.launchpad.net/bugs/1318366 Title: jujud on state server panic misses transaction

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-09-08 Thread Curtis Hovey
** Changed in: juju-core Status: Fix Committed = Fix Released -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1318366 Title: jujud on state server panic misses transaction in queue To manage

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-08-18 Thread Curtis Hovey
** Changed in: juju-core/1.20 Status: Fix Committed = Fix Released -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to juju-core in Ubuntu. https://bugs.launchpad.net/bugs/1318366 Title: jujud on state server panic misses

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-08-18 Thread Curtis Hovey
** Changed in: juju-core/1.20 Status: Fix Committed = Fix Released -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1318366 Title: jujud on state server panic misses transaction in queue To

Re: [Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-30 Thread Mark Shuttleworth
Is there a way to run Go processes under a debugger and generate very high-resolution debugging output? I'm seeing this every second or third attempt to build a cloud. It might be that debugging overhead makes the problem vanish (yay Heisenberg) but it might give us a useful picture.

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-30 Thread Gustavo Niemeyer
As posted in juju-dev last night: Okay, I couldn't resist investigating a bit. I've been looking at the database dump from earlier today and it's smelling like a simpler bug in the txn package, and I might have found the cause already. Here is a quick walkthrough while debugging the problem, to

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-30 Thread Gustavo Niemeyer
Alright, the guess last night was correct, and the candidate fix as well. I've managed to reproduce the problem by stressing out the scenario described with 4 concurrent runners running the following two operations, meanwhile the chaos mechanism injects random slowdowns in various critical points:

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-30 Thread Mark Shuttleworth
Thanks Gustavo, this is 50% of the issues I see on cloud builds so am excited to get a build of the tools with this fix applied. Curtis, think we can spin a build through CI asap that would show up in the testing tools bucket on S3? -- You received this bug notification because you are a member

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-30 Thread Ian Booth
** Also affects: juju-core/1.20 Importance: Undecided Status: New ** Changed in: juju-core/1.20 Milestone: None = 1.20.2 ** Changed in: juju-core/1.20 Importance: Undecided = High ** Changed in: juju-core/1.20 Status: New = In Progress ** Changed in: juju-core

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-30 Thread Ian Booth
** Changed in: juju-core/1.20 Status: In Progress = Fix Committed -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to juju-core in Ubuntu. https://bugs.launchpad.net/bugs/1318366 Title: jujud on state server panic misses

Re: [Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-30 Thread Mark Shuttleworth
Is there a way to run Go processes under a debugger and generate very high-resolution debugging output? I'm seeing this every second or third attempt to build a cloud. It might be that debugging overhead makes the problem vanish (yay Heisenberg) but it might give us a useful picture.

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-30 Thread Gustavo Niemeyer
As posted in juju-dev last night: Okay, I couldn't resist investigating a bit. I've been looking at the database dump from earlier today and it's smelling like a simpler bug in the txn package, and I might have found the cause already. Here is a quick walkthrough while debugging the problem, to

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-30 Thread Gustavo Niemeyer
Alright, the guess last night was correct, and the candidate fix as well. I've managed to reproduce the problem by stressing out the scenario described with 4 concurrent runners running the following two operations, meanwhile the chaos mechanism injects random slowdowns in various critical points:

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-30 Thread Mark Shuttleworth
Thanks Gustavo, this is 50% of the issues I see on cloud builds so am excited to get a build of the tools with this fix applied. Curtis, think we can spin a build through CI asap that would show up in the testing tools bucket on S3? -- You received this bug notification because you are a member

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-30 Thread Ian Booth
** Also affects: juju-core/1.20 Importance: Undecided Status: New ** Changed in: juju-core/1.20 Milestone: None = 1.20.2 ** Changed in: juju-core/1.20 Importance: Undecided = High ** Changed in: juju-core/1.20 Status: New = In Progress ** Changed in: juju-core

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-30 Thread Ian Booth
** Changed in: juju-core/1.20 Status: In Progress = Fix Committed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1318366 Title: jujud on state server panic misses transaction in queue To

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-29 Thread Mark Shuttleworth
Yes, I saw the same restarting of Mongo, looks like every 10-15 seconds. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to juju-core in Ubuntu. https://bugs.launchpad.net/bugs/1318366 Title: jujud on state server panic misses

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-29 Thread Mark Shuttleworth
Attached is a dump of the Juju database in this case. ** Attachment added: dump.tgz https://bugs.launchpad.net/juju-core/+bug/1318366/+attachment/4164924/+files/dump.tgz -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to juju-core in

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-29 Thread Mark Shuttleworth
After some experiments in compression options, here are all the Juju logs you could ever want :) http://people.canonical.com/~mark/juju-server-crash-logs.tar.xz 68M compressed, about 1.9G uncompressed. That's /var/log/juju/ from machine 0. -- You received this bug notification because you are

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-29 Thread Kapil Thangavelu
And the db dump (732k) of the juju db in mongo on mark's state server is at http://chinstrap.canonical.com/~kapil/bug-13183656-juju-db.dump.tbz2 -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to juju-core in Ubuntu.

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-29 Thread Mark Shuttleworth
Here is a snippet of syslog showing two cycles of Mongo starts and restarts. This is happening constantly! Gustavo and I are wondering whether the numactl advice might be relevant. ** Attachment added: syslog.mongorestarts.log

Re: [Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-29 Thread Adam Collard
On 29 July 2014 20:12, Mark Shuttleworth 1318...@bugs.launchpad.net wrote: Gustavo and I are wondering whether the numactl advice might be relevant. I humbly suggest that's a red herring. Note this happened on an Orange Box (see log attached in

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-29 Thread Mark Shuttleworth
Digging in further, it appears that jujud is writing to /etc/init/juju- db.conf (the Upstart job for its database) every few seconds. I'll file a separate bug about this because it plausibly is the root cause of the mongo restarts we're seeing. -- You received this bug notification because you

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-29 Thread Menno Smits
Summarising what we know so far and adding a few more details: 1. We understand why mongo was continually restarting. jujud currently restarts mongo every time it starts so every time jujud panicked (~ every 10s), upstart would restart jujud and jujud would restart mongo. This explains the

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-29 Thread Mark Shuttleworth
Yes, I saw the same restarting of Mongo, looks like every 10-15 seconds. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1318366 Title: jujud on state server panic misses transaction in queue To

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-29 Thread Mark Shuttleworth
Attached is a dump of the Juju database in this case. ** Attachment added: dump.tgz https://bugs.launchpad.net/juju-core/+bug/1318366/+attachment/4164924/+files/dump.tgz -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-29 Thread Mark Shuttleworth
After some experiments in compression options, here are all the Juju logs you could ever want :) http://people.canonical.com/~mark/juju-server-crash-logs.tar.xz 68M compressed, about 1.9G uncompressed. That's /var/log/juju/ from machine 0. -- You received this bug notification because you are

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-29 Thread Kapil Thangavelu
And the db dump (732k) of the juju db in mongo on mark's state server is at http://chinstrap.canonical.com/~kapil/bug-13183656-juju-db.dump.tbz2 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1318366

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-29 Thread Mark Shuttleworth
Here is a snippet of syslog showing two cycles of Mongo starts and restarts. This is happening constantly! Gustavo and I are wondering whether the numactl advice might be relevant. ** Attachment added: syslog.mongorestarts.log

Re: [Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-29 Thread Adam Collard
On 29 July 2014 20:12, Mark Shuttleworth 1318...@bugs.launchpad.net wrote: Gustavo and I are wondering whether the numactl advice might be relevant. I humbly suggest that's a red herring. Note this happened on an Orange Box (see log attached in

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-29 Thread Mark Shuttleworth
Digging in further, it appears that jujud is writing to /etc/init/juju- db.conf (the Upstart job for its database) every few seconds. I'll file a separate bug about this because it plausibly is the root cause of the mongo restarts we're seeing. -- You received this bug notification because you

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-29 Thread Menno Smits
Summarising what we know so far and adding a few more details: 1. We understand why mongo was continually restarting. jujud currently restarts mongo every time it starts so every time jujud panicked (~ every 10s), upstart would restart jujud and jujud would restart mongo. This explains the

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-28 Thread Mark Shuttleworth
I've got this in a live environment from the cloud-installer too. How do I know where to point mongodump? -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to juju-core in Ubuntu. https://bugs.launchpad.net/bugs/1318366 Title: jujud on

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-28 Thread Mark Shuttleworth
I've got this in a live environment from the cloud-installer too. How do I know where to point mongodump? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1318366 Title: jujud on state server panic

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-24 Thread Adam Collard
** Tags added: orange-box -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to juju-core in Ubuntu. https://bugs.launchpad.net/bugs/1318366 Title: jujud on state server panic misses transaction in queue To manage notifications about

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-24 Thread Curtis Hovey
** Changed in: juju-core Milestone: next-stable = 1.21-alpha1 -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to juju-core in Ubuntu. https://bugs.launchpad.net/bugs/1318366 Title: jujud on state server panic misses transaction in

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-24 Thread Nate Finch
I'm looking at this right now. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to juju-core in Ubuntu. https://bugs.launchpad.net/bugs/1318366 Title: jujud on state server panic misses transaction in queue To manage notifications

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-24 Thread Nate Finch
Is there a more complete log than the one posted on james-page's link? That one appears to be cut off at a million lines, and doesn't contain the full panic output, and nothing before the panic, which would be very helpful. -- You received this bug notification because you are a member of Ubuntu

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-24 Thread Nate Finch
Note: I received full logs privately, as they may have sensitive info. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to juju-core in Ubuntu. https://bugs.launchpad.net/bugs/1318366 Title: jujud on state server panic misses

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-24 Thread Gustavo Niemeyer
This error should never happen on a healthy database. The only case I've debugged with such an issue was on a system that had a corrupted database due to an out-of-space situation. The reason why this should never happen is clear in the code of the txn package: before anything is ever done with

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-24 Thread Gustavo Niemeyer
@John, it's definitely a bad idea to have transactions in a capped collection for that sort of reason, but as far as I can see the _txns_ collection, the one holding the transactions themselves, is not capped. Having missing logs for a transaction would not cause this issue. -- You received this

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-24 Thread Adam Collard
If anyone hits this again, you should grab a dump of the Mongo db (mongodump pointed at juju-db address) -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to juju-core in Ubuntu. https://bugs.launchpad.net/bugs/1318366 Title: jujud on

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-24 Thread Gustavo Niemeyer
Looking at the logs from Adam that Nate forwarded to me, I can see the database is being terminated and restarted over and over and over, every few seconds. Looking at logs around it, looks like at least rsyslogd is also being re-freshed on the same cadence. By itself, this should not be an

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-24 Thread Adam Collard
** Tags added: orange-box -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1318366 Title: jujud on state server panic misses transaction in queue To manage notifications about this bug go to:

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-24 Thread Curtis Hovey
** Changed in: juju-core Milestone: next-stable = 1.21-alpha1 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1318366 Title: jujud on state server panic misses transaction in queue To manage

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-24 Thread Nate Finch
I'm looking at this right now. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1318366 Title: jujud on state server panic misses transaction in queue To manage notifications about this bug go to:

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-24 Thread Nate Finch
Is there a more complete log than the one posted on james-page's link? That one appears to be cut off at a million lines, and doesn't contain the full panic output, and nothing before the panic, which would be very helpful. -- You received this bug notification because you are a member of Ubuntu

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-24 Thread Nate Finch
Note: I received full logs privately, as they may have sensitive info. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1318366 Title: jujud on state server panic misses transaction in queue To

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-24 Thread Gustavo Niemeyer
This error should never happen on a healthy database. The only case I've debugged with such an issue was on a system that had a corrupted database due to an out-of-space situation. The reason why this should never happen is clear in the code of the txn package: before anything is ever done with

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-24 Thread Gustavo Niemeyer
@John, it's definitely a bad idea to have transactions in a capped collection for that sort of reason, but as far as I can see the _txns_ collection, the one holding the transactions themselves, is not capped. Having missing logs for a transaction would not cause this issue. -- You received this

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-24 Thread Adam Collard
If anyone hits this again, you should grab a dump of the Mongo db (mongodump pointed at juju-db address) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1318366 Title: jujud on state server panic

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-24 Thread Gustavo Niemeyer
Looking at the logs from Adam that Nate forwarded to me, I can see the database is being terminated and restarted over and over and over, every few seconds. Looking at logs around it, looks like at least rsyslogd is also being re-freshed on the same cadence. By itself, this should not be an

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-22 Thread Adam Collard
** Summary changed: - provisioning large environments with complex relations caused jujud on bootstrap node to fail + jujud on state server panic misses transaction in queue -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to juju-core

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-22 Thread Adam Collard
Note my panic was from Juju 1.20.1, I have the full log, am worried about leaking sensitive information though. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to juju-core in Ubuntu. https://bugs.launchpad.net/bugs/1318366 Title:

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-22 Thread Adam Collard
Please see attached mongo log, it seems to be dying lots. ** Attachment added: grep for mongod in syslog https://bugs.launchpad.net/juju-core/+bug/1318366/+attachment/4160060/+files/mongod-syslog -- You received this bug notification because you are a member of Ubuntu Server Team, which is

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-22 Thread Adam Collard
** Summary changed: - provisioning large environments with complex relations caused jujud on bootstrap node to fail + jujud on state server panic misses transaction in queue -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-22 Thread Adam Collard
Note my panic was from Juju 1.20.1, I have the full log, am worried about leaking sensitive information though. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1318366 Title: jujud on state server

[Bug 1318366] Re: jujud on state server panic misses transaction in queue

2014-07-22 Thread Adam Collard
Please see attached mongo log, it seems to be dying lots. ** Attachment added: grep for mongod in syslog https://bugs.launchpad.net/juju-core/+bug/1318366/+attachment/4160060/+files/mongod-syslog -- You received this bug notification because you are a member of Ubuntu Bugs, which is