> On July 15, 2016, 1:03 p.m., Sid Wagle wrote: > > Ship It! > > Sid Wagle wrote: > General question: Any reason why we started to see this now? Is it > possible the postgres verion 9.2 does not suffer from this? We seem to be > still installing postgresql-server-8.4.20-6
Yeah, I asked myself that same question. The Postgres instance this was seen on was remote (truly remote) and rather slow. I don't think it has to do with the version of Postgres (I think it was 9.1 on Debian). But I do know really know why this happened all of a sudden - perhaps all of our large upgrade tests have been on local Postgres up until now? I looked through the code which was making queries to the DB during the transaction and it hadn't changed in a while. If we had recently introduced a massive call to DB during the upgrade creation, that could have done it. But I think this is just a case of "it's always been there, but we never had the right circumstances to show it". Feel free to look at the Jira; it has my analysis in detail. - Jonathan ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/50079/#review142395 ----------------------------------------------------------- On July 15, 2016, 4:06 p.m., Jonathan Hurley wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/50079/ > ----------------------------------------------------------- > > (Updated July 15, 2016, 4:06 p.m.) > > > Review request for Ambari, Alejandro Fernandez, Nate Cole, and Sid Wagle. > > > Bugs: AMBARI-17738 > https://issues.apache.org/jira/browse/AMBARI-17738 > > > Repository: ambari > > > Description > ------- > > Reproduced as part of creating a rolling upgrade on a large cluster. > > Initially appearing as a deadlock, it's caused by Postgres is holding the > socket open indefinitely. We have a write lock being held while the socket is > open. Jstack dumps taken many minutes apart show the same thread is stuck in > a socket read. Investigating on Postgres shows that there is a lock blocking > the thread which is waiting. > > The sequence query is currently stuck in the {{idle in transaction}} state > which is why it's blocking the other query. The transaction isn't being ended > by EclipseLink. > > The cause is that we begin a transaction and then hammer the database for 2-3 > minutes. During which time, Postgres must keep track of all kinds of > hostcomponentstate updates isolated from our current transaction. When we go > to commit the upgrade, Postgres eventually ends in a deadlock where it > doesn't think that the transaction ended. > > > Diffs > ----- > > > ambari-server/src/main/java/org/apache/ambari/server/controller/internal/UpgradeResourceProvider.java > 2e976ba > > ambari-server/src/main/java/org/apache/ambari/server/orm/entities/UpgradeEntity.java > db27ea5 > > ambari-server/src/main/java/org/apache/ambari/server/orm/entities/UpgradeGroupEntity.java > 96f96d5 > > ambari-server/src/main/java/org/apache/ambari/server/orm/entities/UpgradeItemEntity.java > 6e4a889 > > ambari-server/src/test/java/org/apache/ambari/server/controller/internal/UpgradeResourceProviderTest.java > a5db0f0 > > Diff: https://reviews.apache.org/r/50079/diff/ > > > Testing > ------- > > Fixed on a live cluster where it was 100% reproducible. > > > Thanks, > > Jonathan Hurley > >