> On July 15, 2016, 5:03 p.m., Sid Wagle wrote:
> > Ship It!
> 
> Sid Wagle wrote:
>     General question: Any reason why we started to see this now? Is it 
> possible the postgres verion 9.2 does not suffer from this? We seem to be 
> still installing postgresql-server-8.4.20-6
> 
> Jonathan Hurley wrote:
>     Yeah, I asked myself that same question. The Postgres instance this was 
> seen on was remote (truly remote) and rather slow. I don't think it has to do 
> with the version of Postgres (I think it was 9.1 on Debian).
>     
>     But I do know really know why this happened all of a sudden - perhaps all 
> of our large upgrade tests have been on local Postgres up until now? I looked 
> through the code which was making queries to the DB during the transaction 
> and it hadn't changed in a while. If we had recently introduced a massive 
> call to DB during the upgrade creation, that could have done it. But I think 
> this is just a case of "it's always been there, but we never had the right 
> circumstances to show it".
>     
>     Feel free to look at the Jira; it has my analysis in detail.

Thanks the detailed explanation on the Jira.


- Sid


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50079/#review142395
-----------------------------------------------------------


On July 15, 2016, 8:06 p.m., Jonathan Hurley wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50079/
> -----------------------------------------------------------
> 
> (Updated July 15, 2016, 8:06 p.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Nate Cole, and Sid Wagle.
> 
> 
> Bugs: AMBARI-17738
>     https://issues.apache.org/jira/browse/AMBARI-17738
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> Reproduced as part of creating a rolling upgrade on a large cluster.
> 
> Initially appearing as a deadlock, it's caused by Postgres is holding the 
> socket open indefinitely. We have a write lock being held while the socket is 
> open. Jstack dumps taken many minutes apart show the same thread is stuck in 
> a socket read. Investigating on Postgres shows that there is a lock blocking 
> the thread which is waiting.
> 
> The sequence query is currently stuck in the {{idle in transaction}} state 
> which is why it's blocking the other query. The transaction isn't being ended 
> by EclipseLink.
> 
> The cause is that we begin a transaction and then hammer the database for 2-3 
> minutes. During which time, Postgres must keep track of all kinds of 
> hostcomponentstate updates isolated from our current transaction. When we go 
> to commit the upgrade, Postgres eventually ends in a deadlock where it 
> doesn't think that the transaction ended.
> 
> 
> Diffs
> -----
> 
>   
> ambari-server/src/main/java/org/apache/ambari/server/controller/internal/UpgradeResourceProvider.java
>  2e976ba 
>   
> ambari-server/src/main/java/org/apache/ambari/server/orm/entities/UpgradeEntity.java
>  db27ea5 
>   
> ambari-server/src/main/java/org/apache/ambari/server/orm/entities/UpgradeGroupEntity.java
>  96f96d5 
>   
> ambari-server/src/main/java/org/apache/ambari/server/orm/entities/UpgradeItemEntity.java
>  6e4a889 
>   
> ambari-server/src/test/java/org/apache/ambari/server/controller/internal/UpgradeResourceProviderTest.java
>  a5db0f0 
> 
> Diff: https://reviews.apache.org/r/50079/diff/
> 
> 
> Testing
> -------
> 
> Fixed on a live cluster where it was 100% reproducible.
> 
> 
> Thanks,
> 
> Jonathan Hurley
> 
>

Reply via email to