Re: Review Request 50079: EclipseLink Sequence Query Stuck Inside of Transaction And Blocks Other Threads

2016-07-15 Thread Sid Wagle


> On July 15, 2016, 5:03 p.m., Sid Wagle wrote:
> > Ship It!
> 
> Sid Wagle wrote:
> General question: Any reason why we started to see this now? Is it 
> possible the postgres verion 9.2 does not suffer from this? We seem to be 
> still installing postgresql-server-8.4.20-6
> 
> Jonathan Hurley wrote:
> Yeah, I asked myself that same question. The Postgres instance this was 
> seen on was remote (truly remote) and rather slow. I don't think it has to do 
> with the version of Postgres (I think it was 9.1 on Debian).
> 
> But I do know really know why this happened all of a sudden - perhaps all 
> of our large upgrade tests have been on local Postgres up until now? I looked 
> through the code which was making queries to the DB during the transaction 
> and it hadn't changed in a while. If we had recently introduced a massive 
> call to DB during the upgrade creation, that could have done it. But I think 
> this is just a case of "it's always been there, but we never had the right 
> circumstances to show it".
> 
> Feel free to look at the Jira; it has my analysis in detail.

Thanks the detailed explanation on the Jira.


- Sid


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50079/#review142395
---


On July 15, 2016, 8:06 p.m., Jonathan Hurley wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50079/
> ---
> 
> (Updated July 15, 2016, 8:06 p.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Nate Cole, and Sid Wagle.
> 
> 
> Bugs: AMBARI-17738
> https://issues.apache.org/jira/browse/AMBARI-17738
> 
> 
> Repository: ambari
> 
> 
> Description
> ---
> 
> Reproduced as part of creating a rolling upgrade on a large cluster.
> 
> Initially appearing as a deadlock, it's caused by Postgres is holding the 
> socket open indefinitely. We have a write lock being held while the socket is 
> open. Jstack dumps taken many minutes apart show the same thread is stuck in 
> a socket read. Investigating on Postgres shows that there is a lock blocking 
> the thread which is waiting.
> 
> The sequence query is currently stuck in the {{idle in transaction}} state 
> which is why it's blocking the other query. The transaction isn't being ended 
> by EclipseLink.
> 
> The cause is that we begin a transaction and then hammer the database for 2-3 
> minutes. During which time, Postgres must keep track of all kinds of 
> hostcomponentstate updates isolated from our current transaction. When we go 
> to commit the upgrade, Postgres eventually ends in a deadlock where it 
> doesn't think that the transaction ended.
> 
> 
> Diffs
> -
> 
>   
> ambari-server/src/main/java/org/apache/ambari/server/controller/internal/UpgradeResourceProvider.java
>  2e976ba 
>   
> ambari-server/src/main/java/org/apache/ambari/server/orm/entities/UpgradeEntity.java
>  db27ea5 
>   
> ambari-server/src/main/java/org/apache/ambari/server/orm/entities/UpgradeGroupEntity.java
>  96f96d5 
>   
> ambari-server/src/main/java/org/apache/ambari/server/orm/entities/UpgradeItemEntity.java
>  6e4a889 
>   
> ambari-server/src/test/java/org/apache/ambari/server/controller/internal/UpgradeResourceProviderTest.java
>  a5db0f0 
> 
> Diff: https://reviews.apache.org/r/50079/diff/
> 
> 
> Testing
> ---
> 
> Fixed on a live cluster where it was 100% reproducible.
> 
> 
> Thanks,
> 
> Jonathan Hurley
> 
>



Re: Review Request 50079: EclipseLink Sequence Query Stuck Inside of Transaction And Blocks Other Threads

2016-07-15 Thread Jonathan Hurley

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50079/
---

(Updated July 15, 2016, 4:06 p.m.)


Review request for Ambari, Alejandro Fernandez, Nate Cole, and Sid Wagle.


Changes
---

Added test coverage.


Bugs: AMBARI-17738
https://issues.apache.org/jira/browse/AMBARI-17738


Repository: ambari


Description
---

Reproduced as part of creating a rolling upgrade on a large cluster.

Initially appearing as a deadlock, it's caused by Postgres is holding the 
socket open indefinitely. We have a write lock being held while the socket is 
open. Jstack dumps taken many minutes apart show the same thread is stuck in a 
socket read. Investigating on Postgres shows that there is a lock blocking the 
thread which is waiting.

The sequence query is currently stuck in the {{idle in transaction}} state 
which is why it's blocking the other query. The transaction isn't being ended 
by EclipseLink.

The cause is that we begin a transaction and then hammer the database for 2-3 
minutes. During which time, Postgres must keep track of all kinds of 
hostcomponentstate updates isolated from our current transaction. When we go to 
commit the upgrade, Postgres eventually ends in a deadlock where it doesn't 
think that the transaction ended.


Diffs (updated)
-

  
ambari-server/src/main/java/org/apache/ambari/server/controller/internal/UpgradeResourceProvider.java
 2e976ba 
  
ambari-server/src/main/java/org/apache/ambari/server/orm/entities/UpgradeEntity.java
 db27ea5 
  
ambari-server/src/main/java/org/apache/ambari/server/orm/entities/UpgradeGroupEntity.java
 96f96d5 
  
ambari-server/src/main/java/org/apache/ambari/server/orm/entities/UpgradeItemEntity.java
 6e4a889 
  
ambari-server/src/test/java/org/apache/ambari/server/controller/internal/UpgradeResourceProviderTest.java
 a5db0f0 

Diff: https://reviews.apache.org/r/50079/diff/


Testing (updated)
---

Fixed on a live cluster where it was 100% reproducible.


Thanks,

Jonathan Hurley



Re: Review Request 50079: EclipseLink Sequence Query Stuck Inside of Transaction And Blocks Other Threads

2016-07-15 Thread Alejandro Fernandez

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50079/#review142398
---


Ship it!




Ship It!

- Alejandro Fernandez


On July 15, 2016, 3:56 p.m., Jonathan Hurley wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50079/
> ---
> 
> (Updated July 15, 2016, 3:56 p.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Nate Cole, and Sid Wagle.
> 
> 
> Bugs: AMBARI-17738
> https://issues.apache.org/jira/browse/AMBARI-17738
> 
> 
> Repository: ambari
> 
> 
> Description
> ---
> 
> Reproduced as part of creating a rolling upgrade on a large cluster.
> 
> Initially appearing as a deadlock, it's caused by Postgres is holding the 
> socket open indefinitely. We have a write lock being held while the socket is 
> open. Jstack dumps taken many minutes apart show the same thread is stuck in 
> a socket read. Investigating on Postgres shows that there is a lock blocking 
> the thread which is waiting.
> 
> The sequence query is currently stuck in the {{idle in transaction}} state 
> which is why it's blocking the other query. The transaction isn't being ended 
> by EclipseLink.
> 
> The cause is that we begin a transaction and then hammer the database for 2-3 
> minutes. During which time, Postgres must keep track of all kinds of 
> hostcomponentstate updates isolated from our current transaction. When we go 
> to commit the upgrade, Postgres eventually ends in a deadlock where it 
> doesn't think that the transaction ended.
> 
> 
> Diffs
> -
> 
>   
> ambari-server/src/main/java/org/apache/ambari/server/controller/internal/UpgradeResourceProvider.java
>  2e976ba 
>   
> ambari-server/src/main/java/org/apache/ambari/server/orm/entities/UpgradeEntity.java
>  db27ea5 
>   
> ambari-server/src/main/java/org/apache/ambari/server/orm/entities/UpgradeGroupEntity.java
>  96f96d5 
>   
> ambari-server/src/main/java/org/apache/ambari/server/orm/entities/UpgradeItemEntity.java
>  6e4a889 
> 
> Diff: https://reviews.apache.org/r/50079/diff/
> 
> 
> Testing
> ---
> 
> Fixed on a live cluster where it was 100% reproducible.
> 
> UNIT TESTS PENDING...
> 
> 
> Thanks,
> 
> Jonathan Hurley
> 
>



Re: Review Request 50079: EclipseLink Sequence Query Stuck Inside of Transaction And Blocks Other Threads

2016-07-15 Thread Sid Wagle


> On July 15, 2016, 5:03 p.m., Sid Wagle wrote:
> > Ship It!

General question: Any reason why we started to see this now? Is it possible the 
postgres verion 9.2 does not suffer from this? We seem to be still installing 
postgresql-server-8.4.20-6


- Sid


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50079/#review142395
---


On July 15, 2016, 3:56 p.m., Jonathan Hurley wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50079/
> ---
> 
> (Updated July 15, 2016, 3:56 p.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Nate Cole, and Sid Wagle.
> 
> 
> Bugs: AMBARI-17738
> https://issues.apache.org/jira/browse/AMBARI-17738
> 
> 
> Repository: ambari
> 
> 
> Description
> ---
> 
> Reproduced as part of creating a rolling upgrade on a large cluster.
> 
> Initially appearing as a deadlock, it's caused by Postgres is holding the 
> socket open indefinitely. We have a write lock being held while the socket is 
> open. Jstack dumps taken many minutes apart show the same thread is stuck in 
> a socket read. Investigating on Postgres shows that there is a lock blocking 
> the thread which is waiting.
> 
> The sequence query is currently stuck in the {{idle in transaction}} state 
> which is why it's blocking the other query. The transaction isn't being ended 
> by EclipseLink.
> 
> The cause is that we begin a transaction and then hammer the database for 2-3 
> minutes. During which time, Postgres must keep track of all kinds of 
> hostcomponentstate updates isolated from our current transaction. When we go 
> to commit the upgrade, Postgres eventually ends in a deadlock where it 
> doesn't think that the transaction ended.
> 
> 
> Diffs
> -
> 
>   
> ambari-server/src/main/java/org/apache/ambari/server/controller/internal/UpgradeResourceProvider.java
>  2e976ba 
>   
> ambari-server/src/main/java/org/apache/ambari/server/orm/entities/UpgradeEntity.java
>  db27ea5 
>   
> ambari-server/src/main/java/org/apache/ambari/server/orm/entities/UpgradeGroupEntity.java
>  96f96d5 
>   
> ambari-server/src/main/java/org/apache/ambari/server/orm/entities/UpgradeItemEntity.java
>  6e4a889 
> 
> Diff: https://reviews.apache.org/r/50079/diff/
> 
> 
> Testing
> ---
> 
> Fixed on a live cluster where it was 100% reproducible.
> 
> UNIT TESTS PENDING...
> 
> 
> Thanks,
> 
> Jonathan Hurley
> 
>



Re: Review Request 50079: EclipseLink Sequence Query Stuck Inside of Transaction And Blocks Other Threads

2016-07-15 Thread Sid Wagle

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50079/#review142395
---


Ship it!




Ship It!

- Sid Wagle


On July 15, 2016, 3:56 p.m., Jonathan Hurley wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50079/
> ---
> 
> (Updated July 15, 2016, 3:56 p.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Nate Cole, and Sid Wagle.
> 
> 
> Bugs: AMBARI-17738
> https://issues.apache.org/jira/browse/AMBARI-17738
> 
> 
> Repository: ambari
> 
> 
> Description
> ---
> 
> Reproduced as part of creating a rolling upgrade on a large cluster.
> 
> Initially appearing as a deadlock, it's caused by Postgres is holding the 
> socket open indefinitely. We have a write lock being held while the socket is 
> open. Jstack dumps taken many minutes apart show the same thread is stuck in 
> a socket read. Investigating on Postgres shows that there is a lock blocking 
> the thread which is waiting.
> 
> The sequence query is currently stuck in the {{idle in transaction}} state 
> which is why it's blocking the other query. The transaction isn't being ended 
> by EclipseLink.
> 
> The cause is that we begin a transaction and then hammer the database for 2-3 
> minutes. During which time, Postgres must keep track of all kinds of 
> hostcomponentstate updates isolated from our current transaction. When we go 
> to commit the upgrade, Postgres eventually ends in a deadlock where it 
> doesn't think that the transaction ended.
> 
> 
> Diffs
> -
> 
>   
> ambari-server/src/main/java/org/apache/ambari/server/controller/internal/UpgradeResourceProvider.java
>  2e976ba 
>   
> ambari-server/src/main/java/org/apache/ambari/server/orm/entities/UpgradeEntity.java
>  db27ea5 
>   
> ambari-server/src/main/java/org/apache/ambari/server/orm/entities/UpgradeGroupEntity.java
>  96f96d5 
>   
> ambari-server/src/main/java/org/apache/ambari/server/orm/entities/UpgradeItemEntity.java
>  6e4a889 
> 
> Diff: https://reviews.apache.org/r/50079/diff/
> 
> 
> Testing
> ---
> 
> Fixed on a live cluster where it was 100% reproducible.
> 
> UNIT TESTS PENDING...
> 
> 
> Thanks,
> 
> Jonathan Hurley
> 
>



Re: Review Request 50079: EclipseLink Sequence Query Stuck Inside of Transaction And Blocks Other Threads

2016-07-15 Thread Nate Cole

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50079/#review142388
---


Ship it!




Ship It!

- Nate Cole


On July 15, 2016, 11:56 a.m., Jonathan Hurley wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50079/
> ---
> 
> (Updated July 15, 2016, 11:56 a.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Nate Cole, and Sid Wagle.
> 
> 
> Bugs: AMBARI-17738
> https://issues.apache.org/jira/browse/AMBARI-17738
> 
> 
> Repository: ambari
> 
> 
> Description
> ---
> 
> Reproduced as part of creating a rolling upgrade on a large cluster.
> 
> Initially appearing as a deadlock, it's caused by Postgres is holding the 
> socket open indefinitely. We have a write lock being held while the socket is 
> open. Jstack dumps taken many minutes apart show the same thread is stuck in 
> a socket read. Investigating on Postgres shows that there is a lock blocking 
> the thread which is waiting.
> 
> The sequence query is currently stuck in the {{idle in transaction}} state 
> which is why it's blocking the other query. The transaction isn't being ended 
> by EclipseLink.
> 
> The cause is that we begin a transaction and then hammer the database for 2-3 
> minutes. During which time, Postgres must keep track of all kinds of 
> hostcomponentstate updates isolated from our current transaction. When we go 
> to commit the upgrade, Postgres eventually ends in a deadlock where it 
> doesn't think that the transaction ended.
> 
> 
> Diffs
> -
> 
>   
> ambari-server/src/main/java/org/apache/ambari/server/controller/internal/UpgradeResourceProvider.java
>  2e976ba 
>   
> ambari-server/src/main/java/org/apache/ambari/server/orm/entities/UpgradeEntity.java
>  db27ea5 
>   
> ambari-server/src/main/java/org/apache/ambari/server/orm/entities/UpgradeGroupEntity.java
>  96f96d5 
>   
> ambari-server/src/main/java/org/apache/ambari/server/orm/entities/UpgradeItemEntity.java
>  6e4a889 
> 
> Diff: https://reviews.apache.org/r/50079/diff/
> 
> 
> Testing
> ---
> 
> Fixed on a live cluster where it was 100% reproducible.
> 
> UNIT TESTS PENDING...
> 
> 
> Thanks,
> 
> Jonathan Hurley
> 
>