[jira] [Updated] (HDFS-11180) Intermittent deadlock in NameNode when failover happens.

2016-12-06 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-11180:
-
   Resolution: Fixed
Fix Version/s: 2.6.6
   Status: Resolved  (was: Patch Available)

Committed this to branch-2.6. Closing this.

> Intermittent deadlock in NameNode when failover happens.
> 
>
> Key: HDFS-11180
> URL: https://issues.apache.org/jira/browse/HDFS-11180
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Abhishek Modi
>Assignee: Akira Ajisaka
>Priority: Blocker
>  Labels: high-availability
> Fix For: 2.8.0, 2.7.4, 3.0.0-alpha2, 2.6.6
>
> Attachments: HDFS-11180-branch-2.01.patch, 
> HDFS-11180-branch-2.6.01.patch, HDFS-11180-branch-2.7.01.patch, 
> HDFS-11180-branch-2.8.01.patch, HDFS-11180.00.patch, HDFS-11180.01.patch, 
> HDFS-11180.02.patch, HDFS-11180.03.patch, HDFS-11180.04.patch, jstack.log
>
>
> It is happening due to metrics getting updated at the same time when failover 
> is happening. Please find attached jstack at that point of time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11180) Intermittent deadlock in NameNode when failover happens.

2016-12-02 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-11180:
-
Fix Version/s: 2.7.4

Committed branch-2.7 patch. I'll run full HDFS tests locally with the 
branch-2.6 patch and then commit it.

> Intermittent deadlock in NameNode when failover happens.
> 
>
> Key: HDFS-11180
> URL: https://issues.apache.org/jira/browse/HDFS-11180
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Abhishek Modi
>Assignee: Akira Ajisaka
>Priority: Blocker
>  Labels: high-availability
> Fix For: 2.8.0, 2.7.4, 3.0.0-alpha2
>
> Attachments: HDFS-11180-branch-2.01.patch, 
> HDFS-11180-branch-2.6.01.patch, HDFS-11180-branch-2.7.01.patch, 
> HDFS-11180-branch-2.8.01.patch, HDFS-11180.00.patch, HDFS-11180.01.patch, 
> HDFS-11180.02.patch, HDFS-11180.03.patch, HDFS-11180.04.patch, jstack.log
>
>
> It is happening due to metrics getting updated at the same time when failover 
> is happening. Please find attached jstack at that point of time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11180) Intermittent deadlock in NameNode when failover happens.

2016-12-01 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-11180:
-
Attachment: HDFS-11180-branch-2.7.01.patch

> Intermittent deadlock in NameNode when failover happens.
> 
>
> Key: HDFS-11180
> URL: https://issues.apache.org/jira/browse/HDFS-11180
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Abhishek Modi
>Assignee: Akira Ajisaka
>Priority: Blocker
>  Labels: high-availability
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: HDFS-11180-branch-2.01.patch, 
> HDFS-11180-branch-2.6.01.patch, HDFS-11180-branch-2.7.01.patch, 
> HDFS-11180-branch-2.8.01.patch, HDFS-11180.00.patch, HDFS-11180.01.patch, 
> HDFS-11180.02.patch, HDFS-11180.03.patch, HDFS-11180.04.patch, jstack.log
>
>
> It is happening due to metrics getting updated at the same time when failover 
> is happening. Please find attached jstack at that point of time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11180) Intermittent deadlock in NameNode when failover happens.

2016-12-01 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-11180:
-
Attachment: (was: HDFS-11180-branch-2.7.01.patch)

> Intermittent deadlock in NameNode when failover happens.
> 
>
> Key: HDFS-11180
> URL: https://issues.apache.org/jira/browse/HDFS-11180
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Abhishek Modi
>Assignee: Akira Ajisaka
>Priority: Blocker
>  Labels: high-availability
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: HDFS-11180-branch-2.01.patch, 
> HDFS-11180-branch-2.6.01.patch, HDFS-11180-branch-2.7.01.patch, 
> HDFS-11180-branch-2.8.01.patch, HDFS-11180.00.patch, HDFS-11180.01.patch, 
> HDFS-11180.02.patch, HDFS-11180.03.patch, HDFS-11180.04.patch, jstack.log
>
>
> It is happening due to metrics getting updated at the same time when failover 
> is happening. Please find attached jstack at that point of time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11180) Intermittent deadlock in NameNode when failover happens.

2016-12-01 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-11180:
-
Attachment: HDFS-11180-branch-2.6.01.patch

> Intermittent deadlock in NameNode when failover happens.
> 
>
> Key: HDFS-11180
> URL: https://issues.apache.org/jira/browse/HDFS-11180
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Abhishek Modi
>Assignee: Akira Ajisaka
>Priority: Blocker
>  Labels: high-availability
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: HDFS-11180-branch-2.01.patch, 
> HDFS-11180-branch-2.6.01.patch, HDFS-11180-branch-2.7.01.patch, 
> HDFS-11180-branch-2.8.01.patch, HDFS-11180.00.patch, HDFS-11180.01.patch, 
> HDFS-11180.02.patch, HDFS-11180.03.patch, HDFS-11180.04.patch, jstack.log
>
>
> It is happening due to metrics getting updated at the same time when failover 
> is happening. Please find attached jstack at that point of time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11180) Intermittent deadlock in NameNode when failover happens.

2016-12-01 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-11180:
-
Attachment: HDFS-11180-branch-2.7.01.patch

> Intermittent deadlock in NameNode when failover happens.
> 
>
> Key: HDFS-11180
> URL: https://issues.apache.org/jira/browse/HDFS-11180
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Abhishek Modi
>Assignee: Akira Ajisaka
>Priority: Blocker
>  Labels: high-availability
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: HDFS-11180-branch-2.01.patch, 
> HDFS-11180-branch-2.7.01.patch, HDFS-11180-branch-2.8.01.patch, 
> HDFS-11180.00.patch, HDFS-11180.01.patch, HDFS-11180.02.patch, 
> HDFS-11180.03.patch, HDFS-11180.04.patch, jstack.log
>
>
> It is happening due to metrics getting updated at the same time when failover 
> is happening. Please find attached jstack at that point of time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11180) Intermittent deadlock in NameNode when failover happens.

2016-12-01 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-11180:
-
Fix Version/s: 2.8.0

Committed branch-2 patch to branch-2/2.8. I'll provide a branch-2.7 patch 
shortly.

> Intermittent deadlock in NameNode when failover happens.
> 
>
> Key: HDFS-11180
> URL: https://issues.apache.org/jira/browse/HDFS-11180
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Abhishek Modi
>Assignee: Akira Ajisaka
>Priority: Blocker
>  Labels: high-availability
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: HDFS-11180-branch-2.01.patch, 
> HDFS-11180-branch-2.8.01.patch, HDFS-11180.00.patch, HDFS-11180.01.patch, 
> HDFS-11180.02.patch, HDFS-11180.03.patch, HDFS-11180.04.patch, jstack.log
>
>
> It is happening due to metrics getting updated at the same time when failover 
> is happening. Please find attached jstack at that point of time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11180) Intermittent deadlock in NameNode when failover happens.

2016-12-01 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-11180:
-
Attachment: HDFS-11180-branch-2.8.01.patch

The branch-2 patch can be applied to branch-2.8 cleanly. Renamed the patch to 
see what happens on branch-2.8.

> Intermittent deadlock in NameNode when failover happens.
> 
>
> Key: HDFS-11180
> URL: https://issues.apache.org/jira/browse/HDFS-11180
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Abhishek Modi
>Assignee: Akira Ajisaka
>Priority: Blocker
>  Labels: high-availability
> Fix For: 3.0.0-alpha2
>
> Attachments: HDFS-11180-branch-2.01.patch, 
> HDFS-11180-branch-2.8.01.patch, HDFS-11180.00.patch, HDFS-11180.01.patch, 
> HDFS-11180.02.patch, HDFS-11180.03.patch, HDFS-11180.04.patch, jstack.log
>
>
> It is happening due to metrics getting updated at the same time when failover 
> is happening. Please find attached jstack at that point of time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11180) Intermittent deadlock in NameNode when failover happens.

2016-12-01 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-11180:
-
Fix Version/s: 3.0.0-alpha2

Committed v4 patch to trunk.

> Intermittent deadlock in NameNode when failover happens.
> 
>
> Key: HDFS-11180
> URL: https://issues.apache.org/jira/browse/HDFS-11180
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Abhishek Modi
>Assignee: Akira Ajisaka
>Priority: Blocker
>  Labels: high-availability
> Fix For: 3.0.0-alpha2
>
> Attachments: HDFS-11180-branch-2.01.patch, HDFS-11180.00.patch, 
> HDFS-11180.01.patch, HDFS-11180.02.patch, HDFS-11180.03.patch, 
> HDFS-11180.04.patch, jstack.log
>
>
> It is happening due to metrics getting updated at the same time when failover 
> is happening. Please find attached jstack at that point of time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11180) Intermittent deadlock in NameNode when failover happens.

2016-11-30 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-11180:
-
Attachment: HDFS-11180-branch-2.01.patch

Rebased for branch-2.
I'll commit 04 patch to trunk tonight JST if there is no objection.

> Intermittent deadlock in NameNode when failover happens.
> 
>
> Key: HDFS-11180
> URL: https://issues.apache.org/jira/browse/HDFS-11180
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Abhishek Modi
>Assignee: Akira Ajisaka
>Priority: Blocker
>  Labels: high-availability
> Attachments: HDFS-11180-branch-2.01.patch, HDFS-11180.00.patch, 
> HDFS-11180.01.patch, HDFS-11180.02.patch, HDFS-11180.03.patch, 
> HDFS-11180.04.patch, jstack.log
>
>
> It is happening due to metrics getting updated at the same time when failover 
> is happening. Please find attached jstack at that point of time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11180) Intermittent deadlock in NameNode when failover happens.

2016-11-30 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-11180:
-
Release Note:   (was: 
After this fix, the value of the following metrics/JMX can be incorrect.
* LastWrittenTransactionId (FSNameSystem)
* TotalSyncCount (FSNameSystem)
* TransactionsSinceLastLogRoll (FSNameSystem)
* TransactionsSinceLastCheckpoint (FSNameSystem)
* NameJournalStatus (NameNodeMXBean)
* JournalTransactionInfo (NameNodeMXBean)

This is due to the removal of holding lock of FSEditLog when acquiring the 
value of the metrics/JMX.)

> Intermittent deadlock in NameNode when failover happens.
> 
>
> Key: HDFS-11180
> URL: https://issues.apache.org/jira/browse/HDFS-11180
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Abhishek Modi
>Assignee: Akira Ajisaka
>Priority: Blocker
>  Labels: high-availability
> Attachments: HDFS-11180.00.patch, HDFS-11180.01.patch, 
> HDFS-11180.02.patch, HDFS-11180.03.patch, HDFS-11180.04.patch, jstack.log
>
>
> It is happening due to metrics getting updated at the same time when failover 
> is happening. Please find attached jstack at that point of time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11180) Intermittent deadlock in NameNode when failover happens.

2016-11-30 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-11180:
-
Release Note: 

After this fix, the value of the following metrics/JMX can be incorrect.
* LastWrittenTransactionId (FSNameSystem)
* TotalSyncCount (FSNameSystem)
* TransactionsSinceLastLogRoll (FSNameSystem)
* TransactionsSinceLastCheckpoint (FSNameSystem)
* NameJournalStatus (NameNodeMXBean)
* JournalTransactionInfo (NameNodeMXBean)

This is due to the removal of holding lock of FSEditLog when acquiring the 
value of the metrics/JMX.
 Component/s: namenode

> Intermittent deadlock in NameNode when failover happens.
> 
>
> Key: HDFS-11180
> URL: https://issues.apache.org/jira/browse/HDFS-11180
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Abhishek Modi
>Assignee: Akira Ajisaka
>Priority: Blocker
>  Labels: high-availability
> Attachments: HDFS-11180.00.patch, HDFS-11180.01.patch, 
> HDFS-11180.02.patch, HDFS-11180.03.patch, HDFS-11180.04.patch, jstack.log
>
>
> It is happening due to metrics getting updated at the same time when failover 
> is happening. Please find attached jstack at that point of time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11180) Intermittent deadlock in NameNode when failover happens.

2016-11-29 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-11180:
-
Attachment: HDFS-11180.04.patch

04 patch:
* Ignored findbugs warnings.
* There are some metrics using the newly unsynchronized FSEditLog methods and 
the metrics are used by non-metrics code. I created some new methods using 
synchronized FSEditLog methods for non-metrics code.
* Removed try-with-resources statement for regression test. This makes easier 
for backporting to branch-2.6/2.7.
* Fixed some issues, such as calling Preconditions.checkState in unsynchronized 
method.

> Intermittent deadlock in NameNode when failover happens.
> 
>
> Key: HDFS-11180
> URL: https://issues.apache.org/jira/browse/HDFS-11180
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Abhishek Modi
>Assignee: Akira Ajisaka
>Priority: Blocker
>  Labels: high-availability
> Attachments: HDFS-11180.00.patch, HDFS-11180.01.patch, 
> HDFS-11180.02.patch, HDFS-11180.03.patch, HDFS-11180.04.patch, jstack.log
>
>
> It is happening due to metrics getting updated at the same time when failover 
> is happening. Please find attached jstack at that point of time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11180) Intermittent deadlock in NameNode when failover happens.

2016-11-28 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-11180:
-
Attachment: HDFS-11180.03.patch

03 patch:
* Added a regression test
* Updated FSEditLog and some metric related code to pass the regression test
* TODO: Ignore findbugs warnings

> Intermittent deadlock in NameNode when failover happens.
> 
>
> Key: HDFS-11180
> URL: https://issues.apache.org/jira/browse/HDFS-11180
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Abhishek Modi
>Assignee: Akira Ajisaka
>Priority: Blocker
>  Labels: high-availability
> Attachments: HDFS-11180.00.patch, HDFS-11180.01.patch, 
> HDFS-11180.02.patch, HDFS-11180.03.patch, jstack.log
>
>
> It is happening due to metrics getting updated at the same time when failover 
> is happening. Please find attached jstack at that point of time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11180) Intermittent deadlock in NameNode when failover happens.

2016-11-28 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-11180:
-
Priority: Blocker  (was: Major)

> Intermittent deadlock in NameNode when failover happens.
> 
>
> Key: HDFS-11180
> URL: https://issues.apache.org/jira/browse/HDFS-11180
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Abhishek Modi
>Assignee: Akira Ajisaka
>Priority: Blocker
>  Labels: high-availability
> Attachments: HDFS-11180.00.patch, HDFS-11180.01.patch, 
> HDFS-11180.02.patch, jstack.log
>
>
> It is happening due to metrics getting updated at the same time when failover 
> is happening. Please find attached jstack at that point of time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11180) Intermittent deadlock in NameNode when failover happens.

2016-11-28 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-11180:
-
Assignee: Akira Ajisaka
Target Version/s: 2.8.0, 2.7.4, 3.0.0-alpha2, 2.6.6
  Status: Patch Available  (was: Open)

> Intermittent deadlock in NameNode when failover happens.
> 
>
> Key: HDFS-11180
> URL: https://issues.apache.org/jira/browse/HDFS-11180
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Abhishek Modi
>Assignee: Akira Ajisaka
>  Labels: high-availability
> Attachments: HDFS-11180.00.patch, HDFS-11180.01.patch, 
> HDFS-11180.02.patch, jstack.log
>
>
> It is happening due to metrics getting updated at the same time when failover 
> is happening. Please find attached jstack at that point of time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11180) Intermittent deadlock in NameNode when failover happens.

2016-11-28 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-11180:
-
Attachment: HDFS-11180.02.patch

Thanks Abhishek and Kihwal for the comments.
02 patch: 00 patch + make the txids volatile

> Intermittent deadlock in NameNode when failover happens.
> 
>
> Key: HDFS-11180
> URL: https://issues.apache.org/jira/browse/HDFS-11180
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Abhishek Modi
>  Labels: high-availability
> Attachments: HDFS-11180.00.patch, HDFS-11180.01.patch, 
> HDFS-11180.02.patch, jstack.log
>
>
> It is happening due to metrics getting updated at the same time when failover 
> is happening. Please find attached jstack at that point of time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11180) Intermittent deadlock in NameNode when failover happens.

2016-11-28 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-11180:
-
Attachment: HDFS-11180.01.patch

01: Use ReentrantReadWriteLock.

> Intermittent deadlock in NameNode when failover happens.
> 
>
> Key: HDFS-11180
> URL: https://issues.apache.org/jira/browse/HDFS-11180
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Abhishek Modi
>  Labels: high-availability
> Attachments: HDFS-11180.00.patch, HDFS-11180.01.patch, jstack.log
>
>
> It is happening due to metrics getting updated at the same time when failover 
> is happening. Please find attached jstack at that point of time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11180) Intermittent deadlock in NameNode when failover happens.

2016-11-27 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-11180:
-
Attachment: HDFS-11180.00.patch

Attaching a sample patch.
I'm thinking removing synchronized from FSNameSystem.getLastWrittenTxId and 
FSNameSystem.getCurSegmentTxId seems problematic, so I created other methods 
without locking and use them for metrics.

> Intermittent deadlock in NameNode when failover happens.
> 
>
> Key: HDFS-11180
> URL: https://issues.apache.org/jira/browse/HDFS-11180
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Abhishek Modi
>  Labels: high-availability
> Attachments: HDFS-11180.00.patch, jstack.log
>
>
> It is happening due to metrics getting updated at the same time when failover 
> is happening. Please find attached jstack at that point of time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11180) Intermittent deadlock in NameNode when failover happens.

2016-11-27 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-11180:
-
Affects Version/s: 2.6.0

> Intermittent deadlock in NameNode when failover happens.
> 
>
> Key: HDFS-11180
> URL: https://issues.apache.org/jira/browse/HDFS-11180
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Abhishek Modi
>  Labels: high-availability
> Attachments: jstack.log
>
>
> It is happening due to metrics getting updated at the same time when failover 
> is happening. Please find attached jstack at that point of time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11180) Intermittent deadlock in NameNode when failover happens.

2016-11-27 Thread Abhishek Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated HDFS-11180:
-
Attachment: jstack.log

> Intermittent deadlock in NameNode when failover happens.
> 
>
> Key: HDFS-11180
> URL: https://issues.apache.org/jira/browse/HDFS-11180
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Abhishek Modi
>  Labels: high-availability
> Attachments: jstack.log
>
>
> It is happening due to metrics getting updated at the same time when failover 
> is happening. Please find attached jstack at that point of time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org