[jira] [Updated] (HDFS-8925) Move BlockReader to hdfs-client

2015-08-19 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-8925:

Description: This jira tracks the effort of moving the {{BlockReader}} 
class into the hdfs-client module.  (was: This jira tracks the effort of moving 
the {{DfsClientConf}} class into the hdfs-client module.)

 Move BlockReader to hdfs-client
 ---

 Key: HDFS-8925
 URL: https://issues.apache.org/jira/browse/HDFS-8925
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: build
Reporter: Mingliang Liu
Assignee: Mingliang Liu
 Fix For: 2.8.0


 This jira tracks the effort of moving the {{BlockReader}} class into the 
 hdfs-client module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8925) Move BlockReader to hdfs-client

2015-08-19 Thread Mingliang Liu (JIRA)
Mingliang Liu created HDFS-8925:
---

 Summary: Move BlockReader to hdfs-client
 Key: HDFS-8925
 URL: https://issues.apache.org/jira/browse/HDFS-8925
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: build
Reporter: Mingliang Liu
Assignee: Mingliang Liu
 Fix For: 2.8.0


This jira tracks the effort of moving the {{DfsClientConf}} class into the 
hdfs-client module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8925) Move BlockReader to hdfs-client

2015-08-19 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-8925:

Hadoop Flags:   (was: Reviewed)

 Move BlockReader to hdfs-client
 ---

 Key: HDFS-8925
 URL: https://issues.apache.org/jira/browse/HDFS-8925
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: build
Reporter: Mingliang Liu
Assignee: Mingliang Liu
 Fix For: 2.8.0


 This jira tracks the effort of moving the {{DfsClientConf}} class into the 
 hdfs-client module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8934) Move ShortCircuitShm to hdfs-client

2015-08-22 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-8934:

Attachment: HDFS-8934.001.patch

Hi [~wheat9], thanks for your insightful feedback!

For the first two comments, I fixed them in the v1 patch. Meanwhile, I checked 
the trailing whitespace and they were not caused by my change. I fixed them 
anyway and hopefully Jenkins won't report any more trailing white-spaces.

I'll open new jiras for the last two comments, and bring you into the watch 
list when they're ready.

 Move ShortCircuitShm to hdfs-client
 ---

 Key: HDFS-8934
 URL: https://issues.apache.org/jira/browse/HDFS-8934
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: build
Reporter: Mingliang Liu
Assignee: Mingliang Liu
 Attachments: HDFS-8934.000.patch, HDFS-8934.001.patch


 This jira tracks the effort of moving the {{ShortCircuitShm}} class into the 
 hdfs-client module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8938) Refactor BlockManager in blockmanagement

2015-08-22 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-8938:

Attachment: HDFS-8938.001.patch

The v1 patch checks non-null scheduled {{ReplicationWork}} before adding it to 
the {{work}} list in {{computeReplicationWork}} method.

 Refactor BlockManager in blockmanagement
 

 Key: HDFS-8938
 URL: https://issues.apache.org/jira/browse/HDFS-8938
 Project: Hadoop HDFS
  Issue Type: Task
  Components: build
Reporter: Mingliang Liu
Assignee: Mingliang Liu
 Attachments: HDFS-8938.000.patch, HDFS-8938.001.patch


 This lira tracks the effort of refactoring the {{BlockManager}} in 
 {{hdfs.server.blockmanagement}} package.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8938) Refactor BlockManager in blockmanagement

2015-08-21 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-8938:

Attachment: HDFS-8938.000.patch

The v0 version patch:
* Moves the inner classes {{BlockToMarkCorrupt}} and {{ReplicationWork}} to 
separate files in the same package
* Extract code sections to schedule replication and validate replication work 
in method {{computeReplicationWorkForBlocks}} to respective helper methods

 Refactor BlockManager in blockmanagement
 

 Key: HDFS-8938
 URL: https://issues.apache.org/jira/browse/HDFS-8938
 Project: Hadoop HDFS
  Issue Type: Task
  Components: build
Reporter: Mingliang Liu
Assignee: Mingliang Liu
 Attachments: HDFS-8938.000.patch


 This lira tracks the effort of refactoring the {{BlockManager}} in 
 {{hdfs.server.blockmanagement}} package.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8938) Refactor BlockManager in blockmanagement

2015-08-21 Thread Mingliang Liu (JIRA)
Mingliang Liu created HDFS-8938:
---

 Summary: Refactor BlockManager in blockmanagement
 Key: HDFS-8938
 URL: https://issues.apache.org/jira/browse/HDFS-8938
 Project: Hadoop HDFS
  Issue Type: Task
  Components: build
Reporter: Mingliang Liu
Assignee: Mingliang Liu


This lira tracks the effort of refactoring the {{BlockManager}} in 
{{hdfs.server.blockmanagement}} package.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8948) Fix failing tests in TestPread and TestReplaceDatanodeOnFailure

2015-08-24 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-8948:

Description: The two failing unit tests are {{}}  (was: This jira tracks 
the effort of moving the {{ShortCircuitShm}} class into the hdfs-client module.)

 Fix failing tests in TestPread and TestReplaceDatanodeOnFailure
 ---

 Key: HDFS-8948
 URL: https://issues.apache.org/jira/browse/HDFS-8948
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: build
Reporter: Mingliang Liu
Assignee: Mingliang Liu
 Fix For: 2.8.0


 The two failing unit tests are {{}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8948) Fix failing tests in TestPread and TestReplaceDatanodeOnFailure

2015-08-24 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-8948:

Description: 
The two failing unit tests are {{java.lang.ClassCastException: 
org.slf4j.impl.Log4jLoggerAdapter cannot be cast to 
org.apache.commons.logging.impl.Log4JLogger}}. This is because of the logger 
change from log4j to slf4j in ShortCircuitShm refactoring code.

This lira tracks the effort of fixing the test failures. The goal is to make 
the tests pass and dumping all the log information for debugging. 

  was:The two failing unit tests are {{}}


 Fix failing tests in TestPread and TestReplaceDatanodeOnFailure
 ---

 Key: HDFS-8948
 URL: https://issues.apache.org/jira/browse/HDFS-8948
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: build
Reporter: Mingliang Liu
Assignee: Mingliang Liu
 Fix For: 2.8.0


 The two failing unit tests are {{java.lang.ClassCastException: 
 org.slf4j.impl.Log4jLoggerAdapter cannot be cast to 
 org.apache.commons.logging.impl.Log4JLogger}}. This is because of the logger 
 change from log4j to slf4j in ShortCircuitShm refactoring code.
 This lira tracks the effort of fixing the test failures. The goal is to make 
 the tests pass and dumping all the log information for debugging. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8948) Fix failing tests in TestPread and TestReplaceDatanodeOnFailure

2015-08-24 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-8948:

   Flags: Patch
Hadoop Flags:   (was: Reviewed)

 Fix failing tests in TestPread and TestReplaceDatanodeOnFailure
 ---

 Key: HDFS-8948
 URL: https://issues.apache.org/jira/browse/HDFS-8948
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: build
Reporter: Mingliang Liu
Assignee: Mingliang Liu
 Fix For: 2.8.0

 Attachments: HDFS-8948.000.patch


 The two failing unit tests are {{java.lang.ClassCastException: 
 org.slf4j.impl.Log4jLoggerAdapter cannot be cast to 
 org.apache.commons.logging.impl.Log4JLogger}}. This is because of the logger 
 change from log4j to slf4j in ShortCircuitShm refactoring code.
 This jira tracks the effort of fixing the test failures. The goal is to make 
 the tests pass and to dump all the log information for debugging. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8948) Fix failing tests in TestPread and TestReplaceDatanodeOnFailure

2015-08-24 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-8948:

Status: Patch Available  (was: Open)

 Fix failing tests in TestPread and TestReplaceDatanodeOnFailure
 ---

 Key: HDFS-8948
 URL: https://issues.apache.org/jira/browse/HDFS-8948
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: build
Reporter: Mingliang Liu
Assignee: Mingliang Liu
 Fix For: 2.8.0

 Attachments: HDFS-8948.000.patch


 The two failing unit tests are {{java.lang.ClassCastException: 
 org.slf4j.impl.Log4jLoggerAdapter cannot be cast to 
 org.apache.commons.logging.impl.Log4JLogger}}. This is because of the logger 
 change from log4j to slf4j in ShortCircuitShm refactoring code.
 This jira tracks the effort of fixing the test failures. The goal is to make 
 the tests pass and to dump all the log information for debugging. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8948) Fix failing tests in TestPread and TestReplaceDatanodeOnFailure

2015-08-24 Thread Mingliang Liu (JIRA)
Mingliang Liu created HDFS-8948:
---

 Summary: Fix failing tests in TestPread and 
TestReplaceDatanodeOnFailure
 Key: HDFS-8948
 URL: https://issues.apache.org/jira/browse/HDFS-8948
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: build
Reporter: Mingliang Liu
Assignee: Mingliang Liu
 Fix For: 2.8.0


This jira tracks the effort of moving the {{ShortCircuitShm}} class into the 
hdfs-client module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8948) Fix failing tests in TestPread and TestReplaceDatanodeOnFailure

2015-08-24 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-8948:

Description: 
The two failing unit tests are {{java.lang.ClassCastException: 
org.slf4j.impl.Log4jLoggerAdapter cannot be cast to 
org.apache.commons.logging.impl.Log4JLogger}}. This is because of the logger 
change from log4j to slf4j in ShortCircuitShm refactoring code.

This jira tracks the effort of fixing the test failures. The goal is to make 
the tests pass and to dump all the log information for debugging. 

  was:
The two failing unit tests are {{java.lang.ClassCastException: 
org.slf4j.impl.Log4jLoggerAdapter cannot be cast to 
org.apache.commons.logging.impl.Log4JLogger}}. This is because of the logger 
change from log4j to slf4j in ShortCircuitShm refactoring code.

This jira tracks the effort of fixing the test failures. The goal is to make 
the tests pass and dump all the log information for debugging. 


 Fix failing tests in TestPread and TestReplaceDatanodeOnFailure
 ---

 Key: HDFS-8948
 URL: https://issues.apache.org/jira/browse/HDFS-8948
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: build
Reporter: Mingliang Liu
Assignee: Mingliang Liu
 Fix For: 2.8.0


 The two failing unit tests are {{java.lang.ClassCastException: 
 org.slf4j.impl.Log4jLoggerAdapter cannot be cast to 
 org.apache.commons.logging.impl.Log4JLogger}}. This is because of the logger 
 change from log4j to slf4j in ShortCircuitShm refactoring code.
 This jira tracks the effort of fixing the test failures. The goal is to make 
 the tests pass and to dump all the log information for debugging. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8948) Fix failing tests in TestPread and TestReplaceDatanodeOnFailure

2015-08-24 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-8948:

Description: 
The two failing unit tests are {{java.lang.ClassCastException: 
org.slf4j.impl.Log4jLoggerAdapter cannot be cast to 
org.apache.commons.logging.impl.Log4JLogger}}. This is because of the logger 
change from log4j to slf4j in ShortCircuitShm refactoring code.

This jira tracks the effort of fixing the test failures. The goal is to make 
the tests pass and dump all the log information for debugging. 

  was:
The two failing unit tests are {{java.lang.ClassCastException: 
org.slf4j.impl.Log4jLoggerAdapter cannot be cast to 
org.apache.commons.logging.impl.Log4JLogger}}. This is because of the logger 
change from log4j to slf4j in ShortCircuitShm refactoring code.

This jira tracks the effort of fixing the test failures. The goal is to make 
the tests pass and dumping all the log information for debugging. 


 Fix failing tests in TestPread and TestReplaceDatanodeOnFailure
 ---

 Key: HDFS-8948
 URL: https://issues.apache.org/jira/browse/HDFS-8948
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: build
Reporter: Mingliang Liu
Assignee: Mingliang Liu
 Fix For: 2.8.0


 The two failing unit tests are {{java.lang.ClassCastException: 
 org.slf4j.impl.Log4jLoggerAdapter cannot be cast to 
 org.apache.commons.logging.impl.Log4JLogger}}. This is because of the logger 
 change from log4j to slf4j in ShortCircuitShm refactoring code.
 This jira tracks the effort of fixing the test failures. The goal is to make 
 the tests pass and dump all the log information for debugging. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8948) Fix failing tests in TestPread and TestReplaceDatanodeOnFailure

2015-08-24 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-8948:

Description: 
The two failing unit tests are {{java.lang.ClassCastException: 
org.slf4j.impl.Log4jLoggerAdapter cannot be cast to 
org.apache.commons.logging.impl.Log4JLogger}}. This is because of the logger 
change from log4j to slf4j in ShortCircuitShm refactoring code.

This jira tracks the effort of fixing the test failures. The goal is to make 
the tests pass and dumping all the log information for debugging. 

  was:
The two failing unit tests are {{java.lang.ClassCastException: 
org.slf4j.impl.Log4jLoggerAdapter cannot be cast to 
org.apache.commons.logging.impl.Log4JLogger}}. This is because of the logger 
change from log4j to slf4j in ShortCircuitShm refactoring code.

This lira tracks the effort of fixing the test failures. The goal is to make 
the tests pass and dumping all the log information for debugging. 


 Fix failing tests in TestPread and TestReplaceDatanodeOnFailure
 ---

 Key: HDFS-8948
 URL: https://issues.apache.org/jira/browse/HDFS-8948
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: build
Reporter: Mingliang Liu
Assignee: Mingliang Liu
 Fix For: 2.8.0


 The two failing unit tests are {{java.lang.ClassCastException: 
 org.slf4j.impl.Log4jLoggerAdapter cannot be cast to 
 org.apache.commons.logging.impl.Log4JLogger}}. This is because of the logger 
 change from log4j to slf4j in ShortCircuitShm refactoring code.
 This jira tracks the effort of fixing the test failures. The goal is to make 
 the tests pass and dumping all the log information for debugging. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8934) Move ShortCircuitShm to hdfs-client

2015-08-24 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709792#comment-14709792
 ] 

Mingliang Liu commented on HDFS-8934:
-

Thanks [~wheat9] for reviewing this code.
Thanks [~hitliuyi] for pointing the failing test out. Please refer to 
[HDFS-8948|https://issues.apache.org/jira/browse/HDFS-8948]

 Move ShortCircuitShm to hdfs-client
 ---

 Key: HDFS-8934
 URL: https://issues.apache.org/jira/browse/HDFS-8934
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: build
Reporter: Mingliang Liu
Assignee: Mingliang Liu
 Fix For: 2.8.0

 Attachments: HDFS-8934.000.patch, HDFS-8934.001.patch


 This jira tracks the effort of moving the {{ShortCircuitShm}} class into the 
 hdfs-client module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8948) Fix failing tests in TestPread and TestReplaceDatanodeOnFailure

2015-08-24 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-8948:

Attachment: HDFS-8948.000.patch

The v0 patch uses the GenericTestUtils to set the log level.

 Fix failing tests in TestPread and TestReplaceDatanodeOnFailure
 ---

 Key: HDFS-8948
 URL: https://issues.apache.org/jira/browse/HDFS-8948
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: build
Reporter: Mingliang Liu
Assignee: Mingliang Liu
 Fix For: 2.8.0

 Attachments: HDFS-8948.000.patch


 The two failing unit tests are {{java.lang.ClassCastException: 
 org.slf4j.impl.Log4jLoggerAdapter cannot be cast to 
 org.apache.commons.logging.impl.Log4JLogger}}. This is because of the logger 
 change from log4j to slf4j in ShortCircuitShm refactoring code.
 This jira tracks the effort of fixing the test failures. The goal is to make 
 the tests pass and to dump all the log information for debugging. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8934) Move ShortCircuitShm to hdfs-client

2015-08-21 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-8934:

Attachment: HDFS-8934.000.patch

The v0 patch:
* Moves the {{ShortCircuitShm}}, {{DfsClientShm}}, {{DfsClientShmManager}} and 
their dependent source files to hdfs-client
* Moves some helper methods used by {{ShortCircuitShm}} from {{PBHelper}} to a 
new file {{PBHelperClient}}
* Replaces the code that uses log4j with slfj4

 Move ShortCircuitShm to hdfs-client
 ---

 Key: HDFS-8934
 URL: https://issues.apache.org/jira/browse/HDFS-8934
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: build
Reporter: Mingliang Liu
Assignee: Mingliang Liu
 Fix For: 2.8.0

 Attachments: HDFS-8934.000.patch


 This jira tracks the effort of moving the {{ShortCircuitShm}} class into the 
 hdfs-client module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8934) Move ShortCircuitShm to hdfs-client

2015-08-21 Thread Mingliang Liu (JIRA)
Mingliang Liu created HDFS-8934:
---

 Summary: Move ShortCircuitShm to hdfs-client
 Key: HDFS-8934
 URL: https://issues.apache.org/jira/browse/HDFS-8934
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: build
Reporter: Mingliang Liu
Assignee: Mingliang Liu
 Fix For: 2.8.0


This jira tracks the effort of moving the {{BlockReader}} class into the 
hdfs-client module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8934) Move ShortCircuitShm to hdfs-client

2015-08-21 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-8934:

Description: This jira tracks the effort of moving the {{ShortCircuitShm}} 
class into the hdfs-client module.  (was: This jira tracks the effort of moving 
the {{BlockReader}} class into the hdfs-client module.)

 Move ShortCircuitShm to hdfs-client
 ---

 Key: HDFS-8934
 URL: https://issues.apache.org/jira/browse/HDFS-8934
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: build
Reporter: Mingliang Liu
Assignee: Mingliang Liu
 Fix For: 2.8.0


 This jira tracks the effort of moving the {{ShortCircuitShm}} class into the 
 hdfs-client module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8934) Move ShortCircuitShm to hdfs-client

2015-08-21 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-8934:

Status: Patch Available  (was: Open)

 Move ShortCircuitShm to hdfs-client
 ---

 Key: HDFS-8934
 URL: https://issues.apache.org/jira/browse/HDFS-8934
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: build
Reporter: Mingliang Liu
Assignee: Mingliang Liu
 Fix For: 2.8.0


 This jira tracks the effort of moving the {{ShortCircuitShm}} class into the 
 hdfs-client module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8803) Move DfsClientConf to hdfs-client

2015-08-19 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-8803:

Attachment: HDFS-8803.003.patch

This patch rebases from the branch trunk commit 71aedfa.

 Move DfsClientConf to hdfs-client
 -

 Key: HDFS-8803
 URL: https://issues.apache.org/jira/browse/HDFS-8803
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: build
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-8803.000.patch, HDFS-8803.001.patch, 
 HDFS-8803.002.patch, HDFS-8803.003.patch


 This jira tracks the effort of moving the {{DfsClientConf}} class into the 
 hdfs-client module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8938) Refactor BlockManager in blockmanagement

2015-08-22 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-8938:

Attachment: HDFS-8938.002.patch

The v2 patch rebases the latest {{trunk}} branch and resolves conflicts.

 Refactor BlockManager in blockmanagement
 

 Key: HDFS-8938
 URL: https://issues.apache.org/jira/browse/HDFS-8938
 Project: Hadoop HDFS
  Issue Type: Task
  Components: build
Reporter: Mingliang Liu
Assignee: Mingliang Liu
 Attachments: HDFS-8938.000.patch, HDFS-8938.001.patch, 
 HDFS-8938.002.patch


 This lira tracks the effort of refactoring the {{BlockManager}} in 
 {{hdfs.server.blockmanagement}} package.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8938) Refactor BlockManager in blockmanagement

2015-08-25 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-8938:

Attachment: HDFS-8938.005.patch

 Refactor BlockManager in blockmanagement
 

 Key: HDFS-8938
 URL: https://issues.apache.org/jira/browse/HDFS-8938
 Project: Hadoop HDFS
  Issue Type: Task
  Components: build
Reporter: Mingliang Liu
Assignee: Mingliang Liu
 Attachments: HDFS-8938.000.patch, HDFS-8938.001.patch, 
 HDFS-8938.002.patch, HDFS-8938.003.patch, HDFS-8938.004.patch, 
 HDFS-8938.005.patch


 This lira tracks the effort of refactoring the {{BlockManager}} in 
 {{hdfs.server.blockmanagement}} package.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8938) Refactor BlockManager in blockmanagement

2015-08-24 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-8938:

Attachment: HDFS-8938.003.patch

The v3 patch rebases the trunk branch and fixes the findbugs warning. We don't 
need new tests because no new code is brought in this patch. This is a 
refactoring work.

 Refactor BlockManager in blockmanagement
 

 Key: HDFS-8938
 URL: https://issues.apache.org/jira/browse/HDFS-8938
 Project: Hadoop HDFS
  Issue Type: Task
  Components: build
Reporter: Mingliang Liu
Assignee: Mingliang Liu
 Attachments: HDFS-8938.000.patch, HDFS-8938.001.patch, 
 HDFS-8938.002.patch, HDFS-8938.003.patch


 This lira tracks the effort of refactoring the {{BlockManager}} in 
 {{hdfs.server.blockmanagement}} package.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8948) Fix failing tests in TestPread and TestReplaceDatanodeOnFailure

2015-08-24 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-8948:

Target Version/s: 2.8.0

 Fix failing tests in TestPread and TestReplaceDatanodeOnFailure
 ---

 Key: HDFS-8948
 URL: https://issues.apache.org/jira/browse/HDFS-8948
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: build
Reporter: Mingliang Liu
Assignee: Mingliang Liu
 Attachments: HDFS-8948.000.patch


 The two failing unit tests are {{java.lang.ClassCastException: 
 org.slf4j.impl.Log4jLoggerAdapter cannot be cast to 
 org.apache.commons.logging.impl.Log4JLogger}}. This is because of the logger 
 change from log4j to slf4j in ShortCircuitShm refactoring code.
 This jira tracks the effort of fixing the test failures. The goal is to make 
 the tests pass and to dump all the log information for debugging. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8948) Fix failing tests in TestPread and TestReplaceDatanodeOnFailure

2015-08-24 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-8948:

Fix Version/s: (was: 2.8.0)

 Fix failing tests in TestPread and TestReplaceDatanodeOnFailure
 ---

 Key: HDFS-8948
 URL: https://issues.apache.org/jira/browse/HDFS-8948
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: build
Reporter: Mingliang Liu
Assignee: Mingliang Liu
 Attachments: HDFS-8948.000.patch


 The two failing unit tests are {{java.lang.ClassCastException: 
 org.slf4j.impl.Log4jLoggerAdapter cannot be cast to 
 org.apache.commons.logging.impl.Log4JLogger}}. This is because of the logger 
 change from log4j to slf4j in ShortCircuitShm refactoring code.
 This jira tracks the effort of fixing the test failures. The goal is to make 
 the tests pass and to dump all the log information for debugging. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8951) Move shortcircuit to hdfs-client

2015-08-24 Thread Mingliang Liu (JIRA)
Mingliang Liu created HDFS-8951:
---

 Summary: Move shortcircuit to hdfs-client
 Key: HDFS-8951
 URL: https://issues.apache.org/jira/browse/HDFS-8951
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: build
Reporter: Mingliang Liu
Assignee: Mingliang Liu


This jira tracks the effort of moving the {{BlockReader}} class into the 
hdfs-client module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8951) Move shortcircuit to hdfs-client

2015-08-24 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-8951:

Description: This jira tracks the effort of moving the {{shortcircuit}} 
package into the hdfs-client module.  (was: This jira tracks the effort of 
moving the {{BlockReader}} class into the hdfs-client module.)

 Move shortcircuit to hdfs-client
 

 Key: HDFS-8951
 URL: https://issues.apache.org/jira/browse/HDFS-8951
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: build
Reporter: Mingliang Liu
Assignee: Mingliang Liu

 This jira tracks the effort of moving the {{shortcircuit}} package into the 
 hdfs-client module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8951) Move shortcircuit to hdfs-client

2015-08-24 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-8951:

Attachment: HDFS-8951.000.patch

The v0 patch moves all classes in {{hdfs/shortcircuit}} to 
{{hdfs-client/shortcircuit}}. Specially,
* Replaces the {{log4j}} log with {{slf4j}} logger in these classes
* Move {{isLocalAddress()}} method from {{hdfs/DFSClient}} to 
{{hdfs-client/DFSUtilClient}}
* Remove {{HdfsConfiguration}} dependency in {{BlockMetadataHeader}}

 Move shortcircuit to hdfs-client
 

 Key: HDFS-8951
 URL: https://issues.apache.org/jira/browse/HDFS-8951
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: build
Reporter: Mingliang Liu
Assignee: Mingliang Liu
 Attachments: HDFS-8951.000.patch


 This jira tracks the effort of moving the {{shortcircuit}} package into the 
 hdfs-client module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8951) Move shortcircuit to hdfs-client

2015-08-24 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-8951:

Status: Patch Available  (was: Open)

 Move shortcircuit to hdfs-client
 

 Key: HDFS-8951
 URL: https://issues.apache.org/jira/browse/HDFS-8951
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: build
Reporter: Mingliang Liu
Assignee: Mingliang Liu
 Attachments: HDFS-8951.000.patch


 This jira tracks the effort of moving the {{shortcircuit}} package into the 
 hdfs-client module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8925) Move BlockReader to hdfs-client

2015-08-24 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-8925:

Target Version/s: 2.8.0
   Fix Version/s: (was: 2.8.0)

 Move BlockReader to hdfs-client
 ---

 Key: HDFS-8925
 URL: https://issues.apache.org/jira/browse/HDFS-8925
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: build
Reporter: Mingliang Liu
Assignee: Mingliang Liu

 This jira tracks the effort of moving the {{BlockReader}} class into the 
 hdfs-client module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9241) HDFS clients can't construct HdfsConfiguration instances

2015-10-22 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9241:

Status: Open  (was: Patch Available)

> HDFS clients can't construct HdfsConfiguration instances
> 
>
> Key: HDFS-9241
> URL: https://issues.apache.org/jira/browse/HDFS-9241
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Reporter: Steve Loughran
>Assignee: Mingliang Liu
> Attachments: HDFS-9241.000.patch, HDFS-9241.001.patch, 
> HDFS-9241.002.patch, HDFS-9241.003.patch
>
>
> the changes for the hdfs client classpath make instantiating 
> {{HdfsConfiguration}} from the client impossible; it only lives server side. 
> This breaks any app which creates one.
> I know people will look at the {{@Private}} tag and say "don't do that then", 
> but it's worth considering precisely why I, at least, do this: it's the only 
> way to guarantee that the hdfs-default and hdfs-site resources get on the 
> classpath, including all the security settings. It's precisely the use case 
> which {{HdfsConfigurationLoader.init();}} offers internally to the hdfs code.
> What am I meant to do now? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9241) HDFS clients can't construct HdfsConfiguration instances

2015-10-22 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9241:

Status: Patch Available  (was: Open)

> HDFS clients can't construct HdfsConfiguration instances
> 
>
> Key: HDFS-9241
> URL: https://issues.apache.org/jira/browse/HDFS-9241
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Reporter: Steve Loughran
>Assignee: Mingliang Liu
> Attachments: HDFS-9241.000.patch, HDFS-9241.001.patch, 
> HDFS-9241.002.patch, HDFS-9241.003.patch
>
>
> the changes for the hdfs client classpath make instantiating 
> {{HdfsConfiguration}} from the client impossible; it only lives server side. 
> This breaks any app which creates one.
> I know people will look at the {{@Private}} tag and say "don't do that then", 
> but it's worth considering precisely why I, at least, do this: it's the only 
> way to guarantee that the hdfs-default and hdfs-site resources get on the 
> classpath, including all the security settings. It's precisely the use case 
> which {{HdfsConfigurationLoader.init();}} offers internally to the hdfs code.
> What am I meant to do now? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9304) Add HdfsClientConfigKeys class to TestHdfsConfigFields#configurationClasses

2015-10-25 Thread Mingliang Liu (JIRA)
Mingliang Liu created HDFS-9304:
---

 Summary: Add HdfsClientConfigKeys class to 
TestHdfsConfigFields#configurationClasses
 Key: HDFS-9304
 URL: https://issues.apache.org/jira/browse/HDFS-9304
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Mingliang Liu
Assignee: Mingliang Liu


*tl;dr* Since {{HdfsClientConfigKeys}} holds client side config keys, we need 
to add this class to {{TestHdfsConfigFields#configurationClasses}}.

Now the {{TestHdfsConfigFields}} unit test passes because {{DFSConfigKeys}} 
still contains all the client side config keys, though marked @deprecated. As 
we add new client config keys (e.g. [HDFS-9259]), the unit test will fail with 
the following error:
{quote}
hdfs-default.xml has 1 properties missing in  class 
org.apache.hadoop.hdfs.DFSConfigKeys
{quote}

If the logic is to make the {{DFSConfigKeys}} and {{HdfsClientConfigKeys}} 
together cover all config keys in {{hdfs-default.xml}}, we need to put both of 
them in {{TestHdfsConfigFields#configurationClasses}}.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9259) Make SO_SNDBUF size configurable at DFSClient side for hdfs write scenario

2015-10-24 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9259:

Attachment: HDFS-9259.000.patch

The v0 patch is the first effort to address this issue, which:
 - Adds a new client side config key {{dfs.client.socket.send.buffer.size}}
 - Makes {{DFSClient}} side socket {{sendBufferSize}} configurable, and 
auto-tunable for non-positive value
 - Adds config key description in the {{hdfs-default.xml}}
 - Adds new unit test {{TestDFSClientSocketSize}} to cover common cases

> Make SO_SNDBUF size configurable at DFSClient side for hdfs write scenario
> --
>
> Key: HDFS-9259
> URL: https://issues.apache.org/jira/browse/HDFS-9259
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Mingliang Liu
> Attachments: HDFS-9259.000.patch
>
>
> We recently found that cross-DC hdfs write could be really slow. Further 
> investigation identified that is due to SendBufferSize and ReceiveBufferSize 
> used for hdfs write. The test ran "hadoop -fs -copyFromLocal" of a 256MB file 
> across DC with different SendBufferSize and ReceiveBufferSize values. The 
> results showed that c much faster than b; b is faster than a.
> a. SendBufferSize=128k, ReceiveBufferSize=128k (hdfs default setting).
> b. SendBufferSize=128K, ReceiveBufferSize=not set(TCP auto tuning).
> c. SendBufferSize=not set, ReceiveBufferSize=not set(TCP auto tuning for both)
> HDFS-8829 has enabled scenario b. We would like to enable scenario c by 
> making SendBufferSize configurable at DFSClient side. Cc: [~cmccabe] [~He 
> Tianyi] [~kanaka] [~vinayrpet].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9259) Make SO_SNDBUF size configurable at DFSClient side for hdfs write scenario

2015-10-24 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9259:

Status: Patch Available  (was: Open)

> Make SO_SNDBUF size configurable at DFSClient side for hdfs write scenario
> --
>
> Key: HDFS-9259
> URL: https://issues.apache.org/jira/browse/HDFS-9259
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Mingliang Liu
> Attachments: HDFS-9259.000.patch
>
>
> We recently found that cross-DC hdfs write could be really slow. Further 
> investigation identified that is due to SendBufferSize and ReceiveBufferSize 
> used for hdfs write. The test ran "hadoop -fs -copyFromLocal" of a 256MB file 
> across DC with different SendBufferSize and ReceiveBufferSize values. The 
> results showed that c much faster than b; b is faster than a.
> a. SendBufferSize=128k, ReceiveBufferSize=128k (hdfs default setting).
> b. SendBufferSize=128K, ReceiveBufferSize=not set(TCP auto tuning).
> c. SendBufferSize=not set, ReceiveBufferSize=not set(TCP auto tuning for both)
> HDFS-8829 has enabled scenario b. We would like to enable scenario c by 
> making SendBufferSize configurable at DFSClient side. Cc: [~cmccabe] [~He 
> Tianyi] [~kanaka] [~vinayrpet].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9245) Fix findbugs warnings in hdfs-nfs/WriteCtx

2015-10-24 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9245:

Attachment: HDFS-9245.001.patch

Per offline discussion with[~wheat9] and  [~gtCarrera9], the {{volatile}} is 
considered premature optimization. The v1 patch simply use the synchronized 
block for accessors. The main observation is that synchronized read is not in 
critical path.

> Fix findbugs warnings in hdfs-nfs/WriteCtx
> --
>
> Key: HDFS-9245
> URL: https://issues.apache.org/jira/browse/HDFS-9245
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9245.000.patch, HDFS-9245.001.patch
>
>
> There are findbugs warnings as follows, brought by [HDFS-9092].
> It seems fine to ignore them by write a filter rule in the 
> {{findbugsExcludeFile.xml}} file. 
> {code:xml}
>  instanceHash="592511935f7cb9e5f97ef4c99a6c46c2" instanceOccurrenceNum="0" 
> priority="2" abbrev="IS" type="IS2_INCONSISTENT_SYNC" cweid="366" 
> instanceOccurrenceMax="0">
> Inconsistent synchronization
> 
> Inconsistent synchronization of 
> org.apache.hadoop.hdfs.nfs.nfs3.WriteCtx.offset; locked 75% of time
> 
> 
>  sourcepath="org/apache/hadoop/hdfs/nfs/nfs3/WriteCtx.java" 
> sourcefile="WriteCtx.java" end="314">
> At WriteCtx.java:[lines 40-314]
> 
> In class org.apache.hadoop.hdfs.nfs.nfs3.WriteCtx
> 
> {code}
> and
> {code:xml}
>  instanceHash="4f3daa339eb819220f26c998369b02fe" instanceOccurrenceNum="0" 
> priority="2" abbrev="IS" type="IS2_INCONSISTENT_SYNC" cweid="366" 
> instanceOccurrenceMax="0">
> Inconsistent synchronization
> 
> Inconsistent synchronization of 
> org.apache.hadoop.hdfs.nfs.nfs3.WriteCtx.originalCount; locked 50% of time
> 
> 
>  sourcepath="org/apache/hadoop/hdfs/nfs/nfs3/WriteCtx.java" 
> sourcefile="WriteCtx.java" end="314">
> At WriteCtx.java:[lines 40-314]
> 
> In class org.apache.hadoop.hdfs.nfs.nfs3.WriteCtx
> 
>  name="originalCount" primary="true" signature="I">
>  sourcepath="org/apache/hadoop/hdfs/nfs/nfs3/WriteCtx.java" 
> sourcefile="WriteCtx.java">
> In WriteCtx.java
> 
> 
> Field org.apache.hadoop.hdfs.nfs.nfs3.WriteCtx.originalCount
> 
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-10-22 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: HDFS-9129.008.patch

The v8 patch is to address one checkstyle warning and one failing unit test in 
v7 patch. Other checkstyle warnings are existing ones and we can not resolve it 
in this patch. Findbugs warnings is unrelated. The other failing unit test 
seems flaky. Working on new unit test {{TestBlockManagerSafeMode}} for next 
patch.

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, 
> HDFS-9129.008.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9245) Fix findbugs warnings in hdfs-nfs/WriteCtx

2015-10-22 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969524#comment-14969524
 ] 

Mingliang Liu commented on HDFS-9245:
-

Per offline discussion with [~brandonli], the {{volatile}} works just fine.

> Fix findbugs warnings in hdfs-nfs/WriteCtx
> --
>
> Key: HDFS-9245
> URL: https://issues.apache.org/jira/browse/HDFS-9245
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9245.000.patch
>
>
> There are findbugs warnings as follows, brought by [HDFS-9092].
> It seems fine to ignore them by write a filter rule in the 
> {{findbugsExcludeFile.xml}} file. 
> {code:xml}
>  instanceHash="592511935f7cb9e5f97ef4c99a6c46c2" instanceOccurrenceNum="0" 
> priority="2" abbrev="IS" type="IS2_INCONSISTENT_SYNC" cweid="366" 
> instanceOccurrenceMax="0">
> Inconsistent synchronization
> 
> Inconsistent synchronization of 
> org.apache.hadoop.hdfs.nfs.nfs3.WriteCtx.offset; locked 75% of time
> 
> 
>  sourcepath="org/apache/hadoop/hdfs/nfs/nfs3/WriteCtx.java" 
> sourcefile="WriteCtx.java" end="314">
> At WriteCtx.java:[lines 40-314]
> 
> In class org.apache.hadoop.hdfs.nfs.nfs3.WriteCtx
> 
> {code}
> and
> {code:xml}
>  instanceHash="4f3daa339eb819220f26c998369b02fe" instanceOccurrenceNum="0" 
> priority="2" abbrev="IS" type="IS2_INCONSISTENT_SYNC" cweid="366" 
> instanceOccurrenceMax="0">
> Inconsistent synchronization
> 
> Inconsistent synchronization of 
> org.apache.hadoop.hdfs.nfs.nfs3.WriteCtx.originalCount; locked 50% of time
> 
> 
>  sourcepath="org/apache/hadoop/hdfs/nfs/nfs3/WriteCtx.java" 
> sourcefile="WriteCtx.java" end="314">
> At WriteCtx.java:[lines 40-314]
> 
> In class org.apache.hadoop.hdfs.nfs.nfs3.WriteCtx
> 
>  name="originalCount" primary="true" signature="I">
>  sourcepath="org/apache/hadoop/hdfs/nfs/nfs3/WriteCtx.java" 
> sourcefile="WriteCtx.java">
> In WriteCtx.java
> 
> 
> Field org.apache.hadoop.hdfs.nfs.nfs3.WriteCtx.originalCount
> 
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9241) HDFS clients can't construct HdfsConfiguration instances

2015-10-22 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9241:

Attachment: HDFS-9241.004.patch

The v2 patch can be built locally (Mac and Linux), and the v3 patch did not 
trigger Jenkins successfully (cancel patch and submit patch won't trigger it 
either).

The v4 patch rebases from trunk.

> HDFS clients can't construct HdfsConfiguration instances
> 
>
> Key: HDFS-9241
> URL: https://issues.apache.org/jira/browse/HDFS-9241
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Reporter: Steve Loughran
>Assignee: Mingliang Liu
> Attachments: HDFS-9241.000.patch, HDFS-9241.001.patch, 
> HDFS-9241.002.patch, HDFS-9241.003.patch, HDFS-9241.004.patch
>
>
> the changes for the hdfs client classpath make instantiating 
> {{HdfsConfiguration}} from the client impossible; it only lives server side. 
> This breaks any app which creates one.
> I know people will look at the {{@Private}} tag and say "don't do that then", 
> but it's worth considering precisely why I, at least, do this: it's the only 
> way to guarantee that the hdfs-default and hdfs-site resources get on the 
> classpath, including all the security settings. It's precisely the use case 
> which {{HdfsConfigurationLoader.init();}} offers internally to the hdfs code.
> What am I meant to do now? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9241) HDFS clients can't construct HdfsConfiguration instances

2015-10-21 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9241:

Attachment: HDFS-9241.002.patch

Thanks for your review [~ste...@apache.org] and [~wheat9]. The v2 patch puts 
the deprecated keys in a nested interface of {{HdfsClientConfigKeys}}.

> HDFS clients can't construct HdfsConfiguration instances
> 
>
> Key: HDFS-9241
> URL: https://issues.apache.org/jira/browse/HDFS-9241
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Reporter: Steve Loughran
>Assignee: Mingliang Liu
> Attachments: HDFS-9241.000.patch, HDFS-9241.001.patch, 
> HDFS-9241.002.patch
>
>
> the changes for the hdfs client classpath make instantiating 
> {{HdfsConfiguration}} from the client impossible; it only lives server side. 
> This breaks any app which creates one.
> I know people will look at the {{@Private}} tag and say "don't do that then", 
> but it's worth considering precisely why I, at least, do this: it's the only 
> way to guarantee that the hdfs-default and hdfs-site resources get on the 
> classpath, including all the security settings. It's precisely the use case 
> which {{HdfsConfigurationLoader.init();}} offers internally to the hdfs code.
> What am I meant to do now? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9304) Add HdfsClientConfigKeys class to TestHdfsConfigFields#configurationClasses

2015-10-25 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9304:

Status: Patch Available  (was: Open)

> Add HdfsClientConfigKeys class to TestHdfsConfigFields#configurationClasses
> ---
>
> Key: HDFS-9304
> URL: https://issues.apache.org/jira/browse/HDFS-9304
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: build
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9304.000.patch
>
>
> *tl;dr* Since {{HdfsClientConfigKeys}} holds client side config keys, we need 
> to add this class to {{TestHdfsConfigFields#configurationClasses}}.
> Now the {{TestHdfsConfigFields}} unit test passes because {{DFSConfigKeys}} 
> still contains all the client side config keys, though marked @deprecated. As 
> we add new client config keys (e.g. [HDFS-9259]), the unit test will fail 
> with the following error:
> {quote}
> hdfs-default.xml has 1 properties missing in  class 
> org.apache.hadoop.hdfs.DFSConfigKeys
> {quote}
> If the logic is to make the {{DFSConfigKeys}} and {{HdfsClientConfigKeys}} 
> together cover all config keys in {{hdfs-default.xml}}, we need to put both 
> of them in {{TestHdfsConfigFields#configurationClasses}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9259) Make SO_SNDBUF size configurable at DFSClient side for hdfs write scenario

2015-10-25 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973448#comment-14973448
 ] 

Mingliang Liu commented on HDFS-9259:
-

The failing test {{TestHdfsConfigFields}} is tracked by [HDFS-9304].

> Make SO_SNDBUF size configurable at DFSClient side for hdfs write scenario
> --
>
> Key: HDFS-9259
> URL: https://issues.apache.org/jira/browse/HDFS-9259
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Mingliang Liu
> Attachments: HDFS-9259.000.patch
>
>
> We recently found that cross-DC hdfs write could be really slow. Further 
> investigation identified that is due to SendBufferSize and ReceiveBufferSize 
> used for hdfs write. The test ran "hadoop -fs -copyFromLocal" of a 256MB file 
> across DC with different SendBufferSize and ReceiveBufferSize values. The 
> results showed that c much faster than b; b is faster than a.
> a. SendBufferSize=128k, ReceiveBufferSize=128k (hdfs default setting).
> b. SendBufferSize=128K, ReceiveBufferSize=not set(TCP auto tuning).
> c. SendBufferSize=not set, ReceiveBufferSize=not set(TCP auto tuning for both)
> HDFS-8829 has enabled scenario b. We would like to enable scenario c by 
> making SendBufferSize configurable at DFSClient side. Cc: [~cmccabe] [~He 
> Tianyi] [~kanaka] [~vinayrpet].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9304) Add HdfsClientConfigKeys class to TestHdfsConfigFields#configurationClasses

2015-10-25 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9304:

Attachment: HDFS-9304.000.patch

> Add HdfsClientConfigKeys class to TestHdfsConfigFields#configurationClasses
> ---
>
> Key: HDFS-9304
> URL: https://issues.apache.org/jira/browse/HDFS-9304
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: build
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9304.000.patch
>
>
> *tl;dr* Since {{HdfsClientConfigKeys}} holds client side config keys, we need 
> to add this class to {{TestHdfsConfigFields#configurationClasses}}.
> Now the {{TestHdfsConfigFields}} unit test passes because {{DFSConfigKeys}} 
> still contains all the client side config keys, though marked @deprecated. As 
> we add new client config keys (e.g. [HDFS-9259]), the unit test will fail 
> with the following error:
> {quote}
> hdfs-default.xml has 1 properties missing in  class 
> org.apache.hadoop.hdfs.DFSConfigKeys
> {quote}
> If the logic is to make the {{DFSConfigKeys}} and {{HdfsClientConfigKeys}} 
> together cover all config keys in {{hdfs-default.xml}}, we need to put both 
> of them in {{TestHdfsConfigFields#configurationClasses}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-9307) fuseConnect should be private to fuse_connect.c

2015-10-26 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu reassigned HDFS-9307:
---

Assignee: Mingliang Liu

> fuseConnect should be private to fuse_connect.c
> ---
>
> Key: HDFS-9307
> URL: https://issues.apache.org/jira/browse/HDFS-9307
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fuse-dfs
>Reporter: Colin Patrick McCabe
>Assignee: Mingliang Liu
>Priority: Trivial
>
> fuseConnect should be private to fuse_connect.c, since it's not used outside 
> that file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9307) fuseConnect should be private to fuse_connect.c

2015-10-26 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9307:

Attachment: HDFS-9307.000.patch

Thanks for reporting this. I also think it should be private. The patch simply 
make it {{static}}.

> fuseConnect should be private to fuse_connect.c
> ---
>
> Key: HDFS-9307
> URL: https://issues.apache.org/jira/browse/HDFS-9307
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fuse-dfs
>Reporter: Colin Patrick McCabe
>Assignee: Mingliang Liu
>Priority: Trivial
> Attachments: HDFS-9307.000.patch
>
>
> fuseConnect should be private to fuse_connect.c, since it's not used outside 
> that file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9304) Add HdfsClientConfigKeys class to TestHdfsConfigFields#configurationClasses

2015-10-26 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974847#comment-14974847
 ] 

Mingliang Liu commented on HDFS-9304:
-

Thanks to [~wheat9] for your review and commit.

> Add HdfsClientConfigKeys class to TestHdfsConfigFields#configurationClasses
> ---
>
> Key: HDFS-9304
> URL: https://issues.apache.org/jira/browse/HDFS-9304
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: build
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HDFS-9304.000.patch
>
>
> *tl;dr* Since {{HdfsClientConfigKeys}} holds client side config keys, we need 
> to add this class to {{TestHdfsConfigFields#configurationClasses}}.
> Now the {{TestHdfsConfigFields}} unit test passes because {{DFSConfigKeys}} 
> still contains all the client side config keys, though marked @deprecated. As 
> we add new client config keys (e.g. [HDFS-9259]), the unit test will fail 
> with the following error:
> {quote}
> hdfs-default.xml has 1 properties missing in  class 
> org.apache.hadoop.hdfs.DFSConfigKeys
> {quote}
> If the logic is to make the {{DFSConfigKeys}} and {{HdfsClientConfigKeys}} 
> together cover all config keys in {{hdfs-default.xml}}, we need to put both 
> of them in {{TestHdfsConfigFields#configurationClasses}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9307) fuseConnect should be private to fuse_connect.c

2015-10-26 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9307:

Status: Patch Available  (was: Open)

> fuseConnect should be private to fuse_connect.c
> ---
>
> Key: HDFS-9307
> URL: https://issues.apache.org/jira/browse/HDFS-9307
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fuse-dfs
>Reporter: Colin Patrick McCabe
>Assignee: Mingliang Liu
>Priority: Trivial
> Attachments: HDFS-9307.000.patch
>
>
> fuseConnect should be private to fuse_connect.c, since it's not used outside 
> that file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9307) fuseConnect should be private to fuse_connect.c

2015-10-26 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9307:

Attachment: HDFS-9307.001.patch

Thanks for your review, [~cmccabe]. The v1 patch addresses this and also 
refines the comments for the functions.

> fuseConnect should be private to fuse_connect.c
> ---
>
> Key: HDFS-9307
> URL: https://issues.apache.org/jira/browse/HDFS-9307
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fuse-dfs
>Reporter: Colin Patrick McCabe
>Assignee: Mingliang Liu
>Priority: Trivial
> Attachments: HDFS-9307.000.patch, HDFS-9307.001.patch
>
>
> fuseConnect should be private to fuse_connect.c, since it's not used outside 
> that file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9313) Possible NullPointerException in BlockManager if no excess replica can be chosen

2015-10-26 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975453#comment-14975453
 ] 

Mingliang Liu commented on HDFS-9313:
-

Thanks for filing and working on this, [~mingma]. I think the patch makes sense 
to me. The warning is much better than a NPE.

{code}
+// no replica can't be chosen as the excessive replica as
{code}
Do you mean "no replica *can* be chosen as the excessive replica as"?

> Possible NullPointerException in BlockManager if no excess replica can be 
> chosen
> 
>
> Key: HDFS-9313
> URL: https://issues.apache.org/jira/browse/HDFS-9313
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-9313.patch
>
>
> HDFS-8647 makes it easier to reason about various block placement scenarios. 
> Here is one possible case where BlockManager won't be able to find the excess 
> replica to delete: when storage policy changes around the same time balancer 
> moves the block. When this happens, it will cause NullPointerException.
> {noformat}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy.adjustSetsWithChosenReplica(BlockPlacementPolicy.java:156)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseReplicasToDelete(BlockPlacementPolicyDefault.java:978)
> {noformat}
> Note that it isn't found in any production clusters. Instead, it is found 
> from new unit tests. In addition, the issue has been there before HDFS-8647.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9259) Make SO_SNDBUF size configurable at DFSClient side for hdfs write scenario

2015-10-26 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9259:

Attachment: HDFS-9259.001.patch

Thanks for your review [~mingma]!

The v1 patch addresses the format problems.

> Make SO_SNDBUF size configurable at DFSClient side for hdfs write scenario
> --
>
> Key: HDFS-9259
> URL: https://issues.apache.org/jira/browse/HDFS-9259
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Mingliang Liu
> Attachments: HDFS-9259.000.patch, HDFS-9259.001.patch
>
>
> We recently found that cross-DC hdfs write could be really slow. Further 
> investigation identified that is due to SendBufferSize and ReceiveBufferSize 
> used for hdfs write. The test ran "hadoop -fs -copyFromLocal" of a 256MB file 
> across DC with different SendBufferSize and ReceiveBufferSize values. The 
> results showed that c much faster than b; b is faster than a.
> a. SendBufferSize=128k, ReceiveBufferSize=128k (hdfs default setting).
> b. SendBufferSize=128K, ReceiveBufferSize=not set(TCP auto tuning).
> c. SendBufferSize=not set, ReceiveBufferSize=not set(TCP auto tuning for both)
> HDFS-8829 has enabled scenario b. We would like to enable scenario c by 
> making SendBufferSize configurable at DFSClient side. Cc: [~cmccabe] [~He 
> Tianyi] [~kanaka] [~vinayrpet].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9313) Possible NullPointerException in BlockManager if no excess replica can be chosen

2015-10-26 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975595#comment-14975595
 ] 

Mingliang Liu commented on HDFS-9313:
-

Will this be covered by [HDFS-9314]?

> Possible NullPointerException in BlockManager if no excess replica can be 
> chosen
> 
>
> Key: HDFS-9313
> URL: https://issues.apache.org/jira/browse/HDFS-9313
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-9313.patch
>
>
> HDFS-8647 makes it easier to reason about various block placement scenarios. 
> Here is one possible case where BlockManager won't be able to find the excess 
> replica to delete: when storage policy changes around the same time balancer 
> moves the block. When this happens, it will cause NullPointerException.
> {noformat}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy.adjustSetsWithChosenReplica(BlockPlacementPolicy.java:156)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseReplicasToDelete(BlockPlacementPolicyDefault.java:978)
> {noformat}
> Note that it isn't found in any production clusters. Instead, it is found 
> from new unit tests. In addition, the issue has been there before HDFS-8647.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9129) Move the safemode block count into BlockManager

2015-10-26 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975703#comment-14975703
 ] 

Mingliang Liu commented on HDFS-9129:
-

Thanks to [~anu] for the review and comment.

Anu and I are discussing the potential conflicts with [HDFS-4015] offline, and 
I'll update the patch soon once we reach consensus.

{quote}
 I think it might be a good idea to move BytesInFutureBlocks also to the same 
class.
{quote}
Agreed. Let's address that separately as this jira is to move safemode block 
count from {{FSNamesystem}} to {{BlockManager}}. I'll report this after this is 
resolved, and invite you to review the work.

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, 
> HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, 
> HDFS-9129.011.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9313) Possible NullPointerException in BlockManager if no excess replica can be chosen

2015-10-26 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975620#comment-14975620
 ] 

Mingliang Liu commented on HDFS-9313:
-

Agreed.

+1 (non-binding) pending on Jenkins.

> Possible NullPointerException in BlockManager if no excess replica can be 
> chosen
> 
>
> Key: HDFS-9313
> URL: https://issues.apache.org/jira/browse/HDFS-9313
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-9313.patch
>
>
> HDFS-8647 makes it easier to reason about various block placement scenarios. 
> Here is one possible case where BlockManager won't be able to find the excess 
> replica to delete: when storage policy changes around the same time balancer 
> moves the block. When this happens, it will cause NullPointerException.
> {noformat}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy.adjustSetsWithChosenReplica(BlockPlacementPolicy.java:156)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseReplicasToDelete(BlockPlacementPolicyDefault.java:978)
> {noformat}
> Note that it isn't found in any production clusters. Instead, it is found 
> from new unit tests. In addition, the issue has been there before HDFS-8647.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-10-26 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: HDFS-9129.011.patch

{quote}
{code}
private volatile boolean isInManualSafeMode = false;
private volatile boolean isInResourceLowSafeMode = false;

...
isInManualSafeMode = !resourcesLow;
isInResourceLowSafeMode = resourcesLow;
{code}
How do these two variables synchronize? Is the system in consistent state in 
the middle of the execution?
{quote}

Per offline discussion with [~wheat9], the {{volatile}} keyword is considered 
premature optimization. Make {{isInResourceLowSafeMode }} short-circuit 
{{isInManualSafeMode}} is bad design for incoming changes. The v11 patch uses 
synchronized block to make the system stay in consistent state in the middle of 
the execution.

The comment for {{BlockManagerSafeMode}} (ahead of the class) is also refined.

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, 
> HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, 
> HDFS-9129.011.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9236) Missing sanity check for block size during block recovery

2015-10-29 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981330#comment-14981330
 ] 

Mingliang Liu commented on HDFS-9236:
-

The latest patch looks good to me overall. One minor comment: is it possible to 
assert expected exception thrown (e.g. by error message) ?

> Missing sanity check for block size during block recovery
> -
>
> Key: HDFS-9236
> URL: https://issues.apache.org/jira/browse/HDFS-9236
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: HDFS
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
> Attachments: HDFS-9236.001.patch, HDFS-9236.002.patch, 
> HDFS-9236.003.patch, HDFS-9236.004.patch
>
>
> Ran into an issue while running test against faulty data-node code. 
> Currently in DataNode.java:
> {code:java}
>   /** Block synchronization */
>   void syncBlock(RecoveringBlock rBlock,
>  List syncList) throws IOException {
> …
> // Calculate the best available replica state.
> ReplicaState bestState = ReplicaState.RWR;
> …
> // Calculate list of nodes that will participate in the recovery
> // and the new block size
> List participatingList = new ArrayList();
> final ExtendedBlock newBlock = new ExtendedBlock(bpid, blockId,
> -1, recoveryId);
> switch(bestState) {
> …
> case RBW:
> case RWR:
>   long minLength = Long.MAX_VALUE;
>   for(BlockRecord r : syncList) {
> ReplicaState rState = r.rInfo.getOriginalReplicaState();
> if(rState == bestState) {
>   minLength = Math.min(minLength, r.rInfo.getNumBytes());
>   participatingList.add(r);
> }
>   }
>   newBlock.setNumBytes(minLength);
>   break;
> …
> }
> …
> nn.commitBlockSynchronization(block,
> newBlock.getGenerationStamp(), newBlock.getNumBytes(), true, false,
> datanodes, storages);
>   }
> {code}
> This code is called by the DN coordinating the block recovery. In the above 
> case, it is possible for none of the rState (reported by DNs with copies of 
> the replica being recovered) to match the bestState. This can either be 
> caused by faulty DN code or stale/modified/corrupted files on DN. When this 
> happens the DN will end up reporting the minLengh of Long.MAX_VALUE.
> Unfortunately there is no check on the NN for replica length. See 
> FSNamesystem.java:
> {code:java}
>   void commitBlockSynchronization(ExtendedBlock oldBlock,
>   long newgenerationstamp, long newlength,
>   boolean closeFile, boolean deleteblock, DatanodeID[] newtargets,
>   String[] newtargetstorages) throws IOException {
> …
>   if (deleteblock) {
> Block blockToDel = ExtendedBlock.getLocalBlock(oldBlock);
> boolean remove = iFile.removeLastBlock(blockToDel) != null;
> if (remove) {
>   blockManager.removeBlock(storedBlock);
> }
>   } else {
> // update last block
> if(!copyTruncate) {
>   storedBlock.setGenerationStamp(newgenerationstamp);
>   
>   // XXX block length is updated without any check <<<   storedBlock.setNumBytes(newlength);
> }
> …
> if (closeFile) {
>   LOG.info("commitBlockSynchronization(oldBlock=" + oldBlock
>   + ", file=" + src
>   + (copyTruncate ? ", newBlock=" + truncatedBlock
>   : ", newgenerationstamp=" + newgenerationstamp)
>   + ", newlength=" + newlength
>   + ", newtargets=" + Arrays.asList(newtargets) + ") successful");
> } else {
>   LOG.info("commitBlockSynchronization(" + oldBlock + ") successful");
> }
>   }
> {code}
> After this point the block length becomes Long.MAX_VALUE. Any subsequent 
> block report (even with correct length) will cause the block to be marked as 
> corrupted. Since this is block could be the last block of the file. If this 
> happens and the client goes away, NN won’t be able to recover the lease and 
> close the file because the last block is under-replicated.
> I believe we need to have a sanity check for block size on both DN and NN to 
> prevent such case from happening.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9236) Missing sanity check for block size during block recovery

2015-10-29 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981458#comment-14981458
 ] 

Mingliang Liu commented on HDFS-9236:
-

Sorry for the confusion.

By "assert expected exception thrown (e.g. by error message)", I mean 
{{asserTrue(ioe.getMessage().contains("ooxx"));}} in test, not in the DN code. 
I'm with you. I believe throwing an exception is correct and assert is wrong in 
this case.

> Missing sanity check for block size during block recovery
> -
>
> Key: HDFS-9236
> URL: https://issues.apache.org/jira/browse/HDFS-9236
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: HDFS
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
> Attachments: HDFS-9236.001.patch, HDFS-9236.002.patch, 
> HDFS-9236.003.patch, HDFS-9236.004.patch
>
>
> Ran into an issue while running test against faulty data-node code. 
> Currently in DataNode.java:
> {code:java}
>   /** Block synchronization */
>   void syncBlock(RecoveringBlock rBlock,
>  List syncList) throws IOException {
> …
> // Calculate the best available replica state.
> ReplicaState bestState = ReplicaState.RWR;
> …
> // Calculate list of nodes that will participate in the recovery
> // and the new block size
> List participatingList = new ArrayList();
> final ExtendedBlock newBlock = new ExtendedBlock(bpid, blockId,
> -1, recoveryId);
> switch(bestState) {
> …
> case RBW:
> case RWR:
>   long minLength = Long.MAX_VALUE;
>   for(BlockRecord r : syncList) {
> ReplicaState rState = r.rInfo.getOriginalReplicaState();
> if(rState == bestState) {
>   minLength = Math.min(minLength, r.rInfo.getNumBytes());
>   participatingList.add(r);
> }
>   }
>   newBlock.setNumBytes(minLength);
>   break;
> …
> }
> …
> nn.commitBlockSynchronization(block,
> newBlock.getGenerationStamp(), newBlock.getNumBytes(), true, false,
> datanodes, storages);
>   }
> {code}
> This code is called by the DN coordinating the block recovery. In the above 
> case, it is possible for none of the rState (reported by DNs with copies of 
> the replica being recovered) to match the bestState. This can either be 
> caused by faulty DN code or stale/modified/corrupted files on DN. When this 
> happens the DN will end up reporting the minLengh of Long.MAX_VALUE.
> Unfortunately there is no check on the NN for replica length. See 
> FSNamesystem.java:
> {code:java}
>   void commitBlockSynchronization(ExtendedBlock oldBlock,
>   long newgenerationstamp, long newlength,
>   boolean closeFile, boolean deleteblock, DatanodeID[] newtargets,
>   String[] newtargetstorages) throws IOException {
> …
>   if (deleteblock) {
> Block blockToDel = ExtendedBlock.getLocalBlock(oldBlock);
> boolean remove = iFile.removeLastBlock(blockToDel) != null;
> if (remove) {
>   blockManager.removeBlock(storedBlock);
> }
>   } else {
> // update last block
> if(!copyTruncate) {
>   storedBlock.setGenerationStamp(newgenerationstamp);
>   
>   // XXX block length is updated without any check <<<   storedBlock.setNumBytes(newlength);
> }
> …
> if (closeFile) {
>   LOG.info("commitBlockSynchronization(oldBlock=" + oldBlock
>   + ", file=" + src
>   + (copyTruncate ? ", newBlock=" + truncatedBlock
>   : ", newgenerationstamp=" + newgenerationstamp)
>   + ", newlength=" + newlength
>   + ", newtargets=" + Arrays.asList(newtargets) + ") successful");
> } else {
>   LOG.info("commitBlockSynchronization(" + oldBlock + ") successful");
> }
>   }
> {code}
> After this point the block length becomes Long.MAX_VALUE. Any subsequent 
> block report (even with correct length) will cause the block to be marked as 
> corrupted. Since this is block could be the last block of the file. If this 
> happens and the client goes away, NN won’t be able to recover the lease and 
> close the file because the last block is under-replicated.
> I believe we need to have a sanity check for block size on both DN and NN to 
> prevent such case from happening.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-10-29 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: HDFS-9129.016.patch

The failing tests can pass on my local machine. The v16 patch is to address the 
findbugs warnings.

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, 
> HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, 
> HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, 
> HDFS-9129.014.patch, HDFS-9129.015.patch, HDFS-9129.016.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-9343) Empty caller context considered invalid

2015-10-29 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu reassigned HDFS-9343:
---

Assignee: Mingliang Liu

> Empty caller context considered invalid
> ---
>
> Key: HDFS-9343
> URL: https://issues.apache.org/jira/browse/HDFS-9343
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>
> The caller context with empty context string is considered invalid, and it 
> should not appear in the audit log.
> Meanwhile, too long signature will not be written to audit log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9343) Empty caller context considered invalid

2015-10-29 Thread Mingliang Liu (JIRA)
Mingliang Liu created HDFS-9343:
---

 Summary: Empty caller context considered invalid
 Key: HDFS-9343
 URL: https://issues.apache.org/jira/browse/HDFS-9343
 Project: Hadoop HDFS
  Issue Type: Task
Reporter: Mingliang Liu


The caller context with empty context string is considered invalid, and it 
should not appear in the audit log.

Meanwhile, too long signature will not be written to audit log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9343) Empty caller context considered invalid

2015-10-29 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9343:

Status: Patch Available  (was: Open)

> Empty caller context considered invalid
> ---
>
> Key: HDFS-9343
> URL: https://issues.apache.org/jira/browse/HDFS-9343
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9343.000.patch
>
>
> The caller context with empty context string is considered invalid, and it 
> should not appear in the audit log.
> Meanwhile, too long signature will not be written to audit log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9343) Empty caller context considered invalid

2015-10-29 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9343:

Attachment: HDFS-9343.000.patch

> Empty caller context considered invalid
> ---
>
> Key: HDFS-9343
> URL: https://issues.apache.org/jira/browse/HDFS-9343
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9343.000.patch
>
>
> The caller context with empty context string is considered invalid, and it 
> should not appear in the audit log.
> Meanwhile, too long signature will not be written to audit log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-10-29 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: HDFS-9129.017.patch

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, 
> HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, 
> HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, 
> HDFS-9129.014.patch, HDFS-9129.015.patch, HDFS-9129.016.patch, 
> HDFS-9129.017.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9236) Missing sanity check for block size during block recovery

2015-10-29 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981604#comment-14981604
 ] 

Mingliang Liu commented on HDFS-9236:
-

+1 (non-binding) pending on Jenkins.

> Missing sanity check for block size during block recovery
> -
>
> Key: HDFS-9236
> URL: https://issues.apache.org/jira/browse/HDFS-9236
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: HDFS
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
> Attachments: HDFS-9236.001.patch, HDFS-9236.002.patch, 
> HDFS-9236.003.patch, HDFS-9236.004.patch, HDFS-9236.005.patch
>
>
> Ran into an issue while running test against faulty data-node code. 
> Currently in DataNode.java:
> {code:java}
>   /** Block synchronization */
>   void syncBlock(RecoveringBlock rBlock,
>  List syncList) throws IOException {
> …
> // Calculate the best available replica state.
> ReplicaState bestState = ReplicaState.RWR;
> …
> // Calculate list of nodes that will participate in the recovery
> // and the new block size
> List participatingList = new ArrayList();
> final ExtendedBlock newBlock = new ExtendedBlock(bpid, blockId,
> -1, recoveryId);
> switch(bestState) {
> …
> case RBW:
> case RWR:
>   long minLength = Long.MAX_VALUE;
>   for(BlockRecord r : syncList) {
> ReplicaState rState = r.rInfo.getOriginalReplicaState();
> if(rState == bestState) {
>   minLength = Math.min(minLength, r.rInfo.getNumBytes());
>   participatingList.add(r);
> }
>   }
>   newBlock.setNumBytes(minLength);
>   break;
> …
> }
> …
> nn.commitBlockSynchronization(block,
> newBlock.getGenerationStamp(), newBlock.getNumBytes(), true, false,
> datanodes, storages);
>   }
> {code}
> This code is called by the DN coordinating the block recovery. In the above 
> case, it is possible for none of the rState (reported by DNs with copies of 
> the replica being recovered) to match the bestState. This can either be 
> caused by faulty DN code or stale/modified/corrupted files on DN. When this 
> happens the DN will end up reporting the minLengh of Long.MAX_VALUE.
> Unfortunately there is no check on the NN for replica length. See 
> FSNamesystem.java:
> {code:java}
>   void commitBlockSynchronization(ExtendedBlock oldBlock,
>   long newgenerationstamp, long newlength,
>   boolean closeFile, boolean deleteblock, DatanodeID[] newtargets,
>   String[] newtargetstorages) throws IOException {
> …
>   if (deleteblock) {
> Block blockToDel = ExtendedBlock.getLocalBlock(oldBlock);
> boolean remove = iFile.removeLastBlock(blockToDel) != null;
> if (remove) {
>   blockManager.removeBlock(storedBlock);
> }
>   } else {
> // update last block
> if(!copyTruncate) {
>   storedBlock.setGenerationStamp(newgenerationstamp);
>   
>   // XXX block length is updated without any check <<<   storedBlock.setNumBytes(newlength);
> }
> …
> if (closeFile) {
>   LOG.info("commitBlockSynchronization(oldBlock=" + oldBlock
>   + ", file=" + src
>   + (copyTruncate ? ", newBlock=" + truncatedBlock
>   : ", newgenerationstamp=" + newgenerationstamp)
>   + ", newlength=" + newlength
>   + ", newtargets=" + Arrays.asList(newtargets) + ") successful");
> } else {
>   LOG.info("commitBlockSynchronization(" + oldBlock + ") successful");
> }
>   }
> {code}
> After this point the block length becomes Long.MAX_VALUE. Any subsequent 
> block report (even with correct length) will cause the block to be marked as 
> corrupted. Since this is block could be the last block of the file. If this 
> happens and the client goes away, NN won’t be able to recover the lease and 
> close the file because the last block is under-replicated.
> I believe we need to have a sanity check for block size on both DN and NN to 
> prevent such case from happening.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-10-24 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: HDFS-9129.010.patch

Thank you [~wheat9] for your review. The v10 patch addresses the comments, 
along with rebasing from {{trunk}} branch. As there are conflicts because of 
[HDFS-4015], I also invited [~arpitagarwal] and [~anu] to review the patch.

See response inline to [~wheat9]'s comments.
{quote}
1. It does not give much information compared to figuring out the issues on the 
code directly. What does "thresholds met" / "extensions reached" mean? It 
causes more confusions than explanations.
{quote}
The motivation is that diagram is helpful for glimpse, if we can provide 
definition of "thresholds met" / "extension reached". In v10 patch, I add more 
explanations in the comments.

{quote}
{code}
LOG.error("Non-recognized block manager safe mode status: {}", status);
{code}
2. Should be an assert.
{quote}
Truely, I'll simply {{assert false : "some comment"}}.

{quote}
{code}
private volatile boolean isInManualSafeMode = false;
private volatile boolean isInResourceLowSafeMode = false;

...
isInManualSafeMode = !resourcesLow;
isInResourceLowSafeMode = resourcesLow;
{code}
3. How do these two variables synchronize? Is the system in consistent state in 
the middle of the execution?
{quote}
Good question. Actually it's not in consistent state in the middle of the 
execution. If the {{resourceLow}} is true, and before that name node is in 
manual safe mode, in the middle of the execution {{isInSafeMode}} will return 
false, which means the safe mode is OFF. The main reason is that writing to the 
two variables (aka enter/leave safemode) is guarded by the FSNamesystem write 
lock, while read is not.

The enum type state was replace with two boolean flags in the v7 patch, because 
the two-lay state machine was cumbersome per offline discussion with [~wheat9] 
and [~jingzhao]. Guarding all the read looks expensive. Bitwise operation on a 
flag variable seems tricky.

The new design goes back to the {{trunk}} logic, which makes the block manager 
stays in safe mode in the middle of the execution:
{code}
  if (resourcesLow) {
isInResourceLowSafeMode = true;
  } else {
isInManualSafeMode = true;
  }
{code}
In case both {{isInManualSafeMode}} and {{isInResourceLowSafeMode}} are true, 
{{isInResourceLowSafeMode}} will short-circuit {{isInManualSafeMode}}, 
according to our current logic, e.g. {{getTurnOffTip()}}.

{quote}
{code}
+// INITIALIZED -> THRESHOLD
+bmSafeMode.setBlockTotal(BLOCK_TOTAL);
+assertEquals(BMSafeModeStatus.THRESHOLD, getSafeModeStatus());
+assertTrue(bmSafeMode.isInSafeMode());
{code}
4. It makes sense to put it in a test instead of in the @Before method.
{quote}
That makes sense to me. I'll add a new test called {{testSetBlockTotal}}.

{quote}
{code}
+// EXTENSION -> OFF
+Whitebox.setInternalState(bmSafeMode, "status", 
BMSafeModeStatus.EXTENSION);
+reachBlockThreshold();
+reachExtension();
+bmSafeMode.checkSafeMode();
+assertEquals(BMSafeModeStatus.OFF, getSafeModeStatus());
{code}
5. Please refactor the code – you can reuse the getSafeModeStatus() that is 
defined by the class.
{quote}
Actually the first statment for each state transition tests is to *set* the 
internal state.
I'll make the fields package accessible so the Whitebox is not needed. See the 
end of this comment.

{quote}
{code}
assertEquals(getBlockSafe(), i);
{code}
6. The expected value should go first according to the function signature.
{quote}
Nice catch. I will revise the whole test to fix all similar ones.

{quote}
7. A higher level question is that why it needs so many getInternalState() 
statements? It looks to me that many of these behaviors can be observed outside 
without whitebox testing.
{quote}
The reason was that *all* non-static fields in {{BlockManagerSafeMode}} is 
private. By design, I made them _private_ because we don't need access to its 
internal state in {{BlockManager}}.

In the new design, the v10 patch simply makes the fields package private so the 
test will be more straight-forward.

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, 
> HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> 

[jira] [Commented] (HDFS-9307) fuseConnect should be private to fuse_connect.c

2015-10-27 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977072#comment-14977072
 ] 

Mingliang Liu commented on HDFS-9307:
-

Thank you [~cmccabe] for your reporting this jira, reviewing and committing the 
final patch.

> fuseConnect should be private to fuse_connect.c
> ---
>
> Key: HDFS-9307
> URL: https://issues.apache.org/jira/browse/HDFS-9307
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fuse-dfs
>Reporter: Colin Patrick McCabe
>Assignee: Mingliang Liu
>Priority: Trivial
> Fix For: 2.8.0
>
> Attachments: HDFS-9307.000.patch, HDFS-9307.001.patch
>
>
> fuseConnect should be private to fuse_connect.c, since it's not used outside 
> that file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9164) hdfs-nfs connector fails on O_TRUNC

2015-10-27 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977067#comment-14977067
 ] 

Mingliang Liu commented on HDFS-9164:
-

You may need to "Submit Patch" to trigger Jenkins build.

> hdfs-nfs connector fails on O_TRUNC
> ---
>
> Key: HDFS-9164
> URL: https://issues.apache.org/jira/browse/HDFS-9164
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: HDFS
>Reporter: Constantine Peresypkin
> Attachments: HDFS-9164.1.patch
>
>
> Linux NFS client will issue `open(.. O_TRUNC); write()` when overwriting a 
> file that's in nfs client cache (to not evict the inode, probably). Which 
> will spectacularly fail on hdfs-nfs with I/O error.
> Example:
> $ cp /some/file /to/hdfs/mount/
> $ cp /some/file /to/hdfs/mount/
> I/O error
> The first write will pass if the file is not in cache, the second one will 
> always fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9245) Fix findbugs warnings in hdfs-nfs/WriteCtx

2015-10-26 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9245:

Attachment: HDFS-9245.002.patch

The v2 patch addresses the checkstyle warnings.

> Fix findbugs warnings in hdfs-nfs/WriteCtx
> --
>
> Key: HDFS-9245
> URL: https://issues.apache.org/jira/browse/HDFS-9245
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9245.000.patch, HDFS-9245.001.patch, 
> HDFS-9245.002.patch
>
>
> There are findbugs warnings as follows, brought by [HDFS-9092].
> It seems fine to ignore them by write a filter rule in the 
> {{findbugsExcludeFile.xml}} file. 
> {code:xml}
>  instanceHash="592511935f7cb9e5f97ef4c99a6c46c2" instanceOccurrenceNum="0" 
> priority="2" abbrev="IS" type="IS2_INCONSISTENT_SYNC" cweid="366" 
> instanceOccurrenceMax="0">
> Inconsistent synchronization
> 
> Inconsistent synchronization of 
> org.apache.hadoop.hdfs.nfs.nfs3.WriteCtx.offset; locked 75% of time
> 
> 
>  sourcepath="org/apache/hadoop/hdfs/nfs/nfs3/WriteCtx.java" 
> sourcefile="WriteCtx.java" end="314">
> At WriteCtx.java:[lines 40-314]
> 
> In class org.apache.hadoop.hdfs.nfs.nfs3.WriteCtx
> 
> {code}
> and
> {code:xml}
>  instanceHash="4f3daa339eb819220f26c998369b02fe" instanceOccurrenceNum="0" 
> priority="2" abbrev="IS" type="IS2_INCONSISTENT_SYNC" cweid="366" 
> instanceOccurrenceMax="0">
> Inconsistent synchronization
> 
> Inconsistent synchronization of 
> org.apache.hadoop.hdfs.nfs.nfs3.WriteCtx.originalCount; locked 50% of time
> 
> 
>  sourcepath="org/apache/hadoop/hdfs/nfs/nfs3/WriteCtx.java" 
> sourcefile="WriteCtx.java" end="314">
> At WriteCtx.java:[lines 40-314]
> 
> In class org.apache.hadoop.hdfs.nfs.nfs3.WriteCtx
> 
>  name="originalCount" primary="true" signature="I">
>  sourcepath="org/apache/hadoop/hdfs/nfs/nfs3/WriteCtx.java" 
> sourcefile="WriteCtx.java">
> In WriteCtx.java
> 
> 
> Field org.apache.hadoop.hdfs.nfs.nfs3.WriteCtx.originalCount
> 
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-10-28 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: HDFS-9129.012.patch

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, 
> HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, 
> HDFS-9129.011.patch, HDFS-9129.012.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9168) Move client side unit test to hadoop-hdfs-client

2015-10-28 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977836#comment-14977836
 ] 

Mingliang Liu commented on HDFS-9168:
-

Thanks for working on this.

+1 (non-binding)

> Move client side unit test to hadoop-hdfs-client
> 
>
> Key: HDFS-9168
> URL: https://issues.apache.org/jira/browse/HDFS-9168
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: build
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-9168.000.patch, HDFS-9168.001.patch, 
> HDFS-9168.002.patch, HDFS-9168.003.patch, HDFS-9168.004.patch
>
>
> We need to identify and move the unit tests on the client of hdfs to the 
> hdfs-client module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-10-28 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: HDFS-9129.015.patch

The v15 patch fixes another conflict with [HDFS-4015]. Per offline discussion 
with [~arpitagarwal] and [~anu], the {{smmthread}} should not repeatedly 
reporting that:
{quote}
Refusing to leave safe mode without a force flag. Exiting safe mode will cause 
a deletion of 590683116 byte(s). Please use -forceExit flag to exit safe mode 
forcefully if data loss is acceptable.
{quote}

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, 
> HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, 
> HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, 
> HDFS-9129.014.patch, HDFS-9129.015.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-10-28 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: HDFS-9129.013.patch

The v13 patch revisits the synchronized methods in {{BlockManagerSafeMode}}

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, 
> HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, 
> HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-10-28 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: HDFS-9129.014.patch

The v14 patch resolves the conflicts with [HDFS-4015]. Thanks to [~anu] for 
kindly pointing out and help me reproduce the bug. A new unit test in 
{{TestBlockManagerSafeMode}} is added as well.

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, 
> HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, 
> HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, 
> HDFS-9129.014.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9129) Move the safemode block count into BlockManager

2015-10-22 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970332#comment-14970332
 ] 

Mingliang Liu commented on HDFS-9129:
-

s/this/v9

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, 
> HDFS-9129.008.patch, HDFS-9129.009.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9241) HDFS clients can't construct HdfsConfiguration instances

2015-10-22 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9241:

Attachment: HDFS-9241.005.patch

The failing tests can pass locally. It seems the Jenkins is broken.

The v5 patch fixes checkstyle warnings and rebase from the {{trunk}} branch.

> HDFS clients can't construct HdfsConfiguration instances
> 
>
> Key: HDFS-9241
> URL: https://issues.apache.org/jira/browse/HDFS-9241
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Reporter: Steve Loughran
>Assignee: Mingliang Liu
> Attachments: HDFS-9241.000.patch, HDFS-9241.001.patch, 
> HDFS-9241.002.patch, HDFS-9241.003.patch, HDFS-9241.004.patch, 
> HDFS-9241.005.patch
>
>
> the changes for the hdfs client classpath make instantiating 
> {{HdfsConfiguration}} from the client impossible; it only lives server side. 
> This breaks any app which creates one.
> I know people will look at the {{@Private}} tag and say "don't do that then", 
> but it's worth considering precisely why I, at least, do this: it's the only 
> way to guarantee that the hdfs-default and hdfs-site resources get on the 
> classpath, including all the security settings. It's precisely the use case 
> which {{HdfsConfigurationLoader.init();}} offers internally to the hdfs code.
> What am I meant to do now? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9184) Logging HDFS operation's caller context into audit logs

2015-10-22 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9184:

Attachment: HDFS-9184.009.patch

The v9 patch rebases from {{trunk}} branch and resolves trivial conflicts.

> Logging HDFS operation's caller context into audit logs
> ---
>
> Key: HDFS-9184
> URL: https://issues.apache.org/jira/browse/HDFS-9184
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9184.000.patch, HDFS-9184.001.patch, 
> HDFS-9184.002.patch, HDFS-9184.003.patch, HDFS-9184.004.patch, 
> HDFS-9184.005.patch, HDFS-9184.006.patch, HDFS-9184.007.patch, 
> HDFS-9184.008.patch, HDFS-9184.009.patch
>
>
> For a given HDFS operation (e.g. delete file), it's very helpful to track 
> which upper level job issues it. The upper level callers may be specific 
> Oozie tasks, MR jobs, and hive queries. One scenario is that the namenode 
> (NN) is abused/spammed, the operator may want to know immediately which MR 
> job should be blamed so that she can kill it. To this end, the caller context 
> contains at least the application-dependent "tracking id".
> There are several existing techniques that may be related to this problem.
> 1. Currently the HDFS audit log tracks the users of the the operation which 
> is obviously not enough. It's common that the same user issues multiple jobs 
> at the same time. Even for a single top level task, tracking back to a 
> specific caller in a chain of operations of the whole workflow (e.g.Oozie -> 
> Hive -> Yarn) is hard, if not impossible.
> 2. HDFS integrated {{htrace}} support for providing tracing information 
> across multiple layers. The span is created in many places interconnected 
> like a tree structure which relies on offline analysis across RPC boundary. 
> For this use case, {{htrace}} has to be enabled at 100% sampling rate which 
> introduces significant overhead. Moreover, passing additional information 
> (via annotations) other than span id from root of the tree to leaf is a 
> significant additional work.
> 3. In [HDFS-4680 | https://issues.apache.org/jira/browse/HDFS-4680], there 
> are some related discussion on this topic. The final patch implemented the 
> tracking id as a part of delegation token. This protects the tracking 
> information from being changed or impersonated. However, kerberos 
> authenticated connections or insecure connections don't have tokens. 
> [HADOOP-8779] proposes to use tokens in all the scenarios, but that might 
> mean changes to several upstream projects and is a major change in their 
> security implementation.
> We propose another approach to address this problem. We also treat HDFS audit 
> log as a good place for after-the-fact root cause analysis. We propose to put 
> the caller id (e.g. Hive query id) in threadlocals. Specially, on client side 
> the threadlocal object is passed to NN as a part of RPC header (optional), 
> while on sever side NN retrieves it from header and put it to {{Handler}}'s 
> threadlocals. Finally in {{FSNamesystem}}, HDFS audit logger will record the 
> caller context for each operation. In this way, the existing code is not 
> affected.
> It is still challenging to keep "lying" client from abusing the caller 
> context. Our proposal is to add a {{signature}} field to the caller context. 
> The client choose to provide its signature along with the caller id. The 
> operator may need to validate the signature at the time of offline analysis. 
> The NN is not responsible for validating the signature online.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-10-22 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: HDFS-9129.009.patch

In the v8 patch, the findbugs warning is unrelated, the checkstyle are for long 
method or file length which we can do nothing in this patch. The overall test 
looks good.

Per offline discussion with [~jingzhao] and [~wheat9], this patch adds new 
white-box unit test class named {{TestBlockManagerSafeMode}}.

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, 
> HDFS-9129.008.patch, HDFS-9129.009.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9129) Move the safemode block count into BlockManager

2015-10-23 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970512#comment-14970512
 ] 

Mingliang Liu commented on HDFS-9129:
-

The failing tests can pass locally (Linux and Mac), and seem unrelated.

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, 
> HDFS-9129.008.patch, HDFS-9129.009.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9184) Logging HDFS operation's caller context into audit logs

2015-10-23 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970571#comment-14970571
 ] 

Mingliang Liu commented on HDFS-9184:
-

The failing tests can pass locally (Linux and Mac), and seem unrelated.

> Logging HDFS operation's caller context into audit logs
> ---
>
> Key: HDFS-9184
> URL: https://issues.apache.org/jira/browse/HDFS-9184
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9184.000.patch, HDFS-9184.001.patch, 
> HDFS-9184.002.patch, HDFS-9184.003.patch, HDFS-9184.004.patch, 
> HDFS-9184.005.patch, HDFS-9184.006.patch, HDFS-9184.007.patch, 
> HDFS-9184.008.patch, HDFS-9184.009.patch
>
>
> For a given HDFS operation (e.g. delete file), it's very helpful to track 
> which upper level job issues it. The upper level callers may be specific 
> Oozie tasks, MR jobs, and hive queries. One scenario is that the namenode 
> (NN) is abused/spammed, the operator may want to know immediately which MR 
> job should be blamed so that she can kill it. To this end, the caller context 
> contains at least the application-dependent "tracking id".
> There are several existing techniques that may be related to this problem.
> 1. Currently the HDFS audit log tracks the users of the the operation which 
> is obviously not enough. It's common that the same user issues multiple jobs 
> at the same time. Even for a single top level task, tracking back to a 
> specific caller in a chain of operations of the whole workflow (e.g.Oozie -> 
> Hive -> Yarn) is hard, if not impossible.
> 2. HDFS integrated {{htrace}} support for providing tracing information 
> across multiple layers. The span is created in many places interconnected 
> like a tree structure which relies on offline analysis across RPC boundary. 
> For this use case, {{htrace}} has to be enabled at 100% sampling rate which 
> introduces significant overhead. Moreover, passing additional information 
> (via annotations) other than span id from root of the tree to leaf is a 
> significant additional work.
> 3. In [HDFS-4680 | https://issues.apache.org/jira/browse/HDFS-4680], there 
> are some related discussion on this topic. The final patch implemented the 
> tracking id as a part of delegation token. This protects the tracking 
> information from being changed or impersonated. However, kerberos 
> authenticated connections or insecure connections don't have tokens. 
> [HADOOP-8779] proposes to use tokens in all the scenarios, but that might 
> mean changes to several upstream projects and is a major change in their 
> security implementation.
> We propose another approach to address this problem. We also treat HDFS audit 
> log as a good place for after-the-fact root cause analysis. We propose to put 
> the caller id (e.g. Hive query id) in threadlocals. Specially, on client side 
> the threadlocal object is passed to NN as a part of RPC header (optional), 
> while on sever side NN retrieves it from header and put it to {{Handler}}'s 
> threadlocals. Finally in {{FSNamesystem}}, HDFS audit logger will record the 
> caller context for each operation. In this way, the existing code is not 
> affected.
> It is still challenging to keep "lying" client from abusing the caller 
> context. Our proposal is to add a {{signature}} field to the caller context. 
> The client choose to provide its signature along with the caller id. The 
> operator may need to validate the signature at the time of offline analysis. 
> The NN is not responsible for validating the signature online.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks

2015-10-23 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971836#comment-14971836
 ] 

Mingliang Liu commented on HDFS-4015:
-

I revised the safe mode part and it looks good to me.

+1 for the latest patch.

> Safemode should count and report orphaned blocks
> 
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Anu Engineer
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, 
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, 
> HDFS-4015.006.patch, HDFS-4015.007.patch
>
>
> The safemode status currently reports the number of unique reported blocks 
> compared to the total number of blocks referenced by the namespace. However, 
> it does not report the inverse: blocks which are reported by datanodes but 
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can 
> be confusing: safemode and fsck will show "corrupt files", which are the 
> files which actually have been deleted but got resurrected by restarting from 
> the old image. This will convince them that they can safely force leave 
> safemode and remove these files -- after all, they know that those files 
> should really have been deleted. However, they're not aware that leaving 
> safemode will also unrecoverably delete a bunch of other block files which 
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "90 of expected 100 
> blocks have been reported. Additionally, 1 blocks have been reported 
> which do not correspond to any file in the namespace. Forcing exit of 
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is 
> the logical next step, but just reporting it as a warning seems easy enough 
> to accomplish and worth doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-10-21 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: HDFS-9129.007.patch

Thank you [~wheat9] for your detailed comments. The v7 patch addresses most of 
them. See response inline.

{quote}
{code}
  if (status == BMSafeModeStatus.OFF) {
return;
  }
{code}
1. There are multiple cases like the above. They should be asserts.
{quote}

It is safe for these methods to be called multiple times. If safe mode is not 
currently on, this is a no-op. Previously we check whether the {{safeMode}} is 
null before calling the respective methods. E.g.
{code:title=previous volatile and null check to allow calling multiple times}
  public void checkSafeMode() {
  // safeMode is volatile, and may be set to null at any time
SafeModeInfo safeMode = this.safeMode;
if (safeMode != null) {
  safeMode.checkMode();
}
  }
{code}
As we moved the start up safe mode to {{BlockManager}} and maintain a state 
machine for safe mode status, the volatile and null {{safeMode}} trick is not 
needed. Meanwhile, we must allow {{BlockManagerSafeMode}}'s methods called 
multiple times without side effect.
I will explicitly elaborate this in comments for these methods.

{quote}
2. namesystem.isHaEnabled() does not change in the lifecycle of the process.
{quote}
This is a very good point. I considered this but the {{blockManager}}, which 
will create the {{blockManagerSafeMode}}, is constructed before the 
{{namesystem#haEnabled}} is initialized. Per offline discussion with [~wheat9] 
and [~jingzhao], we can initialize the {{namesystem#haEnabled}} before 
constructing the {{blockManager}}. This way, the {{namesystem.isHaEnabled}} is 
not called repeatedly in the critical path.

{quote}
3. It's better to document the conditions of state transition.
{quote}
Yes, it makes a lot of sense to document the state machine transition. In v7 
patch, I add a diagram in the comment.

{quote}
{code}
+needExtension = extension > 0 &&
+(blockThreshold > 0 || datanodeThreshold > 0);
{code}
4. This can be moved under the THRESHOLD statement and become a local variable.
{quote}
Actually it's hard, if not impossible, largely because the {{needExtension}} 
should be initialized in the start status, aka {{INITALIZED}}. There is a 
regression test for this case 
{{TestHASafeMode#testBlocksRemovedBeforeStandbyRestart}}, brought by 
[HDFS-2692].

{quote}
5. initializeReplQueuesIfNecessary() should be called only once.
{quote}
Yes, we should initialize the replication queue only once. Once called for the 
first time, {{BlockManager#initializeReplQueues}} will set a flag indicating 
the replication queue is initialized. We'll check this flag in 
{{isPopulatingReplQueues}} before calling {{initializeReplQueues}} again. As it 
is of great importance to guarantee this, I'll double check this and fix this 
in next patch.

{quote}
6. safeModeStatus = SafeModeStatus.OFF should be moved to BlockManagerSafeMode.
{quote}
The {{safeModeStatus}} was for the {{FSNamesystem}} and the *OFF* here 
indicates both {{FSNamesystem}} and {{BlockManager}} leave the safe mode. 
{{BlockManagerSafeMode}}'s internal {{status}} was maintained in its own 
{{leaveSafeMode}} method.

Per offline discussion with [~wheat9] and [~jingzhao], the better design is to 
simplify the {{Namesystem}} safe mode to two flags indicating _manually_ or 
_resoure low_. In this way, the safe mode check is pretty straight-forward. The 
side benefit is that when we extend the current safe mode status, one more flag 
will work just fine, without breaking the existing code.
{code:title=new manual and resource low safe mode flag}
  private volatile boolean isInManualSafeMode = false;
  private volatile boolean isInResourceLowSafeMode = false;
  ...
  @Override
  public boolean isInSafeMode() {
return isInManualSafeMode ||
isInResourceLowSafeMode ||
blockManager.isInSafeMode();
  }
{code}

{quote}
7. A cleaner approach is to put the {{reached}} timestamp into the constructor 
of {{SafeModeMonitor()}}.
{quote}
It's a good point to define the {{reached}} value in the monitor. The v7 patch 
initializes it when the monitor starts. As the {{reached}} timestamp is partly 
used out of {{SafeModeMonitor}}, e.g. in {{getSafeModeTip}} and {{checkMode}}, 
the easy (may not be best) way is to treat it as class field.

{quote}
8. It might be good to have additional unit tests for BlockManagerSafeMode.
{quote}
That makes perfect sense to me. I'll add new unit test named 
{{TestBlockManagerSafeMode}} in the next patch.

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>

[jira] [Updated] (HDFS-9241) HDFS clients can't construct HdfsConfiguration instances

2015-10-21 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9241:

Attachment: HDFS-9241.003.patch

The v3 patch is to address the checkstyle warnings.

> HDFS clients can't construct HdfsConfiguration instances
> 
>
> Key: HDFS-9241
> URL: https://issues.apache.org/jira/browse/HDFS-9241
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Reporter: Steve Loughran
>Assignee: Mingliang Liu
> Attachments: HDFS-9241.000.patch, HDFS-9241.001.patch, 
> HDFS-9241.002.patch, HDFS-9241.003.patch
>
>
> the changes for the hdfs client classpath make instantiating 
> {{HdfsConfiguration}} from the client impossible; it only lives server side. 
> This breaks any app which creates one.
> I know people will look at the {{@Private}} tag and say "don't do that then", 
> but it's worth considering precisely why I, at least, do this: it's the only 
> way to guarantee that the hdfs-default and hdfs-site resources get on the 
> classpath, including all the security settings. It's precisely the use case 
> which {{HdfsConfigurationLoader.init();}} offers internally to the hdfs code.
> What am I meant to do now? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9332) Fix Precondition failures from NameNodeEditLogRoller while saving namespace

2015-10-28 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979450#comment-14979450
 ] 

Mingliang Liu commented on HDFS-9332:
-

Looks good to me. +1 (non-bing)

> Fix Precondition failures from NameNodeEditLogRoller while saving namespace
> ---
>
> Key: HDFS-9332
> URL: https://issues.apache.org/jira/browse/HDFS-9332
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.3.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: HDFS-9332.001.patch
>
>
> The check for the # of txns in the open edit log does not first check that an 
> edit log segment is open, leading to a Precondition failure. This surfaced at 
> HDFS-7871 which fixed that it was printing in a tight loop, but the cause of 
> the Precondition failure is still present.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9400) TestRollingUpgradeRollback fails on branch-2.

2015-11-10 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14999177#comment-14999177
 ] 

Mingliang Liu commented on HDFS-9400:
-

The test report seems good?

> TestRollingUpgradeRollback fails on branch-2.
> -
>
> Key: HDFS-9400
> URL: https://issues.apache.org/jira/browse/HDFS-9400
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chris Nauroth
>Assignee: Brahma Reddy Battula
>Priority: Blocker
> Attachments: HDFS-9400-branch-2.001.patch, HDFS-9400-branch-2.patch
>
>
> During a Jenkins pre-commit run on branch-2 for the HDFS-9394 patch, we 
> noticed a pre-existing failure in {{TestRollingUpgradeRollback}}.  I have 
> confirmed that this test is failing in branch-2 only.  It passes in trunk, 
> and it passes in branch-2.7.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9407) TestFileTruncate fails with BindException

2015-11-10 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14999151#comment-14999151
 ] 

Mingliang Liu commented on HDFS-9407:
-

{code}
-.nameNodePort(HdfsClientConfigKeys.DFS_NAMENODE_RPC_PORT_DEFAULT)
+.nameNodePort(
+ServerSocketUtil.getPort(
+HdfsClientConfigKeys.DFS_NAMENODE_RPC_PORT_DEFAULT, 10))
{code}
I don't think it's necessary to specify the name node RPC port. Even if we try 
a specific port multiple times (here _10_), port binding conflict still has a 
chance to happen. There should not be any logic in the test that depends on the 
port number.

> TestFileTruncate fails with BindException
> -
>
> Key: HDFS-9407
> URL: https://issues.apache.org/jira/browse/HDFS-9407
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-9407.patch
>
>
>  https://builds.apache.org/job/Hadoop-Hdfs-trunk/2530/
> {noformat}
> java.net.BindException: Problem binding to [localhost:8020] 
> java.net.BindException: Address already in use; For more details see:  
> http://wiki.apache.org/hadoop/BindException
> at sun.nio.ch.Net.bind0(Native Method)
> at sun.nio.ch.Net.bind(Net.java:444)
> at sun.nio.ch.Net.bind(Net.java:436)
> at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
> at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
> at org.apache.hadoop.ipc.Server.bind(Server.java:469)
> at org.apache.hadoop.ipc.Server$Listener.(Server.java:695)
> at org.apache.hadoop.ipc.Server.(Server.java:2464)
> at org.apache.hadoop.ipc.RPC$Server.(RPC.java:945)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:535)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:510)
> at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:787)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.(NameNodeRpcServer.java:390)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createRpcServer(NameNode.java:742)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:680)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:883)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:862)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1564)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster.createNameNode(MiniDFSCluster.java:1247)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster.configureNameService(MiniDFSCluster.java:1016)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:891)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:823)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:482)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:441)
> at 
> org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.setUp(TestFileTruncate.java:103)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9407) TestFileTruncate fails with BindException

2015-11-10 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14999154#comment-14999154
 ] 

Mingliang Liu commented on HDFS-9407:
-

Thanks for reporting and working on this.

> TestFileTruncate fails with BindException
> -
>
> Key: HDFS-9407
> URL: https://issues.apache.org/jira/browse/HDFS-9407
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-9407.patch
>
>
>  https://builds.apache.org/job/Hadoop-Hdfs-trunk/2530/
> {noformat}
> java.net.BindException: Problem binding to [localhost:8020] 
> java.net.BindException: Address already in use; For more details see:  
> http://wiki.apache.org/hadoop/BindException
> at sun.nio.ch.Net.bind0(Native Method)
> at sun.nio.ch.Net.bind(Net.java:444)
> at sun.nio.ch.Net.bind(Net.java:436)
> at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
> at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
> at org.apache.hadoop.ipc.Server.bind(Server.java:469)
> at org.apache.hadoop.ipc.Server$Listener.(Server.java:695)
> at org.apache.hadoop.ipc.Server.(Server.java:2464)
> at org.apache.hadoop.ipc.RPC$Server.(RPC.java:945)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:535)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:510)
> at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:787)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.(NameNodeRpcServer.java:390)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createRpcServer(NameNode.java:742)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:680)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:883)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:862)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1564)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster.createNameNode(MiniDFSCluster.java:1247)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster.configureNameService(MiniDFSCluster.java:1016)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:891)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:823)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:482)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:441)
> at 
> org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.setUp(TestFileTruncate.java:103)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9349) Reconfigure NN protected directories on the fly

2015-11-10 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14999641#comment-14999641
 ] 

Mingliang Liu commented on HDFS-9349:
-

The v2 patch looks good overall to me.

Further minor comments:
{code}
-  private final SortedSet protectedDirectories;
+  private SortedSet protectedDirectories;
{code}
Seems we can keep the {{protectedDirectories}} final as we clear/addAll instead 
of constructing a new set when re-configurating it.

{code}
+  void setProtectedDirectories(String dirString) {
{code}
Some java doc may be helpful. And {{dirString}} parameter can be renamed 
{{protectedDirsString}} or...?

> Reconfigure NN protected directories on the fly
> ---
>
> Key: HDFS-9349
> URL: https://issues.apache.org/jira/browse/HDFS-9349
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-9349.001.patch, HDFS-9349.002.patch
>
>
> This is to reconfigure
> {code}
> fs.protected.directories
> {code}
> without restarting NN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9349) Reconfigure NN protected directories on the fly

2015-11-10 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14999523#comment-14999523
 ] 

Mingliang Liu commented on HDFS-9349:
-

Thanks for working on this, [~xiaobingo].

Is it possible to eliminate the {{new TreeSet()}} when re-configurate the 
protected directories? Plus, {{setProtectedDirectoriesToDefault()}} seems not 
needed as {{StringUtils.getTrimmedStringCollection()}} accepts null value.

{code}
private final SortedSet protectedDirectories;
  
void setProtectedDirectories(String dirString) {
  protectedDirRWL.writeLock().lock();
  try {
protectedDirectories.clear();
protectedDirectories.addAll(normalizePaths(
StringUtils.getTrimmedStringCollection(dirString),
FS_PROTECTED_DIRECTORIES));
  } finally {
protectedDirRWL.writeLock().unlock();
  }
}
{code}

> Reconfigure NN protected directories on the fly
> ---
>
> Key: HDFS-9349
> URL: https://issues.apache.org/jira/browse/HDFS-9349
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-9349.001.patch
>
>
> This is to reconfigure
> {code}
> fs.protected.directories
> {code}
> without restarting NN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9400) TestRollingUpgradeRollback fails on branch-2.

2015-11-09 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997837#comment-14997837
 ] 

Mingliang Liu commented on HDFS-9400:
-

By the way, [~brahmareddy], feel free to consolidate the v1 patch to a refined 
one. I did not fully debug the failing test and may miss some context.

> TestRollingUpgradeRollback fails on branch-2.
> -
>
> Key: HDFS-9400
> URL: https://issues.apache.org/jira/browse/HDFS-9400
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chris Nauroth
>Assignee: Brahma Reddy Battula
>Priority: Blocker
> Attachments: HDFS-9400-branch-2.001.patch, HDFS-9400-branch-2.patch
>
>
> During a Jenkins pre-commit run on branch-2 for the HDFS-9394 patch, we 
> noticed a pre-existing failure in {{TestRollingUpgradeRollback}}.  I have 
> confirmed that this test is failing in branch-2 only.  It passes in trunk, 
> and it passes in branch-2.7.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9400) TestRollingUpgradeRollback fails on branch-2.

2015-11-09 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9400:

Attachment: HDFS-9400-branch-2.001.patch

Thank you [~cnauroth] for narrowing down the cause of this failing test. The 
[HDFS-8979] was not supposed to contain logic change. Unfortunately it brought 
some changes which was actually added by [HDFS-8332] in trunk.

Thank you [~brahmareddy] for working on this and investigating the root cause. 
I totally agree with you that the root cause is calling {{checkOpen()}} in 
{{DFSClient}} class, brought by [HDFS-8979] which obviously was not aware of 
the revert of [HDFS-8332] from {{branch-2}} when committing.

The fix may be to simply revert the unnecessary {{checkOpen()}} method call 
brought by [HDFS-8979]. I tested the v1 patch locally on my Gentoo Linux and 
Mac, and it seems to work.

> TestRollingUpgradeRollback fails on branch-2.
> -
>
> Key: HDFS-9400
> URL: https://issues.apache.org/jira/browse/HDFS-9400
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chris Nauroth
>Assignee: Brahma Reddy Battula
>Priority: Blocker
> Attachments: HDFS-9400-branch-2.001.patch, HDFS-9400-branch-2.patch
>
>
> During a Jenkins pre-commit run on branch-2 for the HDFS-9394 patch, we 
> noticed a pre-existing failure in {{TestRollingUpgradeRollback}}.  I have 
> confirmed that this test is failing in branch-2 only.  It passes in trunk, 
> and it passes in branch-2.7.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9387) Parse namenodeUri parameter only once in NNThroughputBenchmark$OperationStatsBase#verifyOpArgument()

2015-11-09 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997164#comment-14997164
 ] 

Mingliang Liu commented on HDFS-9387:
-

Thanks for your review [~xyao].

Yes the implementation to parse the {{namenode}} argument is different from 
parsing other parameters. When parsing {{-namenode}}, It calls the 
{{StringUtils.popOptionWithArgument}}. In that helper method, if there is no 
following argument, it will throw an IllegalArgumentException. The 
{{verifyOpArgument}} catches the exception and calls {{printUsage()}} to exit. 
I think it should work just fine?

> Parse namenodeUri parameter only once in 
> NNThroughputBenchmark$OperationStatsBase#verifyOpArgument()
> 
>
> Key: HDFS-9387
> URL: https://issues.apache.org/jira/browse/HDFS-9387
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9387.000.patch
>
>
> In {{NNThroughputBenchmark$OperationStatsBase#verifyOpArgument()}}, the   
> {{namenodeUri}} is always parsed from {{-namenode}} argument. This works just 
> fine if the {{-op}} parameter is not {{all}}, as the single benchmark will 
> need to parse the {{namenodeUri}} from args anyway.
> When the {{-op}} is {{all}}, namely all sub-benchmark will run, multiple 
> sub-benchmark will call the {{verifyOpArgument()}} method. In this case, the 
> first sub-benchmark reads the {{namenode}} argument and removes it from args. 
> The other sub-benchmarks will thereafter read {{null}} value since the 
> argument is removed. This contradicts the intension of providing {{namenode}} 
> for all sub-benchmarks.
> {code:title=current code}
>   try {
> namenodeUri = StringUtils.popOptionWithArgument("-namenode", args);
>   } catch (IllegalArgumentException iae) {
> printUsage();
>   }
> {code}
> The fix is to parse the {{namenodeUri}}, which is shared by all 
> sub-benchmarks, from {{-namenode}} argument only once. This follows the 
> convention of parsing other global arguments in 
> {{OperationStatsBase#verifyOpArgument()}}.
> {code:title=simple fix}
>   if (args.indexOf("-namenode") >= 0) {
> try {
>   namenodeUri = StringUtils.popOptionWithArgument("-namenode", args);
> } catch (IllegalArgumentException iae) {
>   printUsage();
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8979) Clean up checkstyle warnings in hadoop-hdfs-client module

2015-11-08 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995895#comment-14995895
 ] 

Mingliang Liu commented on HDFS-8979:
-

Thanks for reporting this, [~cnauroth].  This was not found as the 
{{TestRollingUpgradeRollback#testRollbackWithHAQJM}} passes in {{trunk}}. This 
patch was not to change any logic and thus any test failure caused should be 
fixed. I'm having a look at [HDFS-9400].

> Clean up checkstyle warnings in hadoop-hdfs-client module
> -
>
> Key: HDFS-8979
> URL: https://issues.apache.org/jira/browse/HDFS-8979
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HDFS-8979.000.patch, HDFS-8979.001.patch, 
> HDFS-8979.002.patch
>
>
> This jira tracks the effort of cleaning up checkstyle warnings in 
> {{hadoop-hdfs-client}} module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9387) Parse namenodeUri parameter only once in NNThroughputBenchmark$OperationStatsBase#verifyOpArgument()

2015-11-12 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003294#comment-15003294
 ] 

Mingliang Liu commented on HDFS-9387:
-

Yes, I agree with you. Fixing it provides obvious value while I see no easy 
way. I'd block it in [HDFS-9421] if no more input from others.
{code}
- if (runAll || ReplicationStats.OP_REPLICATION_NAME.equals(type)) {
+ if ((runAll && namenodeUri == null) || 
ReplicationStats.OP_REPLICATION_NAME.equals(type)) {
{code}

> Parse namenodeUri parameter only once in 
> NNThroughputBenchmark$OperationStatsBase#verifyOpArgument()
> 
>
> Key: HDFS-9387
> URL: https://issues.apache.org/jira/browse/HDFS-9387
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9387.000.patch
>
>
> In {{NNThroughputBenchmark$OperationStatsBase#verifyOpArgument()}}, the   
> {{namenodeUri}} is always parsed from {{-namenode}} argument. This works just 
> fine if the {{-op}} parameter is not {{all}}, as the single benchmark will 
> need to parse the {{namenodeUri}} from args anyway.
> When the {{-op}} is {{all}}, namely all sub-benchmark will run, multiple 
> sub-benchmark will call the {{verifyOpArgument()}} method. In this case, the 
> first sub-benchmark reads the {{namenode}} argument and removes it from args. 
> The other sub-benchmarks will thereafter read {{null}} value since the 
> argument is removed. This contradicts the intension of providing {{namenode}} 
> for all sub-benchmarks.
> {code:title=current code}
>   try {
> namenodeUri = StringUtils.popOptionWithArgument("-namenode", args);
>   } catch (IllegalArgumentException iae) {
> printUsage();
>   }
> {code}
> The fix is to parse the {{namenodeUri}}, which is shared by all 
> sub-benchmarks, from {{-namenode}} argument only once. This follows the 
> convention of parsing other global arguments in 
> {{OperationStatsBase#verifyOpArgument()}}.
> {code:title=simple fix}
>   if (args.indexOf("-namenode") >= 0) {
> try {
>   namenodeUri = StringUtils.popOptionWithArgument("-namenode", args);
> } catch (IllegalArgumentException iae) {
>   printUsage();
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-9421) NNThroughputBenchmark replication test NPE with -namenode option

2015-11-12 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu reassigned HDFS-9421:
---

Assignee: Mingliang Liu

> NNThroughputBenchmark replication test NPE with -namenode option
> 
>
> Key: HDFS-9421
> URL: https://issues.apache.org/jira/browse/HDFS-9421
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: benchmarks
>Reporter: Xiaoyu Yao
>Assignee: Mingliang Liu
>
> Hit the following NPE when reviewing fix for HDFS-9387 with manual tests as 
> NNThroughputBenchmark currently does not have JUnit tests. 
>  
> {code}
> HW11217:centos6.4 xyao$ hadoop 
> org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark -op replication 
> -namenode hdfs://HW11217.local:9000
> 15/11/12 14:52:03 INFO namenode.NNThroughputBenchmark: Starting benchmark: 
> replication
> 15/11/12 14:52:03 ERROR namenode.NNThroughputBenchmark: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$ReplicationStats.generateInputs(NNThroughputBenchmark.java:1312)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$OperationStatsBase.benchmark(NNThroughputBenchmark.java:280)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark.run(NNThroughputBenchmark.java:1509)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark.main(NNThroughputBenchmark.java:1534)
> ...
> {code}
> However, the root cause is different from HDFS-9387.  
> From ReplicationStats#generateInputs, *nameNode* is uninitialized before use, 
> which causes the NPE.
> {code}
>   final FSNamesystem namesystem = nameNode.getNamesystem();
> {code}
> From NNThroughputBenchmark#run, nameNode is only initialized when -namenode 
> option is not specified. The fix is to initialize it properly in the else 
> block when -namenode option is specified OR we should block this if it is not 
> supported.
> {code}
>  if (namenodeUri == null) {
> nameNode = NameNode.createNameNode(argv, config);
> NamenodeProtocols nnProtos = nameNode.getRpcServer();
> nameNodeProto = nnProtos;
> clientProto = nnProtos;
> dataNodeProto = nnProtos;
> refreshUserMappingsProto = nnProtos;
> bpid = nameNode.getNamesystem().getBlockPoolId();
>   } else {
> FileSystem.setDefaultUri(getConf(), namenodeUri);
> DistributedFileSystem dfs = (DistributedFileSystem)
> FileSystem.get(getConf());
> final URI nnUri = new URI(namenodeUri);
> nameNodeProto = DFSTestUtil.getNamenodeProtocolProxy(config, nnUri,
> UserGroupInformation.getCurrentUser());
>  
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9421) NNThroughputBenchmark replication test NPE with -namenode option

2015-11-13 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9421:

Attachment: HDFS-9241.000.patch

Per discussion in [HDFS-9387], the {{replication}} test is not supporting 
namenode in another process or on another host.

In the v0 patch:
-  If arguments include {{-op replication -namenode URI}}, the test will be 
ignored, along with a warning message for ignoring the {{replication}} test.
- If arguments include {{-op all -namenode URI}}, all tests other than 
{{replication}} will run against the standalone name node URI, along with a 
warning message.

> NNThroughputBenchmark replication test NPE with -namenode option
> 
>
> Key: HDFS-9421
> URL: https://issues.apache.org/jira/browse/HDFS-9421
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: benchmarks
>Reporter: Xiaoyu Yao
>Assignee: Mingliang Liu
> Attachments: HDFS-9241.000.patch
>
>
> Hit the following NPE when reviewing fix for HDFS-9387 with manual tests as 
> NNThroughputBenchmark currently does not have JUnit tests. 
>  
> {code}
> HW11217:centos6.4 xyao$ hadoop 
> org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark -op replication 
> -namenode hdfs://HW11217.local:9000
> 15/11/12 14:52:03 INFO namenode.NNThroughputBenchmark: Starting benchmark: 
> replication
> 15/11/12 14:52:03 ERROR namenode.NNThroughputBenchmark: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$ReplicationStats.generateInputs(NNThroughputBenchmark.java:1312)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$OperationStatsBase.benchmark(NNThroughputBenchmark.java:280)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark.run(NNThroughputBenchmark.java:1509)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark.main(NNThroughputBenchmark.java:1534)
> ...
> {code}
> However, the root cause is different from HDFS-9387.  
> From ReplicationStats#generateInputs, *nameNode* is uninitialized before use, 
> which causes the NPE.
> {code}
>   final FSNamesystem namesystem = nameNode.getNamesystem();
> {code}
> From NNThroughputBenchmark#run, nameNode is only initialized when -namenode 
> option is not specified. The fix is to initialize it properly in the else 
> block when -namenode option is specified OR we should block this if it is not 
> supported.
> {code}
>  if (namenodeUri == null) {
> nameNode = NameNode.createNameNode(argv, config);
> NamenodeProtocols nnProtos = nameNode.getRpcServer();
> nameNodeProto = nnProtos;
> clientProto = nnProtos;
> dataNodeProto = nnProtos;
> refreshUserMappingsProto = nnProtos;
> bpid = nameNode.getNamesystem().getBlockPoolId();
>   } else {
> FileSystem.setDefaultUri(getConf(), namenodeUri);
> DistributedFileSystem dfs = (DistributedFileSystem)
> FileSystem.get(getConf());
> final URI nnUri = new URI(namenodeUri);
> nameNodeProto = DFSTestUtil.getNamenodeProtocolProxy(config, nnUri,
> UserGroupInformation.getCurrentUser());
>  
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9421) NNThroughputBenchmark replication test NPE with -namenode option

2015-11-16 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007714#comment-15007714
 ] 

Mingliang Liu commented on HDFS-9421:
-

Thanks for your reporting this issue, reviewing and committing the patch, 
[~xyao]!

> NNThroughputBenchmark replication test NPE with -namenode option
> 
>
> Key: HDFS-9421
> URL: https://issues.apache.org/jira/browse/HDFS-9421
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: benchmarks
>Affects Versions: 2.8.0
>Reporter: Xiaoyu Yao
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HDFS-9241.000.patch
>
>
> Hit the following NPE when reviewing fix for HDFS-9387 with manual tests as 
> NNThroughputBenchmark currently does not have JUnit tests. 
>  
> {code}
> HW11217:centos6.4 xyao$ hadoop 
> org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark -op replication 
> -namenode hdfs://HW11217.local:9000
> 15/11/12 14:52:03 INFO namenode.NNThroughputBenchmark: Starting benchmark: 
> replication
> 15/11/12 14:52:03 ERROR namenode.NNThroughputBenchmark: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$ReplicationStats.generateInputs(NNThroughputBenchmark.java:1312)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$OperationStatsBase.benchmark(NNThroughputBenchmark.java:280)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark.run(NNThroughputBenchmark.java:1509)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark.main(NNThroughputBenchmark.java:1534)
> ...
> {code}
> However, the root cause is different from HDFS-9387.  
> From ReplicationStats#generateInputs, *nameNode* is uninitialized before use, 
> which causes the NPE.
> {code}
>   final FSNamesystem namesystem = nameNode.getNamesystem();
> {code}
> From NNThroughputBenchmark#run, nameNode is only initialized when -namenode 
> option is not specified. The fix is to initialize it properly in the else 
> block when -namenode option is specified OR we should block this if it is not 
> supported.
> {code}
>  if (namenodeUri == null) {
> nameNode = NameNode.createNameNode(argv, config);
> NamenodeProtocols nnProtos = nameNode.getRpcServer();
> nameNodeProto = nnProtos;
> clientProto = nnProtos;
> dataNodeProto = nnProtos;
> refreshUserMappingsProto = nnProtos;
> bpid = nameNode.getNamesystem().getBlockPoolId();
>   } else {
> FileSystem.setDefaultUri(getConf(), namenodeUri);
> DistributedFileSystem dfs = (DistributedFileSystem)
> FileSystem.get(getConf());
> final URI nnUri = new URI(namenodeUri);
> nameNodeProto = DFSTestUtil.getNamenodeProtocolProxy(config, nnUri,
> UserGroupInformation.getCurrentUser());
>  
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9387) Fix namenodeUri parameter parsing in NNThroughputBenchmark

2015-11-16 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007717#comment-15007717
 ] 

Mingliang Liu commented on HDFS-9387:
-

Thanks for your review and commit!

> Fix namenodeUri parameter parsing in NNThroughputBenchmark
> --
>
> Key: HDFS-9387
> URL: https://issues.apache.org/jira/browse/HDFS-9387
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HDFS-9387.000.patch
>
>
> In {{NNThroughputBenchmark$OperationStatsBase#verifyOpArgument()}}, the   
> {{namenodeUri}} is always parsed from {{-namenode}} argument. This works just 
> fine if the {{-op}} parameter is not {{all}}, as the single benchmark will 
> need to parse the {{namenodeUri}} from args anyway.
> When the {{-op}} is {{all}}, namely all sub-benchmark will run, multiple 
> sub-benchmark will call the {{verifyOpArgument()}} method. In this case, the 
> first sub-benchmark reads the {{namenode}} argument and removes it from args. 
> The other sub-benchmarks will thereafter read {{null}} value since the 
> argument is removed. This contradicts the intension of providing {{namenode}} 
> for all sub-benchmarks.
> {code:title=current code}
>   try {
> namenodeUri = StringUtils.popOptionWithArgument("-namenode", args);
>   } catch (IllegalArgumentException iae) {
> printUsage();
>   }
> {code}
> The fix is to parse the {{namenodeUri}}, which is shared by all 
> sub-benchmarks, from {{-namenode}} argument only once. This follows the 
> convention of parsing other global arguments in 
> {{OperationStatsBase#verifyOpArgument()}}.
> {code:title=simple fix}
>   if (args.indexOf("-namenode") >= 0) {
> try {
>   namenodeUri = StringUtils.popOptionWithArgument("-namenode", args);
> } catch (IllegalArgumentException iae) {
>   printUsage();
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9434) Recommission a datanode with 500k blocks may pause NN for 30 seconds

2015-11-16 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007756#comment-15007756
 ] 

Mingliang Liu commented on HDFS-9434:
-

Thanks for working on this . The patch looks good to me. +1 (non-binding) 
pending on Jenkins.

> Recommission a datanode with 500k blocks may pause NN for 30 seconds
> 
>
> Key: HDFS-9434
> URL: https://issues.apache.org/jira/browse/HDFS-9434
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: h9434_20151116.patch
>
>
> In BlockManager, processOverReplicatedBlocksOnReCommission is called within 
> the namespace lock.  There is a (not very useful) log message printed in 
> processOverReplicatedBlock.  When there is a large number of blocks stored in 
> a storage, printing the log message for each block can pause NN to process 
> any other operations.  We did see that it could pause NN  for 30 seconds for 
> a storage with 500k blocks.
> I suggest to change the log message to trace level as a quick fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager

2015-11-11 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9129:

Attachment: HDFS-9129.023.patch

The v23 patch is to address the findbugs warnings.

> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, 
> HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch, 
> HDFS-9129.011.patch, HDFS-9129.012.patch, HDFS-9129.013.patch, 
> HDFS-9129.014.patch, HDFS-9129.015.patch, HDFS-9129.016.patch, 
> HDFS-9129.017.patch, HDFS-9129.018.patch, HDFS-9129.019.patch, 
> HDFS-9129.020.patch, HDFS-9129.021.patch, HDFS-9129.022.patch, 
> HDFS-9129.023.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   6   7   8   9   10   >